# Laborator 10

Folosind un set de date - de exemplu de la https://archive.ics.uci.edu/ml/datasets.php?format=&task=&att=&area=&numAtt=&numIns=&type=text&sort=taskDown&view=table - sa se rezolve o problema de clasificare sau regresie, plecand de la intrari de tip text.

Se poate opta pentru codificare BOW, n-grams, word2vec sau altele adecvate. Modelele de predictie pot fi din biblioteca scikit-learn. Puteti folosi pentru preprocesare biblioteca [NLTK](https://www.nltk.org) etc.

Pentru clasificare se va optimiza scorul F1; se vor raporta scorurile F1, accuracy. Pentru regresie se va optimia scorul mean squared error; se vor raporta scorurile MSE, mean absolute error, r2.

Exemple:
1. [Clasificare de SMS-uri](https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection)
1. [Sentence Classification Data Set](https://archive.ics.uci.edu/ml/datasets/Sentence+Classification#)
1. [Sentiment Labelled Sentences Data Set](https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences)
1. [Victorian Era Authorship Attribution Data Set](https://archive.ics.uci.edu/ml/datasets/Victorian+Era+Authorship+Attribution)
1. [Amazon Commerce reviews set Data Set](https://archive.ics.uci.edu/ml/datasets/Amazon+Commerce+reviews+set)
1. [Farm Ads Data Set](https://archive.ics.uci.edu/ml/datasets/Farm+Ads)
1. etc...


Se vor investiga minim 2 seturi de date si pentru fiecare din ele minim 4 modele de clasificare sau regresie. Daca setul de date e deja impartit in train si test, se va folosi ca atare - setul de antrenare se va imparti, suplimentar in train + validation; altfel, se va face kfold CV, k=5. Valorile optimale ale hiperparametrilor vor fi alesi prin random search si grid search.

Pentru fiecare set de date:
1. Se descrie setul de date, in limba romana (continut, provenienta, problema etc.)
1. Se face analiza exploratorie, folosind cod Python: distributia claselor sau a valorilor continue de iesire - numeric si grafic, statistici asupra textelor (de exemplu: lungime minima/medie/maxima; cele mai frecvente k cuvinte; clustering etc.). Se va explica fiecare pas si ce se urmareste prin efectuarea lui. Graficele vor avea axele numite (ce se reprezinta, evetual unitate de masura)
1. Se face preprocesare de date; se explica in limba romana care sunt metodele de preprocesare folosite, efectul lor pe datele de intrare, ce forma are iesirea obtinuta; se arata efectele pasilor de preprocesare asupra setului de date (noul numar de documente, dinamica vocabularului, trasaturile rezultate etc.) Se pot aduga grafice si tabele la acest pas.
1. Clasificare sau regresie, dupa caz: se face o descriere a modelelor considerate, in limba romana; se descrie modalitatea de cautare a hiperparametrilor; rezultatele obtinute se vor prezenta tabelar, similar cu tema precedenta. 

Descrierea modelelor si a pasilor de preprocesare pot fi in sectiuni separate, cu referinte la acestea unde e necesar. Partea specifica aplicarii pasilor pe datele considerate va fi prezentata respectand ordinea de aplicare. 

Exemple:
1. [Working With Text Data](https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html)
1. [Text Classification with Python and Scikit-Learn](https://stackabuse.com/text-classification-with-python-and-scikit-learn/)
1. [How to Prepare Text Data for Machine Learning with scikit-learn](https://machinelearningmastery.com/prepare-text-data-machine-learning-scikit-learn/)

Prezentarea temei se va face in saptamana 20-24 mai.

# 1
- Continut-fraze marcate cu opinii pozitive sau negative. 
- Provenienta:

    - Acest set de date a fost creat pentru the Paper "From Group to Individual Labels using Deep Features", Kotzias et. al,. KDD 2015 
    
- Detalii:
    - Scorul este 1 (pentru pozitiv) sau 0 (pentru negativ).
    
    - Exemplele provin din trei site-uri / domenii diferite:
        - imdb.com
        - amazon.com
        - yelp.com 
    - Pentru fiecare site web, există 500 de propoziții pozitive și 500 de propoziții negative. Acestea au fost selectate aleatoriu pentru seturi de date mai mari de recenzii.
    - Am încercat să selectăm propoziții care au o conotație clară pozitivă sau negativă, scopul fiind acela că nu vor fi selectate propoziții neutre.
- Atribute:
    - Atributele sunt propoziții de text, extrase din recenziile produselor, filmelor și restaurantelor.
    

In [1]:
from sklearn.datasets import load_files
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
import numpy as np
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import cross_validate
from sklearn.tree import DecisionTreeRegressor

In [2]:
path="D:\ids\sentiment labelled sentences"
path1=path+"\\amazon_cells_labelled.txt"
path2=path+"\\imdb_labelled.txt"
path3=path+"\\yelp_labelled.txt"
data1 = pd.read_csv(path1, sep="\t", header=None)
data2=pd.read_csv(path2, sep="\t", header=None)
data3=pd.read_csv(path3, sep="\t", header=None)
data= [data1, data2, data3]
data= pd.concat(data)

In [3]:
data.head()

Unnamed: 0,0,1
0,So there is no way for me to plug it in here i...,0
1,"Good case, Excellent value.",1
2,Great for the jawbone.,1
3,Tied to charger for conversations lasting more...,0
4,The mic is great.,1


In [4]:
vect2 = CountVectorizer()
text=data[0].values
vect2.fit_transform(text)
print(vect2.vocabulary_) # cuvintele distincte din texte

# vocabular sortat in ordine crescatoare
print(sorted(vect2.vocabulary_.items(), key = lambda item: item[1]))



In [5]:
import operator
cuvinte=vect2.vocabulary_
sorted_cuvinte = sorted(cuvinte.items(), key=operator.itemgetter(0))
sorted_cuvinte
print("Numarul de cuvinte distincte este : ", len(sorted_cuvinte))
print("Cele mai frecvente 100 de cuvinte sunt: ", sorted_cuvinte[5055:])

Numarul de cuvinte distincte este :  5155
Cele mai frecvente 100 de cuvinte sunt:  [('window', 5055), ('windows', 5056), ('wine', 5057), ('wines', 5058), ('wings', 5059), ('winner', 5060), ('wiping', 5061), ('wire', 5062), ('wired', 5063), ('wirefly', 5064), ('wireless', 5065), ('wise', 5066), ('wish', 5067), ('wit', 5068), ('with', 5069), ('within', 5070), ('without', 5071), ('witnessed', 5072), ('witticisms', 5073), ('witty', 5074), ('woa', 5075), ('wobbly', 5076), ('women', 5077), ('won', 5078), ('wonder', 5079), ('wondered', 5080), ('wonderful', 5081), ('wonderfully', 5082), ('wong', 5083), ('wont', 5084), ('wontons', 5085), ('woo', 5086), ('wood', 5087), ('wooden', 5088), ('word', 5089), ('words', 5090), ('work', 5091), ('worked', 5092), ('worker', 5093), ('workers', 5094), ('working', 5095), ('works', 5096), ('world', 5097), ('worn', 5098), ('worries', 5099), ('worry', 5100), ('worse', 5101), ('worst', 5102), ('worth', 5103), ('worthless', 5104), ('worthwhile', 5105), ('worthy', 

In [6]:
print("Lungime minima este:", len(min(cuvinte,key=len)))
print("Lungime maxima este: ", len(max(cuvinte,key=len)))
print("Lungime medie este: ", sum(map(len, cuvinte))/len(cuvinte))

Lungime minima este: 2
Lungime maxima este:  17
Lungime medie este:  6.573617846750728


In [8]:
X=data[0].values
Y=data[1].values
print('How are texts organized: ', type(X))
print('How many texts in train subset', len(X))
print('The first text: ', X[0])
print('Associated emotion:', Y[0])

How are texts organized:  <class 'numpy.ndarray'>
How many texts in train subset 2748
The first text:  So there is no way for me to plug it in here in the US unless I go by a converter.
Associated emotion: 0


In [9]:
print('Classes in set: ', np.unique(Y))

Classes in set:  [0 1]


In [10]:
print('Samples per class in set: {0}'.format(np.bincount(Y)))

Samples per class in set: [1362 1386]


In [11]:
vect = CountVectorizer()
vect.fit(X)
X_transformed = vect.fit_transform(X)

In [12]:
print('Cuvintele distincte din texte:\n', vect.vocabulary_)

Cuvintele distincte din texte:


In [13]:
print(f'Reprezentarea ca vectori rari\n:\b{X_transformed}')

Reprezentarea ca vectori rari
:  (0, 1013)	1
  (0, 647)	1
  (0, 1982)	1
  (0, 4793)	1
  (0, 4835)	1
  (0, 4531)	1
  (0, 2158)	1
  (0, 2314)	2
  (0, 2432)	1
  (0, 3387)	1
  (0, 4609)	1
  (0, 2809)	1
  (0, 1829)	1
  (0, 4987)	1
  (0, 3043)	1
  (0, 2427)	1
  (0, 4545)	1
  (0, 4161)	1
  (1, 4861)	1
  (1, 1603)	1
  (1, 713)	1
  (1, 1993)	1
  (2, 2449)	1
  (2, 2023)	1
  (2, 4531)	1
  :	:
  (2747, 1387)	1
  (2747, 5110)	1
  (2747, 3437)	1
  (2747, 3857)	1
  (2747, 2068)	1
  (2747, 588)	1
  (2747, 4630)	1
  (2747, 785)	1
  (2747, 4548)	1
  (2747, 1540)	1
  (2747, 2630)	1
  (2747, 3166)	1
  (2747, 297)	1
  (2747, 4593)	1
  (2747, 2982)	1
  (2747, 4974)	1
  (2747, 3097)	1
  (2747, 4542)	1
  (2747, 2282)	1
  (2747, 647)	1
  (2747, 4531)	3
  (2747, 2314)	1
  (2747, 2432)	1
  (2747, 4609)	1
  (2747, 4545)	1


# Inmplentare regresie

In [7]:
x_text, y_train=data[0].values, data[1].values

In [8]:
x_text

array(['So there is no way for me to plug it in here in the US unless I go by a converter.',
       'Good case, Excellent value.', 'Great for the jawbone.', ...,
       'Overall I was not impressed and would not go back.',
       "The whole experience was underwhelming, and I think we'll just go to Ninja Sushi next time.",
       "Then, as if I hadn't wasted enough of my life there, they poured salt in the wound by drawing out the time it took to bring the check."],
      dtype=object)

In [9]:
y_train

array([0, 1, 1, ..., 0, 0, 0], dtype=int64)

In [1]:
vect = CountVectorizer(min_df=5, stop_words='english')
x1= vect.fit_transform(x_text)
x2=vect.transform(x_text)

NameError: name 'CountVectorizer' is not defined

In [12]:
print('Reprezentarea ca vectori:\n', x2.toarray())

Reprezentarea ca vectori:
 [[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]


In [13]:
print(f'Dimensiune vocabular: {len(vect.vocabulary_)}')

Dimensiune vocabular: 642


In [14]:
feature_names = vect.get_feature_names()
print('Dimensiunea vocabularului:', len(feature_names))

Dimensiunea vocabularului: 642


In [15]:
print(feature_names[:20])

['10', '20', '30', '40', '90', 'ability', 'absolutely', 'acting', 'action', 'actor', 'actors', 'actually', 'adorable', 'ago', 'amazing', 'amazon', 'ambiance', 'annoying', 'anytime', 'area']


In [64]:
param_grid = {'C':[0.001, 0.01, 0.1, 1, 10]}
grid = GridSearchCV(LogisticRegression(solver='lbfgs', max_iter=1000), param_grid=param_grid, cv=5, n_jobs=4)
grid.fit(x1, y_train)
print('best cross validation score:', grid.best_score_)
print('best params:', grid.best_params_)

best cross validation score: 0.9811557788944724
best params: {'C': 10}


# KNeighborsRegressor

In [17]:
parameter_grid = {'n_neighbors': list(range(1,10)), 'p': [1, 2]}
grid_search = GridSearchCV(estimator =KNeighborsRegressor(), param_grid=parameter_grid,scoring='neg_mean_squared_error', cv=3, iid=False, verbose=10, n_jobs=-1)
cv_results = cross_validate(grid_search, x1, y_train,
                            scoring=('neg_mean_squared_error', 'neg_mean_absolute_error','r2'),cv=5, return_train_score=True)

Fitting 3 folds for each of 18 candidates, totalling 54 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    9.6s
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:   12.1s
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:   19.7s
[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:   28.5s
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:   39.9s
[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:   54.2s
[Parallel(n_jobs=-1)]: Done  54 out of  54 | elapsed:  1.1min finished


Fitting 3 folds for each of 18 candidates, totalling 54 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    6.6s
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    8.7s
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:   15.8s
[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:   23.7s
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:   32.3s
[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:   42.2s
[Parallel(n_jobs=-1)]: Done  54 out of  54 | elapsed:   52.9s finished


Fitting 3 folds for each of 18 candidates, totalling 54 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    7.0s
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    9.7s
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:   16.8s
[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:   24.2s
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:   32.4s
[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:   42.2s
[Parallel(n_jobs=-1)]: Done  54 out of  54 | elapsed:   52.6s finished


Fitting 3 folds for each of 18 candidates, totalling 54 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    6.8s
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    8.6s
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:   15.9s
[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:   25.8s
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:   34.5s
[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:   46.3s
[Parallel(n_jobs=-1)]: Done  54 out of  54 | elapsed:   58.6s finished


Fitting 3 folds for each of 18 candidates, totalling 54 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    9.0s
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:   11.0s
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:   19.2s
[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:   29.3s
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:   38.7s
[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:   49.2s
[Parallel(n_jobs=-1)]: Done  54 out of  54 | elapsed:   59.6s finished


In [18]:
df=pd.DataFrame()

In [19]:
a=cv_results['train_neg_mean_squared_error'].mean()
b=cv_results['train_neg_mean_absolute_error'].mean()
c=cv_results['train_r2'].mean()
d=cv_results['test_neg_mean_squared_error'].mean()
e=cv_results['test_neg_mean_absolute_error'].mean()
f=cv_results['test_r2'].mean()
g=cv_results['fit_time'].mean()
df=df.append( {'Type_of_Search':'GridSearchCv','Model':'KNeighborsRegressor()','train_neg_mean_squared_error':a,
               'train_neg_mean_absolute_error':b, 'train_r2':c,
               'test_neg_mean_squared_error':d,'test_neg_mean_absolute_error':e,
               'test_r2':f, 'fit_time':g}, ignore_index=True)
df

Unnamed: 0,Model,Type_of_Search,fit_time,test_neg_mean_absolute_error,test_neg_mean_squared_error,test_r2,train_neg_mean_absolute_error,train_neg_mean_squared_error,train_r2
0,KNeighborsRegressor(),GridSearchCv,57.538566,-0.405359,-0.21156,0.14655,-0.342298,-0.15938,0.362111


In [20]:
gs_random = RandomizedSearchCV(estimator=KNeighborsRegressor(), param_distributions=parameter_grid,
                               scoring='neg_mean_squared_error', cv= 3,n_iter=15, iid=False, verbose=10, n_jobs=-1)
cv_results = cross_validate(gs_random, x1, y_train,
                            scoring=('neg_mean_squared_error', 'neg_mean_absolute_error','r2'),cv=5, return_train_score=True)

Fitting 3 folds for each of 15 candidates, totalling 45 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    8.9s
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:   16.2s
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:   29.9s
[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:   31.8s
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:   39.9s
[Parallel(n_jobs=-1)]: Done  43 out of  45 | elapsed:   53.5s remaining:    2.4s
[Parallel(n_jobs=-1)]: Done  45 out of  45 | elapsed:   55.7s finished


Fitting 3 folds for each of 15 candidates, totalling 45 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    3.0s
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    6.5s
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:   20.0s
[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:   27.9s
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:   32.6s
[Parallel(n_jobs=-1)]: Done  43 out of  45 | elapsed:   39.4s remaining:    1.7s
[Parallel(n_jobs=-1)]: Done  45 out of  45 | elapsed:   40.0s finished


Fitting 3 folds for each of 15 candidates, totalling 45 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    3.6s
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    6.7s
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:    9.6s
[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:   24.5s
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:   36.2s
[Parallel(n_jobs=-1)]: Done  43 out of  45 | elapsed:   46.0s remaining:    2.0s
[Parallel(n_jobs=-1)]: Done  45 out of  45 | elapsed:   48.8s finished


Fitting 3 folds for each of 15 candidates, totalling 45 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    7.9s
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:   15.7s
[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:   26.8s
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:   45.7s
[Parallel(n_jobs=-1)]: Done  43 out of  45 | elapsed:   55.4s remaining:    2.5s
[Parallel(n_jobs=-1)]: Done  45 out of  45 | elapsed:   56.3s finished


Fitting 3 folds for each of 15 candidates, totalling 45 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    3.2s
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:   11.5s
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:   24.9s
[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:   31.6s
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:   40.9s
[Parallel(n_jobs=-1)]: Done  43 out of  45 | elapsed:   55.5s remaining:    2.5s
[Parallel(n_jobs=-1)]: Done  45 out of  45 | elapsed:   56.3s finished


In [21]:
a=cv_results['train_neg_mean_squared_error'].mean()
b=cv_results['train_neg_mean_absolute_error'].mean()
c=cv_results['train_r2'].mean()
d=cv_results['test_neg_mean_squared_error'].mean()
e=cv_results['test_neg_mean_absolute_error'].mean()
f=cv_results['test_r2'].mean()
g=cv_results['fit_time'].mean()
df=df.append( {'Type_of_Search':'RandomizedSearchCV','Model':'KNeighborsRegressor()','train_neg_mean_squared_error':a,
               'train_neg_mean_absolute_error':b, 'train_r2':c,
               'test_neg_mean_squared_error':d,'test_neg_mean_absolute_error':e,
               'test_r2':f, 'fit_time':g}, ignore_index=True)
df

Unnamed: 0,Model,Type_of_Search,fit_time,test_neg_mean_absolute_error,test_neg_mean_squared_error,test_r2,train_neg_mean_absolute_error,train_neg_mean_squared_error,train_r2
0,KNeighborsRegressor(),GridSearchCv,57.538566,-0.405359,-0.21156,0.14655,-0.342298,-0.15938,0.362111
1,KNeighborsRegressor(),RandomizedSearchCV,51.528217,-0.405359,-0.21156,0.14655,-0.342298,-0.15938,0.362111


# LogisticRegression

In [22]:
param_grid = {'C':[0.001, 0.01, 0.1, 1, 10]}
grid = GridSearchCV(LogisticRegression(solver='lbfgs', max_iter=1000), param_grid=param_grid, cv=5, n_jobs=4, scoring='neg_mean_squared_error')
cv_results = cross_validate(grid, x1, y_train,
                            scoring=('neg_mean_squared_error', 'neg_mean_absolute_error','r2'),cv=5, return_train_score=True)

In [23]:
a=cv_results['train_neg_mean_squared_error'].mean()
b=cv_results['train_neg_mean_absolute_error'].mean()
c=cv_results['train_r2'].mean()
d=cv_results['test_neg_mean_squared_error'].mean()
e=cv_results['test_neg_mean_absolute_error'].mean()
f=cv_results['test_r2'].mean()
g=cv_results['fit_time'].mean()
df=df.append( {'Type_of_Search':'GridSearchCV','Model':'LogisticRegression()','train_neg_mean_squared_error':a,
               'train_neg_mean_absolute_error':b, 'train_r2':c,
               'test_neg_mean_squared_error':d,'test_neg_mean_absolute_error':e,
               'test_r2':f, 'fit_time':g}, ignore_index=True)
df

Unnamed: 0,Model,Type_of_Search,fit_time,test_neg_mean_absolute_error,test_neg_mean_squared_error,test_r2,train_neg_mean_absolute_error,train_neg_mean_squared_error,train_r2
0,KNeighborsRegressor(),GridSearchCv,57.538566,-0.405359,-0.21156,0.14655,-0.342298,-0.15938,0.362111
1,KNeighborsRegressor(),RandomizedSearchCV,51.528217,-0.405359,-0.21156,0.14655,-0.342298,-0.15938,0.362111
2,LogisticRegression(),GridSearchCV,1.976412,-0.219089,-0.219089,0.123575,-0.117268,-0.117268,0.530893


In [24]:
gs_random = RandomizedSearchCV(estimator=LogisticRegression(solver='lbfgs', max_iter=1000), param_distributions=param_grid,
                               scoring='neg_mean_squared_error', cv= 3,n_iter=4, iid=False)
cv_results = cross_validate(gs_random, x1, y_train,
                            scoring=('neg_mean_squared_error', 'neg_mean_absolute_error','r2'),cv=5, return_train_score=True)

In [25]:
a=cv_results['train_neg_mean_squared_error'].mean()
b=cv_results['train_neg_mean_absolute_error'].mean()
c=cv_results['train_r2'].mean()
d=cv_results['test_neg_mean_squared_error'].mean()
e=cv_results['test_neg_mean_absolute_error'].mean()
f=cv_results['test_r2'].mean()
g=cv_results['fit_time'].mean()
df=df.append( {'Type_of_Search':'RandomizedSearchCV','Model':'LogisticRegression()','train_neg_mean_squared_error':a,
               'train_neg_mean_absolute_error':b, 'train_r2':c,
               'test_neg_mean_squared_error':d,'test_neg_mean_absolute_error':e,
               'test_r2':f, 'fit_time':g}, ignore_index=True)
df

Unnamed: 0,Model,Type_of_Search,fit_time,test_neg_mean_absolute_error,test_neg_mean_squared_error,test_r2,train_neg_mean_absolute_error,train_neg_mean_squared_error,train_r2
0,KNeighborsRegressor(),GridSearchCv,57.538566,-0.405359,-0.21156,0.14655,-0.342298,-0.15938,0.362111
1,KNeighborsRegressor(),RandomizedSearchCV,51.528217,-0.405359,-0.21156,0.14655,-0.342298,-0.15938,0.362111
2,LogisticRegression(),GridSearchCV,1.976412,-0.219089,-0.219089,0.123575,-0.117268,-0.117268,0.530893
3,LogisticRegression(),RandomizedSearchCV,1.872318,-0.219089,-0.219089,0.123575,-0.117268,-0.117268,0.530893


# DecisionTreeRegressor

In [26]:
params = {'max_features': ['auto', 'sqrt', 'log2'],
          'min_samples_split':list(range(2,15)), 
          'min_samples_leaf':list(range(2,11)),
         }
grid = GridSearchCV(DecisionTreeRegressor(), param_grid=params, cv=5, n_jobs=4, scoring='neg_mean_squared_error', verbose=10)
cv_results = cross_validate(grid, x1, y_train,
                            scoring=('neg_mean_squared_error', 'neg_mean_absolute_error','r2'),cv=5, return_train_score=True)

Fitting 5 folds for each of 351 candidates, totalling 1755 fits


[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   5 tasks      | elapsed:    1.1s
[Parallel(n_jobs=4)]: Done  10 tasks      | elapsed:    1.8s
[Parallel(n_jobs=4)]: Done  17 tasks      | elapsed:    3.0s
[Parallel(n_jobs=4)]: Done  24 tasks      | elapsed:    4.0s
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    5.9s
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    6.8s
[Parallel(n_jobs=4)]: Done  53 tasks      | elapsed:    8.1s
[Parallel(n_jobs=4)]: Done  64 tasks      | elapsed:    9.3s
[Parallel(n_jobs=4)]: Done  77 tasks      | elapsed:   10.6s
[Parallel(n_jobs=4)]: Done  90 tasks      | elapsed:   11.8s
[Parallel(n_jobs=4)]: Done 105 tasks      | elapsed:   13.4s
[Parallel(n_jobs=4)]: Done 120 tasks      | elapsed:   14.9s
[Parallel(n_jobs=4)]: Done 137 tasks      | elapsed:   16.6s
[Parallel(n_jobs=4)]: Done 154 tasks      | elapsed:   18.2s
[Parallel(n_jobs=4)]: Done 173 tasks      | elapsed:   19.8s
[Parallel(

Fitting 5 folds for each of 351 candidates, totalling 1755 fits


[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   5 tasks      | elapsed:    1.1s
[Parallel(n_jobs=4)]: Done  10 tasks      | elapsed:    1.9s
[Parallel(n_jobs=4)]: Done  17 tasks      | elapsed:    3.1s
[Parallel(n_jobs=4)]: Done  24 tasks      | elapsed:    3.8s
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    5.2s
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    6.2s
[Parallel(n_jobs=4)]: Done  53 tasks      | elapsed:    7.6s
[Parallel(n_jobs=4)]: Done  64 tasks      | elapsed:    9.1s
[Parallel(n_jobs=4)]: Done  77 tasks      | elapsed:   10.7s
[Parallel(n_jobs=4)]: Done  90 tasks      | elapsed:   12.5s
[Parallel(n_jobs=4)]: Done 105 tasks      | elapsed:   14.7s
[Parallel(n_jobs=4)]: Done 120 tasks      | elapsed:   16.3s
[Parallel(n_jobs=4)]: Done 137 tasks      | elapsed:   17.8s
[Parallel(n_jobs=4)]: Done 154 tasks      | elapsed:   19.2s
[Parallel(n_jobs=4)]: Done 173 tasks      | elapsed:   20.8s
[Parallel(

Fitting 5 folds for each of 351 candidates, totalling 1755 fits


[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   5 tasks      | elapsed:    1.2s
[Parallel(n_jobs=4)]: Done  10 tasks      | elapsed:    1.8s
[Parallel(n_jobs=4)]: Done  17 tasks      | elapsed:    2.8s
[Parallel(n_jobs=4)]: Done  24 tasks      | elapsed:    3.6s
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    4.8s
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    5.7s
[Parallel(n_jobs=4)]: Done  53 tasks      | elapsed:    7.0s
[Parallel(n_jobs=4)]: Done  64 tasks      | elapsed:    7.9s
[Parallel(n_jobs=4)]: Done  77 tasks      | elapsed:    9.3s
[Parallel(n_jobs=4)]: Done  90 tasks      | elapsed:   10.4s
[Parallel(n_jobs=4)]: Done 105 tasks      | elapsed:   11.8s
[Parallel(n_jobs=4)]: Done 120 tasks      | elapsed:   13.1s
[Parallel(n_jobs=4)]: Done 137 tasks      | elapsed:   14.7s
[Parallel(n_jobs=4)]: Done 154 tasks      | elapsed:   16.1s
[Parallel(n_jobs=4)]: Done 173 tasks      | elapsed:   17.7s
[Parallel(

Fitting 5 folds for each of 351 candidates, totalling 1755 fits


[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   5 tasks      | elapsed:    0.9s
[Parallel(n_jobs=4)]: Done  10 tasks      | elapsed:    1.5s
[Parallel(n_jobs=4)]: Done  17 tasks      | elapsed:    2.5s
[Parallel(n_jobs=4)]: Done  24 tasks      | elapsed:    3.1s
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    4.2s
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    5.0s
[Parallel(n_jobs=4)]: Done  53 tasks      | elapsed:    6.2s
[Parallel(n_jobs=4)]: Done  64 tasks      | elapsed:    7.3s
[Parallel(n_jobs=4)]: Done  77 tasks      | elapsed:    8.6s
[Parallel(n_jobs=4)]: Done  90 tasks      | elapsed:    9.6s
[Parallel(n_jobs=4)]: Done 105 tasks      | elapsed:   11.0s
[Parallel(n_jobs=4)]: Done 120 tasks      | elapsed:   12.4s
[Parallel(n_jobs=4)]: Done 137 tasks      | elapsed:   14.0s
[Parallel(n_jobs=4)]: Done 154 tasks      | elapsed:   16.7s
[Parallel(n_jobs=4)]: Done 173 tasks      | elapsed:   19.1s
[Parallel(

Fitting 5 folds for each of 351 candidates, totalling 1755 fits


[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   5 tasks      | elapsed:    1.9s
[Parallel(n_jobs=4)]: Done  10 tasks      | elapsed:    2.7s
[Parallel(n_jobs=4)]: Done  17 tasks      | elapsed:    4.9s
[Parallel(n_jobs=4)]: Done  24 tasks      | elapsed:    6.1s
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:    7.9s
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:    9.3s
[Parallel(n_jobs=4)]: Done  53 tasks      | elapsed:   11.5s
[Parallel(n_jobs=4)]: Done  64 tasks      | elapsed:   13.7s
[Parallel(n_jobs=4)]: Done  77 tasks      | elapsed:   15.0s
[Parallel(n_jobs=4)]: Done  90 tasks      | elapsed:   16.5s
[Parallel(n_jobs=4)]: Done 105 tasks      | elapsed:   18.0s
[Parallel(n_jobs=4)]: Done 120 tasks      | elapsed:   19.4s
[Parallel(n_jobs=4)]: Done 137 tasks      | elapsed:   20.7s
[Parallel(n_jobs=4)]: Done 154 tasks      | elapsed:   22.4s
[Parallel(n_jobs=4)]: Done 173 tasks      | elapsed:   24.5s
[Parallel(

In [27]:
a=cv_results['train_neg_mean_squared_error'].mean()
b=cv_results['train_neg_mean_absolute_error'].mean()
c=cv_results['train_r2'].mean()
d=cv_results['test_neg_mean_squared_error'].mean()
e=cv_results['test_neg_mean_absolute_error'].mean()
f=cv_results['test_r2'].mean()
g=cv_results['fit_time'].mean()
df=df.append( {'Type_of_Search':'GridSearchCV','Model':'DecisionTreeRegressor()','train_neg_mean_squared_error':a,
               'train_neg_mean_absolute_error':b, 'train_r2':c,
               'test_neg_mean_squared_error':d,'test_neg_mean_absolute_error':e,
               'test_r2':f, 'fit_time':g}, ignore_index=True)
df

Unnamed: 0,Model,Type_of_Search,fit_time,test_neg_mean_absolute_error,test_neg_mean_squared_error,test_r2,train_neg_mean_absolute_error,train_neg_mean_squared_error,train_r2
0,KNeighborsRegressor(),GridSearchCv,57.538566,-0.405359,-0.21156,0.14655,-0.342298,-0.15938,0.362111
1,KNeighborsRegressor(),RandomizedSearchCV,51.528217,-0.405359,-0.21156,0.14655,-0.342298,-0.15938,0.362111
2,LogisticRegression(),GridSearchCV,1.976412,-0.219089,-0.219089,0.123575,-0.117268,-0.117268,0.530893
3,LogisticRegression(),RandomizedSearchCV,1.872318,-0.219089,-0.219089,0.123575,-0.117268,-0.117268,0.530893
4,DecisionTreeRegressor(),GridSearchCV,74.335318,-0.308674,-0.174669,0.295722,-0.233775,-0.116887,0.532182


In [28]:
gs_random = RandomizedSearchCV(estimator=DecisionTreeRegressor(), param_distributions=params,
                               scoring='neg_mean_squared_error', cv= 3,n_iter=250, iid=False, verbose=10, n_jobs=-1)
cv_results = cross_validate(gs_random, x1, y_train,
                            scoring=('neg_mean_squared_error', 'neg_mean_absolute_error','r2'),cv=5, return_train_score=True)

Fitting 3 folds for each of 250 candidates, totalling 750 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Batch computation too fast (0.0987s.) Setting batch_size=4.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done  16 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done  44 tasks      | elapsed:    2.2s
[Parallel(n_jobs=-1)]: Done  72 tasks      | elapsed:    3.4s
[Parallel(n_jobs=-1)]: Done 108 tasks      | elapsed:    4.7s
[Parallel(n_jobs=-1)]: Done 144 tasks      | elapsed:    5.9s
[Parallel(n_jobs=-1)]: Done 188 tasks      | elapsed:    7.3s
[Parallel(n_jobs=-1)]: Done 232 tasks      | elapsed:    8.9s
[Parallel(n_jobs=-1)]: Done 284 tasks      | elapsed:   10.5s
[Parallel(n_jobs=-1)]: Done 336 tasks      | elapsed:   12.4s
[Parallel(n_jobs=-1)]: Done 396 tasks      | elapsed:   14.0s
[Parallel(n_jobs=-1)]: Done 456 tasks      | elapsed:   15.8s
[Parallel(n_jobs=-1)]: Done 524 tasks      | elapsed:   17.9s
[Parallel(n_jobs=-1)]: Done 592 tas

Fitting 3 folds for each of 250 candidates, totalling 750 fits


[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:    1.9s
[Parallel(n_jobs=-1)]: Done  53 tasks      | elapsed:    2.3s
[Parallel(n_jobs=-1)]: Done  64 tasks      | elapsed:    2.7s
[Parallel(n_jobs=-1)]: Done  77 tasks      | elapsed:    3.3s
[Parallel(n_jobs=-1)]: Done  90 tasks      | elapsed:    3.8s
[Parallel(n_jobs=-1)]: Done 105 tasks      | elapsed:    4.2s
[Parallel(n_jobs=-1)]: Batch computation too fast (0.1919s.) Setting batch_size=2.
[Parallel(n_jobs=-1)]: Done 121 tasks      | elapsed:    4.6s
[Parallel(n_jobs=-1)]: Done 155 tasks      | elapsed:    5.5s
[Parallel(n_jobs=-1)]: Done 189 tasks      | elapsed:    6.5s
[Parallel(n_jobs=-1)]: Done 227 tasks      | elap

Fitting 3 folds for each of 250 candidates, totalling 750 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Batch computation too fast (0.0957s.) Setting batch_size=4.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    0.1s
[Parallel(n_jobs=-1)]: Done  16 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done  44 tasks      | elapsed:    1.9s
[Parallel(n_jobs=-1)]: Done  72 tasks      | elapsed:    2.6s
[Parallel(n_jobs=-1)]: Done 108 tasks      | elapsed:    3.7s
[Parallel(n_jobs=-1)]: Done 144 tasks      | elapsed:    5.0s
[Parallel(n_jobs=-1)]: Done 188 tasks      | elapsed:    6.9s
[Parallel(n_jobs=-1)]: Done 232 tasks      | elapsed:    8.3s
[Parallel(n_jobs=-1)]: Done 284 tasks      | elapsed:    9.5s
[Parallel(n_jobs=-1)]: Done 336 tasks      | elapsed:   10.9s
[Parallel(n_jobs=-1)]: Done 396 tasks      | elapsed:   12.6s
[Parallel(n_jobs=-1)]: Done 456 tasks      | elapsed:   14.3s
[Parallel(n_jobs=-1)]: Done 524 tasks      | elapsed:   16.8s
[Parallel(n_jobs=-1)]: Done 592 tas

Fitting 3 folds for each of 250 candidates, totalling 750 fits


[Parallel(n_jobs=-1)]: Batch computation too fast (0.1186s.) Setting batch_size=2.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done  12 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done  26 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done  40 tasks      | elapsed:    1.7s
[Parallel(n_jobs=-1)]: Done  58 tasks      | elapsed:    2.2s
[Parallel(n_jobs=-1)]: Done  76 tasks      | elapsed:    2.7s
[Parallel(n_jobs=-1)]: Done  98 tasks      | elapsed:    3.6s
[Parallel(n_jobs=-1)]: Done 120 tasks      | elapsed:    4.0s
[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed:    4.7s
[Parallel(n_jobs=-1)]: Done 172 tasks      | elapsed:    5.5s
[Parallel(n_jobs=-1)]: Done 202 tasks      | elapsed:    6.3s
[Parallel(n_jobs=-1)]: Done 232 tasks      | elapsed:    7.2s
[Parallel(n_jobs=-1)]: Done 266 tasks      | elapsed:    8.3s
[Parallel(n_jobs=-1)]: Done 300 tasks      | elapsed:    9.6s
[Parallel(n_jobs=-1)]: Done 338 tasks      | elap

Fitting 3 folds for each of 250 candidates, totalling 750 fits


[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:    1.7s
[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:    2.0s
[Parallel(n_jobs=-1)]: Done  53 tasks      | elapsed:    2.3s
[Parallel(n_jobs=-1)]: Done  64 tasks      | elapsed:    2.7s
[Parallel(n_jobs=-1)]: Done  77 tasks      | elapsed:    3.5s
[Parallel(n_jobs=-1)]: Done  90 tasks      | elapsed:    4.1s
[Parallel(n_jobs=-1)]: Done 105 tasks      | elapsed:    4.8s
[Parallel(n_jobs=-1)]: Done 120 tasks      | elapsed:    5.6s
[Parallel(n_jobs=-1)]: Done 137 tasks      | elapsed:    6.6s
[Parallel(n_jobs=-1)]: Done 154 tasks      | elapsed:    7.4s
[Parallel(n_jobs=-1)]: Done 173 tasks      | elapsed:    8.4s
[Parallel(n_jobs=-1)]: Done 192 tasks      | elapsed:    9.6s
[Paralle

In [29]:
a=cv_results['train_neg_mean_squared_error'].mean()
b=cv_results['train_neg_mean_absolute_error'].mean()
c=cv_results['train_r2'].mean()
d=cv_results['test_neg_mean_squared_error'].mean()
e=cv_results['test_neg_mean_absolute_error'].mean()
f=cv_results['test_r2'].mean()
g=cv_results['fit_time'].mean()
df=df.append( {'Type_of_Search':'RandomizedSearchCV','Model':'DecisionTreeRegressor()','train_neg_mean_squared_error':a,
               'train_neg_mean_absolute_error':b, 'train_r2':c,
               'test_neg_mean_squared_error':d,'test_neg_mean_absolute_error':e,
               'test_r2':f, 'fit_time':g}, ignore_index=True)
df

Unnamed: 0,Model,Type_of_Search,fit_time,test_neg_mean_absolute_error,test_neg_mean_squared_error,test_r2,train_neg_mean_absolute_error,train_neg_mean_squared_error,train_r2
0,KNeighborsRegressor(),GridSearchCv,57.538566,-0.405359,-0.21156,0.14655,-0.342298,-0.15938,0.362111
1,KNeighborsRegressor(),RandomizedSearchCV,51.528217,-0.405359,-0.21156,0.14655,-0.342298,-0.15938,0.362111
2,LogisticRegression(),GridSearchCV,1.976412,-0.219089,-0.219089,0.123575,-0.117268,-0.117268,0.530893
3,LogisticRegression(),RandomizedSearchCV,1.872318,-0.219089,-0.219089,0.123575,-0.117268,-0.117268,0.530893
4,DecisionTreeRegressor(),GridSearchCV,74.335318,-0.308674,-0.174669,0.295722,-0.233775,-0.116887,0.532182
5,DecisionTreeRegressor(),RandomizedSearchCV,25.82573,-0.308306,-0.178904,0.278364,-0.229674,-0.114837,0.540403


# MLPRegressor

In [30]:
paramsM = {'activation' : ['identity', 'logistic', 'tanh', 'relu'], 
          'solver' : ['lbfgs', 'sgd', 'adam'],
          'alpha':[0.1, 0.001, 0.0001]}
grid = GridSearchCV(MLPRegressor(max_iter=10000), param_grid=paramsM, cv=3, n_jobs=-1, scoring='neg_mean_squared_error', verbose=10)
cv_results = cross_validate(grid, x1, y_train,
                            scoring=('neg_mean_squared_error', 'neg_mean_absolute_error','r2'),cv=5, return_train_score=True)

Fitting 3 folds for each of 36 candidates, totalling 108 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:   44.5s
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:   56.5s
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:  1.5min
[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:  2.1min
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:  4.8min
[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:  6.8min
[Parallel(n_jobs=-1)]: Done  53 tasks      | elapsed:  9.4min
[Parallel(n_jobs=-1)]: Done  64 tasks      | elapsed: 12.0min
[Parallel(n_jobs=-1)]: Done  77 tasks      | elapsed: 13.8min
[Parallel(n_jobs=-1)]: Done  90 tasks      | elapsed: 15.8min
[Parallel(n_jobs=-1)]: Done 108 out of 108 | elapsed: 18.0min finished


Fitting 3 folds for each of 36 candidates, totalling 108 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:   40.8s
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:   57.0s
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:  1.5min
[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:  2.0min
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:  4.0min
[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:  5.8min
[Parallel(n_jobs=-1)]: Done  53 tasks      | elapsed:  8.7min
[Parallel(n_jobs=-1)]: Done  64 tasks      | elapsed: 11.0min
[Parallel(n_jobs=-1)]: Done  77 tasks      | elapsed: 12.6min
[Parallel(n_jobs=-1)]: Done  90 tasks      | elapsed: 14.6min
[Parallel(n_jobs=-1)]: Done 108 out of 108 | elapsed: 16.3min finished


Fitting 3 folds for each of 36 candidates, totalling 108 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:   36.1s
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:   52.8s
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:  1.4min
[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:  1.9min
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:  4.6min
[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:  6.2min
[Parallel(n_jobs=-1)]: Done  53 tasks      | elapsed:  9.0min
[Parallel(n_jobs=-1)]: Done  64 tasks      | elapsed: 11.1min
[Parallel(n_jobs=-1)]: Done  77 tasks      | elapsed: 12.9min
[Parallel(n_jobs=-1)]: Done  90 tasks      | elapsed: 14.8min
[Parallel(n_jobs=-1)]: Done 108 out of 108 | elapsed: 16.4min finished


Fitting 3 folds for each of 36 candidates, totalling 108 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:   37.5s
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:   55.0s
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:  1.4min
[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:  2.0min
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:  4.2min
[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:  6.3min
[Parallel(n_jobs=-1)]: Done  53 tasks      | elapsed:  9.2min
[Parallel(n_jobs=-1)]: Done  64 tasks      | elapsed: 11.3min
[Parallel(n_jobs=-1)]: Done  77 tasks      | elapsed: 13.1min
[Parallel(n_jobs=-1)]: Done  90 tasks      | elapsed: 15.3min
[Parallel(n_jobs=-1)]: Done 108 out of 108 | elapsed: 17.1min finished


Fitting 3 folds for each of 36 candidates, totalling 108 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:   35.2s
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:   55.2s
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:  1.4min
[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:  1.9min
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:  3.7min
[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:  6.3min
[Parallel(n_jobs=-1)]: Done  53 tasks      | elapsed:  8.9min
[Parallel(n_jobs=-1)]: Done  64 tasks      | elapsed: 11.6min
[Parallel(n_jobs=-1)]: Done  77 tasks      | elapsed: 13.5min
[Parallel(n_jobs=-1)]: Done  90 tasks      | elapsed: 15.6min
[Parallel(n_jobs=-1)]: Done 108 out of 108 | elapsed: 17.5min finished


In [31]:
a=cv_results['train_neg_mean_squared_error'].mean()
b=cv_results['train_neg_mean_absolute_error'].mean()
c=cv_results['train_r2'].mean()
d=cv_results['test_neg_mean_squared_error'].mean()
e=cv_results['test_neg_mean_absolute_error'].mean()
f=cv_results['test_r2'].mean()
g=cv_results['fit_time'].mean()
df=df.append( {'Type_of_Search':'GridSearchCV','Model':'MLPRegressor()','train_neg_mean_squared_error':a,
               'train_neg_mean_absolute_error':b, 'train_r2':c,
               'test_neg_mean_squared_error':d,'test_neg_mean_absolute_error':e,
               'test_r2':f, 'fit_time':g}, ignore_index=True)
df

Unnamed: 0,Model,Type_of_Search,fit_time,test_neg_mean_absolute_error,test_neg_mean_squared_error,test_r2,train_neg_mean_absolute_error,train_neg_mean_squared_error,train_r2
0,KNeighborsRegressor(),GridSearchCv,57.538566,-0.405359,-0.21156,0.14655,-0.342298,-0.15938,0.362111
1,KNeighborsRegressor(),RandomizedSearchCV,51.528217,-0.405359,-0.21156,0.14655,-0.342298,-0.15938,0.362111
2,LogisticRegression(),GridSearchCV,1.976412,-0.219089,-0.219089,0.123575,-0.117268,-0.117268,0.530893
3,LogisticRegression(),RandomizedSearchCV,1.872318,-0.219089,-0.219089,0.123575,-0.117268,-0.117268,0.530893
4,DecisionTreeRegressor(),GridSearchCV,74.335318,-0.308674,-0.174669,0.295722,-0.233775,-0.116887,0.532182
5,DecisionTreeRegressor(),RandomizedSearchCV,25.82573,-0.308306,-0.178904,0.278364,-0.229674,-0.114837,0.540403
6,MLPRegressor(),GridSearchCV,1052.335578,-0.361469,-0.188195,0.241424,-0.25908,-0.102066,0.591432


In [32]:
gs_random = RandomizedSearchCV(estimator=MLPRegressor(max_iter=10000), param_distributions=paramsM,
                               scoring='neg_mean_squared_error', cv= 3,n_iter=20, iid=False, verbose=10, n_jobs=-1)
cv_results = cross_validate(gs_random, x1, y_train,
                            scoring=('neg_mean_squared_error', 'neg_mean_absolute_error','r2'),cv=5, return_train_score=True)

Fitting 3 folds for each of 20 candidates, totalling 60 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:  1.5min
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:  2.6min
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:  5.5min
[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:  6.3min
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:  7.2min
[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:  8.0min
[Parallel(n_jobs=-1)]: Done  53 tasks      | elapsed:  8.9min
[Parallel(n_jobs=-1)]: Done  60 out of  60 | elapsed:  9.6min remaining:    0.0s
[Parallel(n_jobs=-1)]: Done  60 out of  60 | elapsed:  9.6min finished


Fitting 3 folds for each of 20 candidates, totalling 60 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:   30.9s
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:   58.3s
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:  1.7min
[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:  2.4min
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:  5.9min
[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:  6.5min
[Parallel(n_jobs=-1)]: Done  53 tasks      | elapsed:  8.2min
[Parallel(n_jobs=-1)]: Done  60 out of  60 | elapsed:  8.9min remaining:    0.0s
[Parallel(n_jobs=-1)]: Done  60 out of  60 | elapsed:  8.9min finished


Fitting 3 folds for each of 20 candidates, totalling 60 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:   53.1s
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:  1.7min
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:  4.0min
[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:  6.5min
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:  7.0min
[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:  8.1min
[Parallel(n_jobs=-1)]: Done  53 tasks      | elapsed: 11.2min
[Parallel(n_jobs=-1)]: Done  60 out of  60 | elapsed: 12.3min remaining:    0.0s
[Parallel(n_jobs=-1)]: Done  60 out of  60 | elapsed: 12.3min finished


Fitting 3 folds for each of 20 candidates, totalling 60 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:  1.4min
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:  3.3min
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:  4.3min
[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:  5.8min
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:  9.0min
[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed: 11.0min
[Parallel(n_jobs=-1)]: Done  53 tasks      | elapsed: 12.6min
[Parallel(n_jobs=-1)]: Done  60 out of  60 | elapsed: 13.5min remaining:    0.0s
[Parallel(n_jobs=-1)]: Done  60 out of  60 | elapsed: 13.5min finished


Fitting 3 folds for each of 20 candidates, totalling 60 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:  2.2min
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:  4.1min
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:  5.3min
[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:  6.4min
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:  8.0min
[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed: 10.2min
[Parallel(n_jobs=-1)]: Done  53 tasks      | elapsed: 11.2min
[Parallel(n_jobs=-1)]: Done  60 out of  60 | elapsed: 11.9min remaining:    0.0s
[Parallel(n_jobs=-1)]: Done  60 out of  60 | elapsed: 11.9min finished


In [33]:
a=cv_results['train_neg_mean_squared_error'].mean()
b=cv_results['train_neg_mean_absolute_error'].mean()
c=cv_results['train_r2'].mean()
d=cv_results['test_neg_mean_squared_error'].mean()
e=cv_results['test_neg_mean_absolute_error'].mean()
f=cv_results['test_r2'].mean()
g=cv_results['fit_time'].mean()
df=df.append( {'Type_of_Search':'RandomizedSearchCV','Model':'MLPRegressor()','train_neg_mean_squared_error':a,
               'train_neg_mean_absolute_error':b, 'train_r2':c,
               'test_neg_mean_squared_error':d,'test_neg_mean_absolute_error':e,
               'test_r2':f, 'fit_time':g}, ignore_index=True)
df

Unnamed: 0,Model,Type_of_Search,fit_time,test_neg_mean_absolute_error,test_neg_mean_squared_error,test_r2,train_neg_mean_absolute_error,train_neg_mean_squared_error,train_r2
0,KNeighborsRegressor(),GridSearchCv,57.538566,-0.405359,-0.21156,0.14655,-0.342298,-0.15938,0.362111
1,KNeighborsRegressor(),RandomizedSearchCV,51.528217,-0.405359,-0.21156,0.14655,-0.342298,-0.15938,0.362111
2,LogisticRegression(),GridSearchCV,1.976412,-0.219089,-0.219089,0.123575,-0.117268,-0.117268,0.530893
3,LogisticRegression(),RandomizedSearchCV,1.872318,-0.219089,-0.219089,0.123575,-0.117268,-0.117268,0.530893
4,DecisionTreeRegressor(),GridSearchCV,74.335318,-0.308674,-0.174669,0.295722,-0.233775,-0.116887,0.532182
5,DecisionTreeRegressor(),RandomizedSearchCV,25.82573,-0.308306,-0.178904,0.278364,-0.229674,-0.114837,0.540403
6,MLPRegressor(),GridSearchCV,1052.335578,-0.361469,-0.188195,0.241424,-0.25908,-0.102066,0.591432
7,MLPRegressor(),RandomizedSearchCV,708.481184,-0.369802,-0.184784,0.255138,-0.310682,-0.127174,0.491017


In [34]:
df['train_neg_mean_squared_error']=df['train_neg_mean_squared_error'].abs()
df['train_neg_mean_absolute_error']=df['train_neg_mean_absolute_error'].abs()
df['test_neg_mean_squared_error']=df['test_neg_mean_squared_error'].abs()
df['test_neg_mean_absolute_error']=df['test_neg_mean_absolute_error'].abs()
df

Unnamed: 0,Model,Type_of_Search,fit_time,test_neg_mean_absolute_error,test_neg_mean_squared_error,test_r2,train_neg_mean_absolute_error,train_neg_mean_squared_error,train_r2
0,KNeighborsRegressor(),GridSearchCv,57.538566,0.405359,0.21156,0.14655,0.342298,0.15938,0.362111
1,KNeighborsRegressor(),RandomizedSearchCV,51.528217,0.405359,0.21156,0.14655,0.342298,0.15938,0.362111
2,LogisticRegression(),GridSearchCV,1.976412,0.219089,0.219089,0.123575,0.117268,0.117268,0.530893
3,LogisticRegression(),RandomizedSearchCV,1.872318,0.219089,0.219089,0.123575,0.117268,0.117268,0.530893
4,DecisionTreeRegressor(),GridSearchCV,74.335318,0.308674,0.174669,0.295722,0.233775,0.116887,0.532182
5,DecisionTreeRegressor(),RandomizedSearchCV,25.82573,0.308306,0.178904,0.278364,0.229674,0.114837,0.540403
6,MLPRegressor(),GridSearchCV,1052.335578,0.361469,0.188195,0.241424,0.25908,0.102066,0.591432
7,MLPRegressor(),RandomizedSearchCV,708.481184,0.369802,0.184784,0.255138,0.310682,0.127174,0.491017


In [35]:
df = df.rename(columns={'train_neg_mean_squared_error': 'train_mean_squared_error', 'train_neg_mean_absolute_error': 'train_mean_absolute_error', 
                       'test_neg_mean_squared_error':'test_mean_squared_error','test_neg_mean_absolute_error':'test_mean_absolute_error'})
df

Unnamed: 0,Model,Type_of_Search,fit_time,test_mean_absolute_error,test_mean_squared_error,test_r2,train_mean_absolute_error,train_mean_squared_error,train_r2
0,KNeighborsRegressor(),GridSearchCv,57.538566,0.405359,0.21156,0.14655,0.342298,0.15938,0.362111
1,KNeighborsRegressor(),RandomizedSearchCV,51.528217,0.405359,0.21156,0.14655,0.342298,0.15938,0.362111
2,LogisticRegression(),GridSearchCV,1.976412,0.219089,0.219089,0.123575,0.117268,0.117268,0.530893
3,LogisticRegression(),RandomizedSearchCV,1.872318,0.219089,0.219089,0.123575,0.117268,0.117268,0.530893
4,DecisionTreeRegressor(),GridSearchCV,74.335318,0.308674,0.174669,0.295722,0.233775,0.116887,0.532182
5,DecisionTreeRegressor(),RandomizedSearchCV,25.82573,0.308306,0.178904,0.278364,0.229674,0.114837,0.540403
6,MLPRegressor(),GridSearchCV,1052.335578,0.361469,0.188195,0.241424,0.25908,0.102066,0.591432
7,MLPRegressor(),RandomizedSearchCV,708.481184,0.369802,0.184784,0.255138,0.310682,0.127174,0.491017


In [36]:
def highlight_max(df):
    is_max = df == df.max()
    return ['background-color: red' if v else '' for v in is_max]

def highlight_min(df):
    is_min = df == df.min()
    return ['background-color: yellow ' if a else '' for a in is_min]


df.style.apply(highlight_max)\
.apply(highlight_min)

Unnamed: 0,Model,Type_of_Search,fit_time,test_mean_absolute_error,test_mean_squared_error,test_r2,train_mean_absolute_error,train_mean_squared_error,train_r2
0,KNeighborsRegressor(),GridSearchCv,57.5386,0.405359,0.21156,0.14655,0.342298,0.15938,0.362111
1,KNeighborsRegressor(),RandomizedSearchCV,51.5282,0.405359,0.21156,0.14655,0.342298,0.15938,0.362111
2,LogisticRegression(),GridSearchCV,1.97641,0.219089,0.219089,0.123575,0.117268,0.117268,0.530893
3,LogisticRegression(),RandomizedSearchCV,1.87232,0.219089,0.219089,0.123575,0.117268,0.117268,0.530893
4,DecisionTreeRegressor(),GridSearchCV,74.3353,0.308674,0.174669,0.295722,0.233775,0.116887,0.532182
5,DecisionTreeRegressor(),RandomizedSearchCV,25.8257,0.308306,0.178904,0.278364,0.229674,0.114837,0.540403
6,MLPRegressor(),GridSearchCV,1052.34,0.361469,0.188195,0.241424,0.25908,0.102066,0.591432
7,MLPRegressor(),RandomizedSearchCV,708.481,0.369802,0.184784,0.255138,0.310682,0.127174,0.491017
