Main problem with just splitting into test/train, is that when we are checking different hyperparameters, we don't know if our model is getting better (we only know that it is getting better for that test set). So it is better to have more test sets.

Splittig into train,validation,test sets is a better aproach. Because we adjust hyperparameters on validation set and then do one final test on test set to have a fair evaluation of our model for unseen data. To do this with python's sklearn, just do train_test_split twice. Once 70%-30% split then split 30% into 50%-50%.

# Toliau kaip atlikti CV, jeigu modelio lib neturi savyje

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [4]:
df = pd.read_csv('inp_files/Advertising.csv')
df.head()

Unnamed: 0,TV,radio,newspaper,sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,9.3
3,151.5,41.3,58.5,18.5
4,180.8,10.8,58.4,12.9


In [5]:
X = df.drop('sales', axis=1)
y = df['sales']

In [6]:
from sklearn.model_selection import train_test_split

In [7]:
# train set will be our hold-out set, which can be smaller, pvz 15%
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,random_state=101)

In [8]:
from sklearn.preprocessing import StandardScaler

In [9]:
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

In [12]:
from sklearn.linear_model import Ridge

In [13]:
model = Ridge(alpha=100) # specialiai pasirenkam auuštą alfą, kad pademonstruoti paremetrų tune'iningą

In [11]:
from sklearn.model_selection import cross_val_score

In [15]:
scores = cross_val_score(model, X_train, y_train, scoring='neg_mean_squared_error', cv=5)
scores # length 5, because we did cv=5 as in 5 folds/iterations of spliting into train,validation sets

array([ -9.32552967,  -4.9449624 , -11.39665242,  -7.0242106 ,
        -8.38562723])

In [16]:
abs(scores.mean()) # we judge our models performance

8.215396464543606

In [17]:
# adjust alpha parameter
model = Ridge(alpha=1)
scores = cross_val_score(model, X_train, y_train, scoring='neg_mean_squared_error', cv=5)
abs(scores.mean())

3.344839296530695

In [18]:
# do final performance measure on hold-out set
model.fit(X_train, y_train)
y_final_test_pred = model.predict(X_test)

In [19]:
from sklearn.metrics import mean_squared_error

In [20]:
mean_squared_error(y_final_test_pred, y_test) # final measure of how good model performs (on unseen data)

2.319021579428752

# Using cross_validate instead of cross_val_score

In [21]:
from sklearn.model_selection import cross_validate
# just gives more info, and you can use multuple error metrics

In [27]:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,random_state=101)
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
model = Ridge(alpha=100)

# scikit-learn.org/stable/modules/model_evaluation.html
scores = cross_validate(model, X_train, y_train, 
                        scoring=['neg_mean_squared_error',
                                'neg_mean_absolute_error'], cv=5, return_train_score=False)
scores = pd.DataFrame(scores) # for prettier output
scores

Unnamed: 0,fit_time,score_time,test_neg_mean_squared_error,test_neg_mean_absolute_error
0,0.001908,0.002094,-9.32553,-2.31243
1,0.001621,0.001034,-4.944962,-1.746534
2,0.002011,0.002369,-11.396652,-2.562117
3,0.001519,0.001022,-7.024211,-2.018732
4,0.002213,0.001297,-8.385627,-2.279519


In [28]:
scores.mean()

fit_time                        0.001854
score_time                      0.001563
test_neg_mean_squared_error    -8.215396
test_neg_mean_absolute_error   -2.183866
dtype: float64

# Grid search

Instead of cross_val_score or cross_validate we can use GridSearchCV (to do CV and GridSearch in general way for every model). Grid search is a way of training and validating a model on every possible combination of hyperparameters. So it's just for models who have more than 1 parameter e.g. elastic net.

In [29]:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,random_state=101)
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

In [31]:
from sklearn.linear_model import ElasticNet

In [35]:
base_elastic_net_model = ElasticNet() # we don't define any parameter yet
# we define them in parameter grid (keys of dictionary must match model lib parameter names)
param_grid = {'alpha': [0.1,1,5,10,50,100], 'l1_ratio': [.1,.5,.7,.95,.99,1]}

In [33]:
from sklearn.model_selection import GridSearchCV

In [40]:
grid_model = GridSearchCV(estimator=base_elastic_net_model,param_grid=param_grid,
                          scoring='neg_mean_squared_error', cv=5, verbose=2,return_train_score=False)
grid_model.fit(X_train,y_train)

Fitting 5 folds for each of 36 candidates, totalling 180 fits
[CV] alpha=0.1, l1_ratio=0.1 .........................................
[CV] .......................... alpha=0.1, l1_ratio=0.1, total=   0.0s
[CV] alpha=0.1, l1_ratio=0.1 .........................................
[CV] .......................... alpha=0.1, l1_ratio=0.1, total=   0.0s
[CV] alpha=0.1, l1_ratio=0.1 .........................................
[CV] .......................... alpha=0.1, l1_ratio=0.1, total=   0.0s
[CV] alpha=0.1, l1_ratio=0.1 .........................................
[CV] .......................... alpha=0.1, l1_ratio=0.1, total=   0.0s
[CV] alpha=0.1, l1_ratio=0.1 .........................................
[CV] .......................... alpha=0.1, l1_ratio=0.1, total=   0.0s
[CV] alpha=0.1, l1_ratio=0.5 .........................................
[CV] .......................... alpha=0.1, l1_ratio=0.5, total=   0.0s
[CV] alpha=0.1, l1_ratio=0.5 .........................................
[CV] ..........

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s



[CV] alpha=1, l1_ratio=0.95 ..........................................
[CV] ........................... alpha=1, l1_ratio=0.95, total=   0.0s
[CV] alpha=1, l1_ratio=0.95 ..........................................
[CV] ........................... alpha=1, l1_ratio=0.95, total=   0.0s
[CV] alpha=1, l1_ratio=0.95 ..........................................
[CV] ........................... alpha=1, l1_ratio=0.95, total=   0.0s
[CV] alpha=1, l1_ratio=0.95 ..........................................
[CV] ........................... alpha=1, l1_ratio=0.95, total=   0.0s
[CV] alpha=1, l1_ratio=0.95 ..........................................
[CV] ........................... alpha=1, l1_ratio=0.95, total=   0.0s
[CV] alpha=1, l1_ratio=0.99 ..........................................
[CV] ........................... alpha=1, l1_ratio=0.99, total=   0.0s
[CV] alpha=1, l1_ratio=0.99 ..........................................
[CV] ........................... alpha=1, l1_ratio=0.99, total=   0.0s
[CV] 

[CV] .......................... alpha=10, l1_ratio=0.99, total=   0.0s
[CV] alpha=10, l1_ratio=0.99 .........................................
[CV] .......................... alpha=10, l1_ratio=0.99, total=   0.0s
[CV] alpha=10, l1_ratio=1 ............................................
[CV] ............................. alpha=10, l1_ratio=1, total=   0.0s
[CV] alpha=10, l1_ratio=1 ............................................
[CV] ............................. alpha=10, l1_ratio=1, total=   0.0s
[CV] alpha=10, l1_ratio=1 ............................................
[CV] ............................. alpha=10, l1_ratio=1, total=   0.0s
[CV] alpha=10, l1_ratio=1 ............................................
[CV] ............................. alpha=10, l1_ratio=1, total=   0.0s
[CV] alpha=10, l1_ratio=1 ............................................
[CV] ............................. alpha=10, l1_ratio=1, total=   0.0s
[CV] alpha=50, l1_ratio=0.1 ..........................................
[CV] .

[Parallel(n_jobs=1)]: Done 180 out of 180 | elapsed:    0.7s finished


GridSearchCV(cv=5, error_score='raise',
       estimator=ElasticNet(alpha=1.0, copy_X=True, fit_intercept=True, l1_ratio=0.5,
      max_iter=1000, normalize=False, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False),
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'alpha': [0.1, 1, 5, 10, 50, 100], 'l1_ratio': [0.1, 0.5, 0.7, 0.95, 0.99, 1]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
       scoring='neg_mean_squared_error', verbose=2)

In [37]:
grid_model.best_estimator_ # best combination is alpha=0.1 and l1_ratio=1

ElasticNet(alpha=0.1, copy_X=True, fit_intercept=True, l1_ratio=1,
      max_iter=1000, normalize=False, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [38]:
 grid_model.best_params_

{'alpha': 0.1, 'l1_ratio': 1}

In [41]:
pd.DataFrame(grid_model.cv_results_) #info for each param combination

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_alpha,param_l1_ratio,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.001996,0.000254,0.000654,3.9e-05,0.1,0.1,"{'alpha': 0.1, 'l1_ratio': 0.1}",-3.453021,-1.40519,-5.789125,-2.187302,-4.645576,-3.496043,1.591601,6
1,0.001926,0.000209,0.000743,0.000176,0.1,0.5,"{'alpha': 0.1, 'l1_ratio': 0.5}",-3.32544,-1.427522,-5.59561,-2.163089,-4.451679,-3.392668,1.506827,5
2,0.001853,0.000101,0.000624,2.7e-05,0.1,0.7,"{'alpha': 0.1, 'l1_ratio': 0.7}",-3.26988,-1.442432,-5.502437,-2.16395,-4.356738,-3.347088,1.462765,4
3,0.003233,0.002699,0.000595,0.0001,0.1,0.95,"{'alpha': 0.1, 'l1_ratio': 0.95}",-3.213052,-1.472417,-5.396258,-2.177452,-4.24108,-3.300052,1.406248,3
4,0.002735,0.001828,0.000763,0.000111,0.1,0.99,"{'alpha': 0.1, 'l1_ratio': 0.99}",-3.208124,-1.478489,-5.380242,-2.181097,-4.222968,-3.294184,1.396953,2
5,0.001603,0.000254,0.000538,6e-05,0.1,1.0,"{'alpha': 0.1, 'l1_ratio': 1}",-3.206943,-1.480065,-5.376257,-2.182076,-4.21846,-3.29276,1.394613,1
6,0.001502,0.000288,0.000617,0.000281,1.0,0.1,"{'alpha': 1, 'l1_ratio': 0.1}",-9.827475,-5.261525,-11.875347,-7.449195,-8.542329,-8.591174,2.222939,12
7,0.001544,9.9e-05,0.000405,6.6e-05,1.0,0.5,"{'alpha': 1, 'l1_ratio': 0.5}",-8.707071,-4.214228,-10.879261,-6.204545,-7.173031,-7.435627,2.255532,11
8,0.001334,0.00017,0.000357,7e-06,1.0,0.7,"{'alpha': 1, 'l1_ratio': 0.7}",-7.92087,-3.549562,-10.024877,-5.379553,-6.324836,-6.63994,2.206213,10
9,0.001653,0.00023,0.000499,9.9e-05,1.0,0.95,"{'alpha': 1, 'l1_ratio': 0.95}",-6.729435,-2.591285,-8.709842,-4.156317,-5.329916,-5.503359,2.102835,9


In [42]:
# for final performance measure 
y_pred = grid_model.predict(X_test) # uses best combination
mean_squared_error(y_pred, y_test)

2.3873426420874737