Je vais étudier dans ce notebook différents modèles d'apprentissage supervisé sur la base de donnée que que j'ai construite dans l'analyse exploratoire précédente. 
Cette base de donnée est un échantillon de l'ensemble des données de l'année 2016. 

In [307]:
# Chargement des librairies utiles 

import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
from scipy import sparse 
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.dummy import DummyRegressor
from sklearn import linear_model
from sklearn.metrics import mean_absolute_error
from sklearn import model_selection
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.linear_model import ElasticNet
from sklearn import metrics

In [289]:
# Chargement des bases de données 
data = pd.read_csv("2016_sample.csv", low_memory=False, sep=',')

# Feature engineering

Je vais ici diviser mes variables en deux types : les variables numériques et les variables représentant des catégories. 

In [290]:
scalingDF = data[['DISTANCE']] # Numerical features
categDF = data[['MONTH', 'DAY_OF_MONTH', 'DAY_OF_WEEK',
                  'CARRIER_CODE','ORIGIN_AIRPORT_ID', 'DEST_AIRPORT_ID',
                    'DEP_TIME', 'ARR_TIME' 
                     ]] # Categorical features


Je dois ensuite enconder mes variables pour faciliter l'apprentissage des modèles supervisés testés ultérieurement. 

In [291]:
encoder = OneHotEncoder()# Create encoder object

categDF_encoded = encoder.fit_transform(categDF)


In [296]:
scalingDF_sparse = sparse.csr_matrix(scalingDF)

In [297]:
X_final = sparse.hstack((scalingDF_sparse, categDF_encoded))

In [298]:
y_final = data['ARR_DELAY'].values

In [299]:
X_train, X_test, y_train, y_test = train_test_split(X_final,y_final,test_size = 0.2,random_state = 0) # Do 80/20 split

Il faut maintenant récupérer les variables numériques pour les mettre sur la même échelle que les autres variables afin qu'elles ne les écrasent pas dans l'apprentissage. 

In [300]:
X_train_numerical = X_train[:, 0:1].toarray() 
X_test_numerical = X_test[:, 0:1].toarray()

In [301]:
scaler = StandardScaler() # create scaler object
scaler.fit(X_train_numerical) # fit with the training data ONLY
X_train_numerical = sparse.csr_matrix(scaler.transform(X_train_numerical)) # Transform the data and convert to sparse
X_test_numerical = sparse.csr_matrix(scaler.transform(X_test_numerical))

In [302]:
X_train[:, 0:1] = X_train_numerical
X_test[:, 0:1] = X_test_numerical

# Tests des différents modèles

Je vais maintenant tester plusieurs modèles avec en baseline une approche de regression naïve qui prend simplement la moyenne du data set. 

## Baseline

In [306]:
dummy = DummyRegressor(strategy='mean')
dummy.fit(X_train, y_train)

DummyRegressor(constant=None, quantile=None, strategy='mean')

In [368]:
# Prédiction sur le jeu de test
y_pred_dum = dummy.predict(X_test)

# Evaluate
print("RMSE : %.2f" % np.sqrt(metrics.mean_squared_error(y_test, y_pred_dum)))

RMSE : 42.51


## Régression linéaire 

In [315]:
lr = linear_model.LinearRegression()
lr.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [369]:
# Prédiction sur le jeu de test
y_pred_lr = lr.predict(X_test)

# Evaluate
print("RMSE : %.2f" % np.sqrt(metrics.mean_squared_error(y_test, y_pred_lr)))

RMSE : 41.84


Il semble qu'un modèle de régression linéaire simple soit plus précis que la baseline car le RMSE est inférieur. Je vais essayer de mieux faire en utilisant la régularisation. 

## Ridge

Afin d'optimiser le modèle de regression Ridge, je vais d'abord effectuer une validation croisée afin d'obtenir la valeur optimale d'alpha pour laquelle le modèle est le plus précis

In [352]:
ridge = model_selection.GridSearchCV(Ridge(), cv=5,
                                      param_grid={"alpha": np.logspace(0, 3, 40)},
                                    verbose=10, n_jobs=-1, scoring='neg_mean_squared_error')

In [353]:
ridge.fit(X_train, y_train)

Fitting 5 folds for each of 40 candidates, totalling 200 fits
[CV] alpha=1.0 .......................................................
[CV] alpha=1.0 .......................................................
[CV] alpha=1.0 .......................................................
[CV] alpha=1.0 .......................................................
[CV] .............. alpha=1.0, score=-1746.188577282866, total=  12.9s
[CV] alpha=1.0 .......................................................
[CV] ............. alpha=1.0, score=-1761.9962556887713, total=  16.6s
[CV] alpha=1.1937766417144364 ........................................
[CV] ............. alpha=1.0, score=-1741.4571957816775, total=  19.9s
[CV] alpha=1.1937766417144364 ........................................
[CV] .............. alpha=1.0, score=-1660.091571398308, total=  22.1s
[CV] alpha=1.1937766417144364 ........................................
[CV] ............. alpha=1.0, score=-1772.5351573899452, total=  16.4s
[CV] alpha=1.19

[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:   30.2s


[CV]  alpha=1.1937766417144364, score=-1746.15777014089, total=  11.9s
[CV] alpha=1.1937766417144364 ........................................
[CV]  alpha=1.1937766417144364, score=-1741.4329607320449, total=  17.4s
[CV] alpha=1.425102670302998 .........................................
[CV]  alpha=1.1937766417144364, score=-1761.9641217348603, total=  14.1s
[CV] alpha=1.425102670302998 .........................................
[CV]  alpha=1.1937766417144364, score=-1772.4994410135737, total=  17.3s
[CV]  alpha=1.425102670302998, score=-1746.1217961869643, total=  12.5s
[CV] alpha=1.425102670302998 .........................................
[CV] alpha=1.425102670302998 .........................................


[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:   50.0s


[CV]  alpha=1.1937766417144364, score=-1660.061528674597, total=  21.5s
[CV] alpha=1.425102670302998 .........................................
[CV]  alpha=1.425102670302998, score=-1741.4049179277042, total=  17.4s
[CV] alpha=1.701254279852589 .........................................
[CV]  alpha=1.425102670302998, score=-1761.9271550275066, total=  14.0s
[CV] alpha=1.701254279852589 .........................................
[CV]  alpha=1.425102670302998, score=-1772.4581406694563, total=  15.0s
[CV] alpha=1.701254279852589 .........................................
[CV]  alpha=1.425102670302998, score=-1660.0265052118414, total=  17.7s
[CV] alpha=1.701254279852589 .........................................
[CV]  alpha=1.701254279852589, score=-1741.372633919039, total=  15.8s
[CV] alpha=1.701254279852589 .........................................
[CV]  alpha=1.701254279852589, score=-1746.081328001392, total=  15.3s
[CV] alpha=2.0309176209047357 ........................................


[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:  1.3min


[CV]  alpha=1.701254279852589, score=-1761.885806068051, total=  17.3s
[CV] alpha=2.0309176209047357 ........................................
[CV]  alpha=1.701254279852589, score=-1772.4114932990033, total=  17.3s
[CV] alpha=2.0309176209047357 ........................................
[CV]  alpha=1.701254279852589, score=-1659.9861044552538, total=  19.2s
[CV] alpha=2.0309176209047357 ........................................
[CV]  alpha=2.0309176209047357, score=-1741.3352492190045, total=  15.0s
[CV] alpha=2.0309176209047357 ........................................
[CV]  alpha=2.0309176209047357, score=-1746.0334632533559, total=  11.4s
[CV] alpha=2.4244620170823286 ........................................
[CV]  alpha=2.0309176209047357, score=-1761.8391485780185, total=  14.7s
[CV] alpha=2.4244620170823286 ........................................
[CV]  alpha=2.0309176209047357, score=-1659.9397849934749, total=  18.2s
[CV] alpha=2.4244620170823286 .....................................

[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:  1.8min


[CV]  alpha=2.0309176209047357, score=-1772.3574198897672, total=  13.4s
[CV] alpha=2.4244620170823286 ........................................
[CV]  alpha=2.4244620170823286, score=-1741.2927057181328, total=  14.1s
[CV] alpha=2.4244620170823286 ........................................
[CV]  alpha=2.4244620170823286, score=-1745.980707756137, total=  10.5s
[CV] alpha=2.894266124716751 .........................................
[CV]  alpha=2.4244620170823286, score=-1761.7858032820625, total=  11.7s
[CV] alpha=2.894266124716751 .........................................
[CV]  alpha=2.4244620170823286, score=-1772.2962130050164, total=  12.5s
[CV] alpha=2.894266124716751 .........................................
[CV]  alpha=2.894266124716751, score=-1741.2441536393517, total=  11.9s
[CV] alpha=2.894266124716751 .........................................
[CV]  alpha=2.4244620170823286, score=-1659.8864798752163, total=  15.2s
[CV] alpha=2.894266124716751 ....................................

[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:  2.2min


[CV]  alpha=2.894266124716751, score=-1772.2275922691547, total=  12.5s
[CV] alpha=3.45510729459222 ..........................................
[CV]  alpha=2.894266124716751, score=-1659.825662936309, total=  14.6s
[CV] alpha=3.45510729459222 ..........................................
[CV]  alpha=3.45510729459222, score=-1741.1893791031557, total=  12.9s
[CV] alpha=3.45510729459222 ..........................................
[CV]  alpha=3.45510729459222, score=-1745.8530579778171, total=  12.2s
[CV] alpha=4.1246263829013525 ........................................
[CV] . alpha=3.45510729459222, score=-1761.664623528303, total=  12.6s
[CV] alpha=4.1246263829013525 ........................................
[CV]  alpha=3.45510729459222, score=-1659.7571239475262, total=  14.9s
[CV] alpha=4.1246263829013525 ........................................
[CV]  alpha=3.45510729459222, score=-1772.1511417293816, total=  13.7s
[CV] alpha=4.1246263829013525 ........................................
[CV] 

[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:  2.7min


[CV]  alpha=4.1246263829013525, score=-1761.5955472440187, total=  13.5s
[CV] alpha=4.923882631706739 .........................................
[CV]  alpha=4.1246263829013525, score=-1659.6798895607444, total=  15.8s
[CV] alpha=4.923882631706739 .........................................
[CV]  alpha=4.923882631706739, score=-1741.0561822455259, total=  11.3s
[CV] alpha=4.923882631706739 .........................................
[CV]  alpha=4.1246263829013525, score=-1772.065830307393, total=  13.6s
[CV] alpha=4.923882631706739 .........................................
[CV]  alpha=4.923882631706739, score=-1745.6931288921123, total=   9.4s
[CV] alpha=5.878016072274914 .........................................
[CV]  alpha=4.923882631706739, score=-1761.520421288996, total=   9.9s
[CV] alpha=5.878016072274914 .........................................
[CV]  alpha=4.923882631706739, score=-1771.9720430978348, total=   9.1s
[CV] alpha=5.878016072274914 ........................................

[Parallel(n_jobs=-1)]: Done  53 tasks      | elapsed:  3.2min


[CV]  alpha=5.878016072274914, score=-1771.8732391921887, total=   8.4s
[CV] alpha=7.01703828670383 ..........................................
[CV]  alpha=5.878016072274914, score=-1659.4968228454886, total=   9.5s
[CV] alpha=7.01703828670383 ..........................................
[CV]  alpha=7.01703828670383, score=-1740.9058043377695, total=   6.9s
[CV] alpha=7.01703828670383 ..........................................
[CV]  alpha=7.01703828670383, score=-1745.5073523226197, total=   7.2s
[CV] alpha=8.37677640068292 ..........................................
[CV] . alpha=7.01703828670383, score=-1761.357089213321, total=   7.4s
[CV] alpha=8.37677640068292 ..........................................
[CV]  alpha=7.01703828670383, score=-1659.3935362366149, total=   7.9s
[CV] alpha=8.37677640068292 ..........................................
[CV] . alpha=7.01703828670383, score=-1771.755855026977, total=   7.6s
[CV] alpha=8.37677640068292 ..........................................
[CV]

[Parallel(n_jobs=-1)]: Done  64 tasks      | elapsed:  3.5min


[CV] . alpha=8.37677640068292, score=-1771.648553705432, total=   7.9s
[CV] alpha=10.0 ......................................................
[CV] ............ alpha=10.0, score=-1740.7068835147768, total=   8.7s
[CV] alpha=10.0 ......................................................
[CV] ............ alpha=10.0, score=-1745.2795196733214, total=   7.8s
[CV] alpha=11.93776641714437 .........................................
[CV] ............ alpha=10.0, score=-1761.1748707363024, total=   7.7s
[CV] alpha=11.93776641714437 .........................................
[CV] ............ alpha=10.0, score=-1659.1490201447527, total=   7.4s
[CV] alpha=11.93776641714437 .........................................
[CV] ............ alpha=10.0, score=-1771.5222932846846, total=   6.8s
[CV] alpha=11.93776641714437 .........................................
[CV] . alpha=11.93776641714437, score=-1740.60341802983, total=   7.1s
[CV] alpha=11.93776641714437 .........................................
[CV]  

[Parallel(n_jobs=-1)]: Done  77 tasks      | elapsed:  4.0min


[CV]  alpha=14.251026703029986, score=-1658.8758847836825, total=   8.2s
[CV] alpha=17.012542798525892 ........................................
[CV]  alpha=14.251026703029986, score=-1760.9764879816275, total=   8.6s
[CV] alpha=17.012542798525892 ........................................
[CV]  alpha=14.251026703029986, score=-1771.2604741974665, total=   7.0s
[CV] alpha=17.012542798525892 ........................................
[CV]  alpha=17.012542798525892, score=-1740.3613292172995, total=   7.6s
[CV] alpha=17.012542798525892 ........................................
[CV]  alpha=17.012542798525892, score=-1744.897206072317, total=   8.5s
[CV] alpha=20.309176209047358 ........................................
[CV]  alpha=17.012542798525892, score=-1760.8889138533025, total=   9.1s
[CV] alpha=20.309176209047358 ........................................
[CV]  alpha=17.012542798525892, score=-1658.7330786487398, total=   9.6s
[CV] alpha=20.309176209047358 ..................................

[Parallel(n_jobs=-1)]: Done  90 tasks      | elapsed:  4.5min


[CV]  alpha=24.244620170823282, score=-1740.1108515297747, total=   8.1s
[CV] alpha=24.244620170823282 ........................................
[CV]  alpha=24.244620170823282, score=-1744.6344014546949, total=   8.0s
[CV] alpha=28.942661247167518 ........................................
[CV]  alpha=24.244620170823282, score=-1760.7025331073216, total=   7.0s
[CV] alpha=28.942661247167518 ........................................
[CV]  alpha=24.244620170823282, score=-1658.419464507047, total=   7.9s
[CV] alpha=28.942661247167518 ........................................
[CV]  alpha=24.244620170823282, score=-1770.8547815614852, total=   8.5s
[CV] alpha=28.942661247167518 ........................................
[CV]  alpha=28.942661247167518, score=-1739.9842989052997, total=   8.2s
[CV] alpha=28.942661247167518 ........................................
[CV]  alpha=28.942661247167518, score=-1744.4999690381208, total=   8.5s
[CV] alpha=34.5510729459222 ....................................

[Parallel(n_jobs=-1)]: Done 105 tasks      | elapsed:  5.0min


[CV]  alpha=41.24626382901352, score=-1739.7244650854116, total=   5.7s
[CV] alpha=41.24626382901352 .........................................
[CV]  alpha=41.24626382901352, score=-1744.2498389663813, total=   6.0s
[CV] alpha=49.238826317067414 ........................................
[CV]  alpha=41.24626382901352, score=-1760.4251109305794, total=   6.6s
[CV] alpha=49.238826317067414 ........................................
[CV]  alpha=41.24626382901352, score=-1657.9454237538148, total=   6.1s
[CV] alpha=49.238826317067414 ........................................
[CV]  alpha=41.24626382901352, score=-1770.4655709817027, total=   5.9s
[CV] alpha=49.238826317067414 ........................................
[CV]  alpha=49.238826317067414, score=-1739.602766379446, total=   5.7s
[CV] alpha=49.238826317067414 ........................................
[CV]  alpha=49.238826317067414, score=-1744.132268823593, total=   6.6s
[CV] alpha=58.78016072274915 .........................................

[Parallel(n_jobs=-1)]: Done 120 tasks      | elapsed:  5.3min


[CV]  alpha=70.1703828670383, score=-1739.3638714799526, total=   6.0s
[CV] alpha=70.1703828670383 ..........................................
[CV]  alpha=70.1703828670383, score=-1657.5088787661762, total=   6.5s
[CV] alpha=83.7677640068292 ..........................................
[CV] . alpha=70.1703828670383, score=-1743.933258972449, total=   8.1s
[CV] alpha=83.7677640068292 ..........................................
[CV]  alpha=70.1703828670383, score=-1770.1424995010736, total=   6.0s
[CV] alpha=83.7677640068292 ..........................................
[CV]  alpha=70.1703828670383, score=-1760.1985748562586, total=  10.6s
[CV] alpha=83.7677640068292 ..........................................
[CV]  alpha=83.7677640068292, score=-1739.2590776243376, total=   6.4s
[CV] alpha=83.7677640068292 ..........................................
[CV] . alpha=83.7677640068292, score=-1743.851610575381, total=   9.6s
[CV] alpha=100.0 .....................................................
[CV]  

[Parallel(n_jobs=-1)]: Done 137 tasks      | elapsed:  6.1min


[CV]  alpha=119.3776641714437, score=-1657.1557110454205, total=  10.0s
[CV] alpha=142.51026703029993 ........................................
[CV]  alpha=119.3776641714437, score=-1760.0313962078903, total=  12.7s
[CV] alpha=142.51026703029993 ........................................
[CV]  alpha=119.3776641714437, score=-1769.922703445882, total=   9.8s
[CV] alpha=142.51026703029993 ........................................
[CV]  alpha=142.51026703029993, score=-1738.9974335635234, total=   9.1s
[CV] alpha=142.51026703029993 ........................................
[CV]  alpha=142.51026703029993, score=-1743.6870900449298, total=  10.6s
[CV] alpha=170.1254279852589 .........................................
[CV]  alpha=142.51026703029993, score=-1759.9909331055123, total=  12.3s
[CV] alpha=170.1254279852589 .........................................
[CV]  alpha=142.51026703029993, score=-1657.0634797176194, total=  11.0s
[CV] alpha=170.1254279852589 ......................................

[Parallel(n_jobs=-1)]: Done 154 tasks      | elapsed:  6.8min


[CV]  alpha=203.09176209047368, score=-1769.8367180570492, total=  10.2s
[CV] alpha=242.44620170823282 ........................................
[CV]  alpha=242.44620170823282, score=-1738.8370377552299, total=   9.3s
[CV] alpha=242.44620170823282 ........................................
[CV]  alpha=242.44620170823282, score=-1743.6409506376474, total=   7.7s
[CV] alpha=289.4266124716752 .........................................
[CV]  alpha=242.44620170823282, score=-1759.921049879285, total=   8.0s
[CV] alpha=289.4266124716752 .........................................
[CV]  alpha=242.44620170823282, score=-1656.8757479195692, total=   7.5s
[CV] alpha=289.4266124716752 .........................................
[CV]  alpha=242.44620170823282, score=-1769.8406368495848, total=   7.5s
[CV] alpha=289.4266124716752 .........................................
[CV]  alpha=289.4266124716752, score=-1738.8089439228452, total=   8.6s
[CV] alpha=289.4266124716752 ....................................

[Parallel(n_jobs=-1)]: Done 173 tasks      | elapsed:  7.6min


[CV]  alpha=412.4626382901352, score=-1656.8323131545392, total=  11.0s
[CV] alpha=492.3882631706742 .........................................
[CV]  alpha=412.4626382901352, score=-1769.9559840445395, total=  10.7s
[CV] alpha=492.3882631706742 .........................................
[CV]  alpha=492.3882631706742, score=-1738.810682076626, total=   9.8s
[CV] alpha=492.3882631706742 .........................................
[CV]  alpha=492.3882631706742, score=-1743.735844241921, total=   8.4s
[CV] alpha=587.8016072274912 .........................................
[CV]  alpha=492.3882631706742, score=-1759.9697022739174, total=   7.7s
[CV] alpha=587.8016072274912 .........................................
[CV]  alpha=492.3882631706742, score=-1656.8520946416127, total=   7.1s
[CV] alpha=587.8016072274912 .........................................
[CV]  alpha=492.3882631706742, score=-1770.0311927616228, total=   7.0s
[CV] alpha=587.8016072274912 .........................................
[

[Parallel(n_jobs=-1)]: Done 192 tasks      | elapsed:  8.1min


[CV]  alpha=837.6776400682925, score=-1760.1559255485377, total=   6.9s
[CV] alpha=1000.0 ....................................................
[CV]  alpha=837.6776400682925, score=-1657.0277567019732, total=   6.9s
[CV] alpha=1000.0 ....................................................
[CV]  alpha=837.6776400682925, score=-1770.3863923830131, total=   5.5s
[CV] alpha=1000.0 ....................................................
[CV] ........... alpha=1000.0, score=-1739.070949846634, total=   5.2s
[CV] alpha=1000.0 ....................................................
[CV] .......... alpha=1000.0, score=-1744.0024554641438, total=   5.0s
[CV] .......... alpha=1000.0, score=-1760.2611977588517, total=   4.6s
[CV] .......... alpha=1000.0, score=-1770.5568402013191, total=   3.4s
[CV] .......... alpha=1000.0, score=-1657.1321726361016, total=   4.0s


[Parallel(n_jobs=-1)]: Done 200 out of 200 | elapsed:  8.2min finished


GridSearchCV(cv=5, error_score='raise',
       estimator=Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001),
       fit_params=None, iid=True, n_jobs=-1,
       param_grid={'alpha': array([   1.     ,    1.19378,    1.4251 ,    1.70125,    2.03092,
          2.42446,    2.89427,    3.45511,    4.12463,    4.92388,
          5.87802,    7.01704,    8.37678,   10.     ,   11.93777,
         14.25103,   17.01254,   20.30918,   24.24462,   28.94266,
         34....42661,  345.51073,  412.46264,
        492.38826,  587.80161,  701.70383,  837.67764, 1000.     ])},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring='neg_mean_squared_error', verbose=10)

In [354]:
print(ridge.best_params_)

{'alpha': 289.4266124716752}


In [370]:
# Prédiction sur le jeu de tess
y_pred_ridge = ridge.predict(X_test)

# Evaluate
print("RMSE : %.2f" % np.sqrt(metrics.mean_squared_error(y_test, y_pred_ridge)))

RMSE : 41.81


Avec une regression Ridge, la performance est ici meilleur avec un RMSE de 41,81. Essayons maintenant la regression lasso. 

## Lasso

In [356]:
lasso = model_selection.GridSearchCV(Lasso(), cv=5,
                                      param_grid={"alpha": np.logspace(-2.5, 0, 40)},
                                    verbose=10, n_jobs=-1, scoring='neg_mean_squared_error')


In [357]:
lasso.fit(X_train, y_train)

Fitting 5 folds for each of 40 candidates, totalling 200 fits
[CV] alpha=0.0031622776601683794 .....................................
[CV] alpha=0.0031622776601683794 .....................................
[CV] alpha=0.0031622776601683794 .....................................
[CV] alpha=0.0031622776601683794 .....................................
[CV]  alpha=0.0031622776601683794, score=-1738.909255212816, total= 5.1min
[CV] alpha=0.0031622776601683794 .....................................
[CV]  alpha=0.0031622776601683794, score=-1744.071871698372, total= 5.1min
[CV] alpha=0.003665241237079626 ......................................
[CV]  alpha=0.0031622776601683794, score=-1760.4118193370928, total= 5.4min
[CV] alpha=0.003665241237079626 ......................................
[CV]  alpha=0.0031622776601683794, score=-1657.066660697359, total= 5.5min
[CV] alpha=0.003665241237079626 ......................................
[CV]  alpha=0.003665241237079626, score=-1738.9266732470417, total= 3

[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:  9.0min


[CV]  alpha=0.003665241237079626, score=-1744.151953004442, total= 3.8min
[CV] alpha=0.003665241237079626 ......................................
[CV]  alpha=0.003665241237079626, score=-1760.4521831005793, total= 4.2min
[CV] alpha=0.004248201698162612 ......................................
[CV]  alpha=0.0031622776601683794, score=-1770.2594337224925, total= 5.1min
[CV] alpha=0.004248201698162612 ......................................
[CV]  alpha=0.004248201698162612, score=-1738.9505047136633, total= 3.6min
[CV] alpha=0.004248201698162612 ......................................
[CV]  alpha=0.003665241237079626, score=-1657.0892439864697, total= 4.5min
[CV] alpha=0.004248201698162612 ......................................


[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed: 13.5min


[CV]  alpha=0.004248201698162612, score=-1744.2817380730698, total= 3.7min
[CV] alpha=0.004248201698162612 ......................................
[CV]  alpha=0.003665241237079626, score=-1770.2958117537837, total= 4.7min
[CV] alpha=0.004923882631706742 ......................................
[CV]  alpha=0.004923882631706742, score=-1739.002410242683, total= 3.3min
[CV] alpha=0.004923882631706742 ......................................
[CV]  alpha=0.004248201698162612, score=-1760.4926543679446, total= 4.4min
[CV] alpha=0.004923882631706742 ......................................
[CV]  alpha=0.004248201698162612, score=-1657.1710444062426, total= 4.3min
[CV] alpha=0.004923882631706742 ......................................
[CV]  alpha=0.004248201698162612, score=-1770.3594082978134, total= 4.3min
[CV] alpha=0.004923882631706742 ......................................
[CV]  alpha=0.004923882631706742, score=-1744.3563118350844, total= 3.1min
[CV] alpha=0.005707031326057167 ..................

[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed: 20.5min


[CV]  alpha=0.004923882631706742, score=-1657.3056770447224, total= 3.4min
[CV] alpha=0.005707031326057167 ......................................
[CV]  alpha=0.004923882631706742, score=-1760.5403768208112, total= 3.5min
[CV] alpha=0.005707031326057167 ......................................
[CV]  alpha=0.004923882631706742, score=-1770.4813589044056, total= 3.5min
[CV] alpha=0.005707031326057167 ......................................
[CV]  alpha=0.005707031326057167, score=-1739.061865573281, total= 2.9min
[CV] alpha=0.005707031326057167 ......................................
[CV]  alpha=0.005707031326057167, score=-1744.4558627291683, total= 2.9min
[CV] alpha=0.006614740641230145 ......................................
[CV]  alpha=0.005707031326057167, score=-1760.6164103034005, total= 3.1min
[CV] alpha=0.006614740641230145 ......................................
[CV]  alpha=0.005707031326057167, score=-1657.4495937097918, total= 3.0min
[CV] alpha=0.006614740641230145 ..................

[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed: 24.7min


[CV]  alpha=0.005707031326057167, score=-1770.6326353369427, total= 3.1min
[CV] alpha=0.006614740641230145 ......................................
[CV]  alpha=0.006614740641230145, score=-1739.1549251493543, total= 2.5min
[CV] alpha=0.006614740641230145 ......................................
[CV]  alpha=0.006614740641230145, score=-1744.5903208423545, total= 2.5min
[CV] alpha=0.007666822074546214 ......................................
[CV]  alpha=0.006614740641230145, score=-1760.699532586115, total= 2.7min
[CV] alpha=0.007666822074546214 ......................................
[CV]  alpha=0.006614740641230145, score=-1657.5879848157188, total= 2.5min
[CV] alpha=0.007666822074546214 ......................................
[CV]  alpha=0.006614740641230145, score=-1770.8262670468348, total= 2.5min
[CV] alpha=0.007666822074546214 ......................................
[CV]  alpha=0.007666822074546214, score=-1739.2785881848151, total= 2.3min
[CV] alpha=0.007666822074546214 ..................

[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed: 31.1min


[CV]  alpha=0.007666822074546214, score=-1771.0127003492362, total= 2.0min
[CV] alpha=0.008886238162743407 ......................................
[CV]  alpha=0.007666822074546214, score=-1760.8029360027224, total= 2.3min
[CV] alpha=0.008886238162743407 ......................................
[CV]  alpha=0.008886238162743407, score=-1739.4424344187546, total= 2.4min
[CV] alpha=0.008886238162743407 ......................................
[CV]  alpha=0.008886238162743407, score=-1744.8543640319274, total= 1.7min
[CV] alpha=0.010299603658099898 ......................................
[CV]  alpha=0.008886238162743407, score=-1657.8908459831625, total= 1.8min
[CV] alpha=0.010299603658099898 ......................................
[CV]  alpha=0.008886238162743407, score=-1760.9386397784021, total= 2.2min
[CV] alpha=0.010299603658099898 ......................................
[CV]  alpha=0.008886238162743407, score=-1771.2326695870925, total= 1.8min
[CV] alpha=0.010299603658099898 .................

[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed: 35.1min


[CV]  alpha=0.010299603658099898, score=-1761.112119197692, total= 2.0min
[CV] alpha=0.011937766417144363 ......................................
[CV]  alpha=0.010299603658099898, score=-1739.6607216299337, total= 2.8min
[CV] alpha=0.011937766417144363 ......................................
[CV]  alpha=0.010299603658099898, score=-1771.461999855261, total= 1.5min
[CV] alpha=0.011937766417144363 ......................................
[CV]  alpha=0.011937766417144363, score=-1745.241714553726, total= 1.3min
[CV] alpha=0.011937766417144363 ......................................
[CV]  alpha=0.011937766417144363, score=-1739.907756632256, total= 1.8min
[CV] alpha=0.013836480680324595 ......................................
[CV]  alpha=0.011937766417144363, score=-1761.2847579483728, total= 2.0min
[CV] alpha=0.013836480680324595 ......................................
[CV]  alpha=0.011937766417144363, score=-1658.309487912043, total= 1.8min
[CV] alpha=0.013836480680324595 ......................

[Parallel(n_jobs=-1)]: Done  53 tasks      | elapsed: 39.4min


[CV]  alpha=0.013836480680324595, score=-1658.5542812869596, total= 1.2min
[CV] alpha=0.0160371874375133 ........................................
[CV]  alpha=0.013836480680324595, score=-1771.9865690162546, total= 1.2min
[CV] alpha=0.0160371874375133 ........................................
[CV]  alpha=0.0160371874375133, score=-1740.4753701490115, total= 1.5min
[CV] alpha=0.0160371874375133 ........................................
[CV]  alpha=0.0160371874375133, score=-1745.6890920187957, total=  58.4s
[CV] alpha=0.018587918911465634 ......................................
[CV]  alpha=0.0160371874375133, score=-1761.7407838859353, total= 1.3min
[CV] alpha=0.018587918911465634 ......................................
[CV]  alpha=0.0160371874375133, score=-1658.8269589506635, total=  57.6s
[CV] alpha=0.018587918911465634 ......................................
[CV]  alpha=0.0160371874375133, score=-1772.30125433871, total= 1.1min
[CV] alpha=0.018587918911465634 .............................

[Parallel(n_jobs=-1)]: Done  64 tasks      | elapsed: 42.0min


[CV]  alpha=0.018587918911465634, score=-1772.6844969163055, total=  55.6s
[CV] alpha=0.021544346900318832 ......................................
[CV]  alpha=0.021544346900318832, score=-1746.2516909181243, total=  47.3s
[CV] alpha=0.021544346900318832 ......................................
[CV]  alpha=0.021544346900318832, score=-1741.2415180475684, total= 1.3min
[CV] alpha=0.02497099786006541 .......................................
[CV]  alpha=0.021544346900318832, score=-1762.3858741408542, total= 1.1min
[CV] alpha=0.02497099786006541 .......................................
[CV]  alpha=0.021544346900318832, score=-1659.530001535019, total=  44.5s
[CV] alpha=0.02497099786006541 .......................................
[CV]  alpha=0.021544346900318832, score=-1773.1451800645323, total=  51.1s
[CV] alpha=0.02497099786006541 .......................................
[CV]  alpha=0.02497099786006541, score=-1746.5955626747038, total=  43.0s
[CV] alpha=0.02497099786006541 ....................

[Parallel(n_jobs=-1)]: Done  77 tasks      | elapsed: 44.6min


[CV]  alpha=0.028942661247167503, score=-1763.3202750935563, total=  38.2s
[CV] alpha=0.033546021859540434 ......................................
[CV]  alpha=0.028942661247167503, score=-1660.5623131846671, total=  34.6s
[CV] alpha=0.033546021859540434 ......................................
[CV]  alpha=0.033546021859540434, score=-1743.0464304162026, total=  33.4s
[CV] alpha=0.033546021859540434 ......................................
[CV]  alpha=0.028942661247167503, score=-1774.2433502017846, total=  37.2s
[CV] alpha=0.033546021859540434 ......................................
[CV]  alpha=0.033546021859540434, score=-1747.5408688243722, total=  32.1s
[CV] alpha=0.03888155180308087 .......................................
[CV]  alpha=0.033546021859540434, score=-1763.9752748478509, total=  37.1s
[CV] alpha=0.03888155180308087 .......................................
[CV]  alpha=0.033546021859540434, score=-1661.2643710582659, total=  35.4s
[CV] alpha=0.03888155180308087 ..................

[Parallel(n_jobs=-1)]: Done  90 tasks      | elapsed: 46.5min


[CV] alpha=0.04506570337745473 .......................................
[CV]  alpha=0.04506570337745473, score=-1744.9072013144257, total=  30.9s
[CV] alpha=0.04506570337745473 .......................................
[CV]  alpha=0.04506570337745473, score=-1748.9949078251723, total=  30.6s
[CV] alpha=0.05223345074266841 .......................................
[CV]  alpha=0.04506570337745473, score=-1765.8981516540714, total=  30.3s
[CV] alpha=0.05223345074266841 .......................................
[CV]  alpha=0.04506570337745473, score=-1663.0990333226237, total=  32.6s
[CV] alpha=0.05223345074266841 .......................................
[CV]  alpha=0.04506570337745473, score=-1777.1125220346053, total=  28.0s
[CV] alpha=0.05223345074266841 .......................................
[CV]  alpha=0.05223345074266841, score=-1749.99019685844, total=  28.3s
[CV] alpha=0.05223345074266841 .......................................
[CV]  alpha=0.05223345074266841, score=-1746.1785770293836, t

[Parallel(n_jobs=-1)]: Done 105 tasks      | elapsed: 48.4min


[CV]  alpha=0.07017038286703826, score=-1749.5525335815446, total=  31.7s
[CV] alpha=0.07017038286703826 .......................................
[CV]  alpha=0.07017038286703826, score=-1770.4601486537858, total=  27.7s
[CV] alpha=0.08133105582266928 .......................................
[CV]  alpha=0.07017038286703826, score=-1752.4591926131375, total=  29.6s
[CV] alpha=0.08133105582266928 .......................................
[CV]  alpha=0.07017038286703826, score=-1667.4333582898014, total=  29.8s
[CV] alpha=0.08133105582266928 .......................................
[CV]  alpha=0.07017038286703826, score=-1782.0869123647317, total=  25.4s
[CV] alpha=0.08133105582266928 .......................................
[CV]  alpha=0.08133105582266928, score=-1751.6556137683535, total=  25.9s
[CV] alpha=0.08133105582266928 .......................................
[CV]  alpha=0.08133105582266928, score=-1754.244596687828, total=  28.8s
[CV] alpha=0.09426684551178849 ..........................

[Parallel(n_jobs=-1)]: Done 120 tasks      | elapsed: 50.1min


[CV]  alpha=0.10926008611173779, score=-1757.1168537457277, total=  24.3s
[CV] alpha=0.10926008611173779 .......................................
[CV]  alpha=0.10926008611173779, score=-1758.8497134590204, total=  23.9s
[CV] alpha=0.1266380173467403 ........................................
[CV]  alpha=0.10926008611173779, score=-1778.043627240288, total=  22.4s
[CV] alpha=0.1266380173467403 ........................................
[CV]  alpha=0.10926008611173779, score=-1674.6520195913595, total=  24.1s
[CV] alpha=0.1266380173467403 ........................................
[CV]  alpha=0.10926008611173779, score=-1790.0056055457744, total=  22.4s
[CV] alpha=0.1266380173467403 ........................................
[CV]  alpha=0.1266380173467403, score=-1760.31008655325, total=  20.7s
[CV] alpha=0.1266380173467403 ........................................
[CV]  alpha=0.1266380173467403, score=-1761.6110018222644, total=  21.0s
[CV] alpha=0.1467799267622069 ...............................

[Parallel(n_jobs=-1)]: Done 137 tasks      | elapsed: 51.4min


[CV]  alpha=0.17012542798525884, score=-1789.7273275871744, total=  15.2s
[CV] alpha=0.19718405565126418 .......................................
[CV]  alpha=0.19718405565126418, score=-1771.8977232956152, total=  12.4s
[CV] alpha=0.19718405565126418 .......................................
[CV]  alpha=0.19718405565126418, score=-1772.090930047467, total=  12.7s
[CV] alpha=0.19718405565126418 .......................................
[CV]  alpha=0.17012542798525884, score=-1801.5659376316398, total=  15.8s
[CV] alpha=0.19718405565126418 .......................................
[CV]  alpha=0.17012542798525884, score=-1685.4676493344784, total=  17.7s
[CV] alpha=0.22854638641349895 .......................................
[CV]  alpha=0.19718405565126418, score=-1793.9431254926317, total=  10.7s
[CV] alpha=0.22854638641349895 .......................................
[CV]  alpha=0.22854638641349895, score=-1775.6845165580296, total=   9.0s
[CV] alpha=0.22854638641349895 ..........................

[Parallel(n_jobs=-1)]: Done 154 tasks      | elapsed: 52.2min


[CV]  alpha=0.2648969287610527, score=-1814.0314094469384, total=   6.7s
[CV] alpha=0.3070290629757848 ........................................
[CV]  alpha=0.3070290629757848, score=-1784.3079806911269, total=   6.1s
[CV] alpha=0.3070290629757848 ........................................
[CV]  alpha=0.3070290629757848, score=-1783.6663001875636, total=   6.5s
[CV]  alpha=0.3070290629757848, score=-1806.4571174572288, total=   6.4s
[CV] alpha=0.3558623573050966 ........................................
[CV] alpha=0.3558623573050966 ........................................
[CV]  alpha=0.3070290629757848, score=-1701.3506906570217, total=   6.4s
[CV] alpha=0.3558623573050966 ........................................
[CV]  alpha=0.3070290629757848, score=-1818.7140727891367, total=   6.4s
[CV] alpha=0.3558623573050966 ........................................
[CV]  alpha=0.3558623573050966, score=-1788.3982986187577, total=   5.1s
[CV]  alpha=0.3558623573050966, score=-1787.6169075862788, tota

[Parallel(n_jobs=-1)]: Done 173 tasks      | elapsed: 52.6min


[CV] alpha=0.5541020330009492 ........................................
[CV]  alpha=0.5541020330009492, score=-1798.7416900549003, total=   3.9s
[CV] alpha=0.5541020330009492 ........................................
[CV]  alpha=0.47806525330073807, score=-1830.6429191249542, total=   4.7s
[CV] alpha=0.5541020330009492 ........................................
[CV]  alpha=0.5541020330009492, score=-1796.8424549166245, total=   3.6s
[CV] alpha=0.6422325422229356 ........................................
[CV]  alpha=0.5541020330009492, score=-1821.8489929690359, total=   3.8s
[CV] alpha=0.6422325422229356 ........................................
[CV]  alpha=0.5541020330009492, score=-1715.3164107705443, total=   4.0s
[CV]  alpha=0.5541020330009492, score=-1833.8175728221017, total=   3.9s
[CV] alpha=0.6422325422229356 ........................................
[CV] alpha=0.6422325422229356 ........................................
[CV]  alpha=0.6422325422229356, score=-1801.1419017909789, total

[Parallel(n_jobs=-1)]: Done 192 tasks      | elapsed: 52.8min


[CV]  alpha=0.8627747685955867, score=-1824.9157948163459, total=   1.6s
[CV] alpha=1.0 .......................................................
[CV]  alpha=0.8627747685955867, score=-1718.5781530282738, total=   1.8s
[CV] alpha=1.0 .......................................................
[CV]  alpha=0.8627747685955867, score=-1836.9009433979772, total=   1.9s
[CV] .............. alpha=1.0, score=-1802.167830408348, total=   1.8s
[CV] alpha=1.0 .......................................................
[CV] alpha=1.0 .......................................................
[CV] ............. alpha=1.0, score=-1800.4635585651288, total=   1.6s
[CV] ............. alpha=1.0, score=-1824.9157948163459, total=   1.6s
[CV] ............. alpha=1.0, score=-1718.5781530282738, total=   1.6s
[CV] ............. alpha=1.0, score=-1836.9009433979772, total=   1.5s


[Parallel(n_jobs=-1)]: Done 200 out of 200 | elapsed: 52.9min finished


GridSearchCV(cv=5, error_score='raise',
       estimator=Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False),
       fit_params=None, iid=True, n_jobs=-1,
       param_grid={'alpha': array([0.00316, 0.00367, 0.00425, 0.00492, 0.00571, 0.00661, 0.00767,
       0.00889, 0.0103 , 0.01194, 0.01384, 0.01604, 0.01859, 0.02154,
       0.02497, 0.02894, 0.03355, 0.03888, 0.04507, 0.05223, 0.06054,
       0.07017, 0.08133, 0.09427, 0.10926, 0.12664, 0.14678, 0.17013,
       0.19718, 0.22855, 0.2649 , 0.30703, 0.35586, 0.41246, 0.47807,
       0.5541 , 0.64223, 0.74438, 0.86277, 1.     ])},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring='neg_mean_squared_error', verbose=10)

In [363]:
print(lasso.best_params_)

{'alpha': 0.0031622776601683794}


In [371]:
# Prédiction sur le jeu de test
y_pred_lasso = lasso.predict(X_test)

# Evaluate
print("RMSE : %.2f" % np.sqrt(metrics.mean_squared_error(y_test, y_pred_lasso)))

RMSE : 41.81


On trouve ici le même RMSE ce qui nous donne une performance similaire pour les 2 modèles. 

## Modèle final : régression Ridge

Le modèle final choisi est donc une regression Ridge car avec le lasso ils ont le RMSE le plus bas : 41,81 et en effectuant ces tests sur la totalité de la base de données, la regression Ridge était plus performante.   