# Régression linéaire pénalisée

Dans ce notebook, vous découvrirez comment utiliser la régression linéaire pénalisée L1 (Lasso), L2 (Ridge) et L1 + L2 (Elasticnet).

Ces pénalités intégrées à la fonction de coût vous aideront à entraîner des modèles moins complexes pour éviter le sur-apprentissage.

# Importation des packages

In [2]:
import pandas as pd
import seaborn as sns
# Importation of the data for our regression example
from sklearn.datasets import load_boston

# Importation of the function to standardize the data
from sklearn.preprocessing import StandardScaler

# Importation of the train_test_split function which split randomly our data 
# into a train and test set
from sklearn.model_selection import train_test_split

# Importation of the linear regression algorithm
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge
from sklearn.linear_model import ElasticNet

# Importation of the performance metrics
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Importation of the maplotlib package to create graphics
import matplotlib.pyplot as plt

# Importation of numpy to use of vectors, matrices, tensors.
import numpy as np 

# Importation des données

In [3]:
# Data frame for ou regression example
boston = load_boston()
boston.keys()

dict_keys(['data', 'target', 'feature_names', 'DESCR', 'filename'])

In [4]:
boston['data'].shape

(506, 13)

In [5]:
boston['feature_names']


array(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD',
       'TAX', 'PTRATIO', 'B', 'LSTAT'], dtype='<U7')

In [6]:
BostonData = pd.DataFrame(boston['data'])
BostonData.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33


In [7]:
BostonData = pd.DataFrame(boston['data'],columns=boston['feature_names'])
BostonData.head(5)

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33


In [8]:
BostonData.shape

(506, 13)

In [10]:
print(boston.DESCR)

.. _boston_dataset:

Boston house prices dataset
---------------------------

**Data Set Characteristics:**  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pu

In [None]:
BostonData['MEDV'] = boston.target

In [None]:
boston['target'].shape

In [None]:
BostonData.isnull().sum()

In [None]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.distplot(BostonData['MEDV'], bins=30)
plt.show()

In [None]:
sns.scatterplot(x='LSTAT', y='MEDV', data=BostonData)
plt.show()

In [None]:

correlation_matrix = BostonData.corr()
# annot = True to print the values inside the square
sns.heatmap(data=correlation_matrix, annot=True)

In [None]:
X_reg = boston.data[:,]
y_reg = boston.target
X_reg.shape

In [None]:

X_reg = pd.DataFrame(np.c_[BostonData['LSTAT'], BostonData['RM']], columns = ['LSTAT','RM'])
y_reg = BostonData['MEDV']
X_reg.shape

Si vous avez des questions à propos de la fonction *train_test_split* n'hésitez pas à consulter la [doc](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html).

In [None]:
# Use the function train_test_split to create your train and test set
X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(X_reg, y_reg, 
                                                                    test_size=0.10, 
                                                                    random_state=123)

# Normalisation des données

Si vous avez des questions à propos de la fonction *StandardScaler* n'hésitez pas à consulter la [doc](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html).

In [None]:
# Initialize the StandardScaler function
scaler = StandardScaler()

# Fit the StandardScaler on the trainig set
scaler.fit(X_train_reg)

# Standardization of the training set
X_train_reg_norm = scaler.transform(X_train_reg)

# Standardization of the validation set
X_test_reg_norm = scaler.transform(X_test_reg)

# Initialisation des modèles

In [None]:
reg = LinearRegression()

Fonction de coût de la régression Lasso :

$ J(w) =  \frac{1}{2m}[\sum^m_{i=1}(\hat{y}^{(i)}-y^{(i)})^2+\alpha\sum^n_{j=1}|w_j|$ 



In [None]:
lasso = Lasso(alpha=0.2, random_state=123)

Fonction de coût de la régression Ridge :

$ J(w) =  \frac{1}{2m}[\sum^m_{i=1}(\hat{y}^{(i)}-y^{(i)})^2+\alpha\sum^n_{j=1}w_j^2$ 



In [None]:
ridge = Ridge(alpha=0.2, random_state=123)

Fonction de coût de la régression ElasticNet :

$ J(w) =  \frac{1}{2m}[\sum^m_{i=1}(\hat{y}^{(i)}-y^{(i)})^2+\alpha[\frac{1-l1\_ratio}{2}\sum^n_{j=1}w_j^2 + l1\_ratio\sum^n_{j=1}|w_j|]]$ 



In [None]:
elasticnet = ElasticNet(alpha=0.2, l1_ratio=0.5, random_state=123)

# Entraînement des modèles

In [None]:
# Classic linear regression
reg.fit(X_train_reg_norm, y_train_reg)

In [None]:
# Lasso regression
lasso.fit(X_train_reg_norm, y_train_reg)

In [None]:
# Ridge regression
ridge.fit(X_train_reg_norm, y_train_reg)

In [None]:
# ElasticNet regression
elasticnet.fit(X_train_reg_norm, y_train_reg)

# Validation des modèles

In [None]:
# Classic linear regression
y_train_reg_prediction = reg.predict(X_train_reg_norm)

y_test_reg_prediction = reg.predict(X_test_reg_norm)

In [None]:
# Lasso regression
y_train_lasso_prediction = lasso.predict(X_train_reg_norm)

y_test_lasso_prediction = lasso.predict(X_test_reg_norm)

In [None]:
# Ridge regression
y_train_ridge_prediction = ridge.predict(X_train_reg_norm)

y_test_ridge_prediction = ridge.predict(X_test_reg_norm)

In [None]:
# ElasticNet regression
y_train_elasticnet_prediction = elasticnet.predict(X_train_reg_norm)

y_test_elasticnet_prediction = elasticnet.predict(X_test_reg_norm)

Calculer l'erreur absolue moyenne pour chaque modèle.

In [None]:
# Classic linear regression
mae_train_reg = mean_absolute_error(y_train_reg_prediction, y_train_reg)

mae_test_reg = mean_absolute_error(y_test_reg_prediction, y_test_reg)

print('MAE for the training set : '+str(mae_train_reg))

print('MAE for the testing set : '+str(mae_test_reg))


rmse = (np.sqrt(mean_squared_error(y_train_reg, y_train_reg_prediction)))
r2 = r2_score(y_train_reg, y_train_reg_prediction)

print("The model performance for training set")
print("--------------------------------------")
print('RMSE is {}'.format(rmse))
print('R2 score is {}'.format(r2))
print("\n")

# model evaluation for testing set

rmse = (np.sqrt(mean_squared_error(y_test_reg,y_test_reg_prediction)))
r2 = r2_score(y_test_reg,y_test_reg_prediction)

print("The model performance for testing set")
print("--------------------------------------")
print('RMSE is {}'.format(rmse))
print('R2 score is {}'.format(r2))



In [None]:
# Lasso regression
mae_train_lasso = mean_absolute_error(y_train_lasso_prediction, y_train_reg)

mae_test_lasso = mean_absolute_error(y_test_lasso_prediction, y_test_reg)

print('MAE for the training set : '+str(mae_train_lasso))

print('MAE for the testing set : '+str(mae_test_lasso))

In [None]:
# Ridge regression
mae_train_ridge = mean_absolute_error(y_train_ridge_prediction, y_train_reg)

mae_test_ridge = mean_absolute_error(y_test_ridge_prediction, y_test_reg)

print('MAE for the training set : '+str(mae_train_ridge))

print('MAE for the testing set : '+str(mae_test_ridge))

In [None]:
# ElasticNet regression
mae_train_elasticnet = mean_absolute_error(y_train_elasticnet_prediction, y_train_reg)

mae_test_elasticnet = mean_absolute_error(y_test_elasticnet_prediction, y_test_reg)

print('MAE for the training set : '+str(mae_train_elasticnet))

print('MAE for the testing set : '+str(mae_test_elasticnet))

In [None]:
for alpha_values in [0.1, 0.2, 0.5, 1, 10] :
  lasso = Lasso(alpha=alpha_values, random_state=123)
  lasso.fit(X_train_reg_norm, y_train_reg)
  print('Alpha = '+str(alpha_values))
  print(lasso.coef_)

In [None]:
for alpha_values in [0.1, 1, 10, 10000000000] :
  ridge = Ridge(alpha=alpha_values, random_state=123)
  ridge.fit(X_train_reg_norm, y_train_reg)
  print('Alpha = '+str(alpha_values))
  print(ridge.coef_)