# Ridge Regression - L2 Regularization

Notebook to demonstrate Ridge Regression 

Dataaset source: https://archive.ics.uci.edu/ml/machine-learning-databases/00242/ 
Metadata:  https://archive.ics.uci.edu/ml/datasets/Energy+efficiency

Both Y1 and Y2 can be used as target variables

In [11]:
import pandas as pd
import numpy as np
from sklearn.linear_model import Ridge
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

In [6]:
enb = pd.read_excel('ENB2012_data.xlsx')

In [7]:
enb.describe()

Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,Y1,Y2
count,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0
mean,0.764167,671.708333,318.5,176.604167,5.25,3.5,0.234375,2.8125,22.307195,24.58776
std,0.105777,88.086116,43.626481,45.16595,1.75114,1.118763,0.133221,1.55096,10.090204,9.513306
min,0.62,514.5,245.0,110.25,3.5,2.0,0.0,0.0,6.01,10.9
25%,0.6825,606.375,294.0,140.875,3.5,2.75,0.1,1.75,12.9925,15.62
50%,0.75,673.75,318.5,183.75,5.25,3.5,0.25,3.0,18.95,22.08
75%,0.83,741.125,343.0,220.5,7.0,4.25,0.4,4.0,31.6675,33.1325
max,0.98,808.5,416.5,220.5,7.0,5.0,0.4,5.0,43.1,48.03


In [8]:
enb.isna().any()

X1    False
X2    False
X3    False
X4    False
X5    False
X6    False
X7    False
X8    False
Y1    False
Y2    False
dtype: bool

In [9]:
enb.dtypes

X1    float64
X2    float64
X3    float64
X4    float64
X5    float64
X6      int64
X7    float64
X8      int64
Y1    float64
Y2    float64
dtype: object

In [19]:
#Select df columns with regex:
X = enb.filter(regex=("X.*"))
y = enb.filter(regex=("Y.*"))

In [20]:
X.describe()

Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8
count,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0
mean,0.764167,671.708333,318.5,176.604167,5.25,3.5,0.234375,2.8125
std,0.105777,88.086116,43.626481,45.16595,1.75114,1.118763,0.133221,1.55096
min,0.62,514.5,245.0,110.25,3.5,2.0,0.0,0.0
25%,0.6825,606.375,294.0,140.875,3.5,2.75,0.1,1.75
50%,0.75,673.75,318.5,183.75,5.25,3.5,0.25,3.0
75%,0.83,741.125,343.0,220.5,7.0,4.25,0.4,4.0
max,0.98,808.5,416.5,220.5,7.0,5.0,0.4,5.0


Scale features before creating Ridge model. Unscaled features would otherwise get penalized differently by the regularization term

In [21]:
scaler = MinMaxScaler()

In [23]:
X_train, X_test, y_train, y_test = train_test_split(X,y)

In [27]:
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_transform(X_test)

Higher alpha implies higher degree of regularization and lower model complexity i.e. sensitivity to overfitting

In [25]:
ridge = Ridge(alpha = 0.01)

In [29]:
ridge.fit(X_train_scaled, y_train)

Ridge(alpha=0.01, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [31]:
print('ridge regression linear model intercept: {}'
     .format(ridge.intercept_))
print('ridge regression linear model coeff:\n{}'
     .format(ridge.coef_))

ridge regression linear model intercept: [32.56721491 37.26474479]
ridge regression linear model coeff:
[[-23.92232896  -9.78678138   0.65071393 -13.55515268  13.54365854
   -0.27271132   8.25410372   1.50208146]
 [-25.31429275 -10.2645866   -1.71793954 -12.34994026  14.44576301
   -0.02577569   5.97931894   0.59718254]]


In [32]:
print('R-squared score (training): {:.3f}'
     .format(ridge.score(X_train_scaled, y_train)))
print('R-squared score (test): {:.3f}'
     .format(ridge.score(X_test_scaled, y_test)))

R-squared score (training): 0.906
R-squared score (test): 0.890


In [33]:
print('Number of non-zero features: {}'
     .format(np.sum(ridge.coef_ != 0)))

Number of non-zero features: 16


Now let's rerun with a higher alpha:

In [35]:
ridge = Ridge(alpha = 10)
ridge.fit(X_train_scaled, y_train)

Ridge(alpha=10, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [36]:
print('ridge regression linear model intercept: {}'
     .format(ridge.intercept_))
print('ridge regression linear model coeff:\n{}'
     .format(ridge.coef_))

ridge regression linear model intercept: [13.14938012 17.23758588]
ridge regression linear model coeff:
[[-1.63547305 -0.58877748  6.59505395 -5.91452305 11.77957282 -0.28546952
   7.15597565  1.36284426]
 [-1.69388823 -0.7838309   5.1915944  -5.08301463 11.73614802 -0.06265203
   5.17339795  0.52664284]]


In [37]:
print('R-squared score (training): {:.3f}'
     .format(ridge.score(X_train_scaled, y_train)))
print('R-squared score (test): {:.3f}'
     .format(ridge.score(X_test_scaled, y_test)))

R-squared score (training): 0.891
R-squared score (test): 0.872


In [38]:
print('Number of non-zero features: {}'
     .format(np.sum(ridge.coef_ != 0)))

Number of non-zero features: 16


Coefficients are much smaller