The initial steps remain the same, where we import the necessary libraries, separate the dependent and independent variables, and split the dataset.

L2 Regularization: Ridge Regularization or Tikhonov regularization:
- Just like L1 regularization, L2 regularization also addresses overfitting by shrinking the model’s coefficients ensuring they remain small.
- Unlike L1, which reduces the coefficients to absolute zero, L2 keeps them close to zero thereby retaining all the features in the model.
- With L2 regularization, we add the sum of squares of all the coefficients to the loss function. This is weighted by the regularization parameter alpha. 
- Added term (alpha m square) - which penalizes the loss function for large coefficients.
- How does L2 work? L2 regularization addresses multicollinearity and prevents overfitting by shrinking coefficient magnitudes without excluding any features i.e. no feature reduction! 

In [1]:
# import the necessary libraries
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.linear_model import RidgeCV, Ridge

In [2]:
os.chdir('C:/Users/Admin/Downloads')

In [3]:
# read the dataset
df = pd.read_csv("Energy_Efficiency_Overfit_Dataset_Updated.csv")

In [4]:
# separate the dependent and independent variables
X = df.drop('Energy_Efficiency_Rating', axis = 1)
y = df['Energy_Efficiency_Rating']

In [5]:
# split the data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# L2 Regularization on Original Dataset

In [6]:
# fti the linear regression model
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)

In [7]:
# Get the coefficients of the features before L2 regularization
coefficients_before_l2 = pd.Series(linear_model.coef_, index=X_train.columns)
coefficients_before_l2

Wall_Area              0.302253
Roof_Area              0.241509
Window_Area            0.259234
Overall_Height        -0.103891
Outdoor_Temperature   -0.080775
Humidity              -0.066741
Noise_Feature_1       -1.157413
Noise_Feature_2        0.456279
Noise_Feature_3       -1.231271
Noise_Feature_4       -0.009997
Noise_Feature_5        0.815266
Noise_Feature_6        1.790095
Noise_Feature_7       -0.289109
Noise_Feature_8        4.026383
Noise_Feature_9        0.855612
Noise_Feature_10      -0.977680
Orientation_East       0.919734
Orientation_North     -0.817762
Orientation_South      1.497093
Orientation_West      -1.599065
Glazing_Type_Type_A    0.264858
Glazing_Type_Type_B    0.139014
Glazing_Type_Type_C   -0.403872
dtype: float64

In [8]:
# Create an array with 20 numbers equally spaced between 0 to 10
alphas = np.linspace(0.1, 10, 20)
alphas

array([ 0.1       ,  0.62105263,  1.14210526,  1.66315789,  2.18421053,
        2.70526316,  3.22631579,  3.74736842,  4.26842105,  4.78947368,
        5.31052632,  5.83157895,  6.35263158,  6.87368421,  7.39473684,
        7.91578947,  8.43684211,  8.95789474,  9.47894737, 10.        ])

In [9]:
# Initialize RidgeCV to find the best alpha for L2 regularization
ridge_cv = RidgeCV(alphas=alphas, cv=10, scoring='r2')
ridge_cv.fit(X_train, y_train)

In [10]:
# Find the best alpha value
best_alpha_ridge = ridge_cv.alpha_
best_alpha_ridge

10.0

In [11]:
# Create an array with 20 numbers equally spaced between 0 to 10
alphas = np.linspace(10, 30, 20)
# Initialize RidgeCV to find the best alpha for L2 regularization
ridge_cv = RidgeCV(alphas=alphas, cv=10, scoring='r2')
ridge_cv.fit(X_train, y_train)
ridge_cv.alpha_

24.736842105263158

In [12]:
# fit the coefficients after L2 regularization
coefficients_after_l2 = pd.Series(ridge_cv.coef_, index=X_train.columns)

In [13]:
# Compare the coefficients before and after L2 regularization
coefficients_comparison = pd.DataFrame({
    'Standard Regularization': coefficients_before_l2,
    'After L2 Regularization': coefficients_after_l2
})

coefficients_comparison

Unnamed: 0,Standard Regularization,After L2 Regularization
Wall_Area,0.302253,0.300379
Roof_Area,0.241509,0.244928
Window_Area,0.259234,0.259251
Overall_Height,-0.103891,-0.05981
Outdoor_Temperature,-0.080775,-0.090001
Humidity,-0.066741,-0.06619
Noise_Feature_1,-1.157413,-0.426444
Noise_Feature_2,0.456279,0.211002
Noise_Feature_3,-1.231271,-0.358951
Noise_Feature_4,-0.009997,0.359567


In [14]:
# R-squared scores for the Ridge model
r2_train_ridge = ridge_cv.score(X_train, y_train)
r2_test_ridge = ridge_cv.score(X_test, y_test)
r2_train_ridge, r2_test_ridge

(0.944735810754923, 0.8721256088178195)

Here we can see that the ridge model achieved an r squared value of about 0.94 on training set, which is slightly higher than 0.93 achieved in the L1 regularization. When we step into the testing round, the rage model presented an r squared value of approximately 0.87, which is similar to L1 model. This is a slightly more overfitted model compared to L1 regularization. While L1 helped us in feature selection by zeroing out the less important features, L2 kept all the features but reduced the coefficients of irrelevant features.

What if you could apply both L1 and L2 regularization simultaneously in a model? In machine learning, this practical solution is known as Elastic net regularization.