<a href="https://colab.research.google.com/github/kKawsarAlam/Regression-Analysis/blob/main/L1_%26_L2_Regularization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Lasso Regression(L1 regularization)**   

Lasso Regression (Least Absolute Shrinkage and Selection Operator) is a type of linear regression that adds a penalty term (L1 regularization) to the cost function, shrinking less important feature coefficients (weights) to exactly zero, effectively performing variable selection and preventing overfitting, making it ideal for high-dimensional datasets and improving model interpretability by creating simpler, sparser models.   

Penalty = lemda * sum|weight(B)|  
Formula = sum(Y(actual) - Y(predict))^2 + Penalty  

**Advantages**  
Feature Selection: It removes the need to manually select most important features hence the developed regression model becomes simpler and more explainable.    
Overfitting Control: Shrinks coefficients to prevent the model from memorizing noise.   
Regularization: It constrains large coefficients so a less biased model is generated which is robust and general in its predictions.  
Interpretability: This creates another models helps in making them simpler to understand and explain which is important in fields like healthcare and finance.  
Handles Large Feature Spaces: It is effective in handling high-dimensional data such as images and videos.

In [11]:
# import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sn

In [12]:
# Supports for clean notebook
import warnings
warnings.filterwarnings('ignore')

In [13]:
# Load dataset
df = pd.read_csv('/content/melb_data.csv')
df.head()

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
0,Abbotsford,85 Turner St,2,h,1480000.0,S,Biggin,3/12/2016,2.5,3067.0,...,1.0,1.0,202.0,,,Yarra,-37.7996,144.9984,Northern Metropolitan,4019.0
1,Abbotsford,25 Bloomburg St,2,h,1035000.0,S,Biggin,4/02/2016,2.5,3067.0,...,1.0,0.0,156.0,79.0,1900.0,Yarra,-37.8079,144.9934,Northern Metropolitan,4019.0
2,Abbotsford,5 Charles St,3,h,1465000.0,SP,Biggin,4/03/2017,2.5,3067.0,...,2.0,0.0,134.0,150.0,1900.0,Yarra,-37.8093,144.9944,Northern Metropolitan,4019.0
3,Abbotsford,40 Federation La,3,h,850000.0,PI,Biggin,4/03/2017,2.5,3067.0,...,2.0,1.0,94.0,,,Yarra,-37.7969,144.9969,Northern Metropolitan,4019.0
4,Abbotsford,55a Park St,4,h,1600000.0,VB,Nelson,4/06/2016,2.5,3067.0,...,1.0,2.0,120.0,142.0,2014.0,Yarra,-37.8072,144.9941,Northern Metropolitan,4019.0


In [14]:
# Check unique features
df.nunique()

Unnamed: 0,0
Suburb,314
Address,13378
Rooms,9
Type,3
Price,2204
Method,5
SellerG,268
Date,58
Distance,202
Postcode,198


In [15]:
# Check row and columns
df.shape

(13580, 21)

In [16]:
# Useable columns
col_use = ['Suburb', 'Rooms', 'Type', 'Method', 'SellerG', 'Regionname', 'Propertycount', 'Distance', 'Bedroom2', 'Car', 'Landsize', 'CouncilArea', 'Bathroom', 'BuildingArea', 'Price']
df = df[col_use]
df.head()

Unnamed: 0,Suburb,Rooms,Type,Method,SellerG,Regionname,Propertycount,Distance,Bedroom2,Car,Landsize,CouncilArea,Bathroom,BuildingArea,Price
0,Abbotsford,2,h,S,Biggin,Northern Metropolitan,4019.0,2.5,2.0,1.0,202.0,Yarra,1.0,,1480000.0
1,Abbotsford,2,h,S,Biggin,Northern Metropolitan,4019.0,2.5,2.0,0.0,156.0,Yarra,1.0,79.0,1035000.0
2,Abbotsford,3,h,SP,Biggin,Northern Metropolitan,4019.0,2.5,3.0,0.0,134.0,Yarra,2.0,150.0,1465000.0
3,Abbotsford,3,h,PI,Biggin,Northern Metropolitan,4019.0,2.5,3.0,1.0,94.0,Yarra,2.0,,850000.0
4,Abbotsford,4,h,VB,Nelson,Northern Metropolitan,4019.0,2.5,3.0,2.0,120.0,Yarra,1.0,142.0,1600000.0


In [17]:
df.shape

(13580, 15)

In [18]:
# Check the null value of dataset
df.isnull().sum()

Unnamed: 0,0
Suburb,0
Rooms,0
Type,0
Method,0
SellerG,0
Regionname,0
Propertycount,0
Distance,0
Bedroom2,0
Car,62


In [19]:
# Fill up the car column by zero
col_fill_zero = ['Car']
df[col_fill_zero] = df[col_fill_zero].fillna(0)
df.isnull().sum()

Unnamed: 0,0
Suburb,0
Rooms,0
Type,0
Method,0
SellerG,0
Regionname,0
Propertycount,0
Distance,0
Bedroom2,0
Car,0


In [20]:
# Fill up the BuildingArea column by the mean value
df['BuildingArea'] = df['BuildingArea'].fillna(df.BuildingArea.mean())
df.isnull().sum()

Unnamed: 0,0
Suburb,0
Rooms,0
Type,0
Method,0
SellerG,0
Regionname,0
Propertycount,0
Distance,0
Bedroom2,0
Car,0


In [21]:
df.dropna(inplace=True)
df.isnull().sum()

Unnamed: 0,0
Suburb,0
Rooms,0
Type,0
Method,0
SellerG,0
Regionname,0
Propertycount,0
Distance,0
Bedroom2,0
Car,0


In [22]:
# Converts categorical variables into one-hot encoded columns.
df = pd.get_dummies(df, drop_first=True).astype(int)
df.head()

Unnamed: 0,Rooms,Propertycount,Distance,Bedroom2,Car,Landsize,Bathroom,BuildingArea,Price,Suburb_Aberfeldie,...,CouncilArea_Moreland,CouncilArea_Nillumbik,CouncilArea_Port Phillip,CouncilArea_Stonnington,CouncilArea_Unavailable,CouncilArea_Whitehorse,CouncilArea_Whittlesea,CouncilArea_Wyndham,CouncilArea_Yarra,CouncilArea_Yarra Ranges
0,2,4019,2,2,1,202,1,151,1480000,0,...,0,0,0,0,0,0,0,0,1,0
1,2,4019,2,2,0,156,1,79,1035000,0,...,0,0,0,0,0,0,0,0,1,0
2,3,4019,2,3,0,134,2,150,1465000,0,...,0,0,0,0,0,0,0,0,1,0
3,3,4019,2,3,1,94,2,151,850000,0,...,0,0,0,0,0,0,0,0,1,0
4,4,4019,2,3,2,120,1,142,1600000,0,...,0,0,0,0,0,0,0,0,1,0


In [23]:
X = df.drop('Price', axis=1)
y = df['Price']

In [24]:
# Seperate Train and Test
from sklearn.model_selection import train_test_split
train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.3, random_state=2)

In [25]:
# Fit the LinearRegression model
from sklearn.linear_model import LinearRegression
model = LinearRegression().fit(train_X, train_y)

In [26]:
# Training score
model.score(train_X, train_y)

0.7119887034735544

In [27]:
# Testing score
model.score(test_X, test_y)

0.6569196517967659

Therefore, the model performs well on the training dataset but poorly on the testing dataset, indicating overfitting."

In [81]:
# Lasso regression

from sklearn import linear_model
lasso_reg = linear_model.Lasso(alpha=50, max_iter=100, tol=0.1)
lasso_reg.fit(train_X, train_y)

In [85]:
# Training score
lasso_reg.score(train_X, train_y)

0.7074447115090474

In [83]:
# Testing score
lasso_reg.score(test_X, test_y)

0.6635483386303518

**Ridge regression (L2 Regularization)**  

Ridge regression is a regularization technique used to improve linear regression models by reducing overfitting and handling multicollinearity (high correlation between features). It works by adding an L2 penalty (the sum of squared coefficients) to the cost function, which shrinks coefficients towards zero without eliminating them entirely.     

Penalty = lemda * sum(weight(B))^2  
Formula = sum(Y(actual) - Y(predict))^2 + Penalty  

**Advantages**  
Overfitting Control: Shrinks coefficients to prevent the model from memorizing noise.  
Correlation Support: Handles correlated predictors more effectively than linear regression.  
Better Generalization: Produces stable predictions on new, unseen data.  
Feature Retention: Keeps all features in the model instead of dropping any unlike Lasso.  

**Limitations**  
No Feature Selection: Coefficients are shrunk but never reduced to exact zero.  
Hyperparameter Sensitivity: Requires careful λ (alpha) tuning for best performance.  
Irrelevant Feature Impact: May still be affected when many inputs add no useful information.  
Reduced Interpretability: Heavy shrinkage can obscure the true effect of predictors.  
Poor Fit for Sparse Models: Not ideal when only a few predictors truly matter.

In [72]:
# Ridge regression
from sklearn.linear_model import Ridge
ridge_reg = linear_model.Ridge(alpha=50, max_iter=100, tol=0.1)
ridge_reg.fit(train_X, train_y)

In [73]:
# Train data accuracy
ridge_reg.score(train_X, train_y)

0.6764768915313042

In [74]:
# Test data accuracy
ridge_reg.score(test_X, test_y)

0.6642582722337234

Good performance on both the training and testing datasets indicates reduced overfitting.