In [46]:
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import Lasso, Ridge, ElasticNet
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

data = fetch_california_housing(as_frame=True)
df = data.frame
df.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,MedHouseVal
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


In [47]:
X = df[['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude']]
y = df['MedHouseVal']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Regularization
Regularization is a core concept in Machine Learning that helps your models generalize better and avoid overfitting.

Regularization is a technique used to reduce overfitting by adding a penalty to the model’s complexity.
- Without regularization → model learns noise and performs poorly on new data.
- With regularization → model stays simpler, learns general patterns, and performs better on unseen data.

Why Overfitting Happens
- Too many features
- Too complex model
- Too few training examples
- Noise in data

Regularization helps by controlling weights so that the model doesn't fit noise.

Below are types of Regularization:


#### Lasso Regression (L1)
L1 adds a penalty equal to the sum of absolute values of weights
Purpose:
- Reduce Overfitting
- Perform Automatic Feature Selection
- Create a Sparse Model
- Handle High-Dimensional Datasets

L1 makes some weights exactly zero → removes features

In [48]:
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)
prediction = lasso.predict(X_test)
lassoR2Score = r2_score(y_test, prediction)
print('R2 Score for Lasso (L1) Regression', lassoR2Score)

R2 Score for Lasso (L1) Regression 0.5318167610318159


#### Ridge Regression (L2)
L2 regularization adds a penalty equal to the sum of squared weights to the loss function.
Purpose:
- Shrinks large weights
- Reduces overfitting
- Keeps all features but makes their effect smaller

L2 (Ridge) does not remove/reduce the number of features.

In [49]:
ridge = Ridge(alpha=0.1)
ridge.fit(X_train, y_train)
prediction = ridge.predict(X_test)
ridgeR2Score = r2_score(y_test, prediction)
print('R2 Score for Ridge (L2) Regression', ridgeR2Score)

R2 Score for Ridge (L2) Regression 0.5757944553633934


#### Elastic Net (L1 + L2)
Elastic Net adds a penalty to the loss function that is a mix of Lasso and Ridge

Purpose of Elastic Net:
- Keeps groups of correlated features together (Ridge behavior)
- Performs feature selection by shrinking some to zero (Lasso behavior)

When to use Elastic Net?
- Your dataset has many features
- Features are correlated
- You want better stability than Lasso
- You want feature selection + ridge strength

In [50]:
elasticNet = ElasticNet(alpha=0.1)
elasticNet.fit(X_train, y_train)
prediction = elasticNet.predict(X_test)
elasticNetR2Score = r2_score(y_test, prediction)
print('R2 Score for elasticNet (L1 + L2) Regression', elasticNetR2Score)

R2 Score for elasticNet (L1 + L2) Regression 0.5626560643897962


### Summary
| Method          | What it does                | Feature Removal? | Best for                 |
| --------------- | --------------------------- | ---------------- | ------------------------ |
| **Ridge (L2)**  | Shrinks weights             | ❌ No             | Correlated features      |
| **Lasso (L1)**  | Shrinks & sets weights to 0 | ✅ Yes            | Feature selection        |
| **Elastic Net** | Mix of L1 + L2              | ⚠️ Yes (some)    | Many correlated features |
