In [None]:
Regularization in Machine Learning is a set of techniques 
used to prevent overfitting by adding a penalty for model complexity.

In [None]:
It keeps the model simple enough to generalize well on new data.

In [None]:
Why Regularization is Needed

In [None]:
When a model is too complex, it can:
Memorize training data
Learn noise instead of pattern
Perform poorly on new data (overfitting)
Regularization controls this by penalizing large weights.

**Core Idea**

In [None]:
Normally, ML models minimize:
Loss=Error(y,y^)

In [None]:
With regularization:
Loss=Error+Regularization Penalty

So the model tries to:
Fit data well
Keep weights small/simple

In [None]:
Intuition
Imagine fitting a curve:
Without regularization → very wiggly curve (overfit)
With regularization → smoother curve (generalizes better)

In [None]:
Types of Regularization

In [None]:
L1 Regularization (Lasso)
Loss=Error+λ∑∣w∣

In [None]:
Key Properties
Can shrink some weights to exactly zero
Performs feature selection
Creates sparse models

In [None]:
Use When
Many irrelevant features
Need feature selection


In [None]:
L2 Regularization (Ridge)
Loss=Error+λ∑w**2

In [None]:
Key Properties
Shrinks weights but rarely zero
Keeps all features
Very stable

In [None]:
Use When
All features useful
Multicollinearity present

In [None]:
Elastic Net
Loss=Error+λ1*∣w∣+λ2*w**2

In [None]:
Best of both worlds
Common in real-world problems
It is used to reduce overfitting, handle multicollinearity, 
and sometimes select features.

In [None]:
Why Elastic Net?
Each method alone has limits:

Lasso (L1)
Feature selection
Struggles when features are highly correlated

Ridge (L2)
Handles multicollinearity well
Does not remove features

Elastic Net
Shrinks coefficients (like Ridge)
Can set some to zero (like Lasso)
Works well with correlated features

In [None]:
Loss=Error+λ1*∑∣w∣+λ2*∑w**2

In [None]:
Or commonly:
Loss=Error+λ(α∣w∣+(1−α)w**2)

In [None]:
Key Parameters
λ (lambda)
Controls regularization strength
Large λ → simpler model
Small λ → less regularization


In [None]:
α (alpha) / l1_ratio
Controls L1 vs L2 mix


| α value | Meaning    |
| ------- | ---------- |
| 0       | Pure Ridge |
| 1       | Pure Lasso |
| 0.5     | Equal mix  |


In [None]:
Role of Lambda
Lambda controls regularization strength:


| λ value | Effect                         |
| ------- | ------------------------------ |
| 0       | No regularization              |
| Small   | Slight penalty                 |
| Large   | Strong penalty → simpler model |


In [None]:
Too large λ → underfitting

In [None]:
Bias–Variance Tradeoff
Increases Bias slightly
Reduces Variance a lot

Result → Better generalization

In [7]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge, Lasso, ElasticNet
data = {
    "size": [800, 900, 1000, 1100, 1200, 1500],
    "bedrooms": [2, 2, 3, 3, 3, 4],
    "age": [20, 18, 15, 12, 10, 5],
    "price": [200, 220, 260, 280, 300, 380]
}

df = pd.DataFrame(data)


In [8]:
X = df.drop("price", axis=1)
y = df["price"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


In [9]:
scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


In [10]:
elnet = ElasticNet(
    alpha=0.5,      # overall strength
    l1_ratio=0.5    # mix of L1 & L2
)

elnet.fit(X_train, y_train)


0,1,2
,alpha,0.5
,l1_ratio,0.5
,fit_intercept,True
,precompute,False
,max_iter,1000
,copy_X,True
,tol,0.0001
,warm_start,False
,positive,False
,random_state,


In [11]:


# Ridge (L2)
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)

# Lasso (L1)
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)


0,1,2
,alpha,0.1
,fit_intercept,True
,precompute,False
,copy_X,True
,max_iter,1000
,tol,0.0001
,warm_start,False
,positive,False
,random_state,
,selection,'cyclic'


In [12]:
predel = elnet.predict(X_test)
print(predel)


[196.97862388 212.79112338]


In [14]:
print("Coefficients:", elnet.coef_)


Coefficients: [ 14.72949209  13.44658142 -14.44966185]


In [15]:
predlas = lasso.predict(X_test)
print(predlas)

[200.27234491 220.25601468]


In [16]:
print("Coefficients:", lasso.coef_)

Coefficients: [37.38602281  8.58853882 -0.        ]


In [17]:
predrid = ridge.predict(X_test)
print(predrid)

[196.35057604 212.24286309]


In [18]:
print("Coefficients:", ridge.coef_)

Coefficients: [ 14.79489383  13.54193983 -14.53125159]
