# Elastic Net Regression

`Elastic Net Regression` is a **regularized linear regression technique** that combines both
**L1 (Lasso)** and **L2 (Ridge)** penalties in a single model.

It is designed to overcome the limitations of using Lasso or Ridge alone, especially when
features are **highly correlated** and feature selection is still desired.

---

## Why Elastic Net is Needed

- **Lasso (L1)** performs feature selection but struggles with correlated features  
- **Ridge (L2)** handles correlated features well but cannot eliminate irrelevant ones  

Elastic Net combines the strengths of both approaches:
- Feature selection (from L1)  
- Stability with correlated features (from L2)

---

## Elastic Net Cost Function

The Elastic Net objective function is:

$$
J(\theta) =
\frac{1}{m} \sum_{i=1}^{m} (\hat{y}^{(i)} - y^{(i)})^2
+ \lambda \left[
\alpha \sum_{j=1}^{p} |\theta_j|
+ (1 - \alpha) \sum_{j=1}^{p} \theta_j^2
\right]
$$

Where:
- $(\lambda)$ = overall regularization strength  
- $(\alpha)$ = mixing parameter between L1 and L2  
- $(\theta_j)$ = model coefficients (bias term excluded)  
- $(p)$ = number of features  

---

## Role of the Mixing Parameter $(\alpha)$

| Value of $(\alpha)$ | Behavior |
|--------------------|----------|
| $(\alpha = 1)$ | Equivalent to Lasso (L1) |
| $(\alpha = 0)$ | Equivalent to Ridge (L2) |
| $(0 < \alpha < 1)$ | Combination of L1 and L2 |

This flexibility allows Elastic Net to adapt to different data characteristics.

---

## Key Characteristics

- Combines **L1 and L2 regularization**  
- Performs feature selection  
- Handles multicollinearity effectively  
- Produces sparse yet stable models  
- More robust than Lasso when features are correlated  

---

## Importance of Feature Scaling

Feature scaling is essential for Elastic Net because:
- Both penalties depend on coefficient magnitude  
- Unscaled features can dominate regularization  

Standardization should always be applied before Elastic Net.

---

## Elastic Net vs Ridge vs Lasso

| Aspect | Ridge | Lasso | Elastic Net |
|------|-------|-------|-------------|
| L1 penalty | No | Yes | Yes |
| L2 penalty | Yes | No | Yes |
| Feature selection | No | Yes | Yes |
| Handles correlated features | Good | Poor | Excellent |

---

## When to Use Elastic Net

Elastic Net is preferred when:
- Features are highly correlated  
- Feature selection is required  
- Dataset has many predictors  
- Both bias and variance need control  

---

## Summary

Elastic Net Regression combines the strengths of Ridge and Lasso by applying both L1 and L2 penalties.
It provides a balanced approach to regularization, offering feature selection, stability, and improved
generalization for complex, high-dimensional datasets.


In [1]:
%%capture
!pip install numpy
!pip install pandas
!pip install matplotlib
!pip install seaborn


In [2]:
from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression,Ridge,Lasso,ElasticNet
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score


In [3]:
X,y = load_diabetes(return_X_y=True)
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=2)

In [4]:
# Linear Regression
reg = LinearRegression()
reg.fit(X_train,y_train)
y_pred = reg.predict(X_test)
r2_score(y_test,y_pred)

0.4399338661568968

In [5]:
# Ridge 
reg = Ridge(alpha=0.1)
reg.fit(X_train,y_train)
y_pred = reg.predict(X_test)
r2_score(y_test,y_pred)

0.45199494197195456

In [6]:
# Lasso
reg = Lasso(alpha=0.01)
reg.fit(X_train,y_train)
y_pred = reg.predict(X_test)
r2_score(y_test,y_pred)

0.44111855963110613

In [7]:
# ElasticNet
reg = ElasticNet(alpha=0.005,l1_ratio=0.9)
reg.fit(X_train,y_train)
y_pred = reg.predict(X_test)
r2_score(y_test,y_pred)

0.4531474541554823