# Chapter 2.8 and 2.9: Elastic Net

Goal: Understand when Elastic Net combines the benefits of Ridge and Lasso.

### Topics:
- Understanding l1_ratio: blending L1 and L2 penalties
- Comparing Elastic Net to Ridge and Lasso
- Using ElasticNetCV for hyperparameter tuning
- Choosing the right regularization method

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet, ElasticNetCV
from sklearn.metrics import r2_score

## Quick Recap

- **Elastic Net** = L1 (Lasso) + L2 (Ridge) combined
- **l1_ratio** controls the blend:
  - l1_ratio = 0 → Pure Ridge
  - l1_ratio = 1 → Pure Lasso
  - l1_ratio = 0.5 → Equal mix
- When to use: Features are correlated AND you want feature selection

In [None]:
# Load and prepare Auto MPG data (same as previous activity)
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model_year', 'origin', 'car_name']
df = pd.read_csv(url, delim_whitespace=True, names=columns, na_values='?')
df = df.dropna().drop('car_name', axis=1)

X = df.drop('mpg', axis=1)
y = df['mpg']

# Split and scale
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f"Features: {list(X.columns)}")

## Practice

### 1. Fit ElasticNet with l1_ratio=0.5, alpha=1

In [None]:
# Step 1: Create ElasticNet with l1_ratio=0.5 and alpha=1
enet = ElasticNet(alpha=1, l1_ratio=0.5, random_state=42)

# Step 2: Fit on scaled training data
enet.fit(X_train_scaled, y_train)

# Step 3: Display coefficients
coef_df = pd.DataFrame({
    'Feature': X.columns,
    'Elastic Net (0.5)': enet.coef_
})

print(f"Elastic Net R² on test: {enet.score(X_test_scaled, y_test):.4f}")
coef_df

### 2. How many coefficients are zero? Compare to Lasso

In [None]:
# Fit Lasso for comparison
lasso = Lasso(alpha=1, random_state=42)
lasso.fit(X_train_scaled, y_train)

# Count zeros
enet_zeros = (enet.coef_ == 0).sum()
lasso_zeros = (lasso.coef_ == 0).sum()

print(f"Elastic Net (l1_ratio=0.5): {enet_zeros} zero coefficients")
print(f"Lasso: {lasso_zeros} zero coefficients")

**Your observation:** Does Elastic Net eliminate as many features as Lasso?

(Write your answer here)

### 3. Fit with l1_ratio=0.1 (more Ridge-like) - fewer zeros?

In [None]:
# Step 1: Create ElasticNet with l1_ratio=0.1


# Step 2: Fit and count zeros


# Step 3: Print results
print(f"Elastic Net (l1_ratio=0.1) R²: {enet_01.score(X_test_scaled, y_test):.4f}")
print(f"Zero coefficients: {(enet_01.coef_ == 0).sum()}")

### 4. Fit with l1_ratio=0.9 (more Lasso-like) - more zeros?

In [None]:
# Step 1: Create ElasticNet with l1_ratio=0.9


# Step 2: Fit and count zeros


# Step 3: Print results
print(f"Elastic Net (l1_ratio=0.9) R²: {enet_09.score(X_test_scaled, y_test):.4f}")
print(f"Zero coefficients: {(enet_09.coef_ == 0).sum()}")

**Your observation:** Summarize the relationship between l1_ratio and number of zero coefficients:

| l1_ratio | Zero Coefficients | R² |
|----------|-------------------|----|
| 0.1 (Ridge-like) | | |
| 0.5 (balanced) | | |
| 0.9 (Lasso-like) | | |

(Fill in the table with your results)

### 5. Use ElasticNetCV to find optimal parameters

In [None]:
# ElasticNetCV tunes both alpha and l1_ratio
l1_ratios = [0.1, 0.5, 0.7, 0.9, 0.95, 0.99]

enet_cv = ElasticNetCV(l1_ratio=l1_ratios, cv=5, random_state=42)
enet_cv.fit(X_train_scaled, y_train)

print(f"Best alpha: {enet_cv.alpha_:.4f}")
print(f"Best l1_ratio: {enet_cv.l1_ratio_}")
print(f"Test R²: {enet_cv.score(X_test_scaled, y_test):.4f}")

In [None]:
# What features remain?
print("\nFeatures selected by ElasticNetCV:")
for feat, coef in zip(X.columns, enet_cv.coef_):
    status = "KEPT" if coef != 0 else "eliminated"
    print(f"  {feat}: {coef:.4f} ({status})")

### 6. Compare R² scores: OLS vs Ridge vs Lasso vs Elastic Net

In [None]:
# Fit all models with CV-tuned parameters
from sklearn.linear_model import RidgeCV, LassoCV

# OLS
ols = LinearRegression()
ols.fit(X_train_scaled, y_train)

# Ridge CV
ridge_cv = RidgeCV(cv=5)
ridge_cv.fit(X_train_scaled, y_train)

# Lasso CV
lasso_cv = LassoCV(cv=5, random_state=42)
lasso_cv.fit(X_train_scaled, y_train)

# Final comparison
results = pd.DataFrame({
    'Model': ['OLS', 'Ridge', 'Lasso', 'Elastic Net'],
    'Test R²': [
        ols.score(X_test_scaled, y_test),
        ridge_cv.score(X_test_scaled, y_test),
        lasso_cv.score(X_test_scaled, y_test),
        enet_cv.score(X_test_scaled, y_test)
    ],
    'Non-zero Features': [
        len(X.columns),
        len(X.columns),  # Ridge never sets to 0
        (lasso_cv.coef_ != 0).sum(),
        (enet_cv.coef_ != 0).sum()
    ]
})

results

**Your recommendation:** Based on this comparison, which model would you choose for this dataset? Consider:
- Performance (R²)
- Simplicity (number of features)
- Interpretability

(Write your recommendation here)

## Summary: When to Use Each Method

| Method | Best When... | Key Characteristic |
|--------|-------------|--------------------|
| **OLS** | Features are independent, no overfitting concern | No regularization |
| **Ridge** | Features are correlated, want to keep all features | Shrinks but keeps all |
| **Lasso** | Want automatic feature selection, sparse model | Sets some to zero |
| **Elastic Net** | Features are correlated AND want selection | Best of both worlds |

## Discussion Question

A colleague says "Just use Elastic Net for everything since it combines Ridge and Lasso." Do you agree? What are the trade-offs?

(Discuss with a neighbor)