# Chapter 2.8 and 2.9: Elastic Net

Goal: Understand when Elastic Net combines the benefits of Ridge and Lasso.

### Topics:
- Understanding l1_ratio: blending L1 and L2 penalties
- Comparing Elastic Net to Ridge and Lasso
- Using ElasticNetCV for hyperparameter tuning
- Choosing the right regularization method

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet, ElasticNetCV
from sklearn.metrics import r2_score

## Quick Recap

- **Elastic Net** = L1 (Lasso) + L2 (Ridge) combined
- **l1_ratio** controls the blend:
  - l1_ratio = 0 → Pure Ridge
  - l1_ratio = 1 → Pure Lasso
  - l1_ratio = 0.5 → Equal mix
- When to use: Features are correlated AND you want feature selection

In [None]:
# Load and prepare Auto MPG data (same as previous activity)
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model_year', 'origin', 'car_name']
df = pd.read_csv(url, delim_whitespace=True, names=columns, na_values='?')
df = df.dropna().drop('car_name', axis=1)

# Set X to all columns except the target ("mpg"), and y equal to just the target column
...

# Train/test split the data
...


## Practice

### 1. Fit ElasticNet with l1_ratio=0.5, alpha=1

In [None]:
# Step 1: Create ElasticNet with l1_ratio=0.5 and alpha=1
...

# Step 2: Fit on scaled training data
...

# Step 3: Display coefficients
...

### 2. How many coefficients are zero? Compare to Lasso

In [None]:
# Fit Lasso for comparison
...

# How many coefficients did ElasticNet shrink to zero, and how many did Lasso shrink to zero?
...

**Your observation:** Does Elastic Net eliminate as many features as Lasso?

(Write your answer here)

### 3. Fit with l1_ratio=0.1 (more Ridge-like) - fewer zeros?

In [None]:
# Step 1: Create ElasticNet with l1_ratio=0.1
...

# Step 2: Fit and count zeros
...

# Step 3: Print R² results
...

### 4. Fit with l1_ratio=0.9 (more Lasso-like) - more zeros?

In [None]:
# Step 1: Create ElasticNet with l1_ratio=0.9
...

# Step 2: Fit and count zeros
...

# Step 3: Print R² results
...

**Your observation:** Summarize the relationship between l1_ratio and number of zero coefficients:

| l1_ratio | Zero Coefficients | R² |
|----------|-------------------|----|
| 0.1 (Ridge-like) | | |
| 0.5 (balanced) | | |
| 0.9 (Lasso-like) | | |

(Fill in the table with your results)

### 5. Use ElasticNetCV to find optimal parameters

In [None]:
# ElasticNetCV tunes both alpha and l1_ratio. Pick a few l1 ratios between 0 and 1.
...

# Print out which alpha and l1 ratio were best, and the best overall test R²
...

In [None]:
# What features still remain (coefficients didn't shrink to zero)?
...

### 6. Compare R² scores: OLS vs Ridge vs Lasso vs Elastic Net

In [None]:
# Fit all models with CV-tuned parameters
from sklearn.linear_model import RidgeCV, LassoCV

# Fit a linear regression model (no regularization)
...

# Fit a Ridge CV model
...

# Fit a Lasso CV model
...

# Compare all models, which is best?
...

**Your recommendation:** Based on this comparison, which model would you choose for this dataset? Consider:
- Performance (R²)
- Simplicity (number of features)
- Interpretability

(Write your recommendation here)

## Summary: When to Use Each Method

| Method | Best When... | Key Characteristic |
|--------|-------------|--------------------|
| **OLS** | Features are independent, no overfitting concern | No regularization |
| **Ridge** | Features are correlated, want to keep all features | Shrinks but keeps all |
| **Lasso** | Want automatic feature selection, sparse model | Sets some to zero |
| **Elastic Net** | Features are correlated AND want selection | Best of both worlds |