# Chapter 2.7: Ridge and Lasso Regularization

Goal: Compare Ridge vs Lasso, understand coefficient shrinkage and feature selection.

### Topics:
- Scaling features before regularization
- Fitting Ridge and Lasso with different alpha values
- Understanding coefficient shrinkage (Ridge) vs elimination (Lasso)
- Using cross-validation to find optimal alpha

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, Ridge, Lasso, RidgeCV, LassoCV
from sklearn.metrics import r2_score

## Quick Recap

- **Ridge (L2)**: Shrinks all coefficients toward zero, but never exactly zero
- **Lasso (L1)**: Can shrink coefficients to exactly zero → automatic feature selection
- **Alpha**: Controls regularization strength (higher = more shrinkage)
- **Important**: Always scale features before regularization!

In [None]:
# Load Auto MPG dataset
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model_year', 'origin', 'car_name']
df = pd.read_csv(url, delim_whitespace=True, names=columns, na_values='?')

# Drop rows with missing values and car_name (not numeric)
df = df.dropna()
df = df.drop('car_name', axis=1)

df.head()

In [None]:
# Set X to all columns except the target ("mpg"), and y equal to just the target column
...

# Train/test split the data
...


## Practice

### 1. Scale features using StandardScaler (fit on train, transform both)

In [None]:
# Step 1: Create StandardScaler
...

# Step 2: Fit on training data ONLY, then transform both train and test
...

# Verify scaling worked (mean should be ~0, std should be ~1 for train)
...

### 2. Fit LinearRegression, note the coefficients

In [None]:
# Step 1: Fit LinearRegression on scaled data
...

# Step 2: Display coefficients
...


### 3. Fit Ridge(alpha=1), compare coefficients - are they smaller?

In [None]:
# Step 1: Create and fit Ridge with alpha=1
...

# Step 2: Add Ridge coefficients to the comparison DataFrame
...

**Your observation:** Are the Ridge coefficients smaller (closer to 0) than OLS? Which features shrank the most?

(Write your answer here)

### 4. Fit Lasso(alpha=1), which coefficients are exactly 0?

In [None]:
# Step 1: Create and fit Lasso with alpha=1
...

# Step 2: Add Lasso coefficients to the comparison DataFrame
...

### 5. Fit Lasso with alpha=0.1, 0.5, 1, 2 - how do zero coefficients change?

In [None]:
# Compare Lasso with different alpha values
...

**Your observation:** As alpha increases, what happens to the number of zero coefficients? What happens to R²?

(Write your answer here)

### 6. Use `RidgeCV` to find optimal alpha

In [None]:
# Step 1: Create RidgeCV with a range of alphas
...

# Step 2: Fit on training data
...

# Step 3: Get the best alpha
...

### 7. Use `LassoCV` to find optimal alpha

In [None]:
# Step 1: Create LassoCV (it automatically chooses alphas)
...

# Step 2: Fit on training data
...

# Step 3: Get the best alpha and see which features remain
...

### 8. Which regularization method would you choose for this data and why?

In [None]:
# Compare all models to see which is best
...

**Your recommendation:** Based on the results, which method would you choose? Consider:
- Performance (R² scores)
- Interpretability (how many features?)
- Simplicity

(Write your recommendation and reasoning here)