<a href="https://colab.research.google.com/github/plthomps/CIS-3902-Data-Mining/blob/main/Elastic_Net_Simple.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Elastic Net for Linear Regression
Prepared by Dr. Pamela Thompson

This notebook answers:
**What is Elastic Net used for in linear regression?**

Elastic Net is used when you want a regression model that:
- **doesn't overfit as easily** (keeps coefficients from getting too large), and
- can **set some coefficients to 0** (drops less useful features), and
- works well when some features are **correlated** (similar to each other).

Elastic Net combines two regularization ideas:
- **Ridge (L2)**: shrinks coefficients
- **Lasso (L1)**: can shrink some coefficients all the way to **0**

We'll compare four models:
1) Linear Regression (no regularization)
2) Ridge
3) Lasso
4) Elastic Net


## 1) Imports

In [1]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score, mean_squared_error

from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet, ElasticNetCV


## 2) Load a real dataset with real feature names
We'll try to load the **California Housing** dataset (real feature names like `MedInc`, `HouseAge`, etc.).

If that download isn't available in your environment, we fall back to the **Diabetes** dataset.

> Either way: you'll get a real dataset with named features.


In [5]:
from sklearn.datasets import fetch_california_housing, load_diabetes

try:
    housing = fetch_california_housing(as_frame=True)
    X = housing.data
    y = housing.target #this is already set by scikit learn to be median house value in 100,000
    dataset_name = "California Housing"
except Exception as e:
    diabetes = load_diabetes(as_frame=True)
    X = diabetes.data
    y = diabetes.target
    dataset_name = "Diabetes (fallback)"
    print("California Housing couldn't be fetched, using Diabetes dataset instead.")
    print("Reason:", e)

print("Dataset:", dataset_name)
print("X shape:", X.shape)
print("y shape:", y.shape)

X.head()


Dataset: California Housing
X shape: (20640, 8)
y shape: (20640,)


Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25


In [6]:
y.head()

Unnamed: 0,MedHouseVal
0,4.526
1,3.585
2,3.521
3,3.413
4,3.422


## 3) Train/test split
We'll use 75% of the data for training and 25% for testing.


In [7]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)

X_train.shape, X_test.shape


((15480, 8), (5160, 8))

## 4) Standardize the features (IMPORTANT)
Regularization methods (Ridge/Lasso/Elastic Net) assume features are on similar scales.

- We **fit** the scaler on the training data
- Then **transform** both train and test using the same scaler


StandardScaler() transforms each feature so that:

mean = 0

standard deviation = 1

This is called z-score standardization (or normalization).

In [8]:
scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled  = scaler.transform(X_test)

X_train_scaled[:3]


array([[ 0.17648852,  0.66640687, -0.06085431, -0.2811182 , -0.49654414,
        -0.04828325, -0.8607415 ,  0.73099911],
       [ 0.77137436,  1.06288858,  0.38352076, -0.02037756, -0.41167187,
        -0.05872947,  0.70916822, -1.19670983],
       [ 0.02142423,  0.58711053,  0.27852339, -0.08462537, -0.62691494,
        -0.07283798,  1.31370062, -1.55128842]])

## 5) Fit each model (no pipelines, no helper functions - simplified code for our class example)
I'll write out the steps each time so it's easy to follow.

We'll also compute:
- **R²** (higher is better)
- **RMSE** (lower is better)


In [9]:
# --- 1) Linear Regression (no regularization) ---
lr = LinearRegression()
lr.fit(X_train_scaled, y_train)
lr_pred = lr.predict(X_test_scaled)

lr_r2 = r2_score(y_test, lr_pred)
lr_rmse = np.sqrt(mean_squared_error(y_test, lr_pred))


# --- 2) Ridge Regression (L2) ---
ridge = Ridge(alpha=1.0)
ridge.fit(X_train_scaled, y_train)
ridge_pred = ridge.predict(X_test_scaled)

ridge_r2 = r2_score(y_test, ridge_pred)
ridge_rmse = np.sqrt(mean_squared_error(y_test, ridge_pred))


# --- 3) Lasso Regression (L1) ---
lasso = Lasso(alpha=0.01, max_iter=10000)
lasso.fit(X_train_scaled, y_train)
lasso_pred = lasso.predict(X_test_scaled)

lasso_r2 = r2_score(y_test, lasso_pred)
lasso_rmse = np.sqrt(mean_squared_error(y_test, lasso_pred))


# --- 4) Elastic Net (L1 + L2) ---
enet = ElasticNet(alpha=0.01, l1_ratio=0.5, max_iter=10000)
enet.fit(X_train_scaled, y_train)
enet_pred = enet.predict(X_test_scaled)

enet_r2 = r2_score(y_test, enet_pred)
enet_rmse = np.sqrt(mean_squared_error(y_test, enet_pred))


# Show results
results = pd.DataFrame({
    "Model": ["Linear Regression", "Ridge (L2)", "Lasso (L1)", "Elastic Net"],
    "R2": [lr_r2, ridge_r2, lasso_r2, enet_r2],
    "RMSE": [lr_rmse, ridge_rmse, lasso_rmse, enet_rmse]
}).sort_values("RMSE")

results


Unnamed: 0,Model,R2,RMSE
2,Lasso (L1),0.595146,0.731922
3,Elastic Net,0.594301,0.732685
1,Ridge (L2),0.591075,0.735593
0,Linear Regression,0.591051,0.735615


## 6) Compare coefficients (this is where Elastic Net is interesting)
Coefficients show what each model learned.

- Ridge usually keeps **all** features but shrinks them.
- Lasso can push some coefficients to **exactly 0** (drops features).
- Elastic Net can do a bit of both: shrink and drop.


In [10]:
feature_names = list(X.columns)

coef_df = pd.DataFrame({
    "Feature": feature_names,
    "LinearRegression": lr.coef_,
    "Ridge": ridge.coef_,
    "Lasso": lasso.coef_,
    "ElasticNet": enet.coef_
})

# Count how many are exactly zero (feature selection)
zero_counts = {
    "LinearRegression": int((coef_df["LinearRegression"] == 0).sum()),
    "Ridge": int((coef_df["Ridge"] == 0).sum()),
    "Lasso": int((coef_df["Lasso"] == 0).sum()),
    "ElasticNet": int((coef_df["ElasticNet"] == 0).sum()),
}

print("How many coefficients are exactly 0?")
for k, v in zero_counts.items():
    print(f"{k}: {v}")

# Show biggest coefficients (by Elastic Net magnitude)
coef_df["ElasticNet_abs"] = coef_df["ElasticNet"].abs()
coef_df.sort_values("ElasticNet_abs", ascending=False).head(15)


How many coefficients are exactly 0?
LinearRegression: 0
Ridge: 0
Lasso: 1
ElasticNet: 1


Unnamed: 0,Feature,LinearRegression,Ridge,Lasso,ElasticNet,ElasticNet_abs
0,MedInc,0.852108,0.852043,0.79794,0.821016,0.821016
6,Latitude,-0.893147,-0.892347,-0.788125,-0.786344,0.786344
7,Longitude,-0.86784,-0.867029,-0.755196,-0.756509,0.756509
3,AveBedrms,0.348606,0.348323,0.2119,0.263052,0.263052
2,AveRooms,-0.302106,-0.301876,-0.166669,-0.221033,0.221033
1,HouseAge,0.120655,0.120738,0.125013,0.127903,0.127903
5,AveOccup,-0.041164,-0.041168,-0.030794,-0.036172,0.036172
4,Population,-0.001645,-0.001617,-0.0,0.0,0.0


## 7) The two Elastic Net parameters (some call them knobs): `alpha` and `l1_ratio`
Elastic Net has two main settings:

### `alpha`
- how strong the regularization is overall
- bigger alpha = more shrinkage = simpler model

### `l1_ratio`
- mixes Lasso and Ridge
- `l1_ratio = 1.0` this is pure Lasso
- `l1_ratio = 0.0` this is pure Ridge
- `l1_ratio = 0.5` this is a mix of both

Let's change **only** `l1_ratio` and see what happens.


In [11]:
l1_ratios = [0.0, 0.25, 0.5, 0.75, 1.0]
rows = []

for r in l1_ratios:
    model = ElasticNet(alpha=0.01, l1_ratio=r, max_iter=10000)
    model.fit(X_train_scaled, y_train)
    pred = model.predict(X_test_scaled)

    rows.append({
        "l1_ratio": r,
        "R2": r2_score(y_test, pred),
        "RMSE": np.sqrt(mean_squared_error(y_test, pred)),
        "Num_Zero_Coefs": int((model.coef_ == 0).sum())
    })

pd.DataFrame(rows).sort_values("RMSE")


  model = cd_fast.enet_coordinate_descent(


Unnamed: 0,l1_ratio,R2,RMSE,Num_Zero_Coefs
4,1.0,0.595146,0.731922,1
3,0.75,0.594817,0.73222,1
2,0.5,0.594301,0.732685,1
1,0.25,0.593636,0.733285,1
0,0.0,0.592821,0.734021,0


## 8) (Optional) Best practice: let the computer choose with cross-validation
Instead of guessing `alpha` and `l1_ratio`, we can search for good values using cross-validation.

This is often what people do in real projects.


Cross-validation is a way to test your model many times so you don’t get lucky (or unlucky) with just one train/test split.

Simple analogy

Imagine you are studying for a big exam.

If you only take one practice test, you might get questions that are unusually easy or unusually hard.

If you take several practice tests, you get a much better sense of how prepared you really are.

Cross-validation works the same way for a machine learning model.

In [12]:
enet_cv = ElasticNetCV(
    l1_ratio=[0.1, 0.3, 0.5, 0.7, 0.9],
    cv=5,
    max_iter=20000,
    random_state=42
)

enet_cv.fit(X_train_scaled, y_train)
pred = enet_cv.predict(X_test_scaled)

print("Best alpha:", enet_cv.alpha_)
print("Best l1_ratio:", enet_cv.l1_ratio_)
print("Test R2:", r2_score(y_test, pred))
print("Test RMSE:", np.sqrt(mean_squared_error(y_test, pred)))

best_coefs = pd.Series(enet_cv.coef_, index=feature_names).sort_values(key=lambda s: s.abs(), ascending=False)
best_coefs.head(15)


Best alpha: 0.0028944501141756913
Best l1_ratio: 0.9
Test R2: 0.5931719507185191
Test RMSE: 0.7337044639333614


Unnamed: 0,0
Latitude,-0.861781
MedInc,0.83789
Longitude,-0.83455
AveBedrms,0.312008
AveRooms,-0.266019
HouseAge,0.122522
AveOccup,-0.038573
Population,-0.0


## 9) Summary (what to remember)
- **Elastic Net** is used when you want a regularized linear regression that can also do **feature selection**.
- It is especially helpful when features are **correlated**.
- Ridge is stable but usually keeps all features.
- Lasso can drop features but may behave oddly with correlated features.
- Elastic Net is often a good "middle ground".
