Your name:

“Linear regression tries to use every feature, even random ones.”

“Ridge says: use them, but don’t go crazy (shrinks weights).”

“Lasso says: some features aren’t needed (sets some weights to zero).”

Show the coefficient table and point at the random_* columns:

Ridge: usually small but not zero

Lasso: many become exactly 0

In [1]:
# Regularization demo: Linear Regression vs Ridge vs Lasso
# Paste into a Jupyter notebook cell

import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.metrics import mean_squared_error, r2_score

np.random.seed(42)


Create Dataset


In [2]:
# Story: predicting "final exam score" from lots of signals:
# - 3 real signals (matter)
# - 2 copycat signals (highly correlated with real ones)
# - 10 random signals (noise; don't matter)

n = 250

# real signals
hours_studied = np.random.normal(10, 3, n)
sleep_hours   = np.random.normal(7, 1, n)
attendance    = np.random.normal(0.9, 0.05, n)

# copycat signals (very similar to real ones)
hours_studied_copy = hours_studied + np.random.normal(0, 0.5, n)
attendance_copy    = attendance + np.random.normal(0, 0.02, n)

# noise signals (random, not truly related)
noise = np.random.normal(0, 1, (n, 10))
noise_cols = [f"random_{i}" for i in range(noise.shape[1])]

# target: final score (a linear-ish truth + noise)
y = (
    5.0 * hours_studied
    + 3.0 * sleep_hours
    + 60.0 * attendance
    + np.random.normal(0, 8, n)   # real-world randomness
)

X = pd.DataFrame({
    "hours_studied": hours_studied,
    "sleep_hours": sleep_hours,
    "attendance": attendance,
    "hours_studied_copy": hours_studied_copy,
    "attendance_copy": attendance_copy,
})

# add noise columns
for i, col in enumerate(noise_cols):
    X[col] = noise[:, i]

X.head()


Unnamed: 0,hours_studied,sleep_hours,attendance,hours_studied_copy,attendance_copy,random_0,random_1,random_2,random_3,random_4,random_5,random_6,random_7,random_8,random_9
0,11.490142,5.739116,0.946309,11.028526,0.974296,0.33088,0.833529,-1.993736,0.374057,1.227669,-1.209641,1.672572,0.419019,-0.705012,-0.055769
1,9.585207,7.917862,0.995471,8.909365,1.013964,0.558327,0.076005,0.538756,-0.920674,0.169361,-1.413714,-0.111226,-0.903908,-0.73553,1.236093
2,11.943066,9.122156,0.830072,11.455129,0.831264,1.09131,0.609138,-1.092313,-0.316408,1.213098,0.141717,2.31933,0.393318,0.192049,-0.309116
3,14.56909,8.032465,0.928148,15.09591,0.91521,0.133541,-0.15247,0.708109,0.956702,-0.785989,-1.331233,-1.836205,0.507991,-1.103367,-2.152891
4,9.29754,5.48063,0.867468,8.82284,0.881432,0.388579,2.493,-0.006071,0.838491,0.081829,-0.09889,0.919076,-0.290275,0.267392,0.321698


create train, test, split for modeling

In [3]:
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)

X_train.shape, X_test.shape


((187, 15), (63, 15))

evaluate model function

In [4]:
def evaluate_model(name, model, X_train, X_test, y_train, y_test):
    model.fit(X_train, y_train)
    preds = model.predict(X_test)
    rmse = np.sqrt(mean_squared_error(y_test, preds))
    r2 = r2_score(y_test, preds)
    return {"model": name, "RMSE": rmse, "R2": r2}

results = []


Basic Linear Regression

In [5]:
# 2) Plain Linear Regression (no regularization)
lr = Pipeline([
    ("scaler", StandardScaler()),       # scaling helps Ridge/Lasso comparisons; OK for LR too
    ("model", LinearRegression())
])

results.append(evaluate_model("Linear Regression", lr, X_train, X_test, y_train, y_test))
pd.DataFrame(results)


Unnamed: 0,model,RMSE,R2
0,Linear Regression,7.738933,0.785373


Ridge Regression (L2 regularization)

In [6]:
# 3) Ridge Regression (L2 regularization)
# alpha controls how strong the "press softly" rule is
ridge = Pipeline([
    ("scaler", StandardScaler()),
    ("model", Ridge(alpha=10.0))
])

results.append(evaluate_model("Ridge (alpha=10)", ridge, X_train, X_test, y_train, y_test))
pd.DataFrame(results)


Unnamed: 0,model,RMSE,R2
0,Linear Regression,7.738933,0.785373
1,Ridge (alpha=10),8.180044,0.760209


Lasso Regression (L1 regularization)

In [7]:
# 4) Lasso Regression (L1 regularization)
# alpha controls how strong the "put some crayons away" rule is
lasso = Pipeline([
    ("scaler", StandardScaler()),
    ("model", Lasso(alpha=0.15, max_iter=20000))
])

results.append(evaluate_model("Lasso (alpha=0.15)", lasso, X_train, X_test, y_train, y_test))
pd.DataFrame(results).sort_values("RMSE")


Unnamed: 0,model,RMSE,R2
0,Linear Regression,7.738933,0.785373
2,Lasso (alpha=0.15),7.856153,0.778822
1,Ridge (alpha=10),8.180044,0.760209


Coefficients

✅ Plain Linear Regression

Gives non-zero coefficients to many random features

It is “overthinking” the noise

✅ Ridge

Keeps all the random features

But shrinks their coefficients closer to 0

✅ Lasso

Often makes many of these exactly 0

Basically says: “These random features are useless — I’m deleting them.”

In [8]:
# 5) Look at coefficients to SEE the difference
# (Ridge shrinks them; Lasso often makes some exactly 0)

def get_coefs(pipeline, feature_names):
    # pipeline: scaler -> model
    model = pipeline.named_steps["model"]
    return pd.Series(model.coef_, index=feature_names).sort_values(key=np.abs, ascending=False)

# Fit all models so coefs exist
lr.fit(X_train, y_train)
ridge.fit(X_train, y_train)
lasso.fit(X_train, y_train)

coefs = pd.DataFrame({
    "LinearRegression": get_coefs(lr, X.columns),
    "Ridge": get_coefs(ridge, X.columns),
    "Lasso": get_coefs(lasso, X.columns),
})

coefs


Unnamed: 0,LinearRegression,Ridge,Lasso
attendance,2.974969,2.323002,1.676079
attendance_copy,-1.22518,-0.550141,-0.0
hours_studied,19.815792,9.910076,16.177032
hours_studied_copy,-3.557469,5.927831,0.0
random_0,0.891462,1.030587,0.771444
random_1,0.000503,0.081726,-0.0
random_2,0.358576,0.131595,0.124575
random_3,0.769473,0.885073,0.655195
random_4,-0.758828,-0.69775,-0.603857
random_5,0.000504,-0.001139,-0.0


Analysis

In [9]:
# 6) How many features did Lasso "turn off"?
num_zero = (coefs["Lasso"].abs() < 1e-8).sum()
num_total = coefs.shape[0]
print(f"Lasso set {num_zero} out of {num_total} coefficients to (near) zero.")


Lasso set 5 out of 15 coefficients to (near) zero.


Experiment with alphas

In [10]:
# 7) Optional: Try different alphas quickly
# Bigger alpha -> more shrinkage for Ridge, more zeros for Lasso

alphas = [0.01, 0.1, 1.0, 10.0, 50.0]

grid_results = []
for a in alphas:
    ridge_a = Pipeline([("scaler", StandardScaler()), ("model", Ridge(alpha=a))])
    lasso_a = Pipeline([("scaler", StandardScaler()), ("model", Lasso(alpha=a, max_iter=20000))])
    grid_results.append(evaluate_model(f"Ridge alpha={a}", ridge_a, X_train, X_test, y_train, y_test))
    grid_results.append(evaluate_model(f"Lasso alpha={a}", lasso_a, X_train, X_test, y_train, y_test))

pd.DataFrame(grid_results).sort_values("RMSE").reset_index(drop=True)


Unnamed: 0,model,RMSE,R2
0,Ridge alpha=0.01,7.740578,0.785282
1,Ridge alpha=0.1,7.755163,0.784472
2,Lasso alpha=0.01,7.758988,0.78426
3,Lasso alpha=0.1,7.846979,0.779339
4,Ridge alpha=1.0,7.874694,0.777777
5,Ridge alpha=10.0,8.180044,0.760209
6,Lasso alpha=1.0,8.274399,0.754645
7,Ridge alpha=50.0,8.504865,0.740787
8,Lasso alpha=10.0,12.771176,0.415502
9,Lasso alpha=50.0,16.859616,-0.01863


Answer these questions:

1) Explain regularization as applied to LR in a few sentences.
2) How about Lasso Regularization?
3) Ridge?
4) What does it mean to have features standardized?

Submit: Notebook with answers, add to github repository