# Linear Regression & GLMs — Student Lab

Complete all TODOs. Avoid sklearn for core parts.

In [6]:
import numpy as np

def check(name: str, cond: bool):
    if not cond:
        raise AssertionError(f'Failed: {name}')
    print(f'OK: {name}')

rng = np.random.default_rng(0)

## Section 0 — Synthetic Dataset (with collinearity)
We generate data where features can be highly correlated to motivate ridge.

In [7]:
# Collinearity means that some features are highly correlated with each other.
def make_regression(n=400, d=5, noise=0.5, collinear=True):
    X = rng.standard_normal((n, d))
    if collinear and d >= 2:
        X[:, 1] = X[:, 0] * 0.95 + 0.05 * rng.standard_normal(n)
    w_true = rng.standard_normal(d)
    y = X @ w_true + noise * rng.standard_normal(n)
    return X, y, w_true

X, y, w_true = make_regression()
n, d = X.shape
check('shapes', y.shape == (n,))
print('corr(x0,x1)=', np.corrcoef(X[:,0], X[:,1])[0,1])

OK: shapes
corr(x0,x1)= 0.9985704465455699


## Section 1 — OLS Closed Form

### Task 1.1: Closed-form w_hat using solve

# TODO: compute w_hat using solve on (X^T X) w = X^T y
# HINT: `XtX = X.T@X`, `Xty = X.T@y`, `np.linalg.solve(XtX, Xty)`

**Checkpoint:** Why is explicit inverse discouraged?

In [8]:
# OLS = Ordinary Least Squares means minimizing sum of squared errors
# It helps to find out w which minimizes mean squared error between predicted and actual values
# TODO
XtX = X.T @ X # how much feature overlap with each other 
Xty = X.T @ y # how strongly each feature correlates with prices
w_hat = np.linalg.solve(XtX, Xty)
check('w_shape', w_hat.shape == (d,))

OK: w_shape


### Task 1.2: Evaluate fit + residuals
Compute:
- predictions y_pred
- MSE
- residual mean and std

**Interview Angle:** What does a structured residual pattern imply (e.g., nonlinearity)?

In [9]:
# TODO
y_pred = X @ w_hat # predicted values
mse = float(np.mean((y_pred - y) ** 2)) # mean squared error
resid = y_pred - y   # residuals

# if i have low mse = good fit

print('mse', mse, 'resid_mean', resid.mean(), 'resid_std', resid.std())
check('finite', np.isfinite(mse))

mse 0.22892431689020792 resid_mean 0.02930128510763712 resid_std 0.47756230125633753
OK: finite


## Section 2 — Gradient Descent

### Task 2.1: Implement MSE loss + gradient

Loss = mean((Xw-y)^2), grad = (2/n) X^T(Xw-y)

# TODO: implement `mse_loss_and_grad`

**FAANG gotcha:** shapes and constants.

In [11]:
# mean squared error = how wrong my predictions are on average
# it's going to penalize large errors more than small errors because of squaring
# gradient of mse = it tells us in which direction to change weights to reduce error
def mse_loss_and_grad(X, y, w):
    # TODO
    r = X @ w - y  # residuals = predicted - actual , prediction error
    loss = float(np.mean(r * r))  # mean squared error
    grad = (2.0 / X.shape[0]) * (X.T @ r)  # gradient of mse means how to change weights to reduce error
    return loss, grad

w0 = np.zeros(d) # filled with zeros
loss0, g0 = mse_loss_and_grad(X, y, w0) 
check('grad_shape', g0.shape == (d,))
check('finite_loss', np.isfinite(loss0))

OK: grad_shape
OK: finite_loss


### Task 2.2: Train with GD + compare to closed-form

# TODO: implement a simple GD loop, track loss, and compare final weights to w_hat.

**Checkpoint:** How does feature scaling affect GD?

In [12]:
def train_gd(X, y, lr=0.05, steps=500):
    # TODO
    w = np.zeros(X.shape[1]) # shape[1] = d = number of features(columns)
    losses = []
    for _ in range(steps):
        loss, grad = mse_loss_and_grad(X, y, w)
        losses.append(loss)
        w = w - lr * grad  # update weights in the direction of negative gradient
    return w, losses
    

w_gd, losses = train_gd(X, y, lr=0.05, steps=500)
print('final_loss', losses[-1])
print('||w_gd-w_hat||', np.linalg.norm(w_gd - w_hat))
check('loss_decreases', losses[-1] <= losses[0])

final_loss 0.23136626716013964
||w_gd-w_hat|| 1.3769754271702541
OK: loss_decreases


## Section 3 — Ridge Regression (L2)

### Task 3.1: Ridge closed-form
w = (X^T X + λI)^{-1} X^T y

# TODO: implement ridge_solve

**Interview Angle:** Why does ridge help under collinearity?

In [None]:
# Ridge Regression = Regularize version of linear regression
# It adds a penalty to the loss function
def ridge_solve(X, y, lam):
    # TODO
    ...

w_ridge = ridge_solve(X, y, lam=1.0)
check('ridge_shape', w_ridge.shape == (d,))

### Task 3.2: Bias/variance demo with train/test split

# TODO: split into train/test and compare MSE for multiple lambdas.

**Checkpoint:** why can test error improve even when train error worsens?

In [None]:
# TODO
idx = rng.permutation(n)
train = idx[: int(0.7*n)]
test = idx[int(0.7*n):]
Xtr, ytr = X[train], y[train]
Xte, yte = X[test], y[test]

lams = [0.0, 0.1, 1.0, 10.0]
results = []
for lam in lams:
    w = ridge_solve(Xtr, ytr, lam=lam) if lam > 0 else np.linalg.solve(Xtr.T@Xtr, Xtr.T@ytr)
    tr_mse = np.mean((Xtr@w - ytr)**2)
    te_mse = np.mean((Xte@w - yte)**2)
    results.append((lam, tr_mse, te_mse))
print('lam, train_mse, test_mse')
for r in results:
    print(r)

## Section 4 — GLM Intuition

### Task 4.1: Match tasks to (distribution, link)
Fill in a table for:
- regression
- binary classification
- count prediction

**Explain:** what changes when you go from OLS to a GLM?

| Problem | Target type | Distribution | Link | Loss |
|---|---|---|---|---|
| House price | continuous | ? | ? | ? |
| Fraud | binary | ? | ? | ? |
| Clicks per user | count | ? | ? | ? |


---
## Submission Checklist
- All TODOs completed
- Train/test results shown for ridge
- Short answers to checkpoint questions
