In [10]:
import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler


train_df = pd.read_csv("/Users/haderie/Downloads/housing/train.csv")
train_df = train_df.drop(columns=[ "zipcode"])

X_train = train_df.drop(columns=["price"]) # features
scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
y_train = train_df["price"] / 1000 # target


test_df = pd.read_csv("/Users/haderie/Downloads/housing/test.csv")
test_df = test_df.drop(columns=["id", "date", "zipcode"])

X_test = test_df.drop(columns=["price"]) # featurs

y_test = test_df["price"] / 1000  # target
X_test_scaled = scaler.transform(X_test)



In this problem, you will derive the optimal parameters for ridge regression and train ridge regression models with different regularization levels. In ridge regression, the loss function includes a regularization term:

$J(\theta) = \sum_{i=1}^N(h_{\theta}(x_i)-y_i)^2 + \lambda \sum_{j=1}^d \theta_j^2$

1. **[A]** Write the derivation of the closed form solution for parameter $\theta$ that minimizes the loss function $J(\theta)$ in ridge regression.

In PDF

2. **[C]** Modify your implementation from Problem 5 to implement ridge regression with gradient descent.


In [11]:
def ridge_gradient_descent(X, y, alpha, lam, num_iters):
    """
    Gradient descent for ridge regression
    X: (N, d) design matrix with intercept
    y: (N,) target
    lam: regularization parameter λ
    """
    N, d = X.shape
    theta = np.zeros(d)

    for _ in range(num_iters):
        gradient = (2 / N) * X.T @ (X @ theta - y) + 2 * lam * theta
        theta = theta - alpha * gradient

    return theta


3. **[C]** Simulate $N=1000$ values of random variable $X_i$, distributed uniformly on interval $[-2,2]$. Simulate the values of random variable       $Y_i = 1 + 2X_i + e_i$, where $e_i$ is drawn from a Gaussian distribution $N(0, 2)$. 

Fit this data with linear regression, and also with ridge regression for different values of $\lambda \in \{1,10,100,1000,10000\}$. 

Print the slope, the MSE values, and the $R^2$ statistic for each case and write down some observations. What happens as the regularization parameter $\lambda$ increases?

In [12]:
np.random.seed(0)

# simulate data
N = 1000
X = np.random.uniform(-2, 2, N)
e = np.random.normal(0, np.sqrt(2), N)
y = 1 + 2*X + e

# matrix with intercept
X_matrix = np.column_stack((np.ones(N), X))

# linear regression
theta = np.linalg.pinv(X_matrix) @ y
y_pred_lr = X_matrix @ theta

print("Linear Regression")
print("Intercept:", theta[0])
print("Slope:", theta[1])
print("MSE:", mean_squared_error(y, y_pred_lr))
print("R^2:", r2_score(y, y_pred_lr))
print()

# ridge Regression 
lambdas = [1, 10, 100, 1000, 10000]

for lam in lambdas:
    theta_ridge = ridge_gradient_descent(
        X_matrix, y,
        alpha=0.0001,
        lam=lam,
        num_iters=5000
    )

    y_pred_ridge = X_matrix @ theta_ridge

    print(f"Ridge (lambda={lam})")
    print("Slope:", theta_ridge[1])
    print("MSE:", mean_squared_error(y, y_pred_ridge))
    print("R^2:", r2_score(y, y_pred_ridge))
    print()


Linear Regression
Intercept: 1.0405149813978518
Slope: 1.9656920054163693
MSE: 1.8653740489434376
R^2: 0.7367593726211263

Ridge (lambda=1)
Slope: 1.017926323013737
MSE: 3.419195822005353
R^2: 0.5174848423426281

Ridge (lambda=10)
Slope: 0.23265695792026644
MSE: 6.7702796662790465
R^2: 0.04458161198758659

Ridge (lambda=100)
Slope: 0.026044838209081718
MSE: 7.9465287822285235
R^2: -0.12141005891175194

Ridge (lambda=1000)
Slope: 0.0026359727231450724
MSE: 8.087221790538303
R^2: -0.1412645839579565

Ridge (lambda=10000)
Slope: -0.0007501558600695277
MSE: 8.107440944031175
R^2: -0.1441179005104991



As the regularization parameter λ increases, the slope of the ridge regression model shrinks toward zero, the MSE increases, and the R² decreases. Small λ has little effect, while very large λ causes the model to underfit completely, effectively ignoring the input feature.