# Question 1

Question 1. (20 points)
1) Calculate by hand the empirical variance for the following datasets:
(a) [4, 7, 9, 11, 15]
(b) [18, 22, 25, 30, 35, 40]
2) Once you have manually calculated the variance, confirm your result with Numpy.
• Note that the documentation for numpy variance is at:
https://numpy.org/doc/stable/reference/generated/numpy.var.html
• Read carefully the ddof option it determines if you are using empirical or population variance.
3) Calculate the population variance for the following datasets:
(a) [12, 14, 16, 18, 20]
(b) [28, 32, 36, 40, 44, 48]
4) Once you have manually calculated the population variance, confirm your result with Numpy.
5) Comparing Variances
(a) Explain the key differences between empirical and population variance calculations.
(b) Provide an example where you should use population variance over empirical variance, and vice versa.

In [2]:
import numpy as np

data_1a = np.array([4, 7, 9, 11, 15])
data_1b = np.array([18, 22, 25, 30, 35, 40])
data_3a = np.array([12, 14, 16, 18, 20])
data_3b = np.array([28, 32, 36, 40, 44, 48])

print("Empirical Variance Calculations:")
print("Dataset 1a:", np.var(data_1a, ddof=1))
print("Dataset 1b:", np.var(data_1b, ddof=1))

print("\nPopulation Variance Calculations:")
print("Dataset 3a:", np.var(data_3a, ddof=0))
print("Dataset 3b:", np.var(data_3b, ddof=0))

Empirical Variance Calculations:
Dataset 1a: 17.2
Dataset 1b: 68.26666666666668

Population Variance Calculations:
Dataset 3a: 8.0
Dataset 3b: 46.666666666666664


# Question 3

Question 3. (40 points) In the data folder, find and load the 2 files
stock_prediction_data.csv
stock_price.csv

This data predicts tomorrow’s stock price difference given the previous day’s data.
1) Split the data into Train, validation, test
2) Preprocess the data, remove mean and scale it.
3) Perform 2nd order polynomial regression with:
(a) Lasso constraint
(i) Solve it with your own gradient descent code
(ii) Solve it with sklearn library (compare your results)
(iii) Use the validation data to identify and ideal lambda value
(b) Repeat the above steps with Ridge constraint
(c) Repeat the above steps with Elastic net
(d) How does Elastic Net compare with using only one constraint?

In [13]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Lasso as MyLasso, Ridge as MyRidge, ElasticNet as MyElasticNet
from matplotlib import pyplot as plt

# Load the data from CSV files
X = np.genfromtxt('/Users/shreyas/Desktop/ML/HW/hw4/stock_prediction_data.csv', delimiter=',')
y = np.genfromtxt('/Users/shreyas/Desktop/ML/HW/hw4/stock_price.csv', delimiter=',')
y = y.reshape(-1, 1)

# Step 1: Split data into training, validation, and test sets
X_train, X_remain, y_train, y_remain = train_test_split(X, y, test_size=0.2, random_state=42)
X_test, X_val, y_test, y_val = train_test_split(X_remain, y_remain, test_size=0.5, random_state=42)

# Step 2: Apply scaling to the features
scaler = StandardScaler()
scaler.fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_val_scaled = scaler.transform(X_val)
X_test_scaled = scaler.transform(X_test)

# Step 3: Transform the data into second-order polynomial features
poly = PolynomialFeatures(degree=2, include_bias=True)
X_train_poly = poly.fit_transform(X_train_scaled)
X_val_poly = poly.transform(X_val_scaled)
X_test_poly = poly.transform(X_test_scaled)

# Custom gradient descent function for Lasso regression
def lasso_grad_desc(X, y, lambda_val, learning_rate=0.01, max_iter=10000):
    n, m = X.shape
    weights = np.zeros((m, 1))
    for i in range(max_iter):
        y_pred = X @ weights
        gradient = (2/n) * X.T @ (y_pred - y) + lambda_val * np.sign(weights)
        
        if np.all(np.abs(gradient) < 1e-5):
            break
        
        weights -= learning_rate * gradient
    return weights

# Prediction function
def predict_lasso(X, weights):
    return X @ weights

# Mean squared error (MSE) calculation
def mse(y, y_pred):
    return np.mean((y - y_pred) ** 2)

# Custom Lasso regression training and validation
weights_custom_lasso = lasso_grad_desc(X_train_poly, y_train, lambda_val=1)
pred_train_custom_lasso = predict_lasso(X_train_poly, weights_custom_lasso)
print(f"Custom Lasso Train MSE: {mse(y_train, pred_train_custom_lasso)}")
pred_val_custom_lasso = predict_lasso(X_val_poly, weights_custom_lasso)
print(f"Custom Lasso Validation MSE: {mse(y_val, pred_val_custom_lasso)}")

# Lasso using scikit-learn
lasso_model = MyLasso(alpha=1)
lasso_model.fit(X_train_poly, y_train.flatten())
pred_train_lasso_sklearn = lasso_model.predict(X_train_poly).reshape(-1, 1)
print(f"Scikit-learn Lasso Train MSE: {mse(y_train, pred_train_lasso_sklearn)}")
pred_val_lasso_sklearn = lasso_model.predict(X_val_poly).reshape(-1, 1)
print(f"Scikit-learn Lasso Validation MSE: {mse(y_val, pred_val_lasso_sklearn)}")

# Finding the best lambda for Lasso
lambda_candidates = [0.01, 0.1, 1, 10]
best_lambda = None
best_mse_lasso = float('inf')

for lambda_val in lambda_candidates:
    weights = lasso_grad_desc(X_train_poly, y_train, lambda_val)
    pred_val = predict_lasso(X_val_poly, weights)
    current_mse = mse(y_val, pred_val)
    print(f"Lasso Validation MSE: {current_mse} for lambda: {lambda_val}")
    
    if current_mse < best_mse_lasso:
        best_mse_lasso = current_mse
        best_lambda = lambda_val

print(f"Optimal lambda for Lasso: {best_lambda} with MSE: {best_mse_lasso}")

# Custom Ridge gradient descent
def ridge_grad_desc(X, y, lambda_val, learning_rate=0.01, max_iter=10000):
    n, m = X.shape
    weights = np.zeros((m, 1))
    for i in range(max_iter):
        y_pred = X @ weights
        gradient = (2/n) * X.T @ (y_pred - y) + 2 * lambda_val * weights
        
        if np.all(np.abs(gradient) < 1e-5):
            break
        
        weights -= learning_rate * gradient
    return weights

# Custom Ridge training and validation
weights_custom_ridge = ridge_grad_desc(X_train_poly, y_train, lambda_val=1)
pred_train_custom_ridge = predict_lasso(X_train_poly, weights_custom_ridge)
print(f"Custom Ridge Train MSE: {mse(y_train, pred_train_custom_ridge)}")
pred_val_custom_ridge = predict_lasso(X_val_poly, weights_custom_ridge)
print(f"Custom Ridge Validation MSE: {mse(y_val, pred_val_custom_ridge)}")

# Ridge using scikit-learn
ridge_model = MyRidge(alpha=1)
ridge_model.fit(X_train_poly, y_train.flatten())
pred_train_ridge_sklearn = ridge_model.predict(X_train_poly).reshape(-1, 1)
print(f"Scikit-learn Ridge Train MSE: {mse(y_train, pred_train_ridge_sklearn)}")
pred_val_ridge_sklearn = ridge_model.predict(X_val_poly).reshape(-1, 1)
print(f"Scikit-learn Ridge Validation MSE: {mse(y_val, pred_val_ridge_sklearn)}")

# Finding the best lambda for Ridge
best_lambda_ridge = None
best_mse_ridge = float('inf')

for lambda_val in lambda_candidates:
    weights = ridge_grad_desc(X_train_poly, y_train, lambda_val)
    pred_val = predict_lasso(X_val_poly, weights)
    current_mse = mse(y_val, pred_val)
    print(f"Ridge Validation MSE: {current_mse} for lambda: {lambda_val}")
    
    if current_mse < best_mse_ridge:
        best_mse_ridge = current_mse
        best_lambda_ridge = lambda_val

print(f"Optimal lambda for Ridge: {best_lambda_ridge} with MSE: {best_mse_ridge}")

# Elastic Net with gradient descent
def elastic_net_grad_desc(X, y, l1_ratio, lambda_val, learning_rate=0.01, max_iter=10000):
    n, m = X.shape
    weights = np.zeros((m, 1))
    for i in range(max_iter):
        y_pred = X @ weights
        gradient = (2/n) * X.T @ (y_pred - y) + l1_ratio * np.sign(weights) + (1 - l1_ratio) * 2 * lambda_val * weights
        
        if np.all(np.abs(gradient) < 1e-5):
            break
        
        weights -= learning_rate * gradient
    return weights

# Elastic Net using scikit-learn
elastic_net_model = MyElasticNet(alpha=1, l1_ratio=0.5)
elastic_net_model.fit(X_train_poly, y_train.flatten())
pred_train_elastic_net_sklearn = elastic_net_model.predict(X_train_poly).reshape(-1, 1)
print(f"Scikit-learn Elastic Net Train MSE: {mse(y_train, pred_train_elastic_net_sklearn)}")
pred_val_elastic_net_sklearn = elastic_net_model.predict(X_val_poly).reshape(-1, 1)
print(f"Scikit-learn Elastic Net Validation MSE: {mse(y_val, pred_val_elastic_net_sklearn)}")

# Finding best hyperparameters for Elastic Net
l1_ratios = [0.1, 0.5, 0.9]
best_mse_elastic_net = float('inf')
best_l1_ratio = None
best_lambda_elastic_net = None

for l1_ratio in l1_ratios:
    for lambda_val in lambda_candidates:
        weights = elastic_net_grad_desc(X_train_poly, y_train, l1_ratio, lambda_val)
        pred_val = predict_lasso(X_val_poly, weights)
        current_mse = mse(y_val, pred_val)
        print(f"Elastic Net Validation MSE: {current_mse} for l1_ratio: {l1_ratio}, lambda: {lambda_val}")
        
        if current_mse < best_mse_elastic_net:
            best_mse_elastic_net = current_mse
            best_l1_ratio = l1_ratio
            best_lambda_elastic_net = lambda_val

print(f"Best Elastic Net l1_ratio: {best_l1_ratio}, lambda: {best_lambda_elastic_net} with MSE: {best_mse_elastic_net}")


Custom Lasso Train MSE: 2.122974823830285
Custom Lasso Validation MSE: 2.3665208202191628
Scikit-learn Lasso Train MSE: 6.831045121140056
Scikit-learn Lasso Validation MSE: 7.629002980184826
Lasso Validation MSE: 0.08125352635324681 for lambda: 0.01
Lasso Validation MSE: 0.09554581350427213 for lambda: 0.1
Lasso Validation MSE: 2.3665208202191628 for lambda: 1
Lasso Validation MSE: 46.70576127094301 for lambda: 10
Optimal lambda for Lasso: 0.01 with MSE: 0.08125352635324681
Custom Ridge Train MSE: 12.63957670786306
Custom Ridge Validation MSE: 16.01545282823887
Scikit-learn Ridge Train MSE: 0.03284528040883185
Scikit-learn Ridge Validation MSE: 0.1038999941733034
Ridge Validation MSE: 0.13532333433232058 for lambda: 0.01
Ridge Validation MSE: 1.1402329224528376 for lambda: 0.1
Ridge Validation MSE: 16.01545282823887 for lambda: 1
Ridge Validation MSE: 42.66613532773383 for lambda: 10
Optimal lambda for Ridge: 0.01 with MSE: 0.13532333433232058
Scikit-learn Elastic Net Train MSE: 10.594

# Solve it with sklearn library (compare your results)

The custom Lasso implementation outperformed the scikit-learn version, with lower MSEs (2.12 train, 2.37 validation) compared to scikit-learn's (6.83 train, 7.63 validation). The optimal lambda for Lasso was 0.01, giving a validation MSE of 0.08. Ridge performed best overall with scikit-learn, achieving a train MSE of 0.03 and validation MSE of 0.10. Elastic Net, with an optimal l1_ratio of 0.1 and lambda of 0.01, had a validation MSE of 0.12.

# How does Elastic Net compare with using only one constraint?

Elastic Net, which combines both Lasso and Ridge penalties, performed better than Lasso but slightly worse than Ridge. Its optimal validation MSE was 0.12, compared to Ridge’s 0.10 and Lasso’s 0.08. Elastic Net provides a balance between the two constraints but did not outperform Ridge in this case.