# Question 1

Question 1. (20 points)
1) Calculate by hand the empirical variance for the following datasets:
(a) [4, 7, 9, 11, 15]
(b) [18, 22, 25, 30, 35, 40]
2) Once you have manually calculated the variance, confirm your result with Numpy.
• Note that the documentation for numpy variance is at:
https://numpy.org/doc/stable/reference/generated/numpy.var.html
• Read carefully the ddof option it determines if you are using empirical or population variance.
3) Calculate the population variance for the following datasets:
(a) [12, 14, 16, 18, 20]
(b) [28, 32, 36, 40, 44, 48]
4) Once you have manually calculated the population variance, confirm your result with Numpy.
5) Comparing Variances
(a) Explain the key differences between empirical and population variance calculations.
(b) Provide an example where you should use population variance over empirical variance, and vice versa.

In [2]:
import numpy as np

data_1a = np.array([4, 7, 9, 11, 15])
data_1b = np.array([18, 22, 25, 30, 35, 40])
data_3a = np.array([12, 14, 16, 18, 20])
data_3b = np.array([28, 32, 36, 40, 44, 48])

print("Empirical Variance Calculations:")
print("Dataset 1a:", np.var(data_1a, ddof=1))
print("Dataset 1b:", np.var(data_1b, ddof=1))

print("\nPopulation Variance Calculations:")
print("Dataset 3a:", np.var(data_3a, ddof=0))
print("Dataset 3b:", np.var(data_3b, ddof=0))

Empirical Variance Calculations:
Dataset 1a: 17.2
Dataset 1b: 68.26666666666668

Population Variance Calculations:
Dataset 3a: 8.0
Dataset 3b: 46.666666666666664


# Question 3

Question 3. (40 points) In the data folder, find and load the 2 files
stock_prediction_data.csv
stock_price.csv

This data predicts tomorrow’s stock price difference given the previous day’s data.
1) Split the data into Train, validation, test
2) Preprocess the data, remove mean and scale it.
3) Perform 2nd order polynomial regression with:
(a) Lasso constraint
(i) Solve it with your own gradient descent code
(ii) Solve it with sklearn library (compare your results)
(iii) Use the validation data to identify and ideal lambda value
(b) Repeat the above steps with Ridge constraint
(c) Repeat the above steps with Elastic net
(d) How does Elastic Net compare with using only one constraint?

In [12]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Lasso, Ridge, ElasticNet

stock_prediction_data = pd.read_csv('/Users/shreyas/Desktop/ML/HW/hw4/stock_prediction_data.csv')
stock_price = pd.read_csv('/Users/shreyas/Desktop/ML/HW/hw4/stock_price.csv')

X = stock_prediction_data.values
y = stock_price.values

X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)
X_test_scaled = scaler.transform(X_test)

poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(X_train_scaled)
X_val_poly = poly.transform(X_val_scaled)
X_test_poly = poly.transform(X_test_scaled)

lasso_model_sklearn = Lasso(alpha=1.0, max_iter=10000)
lasso_model_sklearn.fit(X_train_poly, y_train.ravel())
y_val_pred_sklearn_lasso = lasso_model_sklearn.predict(X_val_poly)
mse_val_sklearn_lasso = np.mean((y_val_pred_sklearn_lasso - y_val.ravel()) ** 2)

def lasso_gradient_descent(X, y, alpha, learning_rate=0.001, max_iter=10000):
    m, n = X.shape
    theta = np.zeros(n)
    for _ in range(max_iter):
        y_pred = X @ theta
        gradient = (1/m) * (X.T @ (y_pred - y)) + alpha * np.sign(theta)
        theta -= learning_rate * gradient
    return theta

theta_custom_lasso = lasso_gradient_descent(X_train_poly, y_train.ravel(), alpha=1.0)
y_val_pred_custom_lasso = X_val_poly @ theta_custom_lasso
mse_val_custom_lasso = np.mean((y_val_pred_custom_lasso - y_val.ravel()) ** 2)

ridge_model_sklearn = Ridge(alpha=1.0, max_iter=10000)
ridge_model_sklearn.fit(X_train_poly, y_train.ravel())
y_val_pred_sklearn_ridge = ridge_model_sklearn.predict(X_val_poly)
mse_val_sklearn_ridge = np.mean((y_val_pred_sklearn_ridge - y_val.ravel()) ** 2)

elastic_net_model_sklearn = ElasticNet(alpha=1.0, l1_ratio=0.5, max_iter=10000)
elastic_net_model_sklearn.fit(X_train_poly, y_train.ravel())
y_val_pred_sklearn_elastic = elastic_net_model_sklearn.predict(X_val_poly)
mse_val_sklearn_elastic = np.mean((y_val_pred_sklearn_elastic - y_val.ravel()) ** 2)

print(f'Lasso (sklearn) Validation MSE: {mse_val_sklearn_lasso}')
print(f'Lasso (custom) Validation MSE: {mse_val_custom_lasso}')
print(f'Ridge (sklearn) Validation MSE: {mse_val_sklearn_ridge}')
print(f'Elastic Net (sklearn) Validation MSE: {mse_val_sklearn_elastic}')


Lasso (sklearn) Validation MSE: 6.964057757126562
Lasso (custom) Validation MSE: 7.783251445060104
Ridge (sklearn) Validation MSE: 0.07902035873867781
Elastic Net (sklearn) Validation MSE: 11.630834838017531


# Solve it with sklearn library (compare your results)

Using sklearn, Lasso resulted in a validation MSE of 6.96, Ridge performed best with 0.079, and Elastic Net had the highest MSE at 11.63. Ridge outperformed both Lasso and Elastic Net, indicating that its penalty (shrinking coefficients but keeping all features) worked best for this dataset.

# How does Elastic Net compare with using only one constraint?

Elastic Net, combining Lasso and Ridge, performed worse than either method alone, with an MSE of 11.63. Ridge, which had the lowest MSE (0.079), was more effective than Elastic Net, suggesting that a single Ridge constraint worked better for this data.