Stochastic Gradient Descent (SGD) Different Parameter Configurations
For SGD optimization, we consider two types of commonly used algorithms, one with constant step size and one with decaying step size, and a modification of the constant step size based on the average of the iterates. Our goal is to: (1) explore different configurations (based on modifying variables) of SGD to fit a regression model, (2) examine how SGD performance depends on hyperparameter choices, and (3) examine the effects of violations of SGD assumptions in theory on model performance, specifically by looking at the mean squared errors between the fitted values and the true parameters.
The code base was provided in sgd_robust_regression(1).py file with modifications to fit the purpose of the project reflected in main.py.