# Problem 3, Parts F-H: Stochastic Gradient Descent with a Larger Dataset

Use this notebook to write your code for problem 3 parts F-H by filling in the sections marked `# TODO` and running all cells.

In [None]:
# Setup.

import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

## Problem 3F: Perform SGD with the new dataset

For the functions below, you may re-use your code from parts 3C-E. Note that you can now modify your SGD function to return the final weight vector instead of the weights after every epoch.

In [None]:
def loss(X, Y, w):
    '''
    Calculate the squared loss function.
    
    Inputs:
        X: A (N, D) shaped numpy array containing the data points.
        Y: A (N, ) shaped numpy array containing the (float) labels of the data points.
        w: A (D, ) shaped numpy array containing the weight vector.
    
    Outputs:
        The loss evaluated with respect to X, Y, and w.
    '''
    total = 0
    for i in range(len(X)):
      total += (Y[i]-np.matmul(np.transpose(w),X[i]))**2
    return total

def gradient(x, y, w):
    '''
    Calculate the gradient of the loss function with respect to
    a single point (x, y), and using weight vector w.
    
    Inputs:
        x: A (D, ) shaped numpy array containing a single data point.
        y: The float label for the data point.
        w: A (D, ) shaped numpy array containing the weight vector.
        
    Output:
        The gradient of the loss with respect to x, y, and w. 
    '''
    return -2*np.array(x)*(y-np.matmul(np.transpose(w),x))

def SGD(X, Y, w_start, eta, N_epochs):
    '''
    Perform SGD using dataset (X, Y), initial weight vector w_start,
    learning rate eta, and N_epochs epochs.
    
    Inputs:
        X: A (N, D) shaped numpy array containing the data points.
        Y: A (N, ) shaped numpy array containing the (float) labels of the data points.
        w_start:  A (D, ) shaped numpy array containing the weight vector initialization.
        eta: The step size.
        N_epochs: The number of epochs (iterations) to run SGD.
        
    Outputs:
        w: A (D, ) shaped array containing the final weight vector.
        losses: A (N_epochs, ) shaped array containing the losses from all iterations.
    '''
    losses = []
    w = np.array(w_start)
    for i in range(N_epochs):
      shuffle = np.random.permutation(len(X))
      X = X[shuffle]
      Y = Y[shuffle]
      for point in range(len(X)):
        w -= eta*gradient(X[point],Y[point],w)
      losses.append(loss(X, Y, w))
    return (w,losses)

Next, we need to load the dataset. In doing so, the following function may be helpful:

In [None]:
def load_data(filename):
    """
    Function loads data stored in the file filename and returns it as a numpy ndarray.
    
    Inputs:
        filename: GeneratorExitiven as a string.
    
    Outputs:
        Data contained in the file, returned as a numpy ndarray
    """
    return np.loadtxt(filename, skiprows=1, delimiter=',')

Now, load the dataset in `sgd_data.csv` and run SGD using the given parameters; print out the final weights.

In [None]:
#==============================================
# TODO:
# (1) load the dataset
# (2) run SGD using the given parameters
# (3) print out the final weights.
#==============================================

# The following should help you get started:
data = load_data('https://raw.githubusercontent.com/charlesincharge/Caltech-CS155-2022/main/sets/set1/data/sgd_data.csv')
X = np.insert(data,0,1,axis=1)[:,:5]
Y = np.insert(data,0,1,axis=1)[:,5:]
w_start = [0.001,0.001,0.001,0.001,0.001]
eta = np.exp(-15)
N_epochs = 800
(w_end, losses) = SGD(X, Y, w_start, eta, N_epochs)
print(w_end)

KeyboardInterrupt: ignored

## Problem 3G: Convergence of SGD

This problem examines the convergence of SGD for different learning rates. Please implement your code in the cell below:

In [None]:
#==============================================
# TODO: create a plot showing the convergence
# of SGD for the different learning rates.
#==============================================
eta_vals = [np.exp(-10), np.exp(-11), np.exp(-12), np.exp(-13), np.exp(-14), np.exp(-15)]
training_error = []
for eta in eta_vals:
  (weights, losses) = SGD(X, Y, w_start, eta, N_epochs)
  training_error.append(losses)

plt.figure()

plt.plot(range(N_epochs), training_error[0], marker = 'o', linewidth = 0)
plt.plot(range(N_epochs), training_error[1], marker = 'o', linewidth = 0)
plt.plot(range(N_epochs), training_error[2], marker = 'o', linewidth = 0)
plt.plot(range(N_epochs), training_error[3], marker = 'o', linewidth = 0)
plt.plot(range(N_epochs), training_error[4], marker = 'o', linewidth = 0)
plt.plot(range(N_epochs), training_error[5], marker = 'o', linewidth = 0)

plt.legend(['e^-10 eta', 'e^-11 eta', 'e^-12 eta', 'e^-13 eta', 'e^-14 eta', 'e^-15 eta'], loc = 'best')
plt.xlabel('Number of Epochs')
plt.ylabel('Training Error')
plt.title('SGD Convergence by eta')

plt.show()

## Problem 3H

Provide your code for computing the least-squares analytical solution below.

In [None]:
#==============================================
# TODO: implement the least-squares
# analytical solution.
#==============================================
w = np.matmul(np.linalg.inv(np.matmul(np.transpose(X),X)),np.matmul(np.transpose(X),Y))
print(w)