# Backprop for linear regression and beyond

In this assignment you'll write backprop code for a few simple neural nets. The first one implements linear regression, but we also work with neural nets having a hidden layer of neurons.

You'll also tweak the learning rate and number of training steps to try to get your nets to perform well.

The understanding you get of the how neural nets work will help a lot in the coming weeks.

vf22.6

#### Instructions:

- This homework is to be done completely on your own.  Don't refer to the web for solutions and don't discuss the homework with others.
- Start by reading the notebook.
- There are 7 problems.  For each problem, add code to the indicated notebook cell.
- Do not modify any other cells, and do not add imports.
- It is not enough to duplicate my output -- your code must solve the problem as stated.
- Be sure to "restart and run all" before submitting your notebook.

In [None]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

In [None]:
def plot_loss(zs, ymax=10000):
    """ Plot loss values over time.  zs is a 1D array or list of loss values. """

    zs = np.array(zs)
    n = zs.size
    plt.figure(figsize=(15,8))
    plt.plot(zs)
    plt.title('loss over time')
    plt.ylim(0, ymax)
    plt.xlabel('training step')
    plt.ylabel('loss')
    plt.grid()

Feel free to use the functions below in your code.

In [None]:
def mse(y, y_pred):
    return 0.5 * (y - y_pred)**2

def sigmoid(x):
    return 1/(1 + np.exp(-x))

def neuron(w, x):
    """ Summation followed by a sigmoid activation function.
        w is a 1D NumPy array
        x is a 1D or 2D NumPy array
    """
    return sigmoid(x.dot(w))

#### Heart disease data

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/grbruns/cst383/master/heart.csv')
df['output'] = df['output'] - 1

predictors = ['restbp', 'age']
target = 'maxhr'

X = df[predictors].values
y = df[target].values

scaler = StandardScaler()
X = scaler.fit_transform(X)

## Linear regression, 2 inputs

The figure below shows linear regression written as a tiny neural net.  There are two inputs, x1 and x2, along with a bias input.  The true target value is y, and the predicted target value is v.  The value of the loss function is z.

<img src="https://raw.githubusercontent.com/grbruns/cst495/master/lin-reg-net.png">

Writing the network as equations:

In [None]:
# v = b0 + b1*x1 + b2*x2  (y predicted)
# z = mse(v, y)

### Problem 1

This problem is like a lab problem we worked on in class.

Add your code at the marked locations.  You can use whatever variable names you like, but you will need to use b0, b1, b2 for the weights.  Use no loops or dictionaries.  Do not remove my comments.

I recommend using the names in the diagram above.  For the partial derivatives, I used names like dv_b0 for the partial derivative of v with respect to b0 and dz_b2 for the partial derivatives of z with respect to b2.

In [None]:
def train(X, y, alpha, num_iterations):

    m, n = X.shape

    # initialize parameters
    b0, b1, b2 = np.random.rand(3) - 0.5

    z_history = []
    for _ in range(num_iterations):

        # stochastic gradient descent; get a random training example
        i = np.random.choice(m)
        x1 = X[i,0]
        x2 = X[i,1]

        # forward prop (be sure to include variable z)
        # YOUR CODE HERE

        # backprop
        # YOUR CODE HERE

        # update parametersk
        # YOUR CODE HERE

        z_history.append(z)

    return np.array([b0,b1,b2]), z_history

### Problem 2

Set the values of alpha (learning rate) and num_iterations below such that you get a good result when training.  You can see what a good result is by comparing to LinearRegression below.  

Because stochastic gradient descent is being used, there is a lot of noise, and the result you get will depend on the run.  Don't waste time "chasing the noise".

In [None]:
alpha = None            # replace None with your value
num_iterations = None   # replace None with your value
b, z_history = train(X, y, alpha, num_iterations)

Print the results

In [None]:
print(f'training result: {np.round(b, 3)}')

#### Plot loss over time

Looking at the change in loss over time is helpful when setting up learning rate and number of iterations.

In [None]:
plot_loss(z_history)

Compute the training MSE

In [None]:
y_pred = b[0] + b[1]*X[:,0] + b[2]*X[:,1]
net_mse = ((y_pred - y)**2).mean()

print(f'Training MSE for neural net: {net_mse:.3f}')

Compare to the training MSE obtained with linear regression

In [None]:
regr = LinearRegression()
regr.fit(X, y)
y_pred = regr.predict(X)

print(f'Training MSE for linear regression: {((y - y_pred)**2).mean():.3f}')

Coefficients from linear regression.

In [None]:
print(f'coefficients: {np.array([ regr.intercept_ ] + list(regr.coef_)).round(3) }')

## Linear regression, any number of inputs

It's pretty easy to tweak your result for problem 1 so that it will work with any number of inputs.

Add your code in the cell below at the marked locations.  Use no loops or dictionaries.

Note that the number of inputs can be determined from input 2D array X.

Make sure you code can work with 3 or more inputs.

### Problem 3

In [None]:
def train2(X, y, alpha, num_iterations):

    # augment X
    m,n = X.shape
    Xa = np.c_[np.ones(m), X]

    # initialize parameters
    b = np.random.rand(n+1) - 0.5

    z_history = []
    for _ in range(num_iterations):

        # stochastic gradient descent; get a random training example
        i = np.random.choice(m)
        x = Xa[i]

        # forward prop
        # YOUR CODE HERE

        # backprop
        # YOUR CODE HERE

        # update parameters
        # YOUR CODE HERE

        z_history.append(z)

    return b, z_history

### Problem 4

Set the values of alpha (learning rate) and num_iterations below to get a good result when training.

In [None]:
alpha = None              # replace None with your value
num_iterations = None     # replace None with your value
b, z_history = train2(X, y, alpha, num_iterations)

In [None]:
print(f'training result: {np.round(b, 3)}')

In [None]:
plot_loss(z_history)

In [None]:
y_pred = b[0] + b[1]*X[:,0] + b[2]*X[:,1]
net_mse = ((y_pred - y)**2).mean()

print(f'Training MSE for neural net: {net_mse:.3f}')

## Linear regression plus a hidden layer of 2 neurons, each with sigmoid activation

The figure below is a tiny neural net for regression.  It has a single hidden layer with two neurons.  The neurons use a sigmoid activation function (which isn't a common choice for modern neural nets).

In writing backprop code it's important to have a clear scheme for naming the weights and partial derivatives in the network.

The figure shows the names I used.  Array x is used for the inputs to neuron 1 (the top neuron), and neuron 2.  Note that x[0], the first input to neurons 1 and 2, is a bias input -- it always has value 1.

Similarly, xs is the array of inputs to the summation node.

For the partial derivatives, I use px1[0] for the partial derivative of xs[1] with respect to z.  In other words, I just use a p in front of the input name.  We can associate the names of the partial derivatives with the names of the inputs because every partial derivative is with respect to the loss.

Your code should handle the case where there are any number of inputs to the network, not just 2.

<img src="https://raw.githubusercontent.com/grbruns/cst495/master/tiny-net.png">

In [None]:
# Summary of variable names
#
# variables associated with neuron 1:
# inputs for neuron 1:             x1[0],  x1[1],  x1[2]
# weights for neuron 1:            w1[0],  w1[1],  w1[2]
# partials for neuron 1 inputs:   px1[0], px1[1], px1[2]
# partials for neuron 2 weights:  pw1[0], pw1[1], pw1[2]
#
# variables associated with the summation node:
# inputs:  xs[0], xs[1], xs[2]
# weights: ws[0], ws[1], ws[2]
# partials for the inputs:  pxs[0], pxs[1], pxs[2]
# partials for the weights: pws[0], pws[1], pws[2]

### Problem 5

Add your code in the cell below at the marked locations.

What makes backprop different here is that both the summation node and the neurons need to have their weights updated.  You should update the weights of the summation node right after you compute the partial derivatives for that node.

Use no loops or dictionaries in the code you add.

In [None]:
def train3(X, y, alpha, num_iterations):

    num_neurons = 2
    m,n = X.shape
    # augmented version of X
    Xa = np.c_[np.ones(m), X]

    # initialize parameters of the neural net
    w1 = np.random.rand(n+1) - 0.5
    w2 = np.random.rand(n+1) - 0.5
    ws = np.random.rand(num_neurons + 1) - 0.5

    z_history = []
    for _ in range(num_iterations):

        # stochastic gradient descent; get a random training example
        i = np.random.choice(m)
        x = Xa[i]

        #
        # forward prop
        #

        # YOUR CODE HERE
        # be sure to set loss variable z

        #
        # backprop
        #

        # MSE node
        # YOUR CODE HERE

        # summation node: compute partials and update ws
        # YOUR CODE HERE

        # node 1: compute partials and update w1
        # YOUR CODE HERE

        # node 2: compute partials and update w2
        # YOUR CODE HERE

        z_history.append(z)

    return w1, w2, ws, z_history

### Problem 6

Making predications is a little more complicated now that multiple nodes have weights.  Therefore, we will create a prediction function that will take all the network weights, plus input array X, and produce predictions.

Add your code in the cell below at the marked locations.

In [None]:
def predict3(X, w1, w2, ws):
    """ Compute predictions for the network with weight arrays w1, w2, and ws. """

    # YOUR CODE HERE
    # It is fine to copy/paste/adapt code from the previous problem.

    return y_pred

### Problem 7

Set the values of alpha (learning rate) and num_iterations below such that you get a good result when training.

In [None]:
alpha = None              # replace None with your value
num_iterations = None     # replace None with your value
w1, w2, ws, z_history = train3(X, y, alpha, num_iterations)

In [None]:
plot_loss(z_history)

#### Compute MSE on the training set

Don't use the mse() function here, as it only works on pair of vectors, and uses the 0.5 factor.

In [None]:
y_pred = predict3(X, w1, w2, ws)
print(f'Training MSE with neural net: {((y - y_pred)**2).mean():.3f}')