<a href="https://colab.research.google.com/github/milesba4/CS158-ML/blob/main/homework13%20(Back%20Propagation%20NN).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Homework 13**

For this assignment, we'll bring back some of the helper functions that we built throughout the semester:

In [None]:
import numpy as np

class Scaler():
  def __init__(self,z):
    self.min=np.min(z,axis=0)
    self.max=np.max(z,axis=0)

  def scale(self,x):
    
    return (x-self.min)/(self.max-self.min+0.000001)

  def unscale(self,x):
    return x*(self.max-self.min+0.000001)+self.min
    
def OneHot(y):
  classes=np.max(y)+1
  Y=np.zeros((len(y),classes))
  i=np.arange(len(y))
  Y[i,y[i]]=1
  return Y

def MSE(pred,y):
  return np.mean((pred-y)**2)

def Accuracy(pred,y):
  '''Assumes pred is an array of probabilities, and y is a one-hot encoded target column'''
  class_preds=np.argmax(pred,axis=1) #predicted classes from probabilities
  class_target=np.argmax(y,axis=1) #target classes from OneHot encoding
  return np.mean(class_preds==class_target)

Our task now is to add `backprop` and `update` methods to the three classes you worked on in homework 12: `Linear()`, `Softmax()`, and `Model()`. There are no parameters to update and nothing to do with the gradients for the `Softmax` layer, so that is done for you:

In [None]:
class Softmax():
  '''Implement Softmax as final layer for prediction only'''
  def predict(self,input):
    return np.exp(input)/np.sum(np.exp(input),axis=1)[:,np.newaxis]
    #the end part [:,np.newaxis] was added just to get the shape right for later use

  def backprop(self,grad):
    #We ignore this layer in backpropogation
    return grad

  def update(self,lr):
    #Nothing to update
    pass


For the `Linear` class, the `backprop` method will compute the gradient with respect to each parameter in that layer, given the gradients in the *next* layer. The method should output those gradients, for use in the *previous* layer. 

The `update` method of the `Linear` class should change the weights and biases, based on previously computed gradients and some learning rate. 

In [None]:
class Linear():
  '''Fully connected linear layer class'''
  def __init__(self, input_size, output_size):
    np.random.seed(input_size) #control randomness! Remove for real use
    self.weights = np.random.randn(input_size, output_size) * np.sqrt(2.0 / input_size)
    self.biases = np.zeros(output_size)

  def predict(self,input):
    self.input=input
    return self.input@self.weights+self.biases

  def backprop(self,grad):
    self.grad=grad 
    return self.grad@self.weights.T

  def update(self,lr):
    wt_grad = (self.input.T @ self.grad)
    bias_grad = np.sum(self.grad, axis = 0)
    self.weights -= lr * wt_grad / len(self.input)
    self.biases -= lr * bias_grad / len(self.input)


Finally, add `backprop` and `update` methods to the `Model` class. The `backprop` method should pass the gradient of each layer (starting from the last) to the input of the `backprop` method for the previous layer. The `update` method should just call the `update` methods for each layer. 

Note that I have also included a `train` method that is very similar to what we have seen before, to implement batch gradient descent for the network. Make sure you read and understand that code!

In [None]:
class Model():
  def __init__(self,layerlist):
    self.layerlist=layerlist

  def add(self,layer):
    self.layerlist+=[layer]

  def predict(self,input):
    for layer in self.layerlist:
      input=layer.predict(input)
    return input
 
  def backprop(self,grad):
    #gradient of each layer starting from last
    for layer in self.layerlist[::-1]:
      grad=layer.backprop(grad)
    return grad

  def update(self,lr):
    for layer in self.layerlist:
      layer.update(lr)

  def train(self,X,y,epochs,batch_size,lr,loss_fn):
    n=len(X)  
    indices=np.arange(n)
    for i in range(epochs):
      np.random.seed(i)
      np.random.shuffle(indices)
      X_shuffle=X[indices] 
      y_shuffle=y[indices] 
      num_batches=n//batch_size
      for j in range(num_batches):
        X_batch=X_shuffle[j*batch_size:(j+1)*batch_size]
        y_batch=y_shuffle[j*batch_size:(j+1)*batch_size]
        pred=self.predict(X_batch)
        lossgrad=pred-y_batch 
        #for regression, make sure shape of y_batch is (n,1)
        #for Softmax classification, make sure y_batch is OneHot encoded
        self.backprop(lossgrad)
        self.update(lr)
      if n%batch_size!=0: #Check if there is a smaller leftover batch
        X_batch=X_shuffle[num_batches*batch_size:] 
        y_batch=y_shuffle[num_batches*batch_size:] 
        pred=self.predict(X_batch)
        lossgrad=pred-y_batch 
        self.backprop(lossgrad)
        self.update(lr)
      if i%50==0: #Change this line to update reporting more/less frequently
        print("epoch: ",i,", loss: ",loss_fn(self.predict(X),y))

Let's test your code on the good old iris dataset!

Run this code block to import the dataset and define the feature matrix:

In [None]:
import pandas as pd
iris=pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv',index_col=0)

X=np.array(iris.iloc[:,:4]) #All four flower trait features

Next, we convert the target column (Species) to numerical values, and do a one-hot encoding:

In [None]:
flowerdict={'setosa':0,'versicolor':1,'virginica':2}
target=iris['Species'].apply(lambda x:flowerdict[x])
target=np.array(target)
y=OneHot(target)

Define the model:

In [None]:
NN=Model([])
NN.add(Linear(4,4))
NN.add(Linear(4,3))
NN.add(Softmax())

Now, if your code works, we can train the model, and report the accuracy as it improves:

In [None]:
NN.train(X,y,500,50,0.01,Accuracy)

epoch:  0 , loss:  0.29333333333333333
epoch:  50 , loss:  0.8533333333333334
epoch:  100 , loss:  0.9
epoch:  150 , loss:  0.9333333333333333
epoch:  200 , loss:  0.94
epoch:  250 , loss:  0.9466666666666667
epoch:  300 , loss:  0.9466666666666667
epoch:  350 , loss:  0.9533333333333334
epoch:  400 , loss:  0.9533333333333334
epoch:  450 , loss:  0.9533333333333334


Now try a regression task. First we'll import a toy dataset:

In [None]:
mtcars=pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/datasets/mtcars.csv',index_col=0)
mtcars

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4


1. Make the first column (mpg) the target array, `y`. Make sure `y` is a numpy array of shape (32,1). 
2.  Make the feature matrix `X` the remaining columns. Make sure `X` is a numpy array of shape (32,10)
3. Scale both X and y to obtain X_scaled and y_scaled.
4. Define a neural network called `mtNN` with two linear layers. The first layer should have 10 inputs and 10 outputs. The second layer should have 10 inputs and 1 output. 
5. Train your neural network on X_scaled and y_scaled. Use 500 epochs, a batch size of 5, a learning rate of 0.01, and the `MSE` function to report loss during training.

In [None]:
#1
y = np.array(mtcars.iloc[:, 0]).reshape(32,1)

#2
X = np.array(mtcars.iloc[:, 1:]).reshape(32,10)

#3
y_scaler = Scaler(y)
X_scaler = Scaler(X)


X_scaled = X_scaler.scale(X)
y_scaled = y_scaler.scale(y)

#4
mtNN = Model([])
mtNN.add(Linear(10,10))
mtNN.add(Linear(10,1))
mtNN.add(Softmax())
mtNN.train(X_scaled,y_scaled,500,5,.01,MSE)

epoch:  0 , loss:  0.4090318169203354
epoch:  50 , loss:  0.4090318169203354
epoch:  100 , loss:  nan
epoch:  150 , loss:  nan
epoch:  200 , loss:  nan
epoch:  250 , loss:  nan
epoch:  300 , loss:  nan


  return np.exp(input)/np.sum(np.exp(input),axis=1)[:,np.newaxis]


epoch:  350 , loss:  nan
epoch:  400 , loss:  nan
epoch:  450 , loss:  nan
