In this notebook we will learn the basic building blocks of PyTorch, and implement a small, simple neural network, but using the flexible and expandable framework.

As you did with the decision tree, the recommended process is to first work through the notebook, and then copy/paste the methods into the accompanying tinyNeuralNet.py, which you can then test using github classroom. Then, when you have passed all the tests, you can create the figures asked for in the homework writeup.



In [1]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.io as sio

SMALLNUMBER = 0.00000001
BIGNUMBER = 100000000

Since we will need it many times, first code up the sigmoid function. You may want to test this against github classroom first, to avoid any early errors. 

In [2]:
def sigmoid_fn(s):
   return 1/(1+np.exp(-s))

To understand the structure of this code, I have written up the main parts of the class Model() for you, so you do not need to modify it. Study the format, and use it to guide the building of the remaining compoents. 


In [6]:

class Model():
    def __init__(self):
        self.states = [] # contains state variables s_k
        self.states_grad = [] # contains derivatives d loss / d s_k. Note that d Loss/ ds_0 is not needed.
        self.layers = [] #contains network modules
        
    def forward(self,X,y):
        s = X
        self.states = [s]
        for l in self.layers:
            s = l.forward(s)
            self.states.append(s)
        loss = self.loss_fn.forward(s,y)
        self.loss = loss
        
        yhat = s
        return np.mean(loss), yhat
        
    
    def backward(self,X,y):
        sgrad = self.loss_fn.backward(y,self.states[-1])
        self.states_grad = [sgrad]
        for i in range(len(self.states)-2,0,-1):
            sgrad = self.layers[i].backward(sgrad,self.states[i])
            self.states_grad.insert(0,sgrad)
            
            
    def update_params(self, stepsize):
        for i in range(len(self.layers)):
            layer = self.layers[i]
            if layer.num_params > 0:
                layer.update_var(self.states_grad[i],self.states[i], stepsize)
                
                
                


Next, we will build the  two activation functions (ReLU and Sigmoid). You do not need to touch the Module class, but fill in the forward and backward functions in ReLU and Sigmoid. 

Remember, a forward function performs s_next = f(s), and a backward function computes d loss / d s, given d loss / d splus, and s. 

In [1]:

class Module():
    def __init__(self):
        self.params = None
    
    
class ReLU(Module):
    def __init__(self):
        self.num_params = 0.
        
    def forward(self,s):
       return np.maximum(0,s)
    
    
    def backward(self,dLdsn,s):
        dL_ds_in = dLdsn.copy()
        dL_ds_in[s <= 0] = 0
        return dL_ds_in
       
    
    
class Sigmoid(Module):
    def __init__(self):
        self.num_params = 0.
        
    def forward(self,s):
        return 1/(1+np.exp(-s))
    
    
    def backward(self,dLdsn,s):
        sig = self.forward(s)
        dsig = sig * (1 - sig)  
        return dLdsn * dsig
    
    
    

The linear module is similar to the two activation functions, in that it requires a forward and backward step. However, additionally, you also need to have a function that produces d loss / d W and d loss / d b, in order to do the back propagation. Then, using the given step size, make gradient steps. Test these functions carefully before moving on.  

In [3]:

class Linear(Module):
    def __init__(self, num_outparam,num_inparam, weight_std = 1):
        self.W = np.random.randn(num_outparam,num_inparam)*weight_std
        self.b = np.random.randn(num_outparam)*weight_std
        self.num_params = num_inparam*num_outparam + num_outparam
        
    def forward(self,s):
         self.output = np.dot(self.W, s) + np.outer(self.b, np.ones(s.shape[1]))
         return self.output
    def backward(self,dLdsn,s):
       self.dW = np.dot(dLdsn, s.T)  # Shape should be the same as self.W

        # Gradient with respect to biases
       self.db = np.sum(dLdsn, axis=1)  # Sum across samples if batch processing

        # Gradient with respect to input
       grad_input = np.dot(self.W.T, dLdsn)
       return grad_input
    
    
    def update_var(self,dLdsn,s, stepsize):
       dW = 0.
       db = 0.
        
   
       batchsize = s.shape[1]
       dW = np.dot(dLdsn,s.T)
       db = np.sum(dLdsn,axis=1)
        


       self.W = self.W - dW*stepsize
       self.b = self.b - db*stepsize
        
       return self.W,self.b
    

In the next section, we will focus on making a simple neural network, using a ReLU activation function, and a BCEloss (which is Pytorch's name for logistic loss). The loss function is much like the modules, except in its forward and backward steps, it is focusing on acting on y and yhat.  Code up this loss.

Note that to deal with numerical issues, we will use a SMALLNUMBER in the log function to avoid getting infs. Since the output of this loss function does not affect the back propagation, we can rest easy to know that this trick only helps us with our metrics, but does not affect the guts of the algorithm.

In [4]:

class Loss():
    pass

class BCELoss(Loss):
    def __init__(self):
        pass
        
    # f(y,yhat) = -log(max(SMALLNUMBER, sigmoid(y*yhat)))
    def forward(self,y,yhat):
       yhat_clipped = np.clip(yhat, SMALLNUMBER, 1 - SMALLNUMBER)  # Clip yhat to avoid log(0)
       loss = -np.mean(y * np.log(yhat_clipped) + (1 - y) * np.log(1 - yhat_clipped))
       return loss
        
    def backward(self,y,yhat):
        yhat_clipped = np.clip(yhat, SMALLNUMBER, 1 - SMALLNUMBER)  # Clip yhat to avoid division by zero
        grad = (yhat_clipped - y) / (yhat_clipped * (1 - yhat_clipped))
        return grad
    
    
    

Finally, build the neural network. You will do this by filling self.layers with the necessary units. The rest of the work is now handled by the Model parent class!

In [7]:

    
class SimpleReluClassNN(Model):
    def __init__(self, num_layers, hidden_width, num_inparam):
       super().__init__()
        # Create layers
       for i in range(num_layers):
            input_dim = num_inparam if i == 0 else hidden_width
            output_dim = 1 if i == num_layers - 1 else hidden_width

            # Add a fully connected layer
            self.layers.append(Linear(input_dim, output_dim))

            # Add a ReLU layer, except for the output layer
            if i < num_layers - 1:
                self.layers.append(ReLU())
    
        
    def inference(self,X,y):
        loss, yhat = self.forward(X,y)
        yhat = np.sign(yhat)
        return loss, yhat

Test your overall neural network. Plot its train/test loss, and train/test accuracy over 0/1 disambiguation, over a range of step sizes, width, and layers. You pick what to sweep, but pick at least 2 interesting hyperparameters.

In [7]:
data = sio.loadmat('mnist.mat')
Xtrain = data['trainX'].T
Xtest = data['testX'].T
ytrain = data['trainY'][0,:]
ytest = data['testY'][0,:]


idx = np.less(ytrain,2)
Xtrain = Xtrain[:,idx]
ytrain = ytrain[idx].astype(int)
ytrain[ytrain==0] = -1

idx = np.less(ytest,2)
Xtest = Xtest[:,idx]
ytest = ytest[idx].astype(int)
ytest[ytest==0] = -1



Well, that was fun, wasn't it? And you can see how this whole construct is flexible and extendable? 

To show you really understand the concept, now build a simple neural network, but instead of ReLU, use a sigmoid function as an activation, and instead of BCEloss, use MSEloss (e.g. regression loss). Use the numeric labels as the target value. Again, report the train/test loss, over a couple of interesting hyperparameters.

In [8]:



class MSELoss(Loss):
    def __init__(self):
        self.num_params = 0.
        
    def forward(self,y,yhat):
        return None
        
    def backward(self,y,yhat):
        return None
    
    
class SimpleSigmoidRegressNN(Model):
    def __init__(self, num_layers, hidden_width, num_inparam):
        pass
        
    def inference(self,X,y):
        loss, yhat = self.forward(X,y)        
        return loss, yhat
        
        
        
###################
        
        


data = sio.loadmat('mnist.mat')
Xtrain = data['trainX'].T
Xtest = data['testX'].T
ytrain = data['trainY'][0,:].astype(float)
ytest = data['testY'][0,:].astype(float)




Finally, show that you can extend this concept to *multiclass* regression as well. Note that here, yhat is not a 1-D variable, but rather a k-D variable, where k is the number of classes. 

Hint: Start by coding up CrossEntropyLoss and testing that correctness first. 

Report the train/test loss, and train/test accuracy over iterations, across 2 interesting hyperparameters.


In [9]:
## multiclass classification

    

class CrossEntropyLoss(Loss):
    def __init__(self):
        self.num_params = 0.
        
    def forward(self,y,yhat):
        return None
        
    def backward(self,y,yhat):
        return None
    
    
class SimpleReluMulticlassNN(Model):
    def __init__(self, num_layers, hidden_width, num_inparam, num_classes):
        pass
    
    def inference(self,X,y):
        loss, yhat = self.forward(X,y)
        yhat = np.argmax(yhat,axis=0)
        
        return loss, yhat

##################

    
import scipy
    


data = sio.loadmat('mnist.mat')
Xtrain = data['trainX'].T
Xtest = data['testX'].T
ytrain = data['trainY'][0,:]
ytest = data['testY'][0,:]

ytrain_mat = scipy.sparse.coo_matrix((np.ones(len(ytrain)),(ytrain,range(len(ytrain))))).toarray()
ytest_mat = scipy.sparse.coo_matrix((np.ones(len(ytest)),(ytest,range(len(ytest))))).toarray()
 


