# Assignment #3
## P556: Applied Machine Learning

More often than not, we will use a deep learning library (Tensorflow, Pytorch, or the wrapper known as Keras) to implement our models. However, the abstraction afforded by those libraries can make it hard to troubleshoot issues if we don't understand what is going on under the hood. In this assignment you will implement a fully-connected and a convolutional neural network from scratch. To simplify the implementation, we are asking you to implement static architectures, but you are free to support variable number of layers/neurons/activations/optimizers/etc. We recommend that you make use of private methods so you can easily troubleshoot small parts of your model as you develop them, instead of trying to figure out which parts are not working correctly after implementing everything. Also, keep in mind that there is code from your fully-connected neural network that can be re-used on the CNN. 

Problem #1.1 (40 points): Implement a fully-connected neural network from scratch. The neural network will have the following architecture:

- Input layer
- Dense hidden layer with 512 neurons, using relu as the activation function
- Dropout with a value of 0.2
- Dense hidden layer with 512 neurons, using relu as the activation function
- Dropout with a value of 0.2
- Output layer, using softmax as the activation function

The model will use categorical crossentropy as its loss function. 
We will optimize the gradient descent using RMSProp, with a learning rate of 0.001 and a rho value of 0.9.
We will evaluate the model using accuracy.

Why this architecture? We are trying to reproduce from scratch the following [example from the Keras documentation](https://keras.io/examples/mnist_mlp/). This means that you can compare your results by running the Keras code provided above to see if you are on the right track.

In [1]:
class NeuralNetwork(object):
    
    def __init__(self,neurons,epochs,learning_rate,rho,batch_size):
        self.neurons = neurons
        self.learning_rate = learning_rate
        self.rho = rho
        self.epochs = epochs
        
    
    #Activation function
    def ReLu(self,x):
        return np.maximum(0,x)
    
    #Dropout probability
    #https://towardsdatascience.com/coding-neural-network-dropout-3095632d25ce
    def dropout(self,x, drop_probab):
        keep_probab = 1 - drop_probab
        k = np.random.uniform(0, 1.0, x.shape)
        mask = k < keep_probab
        if keep_probab > 0:
            scale = (1.0/keep_probab)
        else:
            scale = 0
        return mask * x * scale
    
    #Softmax function
    #https://deepnotes.io/softmax-crossentropy
    def softmax(self,x):
        e_x = np.exp(x-np.max(x, axis=1, keepdims=True))
        e_x_summ = np.sum(e_x,axis=1, keepdims=True)
        return e_x / e_x_summ

    #Categorical Cross entropy loss function
    def cross_entropy(self,y, a3):
        N = y.shape[0]
        #loss = -np.sum((y*np.log(a3+1e-9))+((1-y)*np.log(1-a3+1e-9)))/N
        loss = -np.sum((y*np.log(a3+1e-12)))/N
        loss  = np.squeeze(loss)
        return loss

    #Relu backward derivative
    def relu_backward(self,x):
        x[x<=0] = 0
        x[x>0] = 1
        return x
    
    #Softmax derivative
    def dsoftmax(self,x):
        return np.dot(x.T,(1-x))
    
    def parameters(self,neurons,x,y):
        self.neurons = neurons
        w1=np.random.randn(x.shape[1],self.neurons) 
        w2=np.random.randn(self.neurons,self.neurons) 
        w3=np.random.randn(self.neurons,y.shape[1]) 
        
        b1= np.random.randn(1,self.neurons) 
        b2= np.random.randn(1,self.neurons) 
        b3= np.random.randn(1,y.shape[1]) 
        
            
        wb = [w1,b1,w2,b2,w3,b3]
        
        return wb
    
    #Referred to for forwar&backward: https://towardsdatascience.com/neural-networks-from-scratch-easy-vs-hard-b26ddc2e89c7
    #Forward propagation calculation
    def feed_forward(self,x,wb):
        #1st layer
        z1 = np.dot(x,wb[0])+wb[1]
        a1 = NeuralNetwork.ReLu(self,z1)
        a1 = NeuralNetwork.dropout(self,a1, 0.2)#drop_out
        #print("a1",a1)
        
        #2nd layer
        z2 = np.dot(a1,wb[2])+wb[3]
        a2 = NeuralNetwork.ReLu(self,z2)
        a2 = NeuralNetwork.dropout(self,a2, 0.2)#drop_out
        #print("a2",a2)
        
        #3rd layer
        z3 = np.dot(a2,wb[4])+wb[5]
        a3 = NeuralNetwork.softmax(self,z3)
        
        #print("A3",a3)
        return a1,a2,a3,z1,z2
    
    #https://ml-cheatsheet.readthedocs.io/en/latest/backpropagation.html
    #Backward propagation calculation
    def backprop(self,a1,a2,a3,wb,x,y,z1,z2):
        #partial derivatives calculated 
        #https://stackoverflow.com/questions/44130892/in-neural-networks-accuracy-improvement-after-each-epoch-is-greater-than-accura
        dcost = (a3-y)#output error 
        #Layer 1 derivatives
        #a3_doh = np.dot((dcost),NeuralNetwork.dsoftmax(self,a3))
        a3_doh = dcost
        dw3 = np.dot(a2.T, a3_doh)
        db3 = np.sum(a3_doh, axis=0)
        
        #Layer 2 derivatives
        z2_doh = np.dot(a3_doh,wb[4].T)
        a2_doh = z2_doh * NeuralNetwork.relu_backward(self,z2)#*NeuralNetwork.dropout(self,z2,0.2)
        dw2 = np.dot(a1.T, a2_doh)
        db2 = np.sum(a2_doh, axis=0)
        
        #Layer 3 derivatives
        z1_doh = np.dot(a2_doh,wb[2].T)
        a1_doh = z1_doh * NeuralNetwork.relu_backward(self,z1)#*NeuralNetwork.dropout(self,z1,0.2)
        dw1 = np.dot(x.T, a1_doh)
        db1 = np.sum(a1_doh, axis=0)

        grad_parameters = [dw1,db1,dw2,db2,dw3,db3]
        return grad_parameters
    
    #Fits parameters and calculates gradients
    def fit(self,wb,x,y):
        a1,a2,a3,z1,z2 = self.feed_forward(x,wb)
        grad_parameters = self.backprop(a1,a2,a3,wb,x,y,z1,z2)

        return a3,grad_parameters
    
    #RMSProp parameter optimizer
    def parameter_update(self,S_grad,lr,rho,wb,grad_parameters,batch_size):
        lr = 0.001
        rho = 0.9
        for i in range(len(S_grad)):
            S_grad[i] = rho * S_grad[i] + (1 - rho) * ((grad_parameters[i])**2)
            rt = (np.sqrt(S_grad[i])+1e-12)
            wb[i] = wb[i] - (lr * grad_parameters[i]/(rt))

            return wb

    #Evaluation function for Accuracy
    def evaluate(self,a3,y):
        y_actual = np.argmax(y,axis=1)
        y_pred = np.argmax(a3,axis=1)
        #correct_pred = 0
        #for i in range(y.shape[0]):
            #correct_pred += 1
            #return float((correct_pred/len(y))*100) 
        accuracy = np.mean(y_actual == y_pred)
        return accuracy * 100       

Problem #1.2 (10 points): Train your fully-connected neural network on the Fashion-MNIST dataset using 5-fold cross validation. Report accuracy on the folds, as well as on the test set.

In [2]:
# To simplify the usage of our dataset, we will be importing it from the Keras 
# library. Keras can be installed using pip: python -m pip install keras

# Original source for the dataset:
# https://github.com/zalandoresearch/fashion-mnist

# Reference to the Fashion-MNIST's Keras function: 
# https://keras.io/datasets/#fashion-mnist-database-of-fashion-articles

import keras.utils
from keras.datasets import fashion_mnist

import numpy as np

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

#print(x_train.shape)
#print(x_test.shape)

# convert class vectors to binary class matrices
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

Using TensorFlow backend.


60000 train samples
10000 test samples


In [3]:
from sklearn.model_selection import KFold

#total X and Y data
X_total = np.concatenate((x_train,x_test),axis=0)
Y_total = np.concatenate((y_train,y_test),axis=0)

neurons = 512
epochs = 20
batch_size = 128
rho = 0.9
learning_rate = 0.001
neural_net = NeuralNetwork(neurons,epochs,learning_rate,rho,batch_size)
wb = neural_net.parameters(neurons,x_train,y_train)

#Referred to:https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html
#https://machinelearningmastery.com/k-fold-cross-validation/
k_splits = KFold(n_splits = 5)#5-fold
fold_count = 1
for train,test in k_splits.split(X_total):
    print("\n","Train Index: ", train)
    print("Test Index: ", test)
    x, x_test = X_total[train], X_total[test]
    y, y_test = Y_total[train], Y_total[test]
    print("\n","Fold: ",fold_count)
    #print(x_test.shape)
    fold_count += 1#fold count increments for range set
    wb = neural_net.parameters(neurons,x,y)

    for epoch in range(epochs):
        itr = int(x.shape[0]/batch_size)
        S_grad = [0,0,0,0,0,0]
        grad_params = [0,0,0,0,0,0]
        print("Epoch no: ",epoch+1)
        
        loss_hist = []
        
        for i in range(itr):
        
            X=x[i*batch_size:(i+1)*batch_size,:]
            Y=y[i*batch_size:(i+1)*batch_size,:]
            X_test=x_test[i*batch_size:(i+1)*batch_size,:]
            Y_test=y_test[i*batch_size:(i+1)*batch_size,:]
            
            m3,grad_params= neural_net.fit(wb,X,Y)
            loss = neural_net.cross_entropy(Y, m3)
            loss_hist.append(loss)
            wb = neural_net.parameter_update(S_grad,learning_rate,rho,wb,grad_params,batch_size)
    
    #print(loss_hist)
    a1,a2,train_pred,k1,k2 = neural_net.feed_forward(x,wb)
    acc = neural_net.evaluate(train_pred,y)
    print("Accuracy of train data: ",acc) 
    
    b1,b2,test_pred,l1,l2 = neural_net.feed_forward(x_test,wb)
    Acc = neural_net.evaluate(test_pred,y_test)
    print("Accuracy of test data: ",Acc) 


 Train Index:  [14000 14001 14002 ... 69997 69998 69999]
Test Index:  [    0     1     2 ... 13997 13998 13999]

 Fold:  1
Epoch no:  1
Epoch no:  2
Epoch no:  3
Epoch no:  4
Epoch no:  5
Epoch no:  6
Epoch no:  7
Epoch no:  8
Epoch no:  9
Epoch no:  10
Epoch no:  11
Epoch no:  12
Epoch no:  13
Epoch no:  14
Epoch no:  15
Epoch no:  16
Epoch no:  17
Epoch no:  18
Epoch no:  19
Epoch no:  20
Accuracy of train data:  77.78571428571428
Accuracy of test data:  77.29285714285714

 Train Index:  [    0     1     2 ... 69997 69998 69999]
Test Index:  [14000 14001 14002 ... 27997 27998 27999]

 Fold:  2
Epoch no:  1
Epoch no:  2
Epoch no:  3
Epoch no:  4
Epoch no:  5
Epoch no:  6
Epoch no:  7
Epoch no:  8
Epoch no:  9
Epoch no:  10
Epoch no:  11
Epoch no:  12
Epoch no:  13
Epoch no:  14
Epoch no:  15
Epoch no:  16
Epoch no:  17
Epoch no:  18
Epoch no:  19
Epoch no:  20
Accuracy of train data:  77.5
Accuracy of test data:  77.07857142857144

 Train Index:  [    0     1     2 ... 69997 69998 69

Problem #2.1 (40 points): Implement a Convolutional Neural Network from scratch. Similarly to problem 1.1, we will be implementing the same architecture as the one shown in [Keras' CNN documentation](https://keras.io/examples/mnist_cnn/). That is:

- Input layer
- Convolutional hidden layer with 32 neurons, a kernel size of (3,3), and relu activation function
- Convolutional hidden layer with 64 neurons, a kernel size of (3,3), and relu activation function
- Maxpooling with a pool size of (2,2)
- Dropout with a value of 0.25
- Flatten layer
- Dense hidden layer, with 128 neurons, and relu activation function
- Dropout with a value of 0.5
- Output layer, using softmax as the activation function

Our loss function is categorical crossentropy and the evaluation will be done using accuracy, as in Problem 1.1. However, we will not be using the gradient optimizer known as Adadelta.

In [4]:
class ConvolutionalNeuralNetwork(object):
  def __init__(epochs, learning_rate):
    pass
  
  def fit(self):
    pass
  
  def evaluate(self):
    pass

Problem #2.2 (10 points): Train your convolutional neural network on the Fashion-MNIST dataset using 5-fold cross validation. Report accuracy on the folds, as well as on the test set.

In [5]:
import keras
from keras.datasets import fashion_mnist

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

60000 train samples
10000 test samples
