# Assignment #3
## P556: Applied Machine Learning

More often than not, we will use a deep learning library (Tensorflow, Pytorch, or the wrapper known as Keras) to implement our models. However, the abstraction afforded by those libraries can make it hard to troubleshoot issues if we don't understand what is going on under the hood. In this assignment you will implement a fully-connected and a convolutional neural network from scratch. To simplify the implementation, we are asking you to implement static architectures, but you are free to support variable number of layers/neurons/activations/optimizers/etc. We recommend that you make use of private methods so you can easily troubleshoot small parts of your model as you develop them, instead of trying to figure out which parts are not working correctly after implementing everything. Also, keep in mind that there is code from your fully-connected neural network that can be re-used on the CNN. 

Problem #1.1 (40 points): Implement a fully-connected neural network from scratch. The neural network will have the following architecture:

- Input layer
- Dense hidden layer with 512 neurons, using relu as the activation function
- Dropout with a value of 0.2
- Dense hidden layer with 512 neurons, using relu as the activation function
- Dropout with a value of 0.2
- Output layer, using softmax as the activation function

The model will use categorical crossentropy as its loss function. 
We will optimize the gradient descent using RMSProp, with a learning rate of 0.001 and a rho value of 0.9.
We will evaluate the model using accuracy.

Why this architecture? We are trying to reproduce from scratch the following [example from the Keras documentation](https://keras.io/examples/mnist_mlp/). This means that you can compare your results by running the Keras code provided above to see if you are on the right track.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import numpy as np
from matplotlib import pyplot as plt
from sklearn.model_selection import cross_val_score
import copy 

In [3]:
np.seterr(divide='ignore', invalid='ignore')

{'divide': 'warn', 'over': 'warn', 'under': 'ignore', 'invalid': 'warn'}

In [4]:
def relu(x):
    return np.maximum(0,x)

In [5]:
def softmax(x):
    expo = []
    for val in x:
        v = np.exp(val - np.max(val))
        expo.append(v / v.sum())
    return np.array(expo)

In [6]:
def cross_entropy(y_preds, y_train):
    
    y_preds = softmax(y_preds)
    
    # number of rows
    len_out = y_preds.shape[0]
    
    # find actuals
    y_train = y_train.astype(int)
    y_train = np.argmax(y_train, axis = 1)
    
    # find the minimum non zero value
    m = np.min(y_preds[np.nonzero(y_preds)])
    
    # replace all zeros with min non zero val
    y_preds[y_preds == 0] = m
    
    nll = -np.log(y_preds[range(len_out), y_train])
    nll = np.mean(nll)
    return nll

In [7]:
 def accuracy(y_preds, y_train):
    y_train = np.argmax(y_train, axis = 1)
    y_preds = np.argmax(y_preds, axis = 1)
    return (np.sum(np.equal(y_preds, y_train)) / len(y_train)) * 100

In [8]:
def calc_gradients(nn, x_train, y_train):
    y_train = np.argmax(y_train, axis = 1)
    m = y_train.shape[0]
    grad = softmax(nn.layer3)
    grad[range(m),y_train] -= 1
    grad = grad/m


    dout = grad
    dw3 = np.dot(dout.T, nn.relu2)
    db3 = np.sum(dout) / m
    d_relu22 = dout.dot(nn.w3.T)
    d_relu2 = d_relu22.copy()
    d_relu2[d_relu2 < 0] = 0

    dw2 = nn.u2.T * (np.dot(d_relu22.T, nn.relu1))
    db2 = np.sum(d_relu22) / m
    d_relu11 = d_relu2.dot(nn.w2)
    d_relu1 = d_relu11.copy()
    d_relu1[d_relu1 < 0] = 0
    
    dw1 = nn.u1.T * (np.dot(d_relu11.T, x_train))
    db1 = np.sum(d_relu11) / m
    return [dw1, dw2, dw3, db1, db2, db3]

In [9]:
def update_weights(model, gr, lr):
    model.w1 -= gr[0].T * lr
    model.w2 -= gr[1].T * lr
    model.w3 -= gr[2].T * lr

In [10]:
class NeuralNetwork(object):
    def __init__(self, epochs, learning_rate):
        self.epochs = epochs
        self.learning_rate = learning_rate
        self.n_in = 784
        self.n_h = 512
        self.n_out = 10
        self.p = 0.8
        self.w1 = np.random.randn(self.n_in,self.n_h)
        self.b1 = np.zeros(self.n_h)
        self.w2 = np.random.randn(self.n_h,self.n_h)
        self.b2 = np.zeros(self.n_h)
        self.w3 = np.random.randn(self.n_h,self.n_out)
        self.b3 = np.zeros(self.n_out)
        self.u1 = np.random.binomial(1, self.p, size=self.w1.shape)
        self.u2 = np.random.binomial(1, self.p, size=self.w2.shape)
    
    def fit(self, x):
        self.layer1 = np.dot(x, self.u1 * self.w1) + self.b1
        self.relu1 = relu(self.layer1)
        self.layer2 = np.dot(self.relu1, self.u2 * self.w2) + self.b2
        self.relu2 = relu(self.layer2)
        self.layer3 = np.dot(self.relu2,self.w3) + self.b3
        return self.layer3
        
    def evaluate(self):
        pass

In [11]:
def rms(model,rho,g):
    grad_squared1 = 0
    grad_squared2 = 0
    grad_squared3 = 0
    grad_bias1 = 0
    grad_bias2 = 0
    grad_bias3 = 0
    grad_squared1 = rho * grad_squared1 + (1-rho) * g[0] * g[0]
    grad_bias1 = rho * grad_bias1 + (1-rho) * g[3] * g[3]
    eps=1e-4
    sq1 = np.sqrt(grad_squared1)
    sq_1 = np.sqrt(grad_bias1)
  
    s1 = np.min(sq1[np.nonzero(sq1)])
    sq1[sq1 == 0] = s1
    temp1 = (model.learning_rate / sq1 + eps ) * g[0]
    temp_1 = (model.learning_rate / sq_1 + eps) * g[3]
    model.w1 -= temp1.T
    model.b1 -= temp_1.T
    
    grad_squared2 = rho * grad_squared2 + (1-rho) * g[1] * g[1]
    grad_bias2 = rho * grad_bias2 + (1-rho) * g[4] * g[4]
    sq2 = np.sqrt(grad_squared2)
    sq_2 = np.sqrt(grad_bias2)
    s2 = np.min(sq2[np.nonzero(sq2)])
    sq2[sq2 == 0] = s2
    temp2 = (model.learning_rate / sq2 + eps) * g[1]
    temp_2 = (model.learning_rate / sq_2 + eps) * g[4]
    model.w2 -= temp2.T
    model.b2 -= temp_2.T
    
    grad_squared3 = rho * grad_squared3 + (1-rho) * g[2] * g[2]
    grad_bias3 = rho * grad_bias3 + (1-rho) * g[5] * g[5]
    sq3 = np.sqrt(grad_squared3)
    sq_3 = np.sqrt(grad_bias3)
    s3 = np.min(sq3[np.nonzero(sq3)])
    sq3[sq3 == 0] = s3
    temp3 = (model.learning_rate / sq3 + eps) * g[2]
    temp_3 = (model.learning_rate / sq_3 + eps) * g[5]
    model.w3 -= temp3.T
    model.b3 -= temp_3.T


In [12]:
def save_model(model):
    params = {}
    params['w1'] = model.w1
    params['w2'] = model.w2
    params['w3'] = model.w3
    params['b1'] = model.b1
    params['b2'] = model.b2
    params['b3'] = model.b3
    return params
    

Problem #1.2 (10 points): Train your fully-connected neural network on the Fashion-MNIST dataset using 5-fold cross validation. Report accuracy on the folds, as well as on the test set.

In [15]:
# To simplify the usage of our dataset, we will be importing it from the Keras 
# library. Keras can be installed using pip: python -m pip install keras

# Original source for the dataset:
# https://github.com/zalandoresearch/fashion-mnist

# Reference to the Fashion-MNIST's Keras function: 
# https://keras.io/datasets/#fashion-mnist-database-of-fashion-articles

import keras
from keras.datasets import fashion_mnist

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float64')
x_test = x_test.astype('float64')
x_train /= 255
x_test /= 255
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

Using TensorFlow backend.


60000 train samples
10000 test samples


In [18]:
bs = 128
n = 60000
def train(x_train, y_train):
    model = NeuralNetwork(10,0.001)
    loss = []
    acc = []
    m = 0
    for i in range(50):
        preds = model.fit(x_train)

        # calculate the loss
        loss.append(cross_entropy(preds, y_train))
        acc.append(accuracy(preds, y_train))
#         print("Loss at iteration ", i,"is", loss[-1])
#         print("Accuracy ", acc[-1])
        if acc[-1] > m:
            m = acc[-1]
            p = save_model(model)
            md = copy.deepcopy(model)
            
#             print("Better model found at epoch", i , "with accuracy ", acc[-1])

        # calculate the gradients
        g = calc_gradients(model, x_train, y_train)

        # update weights
        rms(model,0.9,g)
#     print("Finished training for fold")
#     test_preds = model.fit(x_test)
#     print("Test loss: ",cross_entropy(test_preds, y_test))
#     print(accuracy(test_preds, y_test))
#     test_preds = model()
    return p, m, md


In [19]:
def k_fold(folds = 5):
    num_folds = 5
    saved = {}
    subset_size = int(len(x_train)/num_folds)
#     print(subset_size)
    for i in range(num_folds):
        start_idx = i*subset_size
        end_idx = start_idx + subset_size
        test_idx = np.array([i for i in range(start_idx, end_idx)])
#         print(test_idx[:5])
        train_idx = np.array([i for i in range(len(x_train)) if i not in test_idx])
        bm, ba, b_model = train(x_train[train_idx], y_train[train_idx])
        saved[i] = [bm,ba]
        print("Best training accuracy for fold ", i+1, "is" ,ba)
        print("Finished training for fold", i+1)
        
        
        test_preds = b_model.fit(x_train[test_idx])
        print("Fold Test loss: ",cross_entropy(test_preds, y_train[test_idx]))
        print("Fold Test accuracy: ", accuracy(test_preds, y_train[test_idx]))
        
        test_preds1 = b_model.fit(x_test)
        print("Actual Test data loss: ",cross_entropy(test_preds1, y_test))
        print("Actual Test data accuracy: ",accuracy(test_preds1, y_test))
        
        print("=" * 40)

In [20]:
k_fold()

Best training accuracy for fold  1 is 50.15625
Finished training for fold 1
Fold Test loss:  304.26719673638655
Fold Test accuracy:  50.375
Actual Test data loss:  312.958774858521
Actual Test data accuracy:  49.57
Best training accuracy for fold  2 is 51.35833333333333
Finished training for fold 2
Fold Test loss:  304.0014301127699
Fold Test accuracy:  50.83333333333333
Actual Test data loss:  303.0186569970457
Actual Test data accuracy:  51.129999999999995
Best training accuracy for fold  3 is 55.2875
Finished training for fold 3
Fold Test loss:  289.02220027450284
Fold Test accuracy:  55.41666666666667
Actual Test data loss:  294.73723514939365
Actual Test data accuracy:  54.900000000000006
Best training accuracy for fold  4 is 51.260416666666664
Finished training for fold 4
Fold Test loss:  300.39982127812044
Fold Test accuracy:  51.366666666666674
Actual Test data loss:  305.24779693774093
Actual Test data accuracy:  50.760000000000005
Best training accuracy for fold  5 is 46.1479