<a href="https://colab.research.google.com/github/poornaprag/deep-learning/blob/master/Dig_MNSIT_From_Scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Graded Assignment 1  
Implementing MNIST Digit Classifier From Scratch  
Deep Learning  
Spring 2020  
Author: Poornapragna Vadiraj  



MNIST Classification has been done many times before but what makes this assignment challening is that it should be done from scratch without using higher level libraries or frameworks (like Keras, TF)

I will enumerate the high level requirements for the assignment:

1. The notebook needs to have a mini batch gradient descent along with an acceptable learning rate.

2. The application should perform drop out -- try multiple dropout levels and select the one that fits best. 

3. The code should correctly configure the random weights of the network.

4. The code should allow simple image improvements to supplement the training data 

5. The code should use 3 or more layers for training.

6. The application must continue to use the relu activation layer in the right place, such as python application

In [1]:
# Load all the dependencies
import numpy as np
import keras
# Please note that I am only using Keras for importing the initial dataset only
# Its functions/classes have not been used for the actual classification
import matplotlib.pyplot as plt
from tqdm import trange
from keras.preprocessing.image import ImageDataGenerator


Using TensorFlow backend.


In [0]:
# Python class to emulate a single neural network layer
class NN_Layer:
  def __init__(self):
    self.wgts = np.zeros(shape=(input.shape[1], 10))
    bs = np.zeros(shape=(10,))

  def forward(self, ip):
        op = np.matmul(ip, self.wgts) + bs
        return op

In [0]:
# ReLU dense layer with learning rate inputs
class ReLuLayer(NN_Layer):
    def __init__(self):
        pass

    def backward(self, ip, grad_op):
        relu_g = ip > 0
        return grad_op*relu_g 
    
    def forward(self, ip):
        return np.maximum(0,ip)

class DenseLayer(NN_Layer):
    def __init__(self, ips, ops, learning_rate=0.1):
        self.wgts = np.random.randn(ips, ops)*0.01  # Normal dist randomization
        self.bs = np.zeros(ops)
        self.learning_rate = learning_rate

    def forward(self,input):
        return np.matmul(input, self.wgts) + self.bs

    def backward(self,ip,grad_op):
        grad_ip = np.dot(grad_op,np.transpose(self.wgts))
        grad_wgts = np.transpose(np.dot(np.transpose(grad_op),ip))
        grad_bs = np.sum(grad_op, axis = 0)
        
        # SGD
        self.bs = self.bs - self.learning_rate * grad_bs
        self.wgts = self.wgts - self.learning_rate * grad_wgts
        return grad_ip

In [0]:
# Loss functions
def softcentropy(lgs,ref_ans):
    log_ans = lgs[np.arange(len(lgs)),ref_ans]
    finalEnt = - log_ans + np.log(np.sum(np.exp(lgs),axis=-1))
    return finalEnt

In [0]:
def gradient_version_soft(lgs,ref_ans):
    oneans = np.zeros_like(lgs)
    oneans[np.arange(len(lgs)),ref_ans] = 1 
    smax = np.exp(lgs) / np.exp(lgs).sum(axis=-1,keepdims=True) 
    return (- oneans + smax) / lgs.shape[0]

In [0]:
def dataLoad(flatten=False):
    (x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data() # This is where the Keras dependency is used
    # Normalizing X 
    x_train = x_train.astype(float) / 255.
    x_test = x_test.astype(float) / 255.
    # Validation dataset
    x_train, x_val = x_train[:-10000], x_train[-10000:]
    y_train, y_val = y_train[:-10000], y_train[-10000:]
    if flatten:
        x_train = x_train.reshape([x_train.shape[0], -1])
        x_val = x_val.reshape([x_val.shape[0], -1])
        x_test = x_test.reshape([x_test.shape[0], -1])
    return x_train, y_train, x_val, y_val, x_test, y_test

In [0]:
x_train, y_train, x_val, y_val, x_test, y_test = dataLoad(flatten=True)    

Data Augmentation using Keras

In [8]:
(X_Train, Y_train), (X_test, Y_test) = keras.datasets.cifar10.load_data()
ImageDataGenerator(featurewise_std_normalization=True,rotation_range=20,horizontal_flip=True).fit(X_Train)



Creating an empty list to represent the neural network, and appending layers with different learning rates

In [0]:
neuralnet = []

In [0]:
# Input
neuralnet.append(DenseLayer(x_train.shape[1],100))
neuralnet.append(ReLuLayer())

In [0]:
# Hidden
neuralnet.append(DenseLayer(100,200,learning_rate=0.2))
neuralnet.append(ReLuLayer())

In [0]:
# Hidden
neuralnet.append(DenseLayer(200,200,learning_rate=0.1))
neuralnet.append(ReLuLayer())

In [0]:
# Output
neuralnet.append(DenseLayer(200,10))

In [0]:
# Passing through the neural network we generated previously as input and iterating over all layers and calling forward feature on each layer.
def forward(neuralnet, x):
    ip = x
    acts = []
    for i in range(len(neuralnet)):
        acts.append(neuralnet[i].forward(x))
        x = neuralnet[i].forward(x)
    return acts

In [0]:
def predict(neuralnet,x):
    lgs = forward(neuralnet,x)[-1]
    return lgs.argmax(axis=-1)

Batchwise training

In [0]:
def train(nn,x,y):
    # The gradient and loss before training
    layer_acts = forward(nn,x)
    lgs = layer_acts[-1]
    
    loss = softcentropy(lgs,y)
    l_grad = gradient_version_soft(lgs,y)
    for i in range(1, len(nn)):
        l_grad = nn[len(nn) - i].backward(layer_acts[len(nn) - i - 1], l_grad)
    return np.mean(loss)

Mini Batch Gradient Descent

In [0]:
def split_into_batches(ips, goals, sizeBatch, randomize_each_time=False):
    if randomize_each_time:
        indexes = np.random.permutation(len(ips))
    for first_index in trange(0, len(ips) - sizeBatch + 1, sizeBatch):
        if randomize_each_time:
            ext = indexes[first_index:first_index + sizeBatch]
        else:
            ext = slice(first_index, first_index + sizeBatch)
        yield ips[ext], goals[ext]

Final training


In [18]:
training_data_logger = []
validation_data_logger = []
# For each cycle, we split the training data into batches (mini batch) and use that for further processing
for iteration in range(5):
    for batchX,batchY in split_into_batches(x_train,y_train,sizeBatch=32,randomize_each_time=True):
        train(neuralnet,batchX,batchY)
    # Log training and validation results to the log lists
    training_data_logger.append(np.mean(predict(neuralnet,x_train)==y_train))
    validation_data_logger.append(np.mean(predict(neuralnet,x_val)==y_val))
    
    print("\n Round",iteration)
    print("\n Train accuracy:",training_data_logger[-1])
    print("\n Validation accuracy:",validation_data_logger[-1])

100%|██████████| 1562/1562 [00:04<00:00, 341.94it/s]
  2%|▏         | 33/1562 [00:00<00:04, 311.84it/s]


 Round 0

 Train accuracy: 0.11356

 Validation accuracy: 0.1064


100%|██████████| 1562/1562 [00:04<00:00, 344.88it/s]
  2%|▏         | 35/1562 [00:00<00:04, 346.32it/s]


 Round 1

 Train accuracy: 0.4613

 Validation accuracy: 0.4845


100%|██████████| 1562/1562 [00:04<00:00, 345.31it/s]
  2%|▏         | 34/1562 [00:00<00:04, 333.92it/s]


 Round 2

 Train accuracy: 0.737

 Validation accuracy: 0.7584


100%|██████████| 1562/1562 [00:04<00:00, 342.57it/s]
  2%|▏         | 37/1562 [00:00<00:04, 368.50it/s]


 Round 3

 Train accuracy: 0.79716

 Validation accuracy: 0.8114


100%|██████████| 1562/1562 [00:04<00:00, 347.80it/s]



 Round 4

 Train accuracy: 0.84438

 Validation accuracy: 0.8588


**Results**  
On average getting about 90% accuracy after 5 training sessions.
Further training does not seem to improve accuracy.

In [0]:
y_pred = predict(neuralnet,x_test)

Let's look at the confusion matrix

In [20]:
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test,y_pred)

array([[ 912,    0,    4,    6,    2,   24,   21,    4,    5,    2],
       [   0, 1114,    2,    5,    2,    2,    2,    2,    6,    0],
       [  13,   16,  799,   69,   22,    8,   25,   16,   57,    7],
       [   2,    6,   20,  887,    0,   47,    4,   22,   20,    2],
       [   1,   10,    9,    2,  877,    3,   15,    7,    5,   53],
       [  31,    5,   11,   77,    5,  675,   22,   20,   39,    7],
       [  19,    6,    9,    2,   20,   31,  865,    2,    4,    0],
       [   4,   24,   15,   14,   17,    4,    2,  911,    4,   33],
       [  10,   20,   14,   48,   18,   67,    9,   10,  757,   21],
       [  11,   11,    6,   16,  127,   13,    1,   60,   24,  740]])

Let us compute the accuracy

In [21]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)

0.8537

MNIST 