## Requirements 
* (done) Apply **mini-batch gradient descent** with appropriate batch size
* (done) Use appropriate **learning rate** (can be adaptive per epoch)
* (do per layer) Apply **dropout** - find appropriate dropout rate at each layer
* (done - try more) Initialize random **weights** properly before training
* Do basic image **augmentation** of training data using Keras
* (todo -- currently only 2 layers after input) Use **3 or more layers** with appropriate **number of neurons** per layer
* (pre-done) Use **relu activation layer** in the right places (given in example code)
* (done) **Normalize and scale** the input before training with Keras
* Include **metrics**: testing, training accuracy and a confusion matrix
* Display top common errors

In [314]:
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import seaborn as sns
%matplotlib inline

np.random.seed(2)

In [315]:
from keras.datasets import mnist
from keras.utils import np_utils

In [316]:
# Load the data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [317]:
x_train.shape

(60000, 28, 28)

In [318]:
x_test.shape

(10000, 28, 28)

The training data is 1 3D array of (digit_label, pixel width, pixel height). For our MLP to run gradient descent, the width and height must be converted into a vector of 784 pixels.

This is accomplished using numpy's reshape() function.

In [319]:
num_pixels = x_train.shape[1] * x_train.shape[2]
# Reshape (examples, width, height) --> (examples, width*height)
x_train = x_train.reshape((x_train.shape[0], num_pixels)).astype('float32')
x_test = x_test.reshape((x_test.shape[0], num_pixels))

x_train = x_train[0:1000]
y_train = y_train[0:1000]

In [320]:
# Normalize pixel values from 0-255 to 0-1
x_train = x_train / 255
x_test = x_test / 255

In [321]:
# todo: x_train augmentation

One hot encoding essentially transforms the categorical values into a matrix where their existence or absence is marked by 1 or 0, respectively.

In [322]:
# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]

In [323]:
def relu(x):
  return (x >= 0) * x

def relu2deriv(output):
  return output >= 0

In [328]:
'''
Runs a 3 layer mini batch gradient descent 

@ params: images
@ retval: void
''' 
def train(images, labels, test_images, test_labels):
    batch_size = 100
    # Select learning rate and number of iterations
    alpha, iterations = (0.001, 250)
    
    # MNIST dataset specific settings and hidden layer neuron size
    pixels_per_image, num_labels, hidden_size = (784, 10, 100)

    # Weight initialization for various layers -- # of neurons based on the tuple passed to np.random
    weights_0_1 = 0.2 * np.random.random((pixels_per_image, hidden_size)) - 0.1
    #weights_1_2 = 0.2 * np.random.random((hidden_size, hidden_size/2))
    weights_1_2 = 0.2 * np.random.random((hidden_size, num_labels)) - 0.1
    
    for j in range(iterations):
        error, correct_cnt = (0.0, 0)
      
        for i in range(int(len(images) / batch_size)):
            batch_start, batch_end = ((i * batch_size), ((i+1) * batch_size))

            layer_0 = images[batch_start:batch_end]
            layer_1 = relu(np.dot(layer_0, weights_0_1))
            dropout_mask = np.random.randint(2, size=layer_1.shape)
            layer_1 *= dropout_mask * 2
            layer_2 = np.dot(layer_1, weights_1_2)

            error += np.sum((labels[batch_start:batch_end] - layer_2) ** 2)
            for k in range(batch_size):
                correct_cnt += int(np.argmax(layer_2[k:k+1])) == np.argmax(labels[batch_start+k:batch_start+k+1])

                layer_2_delta = (labels[batch_start:batch_end] - layer_2) / batch_size
                layer_1_delta = layer_2_delta.dot(weights_1_2.T) * relu2deriv(layer_1)
                layer_1_delta *= dropout_mask

                weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)
                weights_0_1 += alpha * layer_0.T.dot(layer_1_delta)

        '''sys.stdout.write("\r" + \
                      " I:" + str(j) +\
                      " Error:" + str(error/float(len(images)))[0:5] +\
                      " Correct:" + str(correct_cnt/float(len(images))))'''
        if(j%10 == 0):
            test_error = 0.0
            test_correct_cnt = 0

            for i in range(len(test_images)):
                layer_0 = test_images[i:i+1]
                layer_1 = relu(np.dot(layer_0,weights_0_1))
                layer_2 = np.dot(layer_1, weights_1_2)

                test_error += np.sum((test_labels[i:i+1] - layer_2) ** 2)
                test_correct_cnt += int(np.argmax(layer_2) == \
                                         np.argmax(test_labels[i:i+1]))

            sys.stdout.write("\n" + \
            "I:" + str(j) + \
            " Test-Err:" + str(test_error/ float(len(test_images)))[0:5] +\
            " Test-Acc:" + str(test_correct_cnt/ float(len(test_images)))+\
            " Train-Err:" + str(error/ float(len(images)))[0:5] +\
            " Train-Acc:" + str(correct_cnt/ float(len(images))))


In [329]:
train(x_train, y_train, x_test, y_test)


I:0 Test-Err:0.809 Test-Acc:0.4125 Train-Err:1.263 Train-Acc:0.148
I:10 Test-Err:0.575 Test-Acc:0.7147 Train-Err:0.626 Train-Acc:0.643
I:20 Test-Err:0.516 Test-Acc:0.7521 Train-Err:0.547 Train-Acc:0.73
I:30 Test-Err:0.482 Test-Acc:0.777 Train-Err:0.511 Train-Acc:0.743
I:40 Test-Err:0.464 Test-Acc:0.7924 Train-Err:0.483 Train-Acc:0.753
I:50 Test-Err:0.452 Test-Acc:0.7925 Train-Err:0.458 Train-Acc:0.782
I:60 Test-Err:0.449 Test-Acc:0.7912 Train-Err:0.467 Train-Acc:0.782
I:70 Test-Err:0.445 Test-Acc:0.7974 Train-Err:0.451 Train-Acc:0.799
I:80 Test-Err:0.442 Test-Acc:0.7878 Train-Err:0.449 Train-Acc:0.807
I:90 Test-Err:0.446 Test-Acc:0.7849 Train-Err:0.436 Train-Acc:0.795
I:100 Test-Err:0.442 Test-Acc:0.7883 Train-Err:0.440 Train-Acc:0.793
I:110 Test-Err:0.447 Test-Acc:0.789 Train-Err:0.434 Train-Acc:0.811
I:120 Test-Err:0.445 Test-Acc:0.7877 Train-Err:0.428 Train-Acc:0.827
I:130 Test-Err:0.443 Test-Acc:0.791 Train-Err:0.434 Train-Acc:0.826
I:140 Test-Err:0.444 Test-Acc:0.7943 Train-Err:0