#### Neural Network Model Results

My best neural network model acheived 1.58% test error. Here are the details about how the training works with this model.

784-1024-1024-1024-10 Logistic-Softmax Model
* Minibatch Stoichastic Training - The training dataset is split into batches of 
100 and the neural network is trained on one batch at a time. This speeds up the training significantly.
* Learning rate - 0.28%
* Momentum - 95% of the change in synapse from a previous step is added to the change for the next step. This adds another hyperparameter to be tuned
* Max-Norm Regularization - With a softmax activation on the output, it is possible for the synapse values to become exceedingly high and cause inf values to appear as output. To prevent this from being an issue, the magnitude of any output node's synpase is restricted to 10. This should not affect the actual output since softmax by nature regularizes the output.
* Dropout - A method to prevent overfitting on the training set in theory by turning off input and hidden nodes randomly during forward propogation and backpropogation. In the training of this network, 50% of hidden nodes are dropped at a time and 20% of input pixels are also dropped.

The following link is the classifier object in the final application, where a neural network model can be trained/used.
Link to [Classifier](classifier.py)

The rest of the notebook is the testing of a neural network model.

In [1]:
import pandas as pd
import numpy as np
import random
from sklearn import cross_validation
from sklearn.metrics import accuracy_score

filepath = "ClassifierData/"

# softmax function
def softmax(x, deriv=False):
    if(deriv==True):
        return 1
    exp_scores = np.exp(x)
    return exp_scores / np.sum(exp_scores, axis=1, keepdims=True)

# sigmoid/logistic function
def sigmoid(x,deriv=False):
    if(deriv==True):
        return x*(1-x)
    return 1/(1+np.exp(-x))

# Retrieve the test data from the filesystem
data = pd.read_csv("Kaggle Competition MINST train.csv")
target = data['label']
data = data.drop('label', axis=1)
data = data.div(255)

# Split the training data so that I can analyze testing error (same split as training)
train_data, test_data, train_target, test_target = cross_validation.train_test_split(
 data, target, test_size=0.25, random_state=0)
num_attributes = len(data.columns)

In [2]:
# Neural network architecture parameters
hidden_layer_funct = sigmoid
output_layer_funct = softmax

input_layer_size = num_attributes
num_hidden_layers = 3
hidden_layer_size = 1024
output_layer_size = 10

# Save the synapses for later use
synapse = []
biases = []
for index in range(num_hidden_layers + 1):
    syn_filename = filepath + "%d-Layer %s-%s %d-%d-%d nodes syn%d.csv" % (
       num_hidden_layers, hidden_layer_funct.func_name, output_layer_funct.func_name, input_layer_size, hidden_layer_size, output_layer_size, index)
    bias_filename = filepath + "%d-Layer %s-%s %d-%d-%d nodes bias%d.csv" % (
       num_hidden_layers, hidden_layer_funct.func_name, output_layer_funct.func_name, input_layer_size, hidden_layer_size, output_layer_size, index)
    synapse.append(np.array(pd.read_csv(syn_filename).drop("Unnamed: 0", axis = 1)))
    biases.append(np.array(pd.read_csv(bias_filename).drop("Unnamed: 0", axis = 1)))

def pred(input):
    current_layer = input
    for layer in range(num_hidden_layers):
        current_layer = hidden_layer_funct(np.dot(current_layer, synapse[layer])
                                            + biases[layer].T)
        
    output = output_layer_funct(np.dot(current_layer, synapse[num_hidden_layers]) + biases[num_hidden_layers].T)
    
    answer = np.zeros(len(output), dtype = np.int8)
    for i in range(len(output)):
        answer[i] = output[i].argmax()
    return answer

In [3]:
# Test the model at this iteration
prediction = pred(train_data)
train_error = 1 - accuracy_score(train_target, prediction)
prediction = pred(test_data)
test_error = 1 - accuracy_score(test_target, prediction)

print "Training Error: ", train_error
print "Testing Error: " , test_error

Training Error:  0.0
Testing Error:  0.0158095238095


#### Sources of Inspiration

* https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf - Tuning hyperparameters and structure of the final network, dropout tips
* https://www.tensorflow.org/versions/master/tutorials/mnist/pros/index.html#deep-mnist-for-experts - 10 nodes as output rather than 1
* http://iamtrask.github.io/2015/07/28/dropout/ - Implementing dropout to NN
* http://iamtrask.github.io/2015/07/12/basic-python-network/ - Original struture of the network
* https://en.wikipedia.org/ - Regarding sigmoid/softmax/tanh/rectifier functions