# Playing with MNIST's Digits dataset: Neural Networks

#### This notebook is a simple neural network for classifying digit in MNIST's digits dataset. The final predictions were submitted to Kaggle, and had a 0.96671 accuracy. With 10 epochs, the model predictions had an accuracy of 0.96671. With 100 epochs, the model predictions had an accurayc of 0.97386.

In [2]:
import numpy as np
import pandas as pd
from sklearn import preprocessing
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.utils import np_utils

Using Theano backend.


Couldn't import dot_parser, loading of dot files will not be possible.


Since I already know what the data looks like, I'll go ahead and skip the header. The first column is the reponse variable, while the remainder are pixel values corresponding to 28x28 pixel images of handwritten digits.

In [3]:
train = np.genfromtxt("train.csv", delimiter=',', skip_header=1)
test = np.genfromtxt("test.csv", delimiter=',', skip_header=1)
# Pull out the target from the train set
target = train[:,0]
train = train[:,1:]
# One hot encode the target
target = np_utils.to_categorical(target)

In [4]:
print train.shape
print test.shape
print target.shape

(42000, 784)
(28000, 784)
(42000, 10)


Standardize the data, skipping the response column

In [5]:
scaler = preprocessing.StandardScaler().fit(train)
train = scaler.transform(train)
test = scaler.transform(test)

Keras is a great library for building a neural network classifier. The code below is adding layers to the classifier, where Dense is a standard fully connected layer, Activation specifies the activation of that layer, and Dropout will randomly set a fraction of the neurons to zero to prevent overfitting. This model uses only one hidden layer and an output layer.

Choosing the right activations and optimization method was mostly trial and error. Luckily at 10 epochs this runs fairly quickly, and it was interesting to see how the accuracy on the training set changed between models (for example, one model had accuracies as low as 8% after the 10th epoch, whereas this current one reaches nearly 99%).

In [6]:
model = Sequential()
# 784 is a lot of inputs. Bring it down to 256
model.add(Dense(256, input_dim=train.shape[1]))
# Use rectifier activation
model.add(Activation('relu'))
# Drop 10% of neurons to prevent overfitting
model.add(Dropout(0.1))
# Output layer with softmax activation
model.add(Dense(target.shape[1]))
model.add(Activation('softmax'))
# Combine the layers, setting the loss function as mean squared error and use RMSprop
model.compile(loss='mse', optimizer='rmsprop')

Fit the model. Settings here include the number of examples per batch, number of epochs, how many splits for cross validation, and what kind of output to print.

In [13]:
model.fit(train, target, batch_size=32, nb_epoch=1000, validation_split=0.1, show_accuracy=True, verbose=2)

Train on 37800 samples, validate on 4200 samples
Epoch 1/1000
5s - loss: 2.6488e-04 - acc: 0.9987 - val_loss: 0.0041 - val_acc: 0.9760
Epoch 2/1000
5s - loss: 2.6800e-04 - acc: 0.9986 - val_loss: 0.0040 - val_acc: 0.9762
Epoch 3/1000
5s - loss: 2.6639e-04 - acc: 0.9986 - val_loss: 0.0040 - val_acc: 0.9762
Epoch 4/1000
5s - loss: 2.7040e-04 - acc: 0.9986 - val_loss: 0.0040 - val_acc: 0.9762
Epoch 5/1000
5s - loss: 2.8249e-04 - acc: 0.9984 - val_loss: 0.0041 - val_acc: 0.9764
Epoch 6/1000
5s - loss: 2.5812e-04 - acc: 0.9986 - val_loss: 0.0041 - val_acc: 0.9745
Epoch 7/1000
5s - loss: 2.6529e-04 - acc: 0.9985 - val_loss: 0.0040 - val_acc: 0.9745
Epoch 8/1000
5s - loss: 2.6499e-04 - acc: 0.9986 - val_loss: 0.0041 - val_acc: 0.9748
Epoch 9/1000
5s - loss: 2.7804e-04 - acc: 0.9985 - val_loss: 0.0042 - val_acc: 0.9755
Epoch 10/1000
5s - loss: 2.6435e-04 - acc: 0.9986 - val_loss: 0.0042 - val_acc: 0.9745
Epoch 11/1000
5s - loss: 2.6237e-04 - acc: 0.9986 - val_loss: 0.0041 - val_acc: 0.9755
Epo

KeyboardInterrupt: 

Test the model on the test data

In [9]:
predictions = model.predict_classes(test)



Write to file for submission

In [10]:
index = np.arange(1,len(predictions)+1)
pd.DataFrame({"ImageId": index, "Label": predictions}).to_csv("NN_submission2.csv", index=False, header=True)