# Starting point
In my previous notebooks I have experimented with extending the MNIST data set by shifting the digits, dimensionality reduction using PCA and classification using a simple KNN, some ensemble methods and a dense deep neural network.

My best result on the Kaggle test set has so far been 98.48% accuracy using a dense deep neural network.

In this notebook I will tackle MNIST using a Convolutional Neural Network.

# The Data
All pixels take values in the range [0, 255], I will normalise them to [0, 1].
I might opt to extend the data set by shifting the digits like I did in previous notebooks, but first I want to get a feeling for what the training time is like on the standard training set.

Also, like in `4. Handwritten Digits - Ensemble Learning` I will opt to set aside a validation set of 8 000 images.

I will not be doing any PCA. The reason is that a Convolutional layer actually cares about the feature order. The first layer will be  focusing on small square areas of the image. If I displace all feautes with PCA local paterns in the image will disappear.

In [24]:
import pandas as pd
import numpy as np
import keras

import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
%matplotlib inline

## Training set

In [30]:
data_df = pd.read_csv('train.csv')
digits = data_df.iloc[:, 1:].values
labels = data_df['label'].values

## Test set

In [31]:
test_df = pd.read_csv('test.csv')
digits_test = test_df.values

## Modify the data

One-hot encode the labels.

In [32]:
from sklearn.preprocessing import OneHotEncoder

In [33]:
labels_one_hot = OneHotEncoder().fit_transform(labels.reshape(-1,1))

Reshape the data.

In [34]:
img_dimensions = (28, 28, 1)

digits = digits.reshape(-1, *img_dimensions)
digits_test = digits_test.reshape(-1, *img_dimensions)

Scale the pixels.

In [35]:
digits_scaled = digits / 255
digits_test_scaled = digits_test / 255

## Validation set

In [36]:
from sklearn.model_selection import train_test_split
X, X_val, y, y_val = train_test_split(digits_scaled, labels_one_hot, test_size = 8000, stratify = labels_one_hot.toarray(), random_state = 0)

# Convolutional Neural Network

## Setting up my first model
My first model will be heavily inspired by the Keras teams example CNN for MNIST. [Github code](https://github.com/keras-team/keras/blob/master/examples/mnist_cnn.py).

In [60]:
from keras.layers import Dropout, BatchNormalization, Conv2D, MaxPooling2D, Dense, Flatten, Input
from keras.models import Model

model_input = Input(shape=img_dimensions)
x = Conv2D(32, kernel_size=(4,4), strides = 2, activation='relu')(model_input)
x = Conv2D(64, kernel_size=(2,2), strides = 1, activation='relu')(x)
x = MaxPooling2D(pool_size=(2,2))(x)
x = Dropout(rate=.25)(x)
x = Flatten()(x)
x = Dense(128, activation='relu')(x)
x = Dropout(rate=.5)(x)
model_output = Dense(10, activation = 'softmax')(x)

model = Model(model_input, model_output)

Compile the model, use cross entropy as loss function and stochastic gradient descent for backpropagation. Accuracy should be used to measure performance.

In [61]:
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

Lets just do a super small training run to start of.

In [62]:
%time model.fit(X, y, validation_data=[X_val, y_val], epochs=3, batch_size=128)

Train on 34000 samples, validate on 8000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3
Wall time: 2min 9s


<keras.callbacks.History at 0x149a2efe320>

In [63]:
loss_and_metrics_train = model.evaluate(X, y, batch_size=64)
loss_and_metrics_val = model.evaluate(X_val, y_val, batch_size=64)

print("{} accuracy on training set".format(loss_and_metrics_train[1]))
print("{} accuracy on validation set".format(loss_and_metrics_val[1]))



In [65]:
%time model.fit(X, y, validation_data=[X_val, y_val], epochs=3, batch_size=128)

Train on 34000 samples, validate on 8000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3
Wall time: 2min 16s


<keras.callbacks.History at 0x1499673eeb8>

In [66]:
loss_and_metrics_train = model.evaluate(X, y, batch_size=64)
loss_and_metrics_val = model.evaluate(X_val, y_val, batch_size=64)

print("{} accuracy on training set".format(loss_and_metrics_train[1]))
print("{} accuracy on validation set".format(loss_and_metrics_val[1]))

0.9931764705882353 accuracy on training set
0.987625 accuracy on validation set


In [67]:
%time model.fit(X, y, validation_data=[X_val, y_val], epochs=6, batch_size=128)

Train on 34000 samples, validate on 8000 samples
Epoch 1/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6
Wall time: 4min 36s


<keras.callbacks.History at 0x149940a89e8>

In [68]:
loss_and_metrics_train = model.evaluate(X, y, batch_size=64)
loss_and_metrics_val = model.evaluate(X_val, y_val, batch_size=64)

print("{} accuracy on training set".format(loss_and_metrics_train[1]))
print("{} accuracy on validation set".format(loss_and_metrics_val[1]))

0.9967941176470588 accuracy on training set
0.99 accuracy on validation set


After 12 training epochs the model sits at 99.7% training set accuracy and 99.0% dev set accuracy. This beats my previous models by a lot.

In [77]:
predictions = model.predict(digits_test_scaled)

In [79]:
predictions = predictions.argmax(1)

In [80]:
submission_df = pd.DataFrame(list(zip(np.arange(1, 28001), predictions)), columns = ['ImageID', 'Label'])
submission_df.set_index('ImageID').to_csv('Submissions/submission11.csv')

98.7% submission accuracy. Cool!