# Handwritten numbers with a Convolutional Neural Net

Using Keras, Tensorflow, and the MNIST dataset to recognize handwritten digits.

Import data and graphing libraries:

In [None]:
import numpy as np
import pandas as pd

%matplotlib inline
import matplotlib.pyplot as plt

Download the MNIST dataset, with 60,000 samples of labeled handwritten numbers from the USPS.

In [None]:
from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data('/tmp/mnist.npz')

Here's an example of the data, which is stored as an array of greyscale values, 28 by 28 pixels in size.

In [None]:
samples = X_train[:3]
for i, img in enumerate(samples):
    plt.subplot(1, 3, i + 1)
    plt.imshow(img, cmap='gray')

## Fully connected model

First, we'll create fully connected neural net without any convolution layers to see how it performs.

Let's reshape the data by converting each image into a vector (array) and scale the greyscale values to 0-1 to make training the model easier.

In [None]:
# Reshape to vectors
X_train = X_train.reshape(-1, 28*28)
X_test = X_test.reshape(-1, 28*28)

# Turn into 0-1 percentages
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255.0 # Maximum greyscale value is 255, so divide all values by 255 to get %
X_test /= 255.0

X_train[0][140:170]

Now we have an array of 60,000 samples, each with 784 greyscale % values. But we also have to convert the labels to binary categorical data for the model to work. In this case we have 0-9 or 10 possible categories. So a 5 in categorical data would be:

In [None]:
from keras.utils.np_utils import to_categorical

y_train_cat = to_categorical(y_train)
y_test_cat = to_categorical(y_test)

print(y_train[0], ' = ', y_train_cat[0])

Now let's build the model. It will have 4 fully connected ReLU layers, ReLU making it so that the weights can't be negative. The last Softmax layer determines the 0-1 probabilities of each category and hence will have 10 perceptrons.

In [None]:
import keras.backend as K
from keras.models import Sequential
from keras.layers import Dense

K.clear_session()
model = Sequential()
model.add(Dense(512, input_dim=28*28, activation='relu'))
model.add(Dense(256, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

Now train the model and see the accuracy on our test data:

In [None]:
# Train model
fc = model.fit(X_train, y_train_cat, batch_size=128, epochs=3,
               verbose=1, validation_split=0.3)

# Test accuracy
test_accuracy = model.evaluate(X_test, y_test_cat)[1]
print('Fully Connected NN test accuracy = ', test_accuracy)

Looks like we're getting accuracy of around 96% on the test data, which is pretty good -- but we can do better. By only examining and comparing each individual pixel, the model has a limited ability to find patterns in the data.

## Convolutional Neural Net

Now let's reshape the data again into a matrix of greyscale probabilities so that the convolution layer can try to find patterns in groups of pixels, just like humans do.

In [None]:
X_train = X_train.reshape(-1, 28, 28, 1)
X_test = X_test.reshape(-1, 28, 28, 1)

Each sample is now a matrix, with 28 rows and 28 columns. We can now build the model. We will keep the last 2 fully connected layers to determine the final probabilities. 

In [None]:
from keras.layers import Conv2D
from keras.layers import MaxPool2D, AvgPool2D
from keras.layers import Flatten, Activation

K.clear_session() # clears CPU
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(28, 28, 1)))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Activation('relu'))
model.add(Flatten()) # Flatten to feed into last layers
# Add fully connected layers
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])
model.summary()

In the summary of the model above, notice how the output shapes and parameter counts work. The first 3 layers are the convolutional network, which feed into a layer that flattens it into 5,408 nodes (13 * 13 * 32), which have 692,352 parameter weights when fed into 128 ReLU nodes (5408 * 128). 

Now we can train the CNN and test accuracy. (Training will take 3-5x longer due to number of params that have to be trained).

In [None]:
cnn = model.fit(X_train, y_train_cat, batch_size=128,
          epochs=3, verbose=1, validation_split=0.3)

# Test accuracy
test_accuracy = model.evaluate(X_test, y_test_cat)[1]
print('CNN test accuracy = ', test_accuracy)

We should have seen better accuracy here, and we did. If training runs a little longer it will be even more accurate (try adjusting the number of epochs to 5-7 to see how much accuracy goes up). 

## Conclusion

The fully-connected, non-convolution network did surprisingly well. This is likely because the MNIST dataset is pretty simple: greyscale with only basic features to detect like lines and curves. Any more complex data with "compounded" features (like lines > shapes > compound features) and the FCN would not have worked.

Although the CNN only did slightly better, it's the right way to go for any image data, especially if time-to-train isn't an issue.