# Classifying MNIST data using Neural Network

MNIST is a dataset consists of 60000 28x28 grayscale (0-255) hand writing images of the 10 digits, along with a test set of 10,000 images.


In [80]:
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD
from keras.utils import np_utils
from keras.datasets import mnist

In [81]:
# Load MNIST data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
print(X_train.shape)

(60000L, 28L, 28L)


Here, we double check the training datasize (60000) and image size (28 x 28). 

In [82]:
nb_training_samples = X_train.shape[0]
nb_testing_samples = X_test.shape[0]
image_length = image_width = X_train.shape[1]
print("{} training samples").format(nb_training_samples)
print("{} testing samples").format(nb_testing_samples)
print("image width and image length is {} pixels").format(image_length)

60000 training samples
10000 testing samples
image width and image length is 28 pixels


In [83]:
input_dim = image_length*image_width
# Reshape images from 2D to 1D as NN imput 
X_train = X_train.reshape(nb_training_samples, input_dim )
X_test = X_test.reshape(nb_testing_samples, input_dim )
# Cast pixel data type to float
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# Convert each pixel from 0-255 to 0.0-1.0
X_train /= 255
X_test /= 255

In [107]:
nb_classes = len(set(y_train))
print("There are total of {} classes").format(nb_classes)

There are total of 10 classes


We checked that there are indeed 10 classes. 

In [109]:
# In multiclass classification tasks with cross-entropy loss,
# we need to convert labels to one hot vector representation
# e.g. 0-->[0,0,0,0,0,0,0,0,0,0,1]
#      1-->[0,0,0,0,0,0,0,0,0,1,0]...
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

In [115]:
# NN model 
model = Sequential()

# First hidden layer (input dim: 784, output dim: 800)
model.add(Dense(800, input_shape = (input_dim,), init = 'uniform'))
# Add activation fuction
model.add(Activation('relu'))
# Include dropout percentange
model.add(Dropout(0.5))

# Output layer (input dim: 800 (auto-induced), output dim: nb_classes = 10)
model.add(Dense(nb_classes, init='uniform'))
model.add(Activation('softmax'))

# Prints a summary of your model
model.summary()

____________________________________________________________________________________________________
Layer (type)                       Output Shape        Param #     Connected to                     
dense_34 (Dense)                   (None, 800)         628000      dense_input_15[0][0]             
____________________________________________________________________________________________________
activation_31 (Activation)         (None, 800)         0           dense_34[0][0]                   
____________________________________________________________________________________________________
dropout_20 (Dropout)               (None, 800)         0           activation_31[0][0]              
____________________________________________________________________________________________________
dense_35 (Dense)                   (None, 10)          8010        dropout_20[0][0]                 
___________________________________________________________________________________________

In [116]:
# Compile model before training
model.compile(loss='categorical_crossentropy',
              optimizer='SGD',
              metrics=['accuracy'])

In [117]:
# Train on training samples and validate on testing samples. 
history = model.fit(X_train, Y_train,
                    batch_size=256, nb_epoch=10,
                    verbose=2, validation_data=(X_test, Y_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
31s - loss: 1.7539 - acc: 0.5799 - val_loss: 1.2128 - val_acc: 0.8038
Epoch 2/10
31s - loss: 0.9985 - acc: 0.7860 - val_loss: 0.7474 - val_acc: 0.8514
Epoch 3/10
29s - loss: 0.7219 - acc: 0.8231 - val_loss: 0.5786 - val_acc: 0.8704
Epoch 4/10
30s - loss: 0.6055 - acc: 0.8434 - val_loss: 0.4968 - val_acc: 0.8806
Epoch 5/10
30s - loss: 0.5416 - acc: 0.8536 - val_loss: 0.4487 - val_acc: 0.8879
Epoch 6/10
31s - loss: 0.4987 - acc: 0.8615 - val_loss: 0.4154 - val_acc: 0.8938
Epoch 7/10
30s - loss: 0.4698 - acc: 0.8699 - val_loss: 0.3919 - val_acc: 0.8977
Epoch 8/10
31s - loss: 0.4464 - acc: 0.8743 - val_loss: 0.3738 - val_acc: 0.9009
Epoch 9/10
28s - loss: 0.4291 - acc: 0.8796 - val_loss: 0.3595 - val_acc: 0.9041
Epoch 10/10
33s - loss: 0.4121 - acc: 0.8838 - val_loss: 0.3476 - val_acc: 0.9060


In [119]:
score = model.evaluate(X_test, Y_test, verbose=2)
print('Test score: {}').format(score[0])
print('Test accuracy: {}').format(score[1])

Test score: 0.34759791137
Test accuracy: 0.906


We can achieve a MNIST classification accuracy over 90% using this NN model, and the accuracy can be largely improved by tuning the number of layers, neuron units in each layer, optimization fuction, batch size, number of epoch...etc. 

Reference: https://github.com/fchollet/keras/blob/master/examples/mnist_mlp.py