# ConvNet for MNIST Classification

This example creates a Convolutional Neural Network to classify handwritten digits in the MNIST dataset. ConvNets are a powerful network architecture to create features from images. Below, we will define several network architectures and examine their performance on this well-known dataset. The deep ConvNet produced an error below 0.50% .

This exercise is based on a [Machine Learning Mastery tutorial](http://machinelearningmastery.com/handwritten-digit-recognition-using-convolutional-neural-networks-python-keras). A good resource for ConvNets is the [CS231n pages on GitHub](http://cs231n.github.io/convolutional-networks).

In [1]:
import numpy
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Convolution2D
from keras.layers.convolutional import MaxPooling2D
from keras.utils import np_utils
from keras import backend as K

K.set_image_dim_ordering('th')


# fix random seed for reproducibility
seed = 43
numpy.random.seed(seed)

# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# reshape to be [samples][pixels][width][height]
X_train = X_train.reshape(X_train.shape[0], 1, 28, 28).astype('float32')
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28).astype('float32')

# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255

# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]

def baseline_model():
	model = Sequential()
	model.add(Convolution2D(32, 5, 5, border_mode='valid', input_shape=(1, 28, 28), activation='relu'))
	model.add(MaxPooling2D(pool_size=(2, 2)))
	model.add(Dropout(0.2))
	model.add(Flatten())
	model.add(Dense(128, activation='relu'))
	model.add(Dense(num_classes, activation='softmax'))
	# Compile model
	model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model

def larger_model():
	model = Sequential()
    # create 30 features with a 5x5 kernel
	model.add(Convolution2D(50, 5, 5, border_mode='valid', input_shape=(1, 28, 28), activation='relu'))
    # reduce the spatial size of our first conv layer, reducing the number of weights and thereby overfitting
    # generally, strides of > 2 are too destructive
	model.add(MaxPooling2D(pool_size=(2, 2))) # takes a default stride size of 2, reducing input to 1x14x14
	model.add(Convolution2D(15, 3, 3, activation='relu'))
	model.add(MaxPooling2D(pool_size=(2, 2)))
	model.add(Dropout(0.2))
	model.add(Flatten())
	model.add(Dense(128, activation='relu'))
	model.add(Dropout(0.5))
	model.add(Dense(50, activation='relu'))
	model.add(Dense(num_classes, activation='softmax'))
	# Compile model
	model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model

def deep_model():
    # based on the pattern: INPUT -> [CONV -> RELU -> CONV -> RELU -> POOL]*3 -> [FC -> RELU]*2 -> FC
    # from the CS231n treatment of ConvNet architectures.
	model = Sequential()
	model.add(Convolution2D(40, 3, 3, border_mode='valid', input_shape=(1, 28, 28), activation='relu'))
	model.add(Dropout(0.2))
	model.add(Convolution2D(40, 3, 3, activation='relu'))
	model.add(MaxPooling2D(pool_size=(2, 2)))
	model.add(Convolution2D(20, 3, 3, activation='relu'))
	model.add(Convolution2D(20, 3, 3, activation='relu'))
	model.add(MaxPooling2D(pool_size=(2, 2)))
	model.add(Dropout(0.2))
	model.add(Flatten())
	model.add(Dense(256, activation='relu'))
	model.add(Dropout(0.5))
	model.add(Dense(50, activation='relu'))
	model.add(Dropout(0.1))
	model.add(Dense(num_classes, activation='softmax'))
	# Compile model
	model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model

# build the model
model = deep_model()
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), nb_epoch=16, batch_size=200, verbose=2)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Deep Error: %.2f%%" % (100-scores[1]*100))

Using TensorFlow backend.


Train on 60000 samples, validate on 10000 samples
Epoch 1/16
30s - loss: 0.4409 - acc: 0.8561 - val_loss: 0.0730 - val_acc: 0.9773
Epoch 2/16
28s - loss: 0.1253 - acc: 0.9624 - val_loss: 0.0489 - val_acc: 0.9852
Epoch 3/16
28s - loss: 0.0905 - acc: 0.9729 - val_loss: 0.0347 - val_acc: 0.9886
Epoch 4/16
28s - loss: 0.0715 - acc: 0.9785 - val_loss: 0.0278 - val_acc: 0.9909
Epoch 5/16
28s - loss: 0.0608 - acc: 0.9824 - val_loss: 0.0248 - val_acc: 0.9924
Epoch 6/16
28s - loss: 0.0559 - acc: 0.9833 - val_loss: 0.0218 - val_acc: 0.9932
Epoch 7/16
28s - loss: 0.0510 - acc: 0.9847 - val_loss: 0.0227 - val_acc: 0.9922
Epoch 8/16
28s - loss: 0.0460 - acc: 0.9865 - val_loss: 0.0198 - val_acc: 0.9942
Epoch 9/16
28s - loss: 0.0443 - acc: 0.9867 - val_loss: 0.0213 - val_acc: 0.9939
Epoch 10/16
28s - loss: 0.0393 - acc: 0.9883 - val_loss: 0.0211 - val_acc: 0.9938
Epoch 11/16
28s - loss: 0.0393 - acc: 0.9882 - val_loss: 0.0177 - val_acc: 0.9951
Epoch 12/16
28s - loss: 0.0376 - acc: 0.9887 - val_loss: 