# Convolutional Neural Network - MNIST

For those of you unfamiliar with Jupyter notebooks, this is a concise explanation.
Each "cell" of the notebook (a rectangular area) contains either explanatory text ("Markdown," like this cell) or Python/Keras commands ("Code"). By clicking on this cell, you select it (and the contours are highlighted). If you press "__Run__" in the menu, Jupyter processes the contents of this cell and moves on to the next. Scroll to the next cell, read the command and press __Run__ again. The result of the command (if any) will become visible. Just proceed through the notebook in this fashion, and return to previous cells, whenever necessary (either to re-read an explanation or command, or to change parameters). Please note that if you want to restart the entire notebook, you have to start at the top.

Let's start with importing the required plugins.

In [None]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import keras
keras.__version__

Convolutional Neural Networks (CNNs) are a type of neural network which is particularly effective in classification and pattern recognition of images and signals. One of the first CNN architectures was used for the recognition of digital images of handwritten digits and characters. A digital image can be represented as a matrix of values, with each cell in the matrix representing a single pixel value. When provided with this input, a CNN performs convolution and subsampling operations, which are explained below. Eventually, the CNN assigns a probability to each class, in the case of handwritten digit recognition the numbers in the range 0-9.

A CNN architecture differs from other types of neural network architectures, due to the presence of convolution layers. The goal of these convolutions is to extract useful visual characteristics, commonly referred to as __features__, from the data. A convolution layer extracts task-relevant features by making use of a __filter__ that is defined by the training procedure. Convolution preserves the spatial relationship between pixels by learning image features using small squares of input data. We will not go into the mathematical details of convolution here, but will try to understand how it works over images.

As we discussed above, every image can be considered as a matrix of pixel values. Consider a 5 x 5 image whose pixel values are only 0 and 1 (note that for a grayscale image, pixel values range from 0 to 255, the green matrix below is a special case where pixel values are only 0 and 1):

![](images/cnn1.png)

Also, consider another 3 x 3 matrix as shown below:

![](images/cnn2.png)

Then, the convolution of the 5 x 5 image and the 3 x 3 matrix can be computed as shown in the animation in Figure 5 below:

![](images/cnn3.gif)
>_The Convolution operation. The output matrix is called Convolved Feature or __Feature Map__._

Let's take a practical look at a very simple CNN example. We will use our CNN to classify MNIST digits, a task that you've already been through in the previous notebook. Even though our CNN will be very basic, its accuracy will still blow out of the water that of the MLP model from the previous notebook.

The 6 lines of code below show you what a basic convnet looks like. It's a stack of `Conv2D` and `MaxPooling2D` layers. We'll see in a minute what they do concretely. Importantly, a CNN takes as input tensors of shape `(image_height, image_width, image_channels)` (not including the batch dimension). 

In our case, we will configure our convnet to process inputs of size `(28, 28, 1)`, which is the format of MNIST images. We do this via passing the argument `input_shape=(28, 28, 1)` to our first layer.

In [None]:
from keras import layers
from keras import models

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

print('Model configured.')

Let's display the architecture of our CNN so far:

In [None]:
model.summary()

Now, let's train our CNN on the MNIST digits. We will reuse a lot of the code we have already covered in the MNIST example from the previous notebook.

In [None]:
from keras.datasets import mnist
from keras.utils import to_categorical

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype('float32') / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

In [None]:
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5, batch_size=64)

Let's evaluate the model on the test data:

In [None]:
test_loss, test_acc = model.evaluate(test_images, test_labels)

In [None]:
test_acc

While our MLP network from the previous notebook had a test accuracy of approximately 92%, our basic CNN has a test accuracy of 99% !

This interactive Python Notebook is based on https://github.com/fchollet/deep-learning-with-python-notebooks and https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/.