# Introduction to CNN

The problem is about **classifying grayscale images of handwritten digits** (28 pixels by 28 pixels), into their 10 categories (0 to 9), exactly like the one in chapter 2. This time will try to solve the problem through a convolutional neural network (CNN) and see if performance improves.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

from tensorflow.keras import models, layers, optimizers, losses, metrics
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

## Build the network

A basic convnet will be used: a stack of `Conv2D` and `MaxPooling2D` layers.
Importantly, a CNN takes as input tensors of shape (`image_height, image_width, image_channels`), not including the batch dimensions. In this case, we have to configure the CNN to process images of a size compatible with the MNIST database, so it will be of size (28, 28, 1). This will be the input shape to pass to the first layer.

In [2]:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 3, 3, 64)          36928     
Total params: 55,744
Trainable params: 55,744
Non-trainable params: 0
_________________________________________________________________


The output of every `Conv2D` and `MaxPooling2D` layer is also a tensor of shape (`height, width, channels`). Interesting to note how the dimensions tend to shrink as going deeper in the network. The channel is controlled by the first parameter in `Conv2D` layers.

While the the 3rd parameter of the input shape indicates the color channels, in the output of a layer it indicates the number of filters (features) over its input. So, every dimension in the depth axis is a feature (filter), and the 2D tensor is a 2D spatial map of the response of this filter over the input.

#### Insert a classifier on top of the CNN

This is a classification problem, so we need to put the last output tensor of the CNN as input to a densely connected network, similar to the one in chapter 2. We have a class of 10 categories, so the solution will be a 10-way classification, using a final layer of 10 outputs and a `softmax` activation.

In [3]:
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten (Flatten)            (None, 576)               0         
_________________________________________________________________
dense (Dense)                (None, 64)                3

Note that the `Flatten` layer produces a vector of shape (`3 * 3 * 64`), which is the output of the last `Conv2D` layer. Operation needed before going through the 2 dense layers

## Load dataset and preprocess data

`train_images` and `train_labels` form the "training set", the data that the model will learn from. The model will then be tested on the "test set", `test_images` and `test_labels`. The images are encoded as Numpy arrays, and the labels are simply an array of digits, ranging from 0 to 9. There is a one-to-one correspondence between the images and the labels.

Before training, we will preprocess our data by reshaping it into the shape that the CNN expects. We do also need to categorically encode the labels (one-hot)

In [10]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

In [11]:
train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype('float32') / 255

In [12]:
test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype('float32') / 255

In [13]:
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

## Compile and train model

In [14]:
model.compile(
    optimizer='rmsprop',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)
model.fit(train_images, train_labels, epochs=5, batch_size=64)

Train on 60000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x1499b76a0>

## Evaluation

In [15]:
test_loss, test_acc = model.evaluate(test_images, test_labels)



In [16]:
print('test_acc:', test_acc)

test_acc: 0.9909


The test set accuracy turns out to be *99.1%*, which is quite an improvement from the fully connected network in chapter 2, which reached *97.8%* accuracy.