# Convolutional Neural Networks (CNNs)

The figure below shows the CNN model for the MNIST digit classification. Instead of having input vector like MLP, the input tensor now has new dimensions (`height`, `width`, `channels`) or `(image_size, image_size, 1) = (28, 28, 1)` for the grayscale MNIST images. Resizing the train and test images will be needed to conform to this input shape requirement.

In [1]:
%tensorflow_version 2.x

# import libraries
import numpy as np

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dense, Dropout
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten
from tensorflow.keras.utils import to_categorical, plot_model
from tensorflow.keras.datasets import mnist

# load mnist dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# compute the number of labels
num_labels = len(np.unique(y_train))

# convert to one-hot vector
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

# input image dimensions
image_size = X_train.shape[1]

# resize and normalize
X_train = np.reshape(X_train,[-1, image_size, image_size, 1])
X_test = np.reshape(X_test,[-1, image_size, image_size, 1])
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255

# network parameters
# image is processed as is (square grayscale)
input_shape = (image_size, image_size, 1)
batch_size = 128
kernel_size = 3
pool_size = 2
filters = 64
dropout = 0.2

# model is a stack of CNN-ReLU-MaxPooling
model = Sequential()
model.add(Conv2D(filters=filters,
                 kernel_size=kernel_size,
                 activation='relu',
                 input_shape=input_shape))
model.add(MaxPooling2D(pool_size))
model.add(Conv2D(filters=filters,
                 kernel_size=kernel_size,
                 activation='relu'))
model.add(MaxPooling2D(pool_size))
model.add(Conv2D(filters=filters,
                 kernel_size=kernel_size,
                 activation='relu'))
model.add(Flatten())

# dropout added as regularizer
model.add(Dropout(dropout))

# output layer is 10-dim one-hot vector
model.add(Dense(num_labels))
model.add(Activation('softmax'))
model.summary()
plot_model(model, to_file='cnn-mnist.png', show_shapes=True)

# loss function for one-hot vector
# use of adam optimizer
# accuracy is good metric for classification tasks
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

# train the network
model.fit(X_train, y_train, epochs=10, batch_size=batch_size)
loss, acc = model.evaluate(X_test, y_test, batch_size=batch_size)

print("\nTest accuracy: %.1f%%" % (100.0 * acc))

TensorFlow 2.x selected.
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 64)        640       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 64)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        36928     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten (Flatten)            (None, 576)               0         
_____

The major change here is the use of `Conv2D` layers. The `relu` activation function is already an argument of `Conv2D`. The `relu` function can be brought out as an `Activation` layer when the __batch normalization__ layer is included in the model. Batch normalization is used in deep CNNs so that large learning rates can be used without causing instability during training.

## Convolution

If in the MLP model the number of units characterizes the `Dense` layers, the kernel characterizes the CNN operations. As shown in the figure below, the kernel can be visualized as a rectangular patch or window that slides through the whole image from left to right, and top to bottom. This operation is called __convolution__. It transforms the input image into a __feature maps__, which is a representation of what the kernel has _learned_ from the input image. The feature maps are then transformed into another feature maps in the succeeding layer and so on. The number of feature maps generated per `Conv2D` is controlled by the `filters` argument.

## Pooling Operations

The last change is the addition of a `MaxPooling2D` layer with the argument `pool_size=2`. `MaxPooling2D` compresses each feature map. Every patch of size `pool_size × pool_size` is reduced to one pixel. The value is equal to the maximum pixel value within the patch. `MaxPooling2D` is shown in the following figure for two patches:

The significance of `MaxPooling2D` is the reduction in feature maps size which translates to increased kernel coverage. For example, after `MaxPooling2D(2)`, the 2 × 2 kernel is now approximately convolving with a 4 × 4 patch. The CNN has learned a new set of feature maps for a different coverage.

There are other means of pooling and compression. For example, to achieve a 50% size reduction as `MaxPooling2D(2), AveragePooling2D(2)` takes the average of a patch instead of finding the maximum. Strided convolution, `Conv2D(strides=2, ...)` will skip every two pixels during convolution and will still have the same 50% size reduction effect. There are subtle differences in the effectiveness of each reduction technique.

In `Conv2D` and `MaxPooling2D`, both `pool_size` and `kernel` can be non-square. In these cases, both the row and column sizes must be indicated. For example, `pool_size=(1, 2)` and `kernel=(3, 5)`.

The output of the last `MaxPooling2D` is a stack of feature maps. The role of `Flatten` is to convert the stack of feature maps into a vector format that is suitable for either `Dropout` or `Dense` layers, similar to the MLP model output layer.