# 07 - Deep Learning

## 03 - CNN Introduction

![](https://images.unsplash.com/photo-1542378993-3aa1366b0090?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1050&q=80)
Picture by [Sergey Pesterev](https://unsplash.com/photos/PChvdQjvTO4)

___

We will learn today about one of the most famous kind of Deep Learning method: the Convolutional Neural Networks (sometimes called ConvNets or CNN). But before, we need to talk a bit about image processing, in order to introduce necessary tools.

# I. Computer Vision introduction

Images are just numbers! So all we learnt about Machine Learning and Deep Learning has to work with images, both classification and regression!
![](images/computer_vision_task.png)

But unlike other kind of data, images have **spatial structure**! Indeed, the same object can be seen from various viewpoints, deformed, scaled, occluded, etc...

![](images/Computer_Vision_limits.png)

That's why in order to take full advantage of this spatial structure, we need in **add spatial information in our models**. This is where Convolutional Neural Networks (CNN) take place! So first, let's talk a bit about image processing.

# II. Image Processing

## II.1. Padding

Paddind an image is adding borders to it, thus increasing the size of the image. Most of the time, images are padded with zeros.

Below is an example of a 4x4 image before padding, and after padding:

![](images/padding_image.png)

Padding can also add more than one line of zeros.

## II.2. Kernel

A kernel in image processing is nothing more than a small image (or a small matrix), that we will use to make a convolution.

Thus, a kernel is necessary smaller than the image. Usually, kernels have odd dimensions, and are quite small. Typical sizes of kernels are:
* 3x3
* 5x5
* 7x7
* 9x9

Depending on the values of the kernel, the convolution will provide different results.

## II.3 Convolution

A convolution is a mathematical operation that you already encountered. For example, below is the convolution result of an image with a 3x3 kernel of ones:

![](images/convolution_animated.gif)

But as you know, the kernel can have different values. Below is an example of a convolution of an image I with a non uniform kernel K:

![](images/convolution_kernel.png)

A wisely chosen kernel can be really useful to detect patterns in images. Here is a list of commonly used kernels and what they do on a given image (from [wikipedia](https://en.wikipedia.org/wiki/Kernel_(image_processing)):

| Kernel | Usage | Example |
|:--:|:--:|:--:|
| \begin{bmatrix}
0 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 0
\end{bmatrix}| Identity | ![](images/Vd-Orig.png)  |
|\begin{bmatrix}
\ \ 1 & 0 & -1 \\
\ \ 0 & 0 & \ \ 0 \\
-1 & 0 & \ \ 1
\end{bmatrix} | Edge detection | ![](images/Vd-Edge1.png)  |
| \begin{bmatrix}
0 &  \ \ 1 & 0 \\
1 & -4 & 1 \\
0 &  \ \ 1 & 0
\end{bmatrix}| Identity | ![](images/Vd-Edge2.png)  |
| \begin{bmatrix}
-1 &  -1 & -1 \\
-1 & \ \ 8 & -1 \\
-1 &  -1 & -1
\end{bmatrix}| Edge detection | ![](images/Vd-Edge3.png)  |
| \begin{bmatrix}
\ \ 0 & -1 & \ \ 0 \\
-1 & \ \ 5 & -1 \\
\ \ 0 & -1 & \ \ 0
\end{bmatrix}| Sharpening | ![](images/Vd-Sharp.png)  |
| \begin{bmatrix}
1 & 2 & 1 \\
2 & 4 & 2 \\
1 & 2 & 1
\end{bmatrix}| Gaussian blur | ![](images/Vd-Blur1.png)  |


So to summarize, a convolution is a mathematical operation on a image with a given kernel, resulting in a new processed image.

## II.4. Strides

Up to now, we know how to compute a convolution, with a given kernel, and to pad an image. The last tool needed is what is called stride.

The stride value is the step between two convolutions with a kernel. For the moment, all we did was with a stride of one, as in the example below:

![](images/Stride1.png)

So as we see in this example, an image of dimension 7x7, convoluted with a kernel of 3x3 results in an image of 5x5.

What if we use a stride of 2, meaning our kernel with use a step of 2 on our image:

![](images/Stride2.png)

With a stride of 2, an input image of dimension 7x7 convoluted with a kernel of 3x3 results in an image of 3x3.

# III. Convolutional Neural Networks

## III.1. Introduction

Why would we use convolutions in neural networks? Isn't a MLP complicated enough?

Well, indeed MLP are complicated enough, but they do not take into account *spatial* correlations. Indeed in an image, there are spatial correlations everywhere:
* In a face the eye is always close to the nose, and the nose is always in the middle of the face.
* A bike always has two wheels.
* All numbers and letters have a particular shape.

For all of those examples, a classical MLP will not see the spatial correlations, and might have to learn them by itself. But if you add convolutions, this is far more easier to understand the structure of the data for the neural network.

Historically, convolutional neural networks were not always famous. Yann Le Cun, commonly admitted as one of the creators of this technique, is now a rock star in AI, but had rough years in academic research in the past.

CNNs exploded with [AlexNet](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf) in 2012, a CNN that made a breakthrough in the Computer Vision field. Since then, CNNs are quite popular and widely used in the Computer Vision are.

![](images/CNN_features_levels.png)

## III.2. Convolutional Layer

To make a CNN, we need first to make a Convolutional Layer. A convolutional layer is simply a convolution on our input data, with several options to choose: the size of the kernel, the padding and the stride. 

Here is the signature in TensorFlow:

`tensorflow.keras.layers.Conv2D(filters, kernel_size, strides=(1, 1), padding='valid', data_format=None, dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)`

With especially the following parameters to use:
* `filters`: the number of kernels to use
* `kernel_size`: the size of the kernels (e.g. `(3,3)` for a 3x3 kernel)
* `padding`: `valid` for no padding, `same` to keep the same dimensions for the image
* `kernel_regularizer`: to add regularization

We will see how to use it in an example in the last section.

## III.3. Pooling Layer

A pooling layer is a layer that will reduce the size of the images by taking the average or max value of a given number of pixels.

Below is an example of pooling layer, with a kernel of 2x2 and a stride of 2 applied on an image of 4x4:

![](images/max_pooling.jpeg)

A pooling layer is usually used right after a convolutional layer.

Pooling layers are defined in Keras as well, with the following signature:

`tensorflow.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=None, padding='valid', data_format=None)`

Where:
* `pool_size` is the dimensions of the kernel
* `strides` is the stride value, if `None` the value will be the same as `pool_size`
* `padding` can be `valid` for no padding or `same` for keeping image dimension

# IV. LeNet-5 on MNIST digits

## IV.1. Architecture diagram

We will first see the architecture of the LeNet-5: this is the algorithm that was proposed by [LeCun et al.](http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf) for digit classification using convolutional networks.

![](images/Lenet5.png)

We will now decompose and explain the layers step by step:
* Input (32x32): this is the input image of a digit
* C1 (6@28x28): this is a convolutional layer of 3x3 with 6 filters
* S2 (6@14x14): this is a max pooling layer of 2x2
* C3 (16@10x10): this is a convolution layer of 3x3 with 10 filters
* S4 (16@5x5): this is a max pooling layer of 2x2
* C5 (120): this is a fully connected layer (MLP) of 120 units
* F6 (84): this is a fully connected layer (MLP) pf 84 units
* Output (10): this is a softmax layer of 10 units (1 units per digit)

## IV.2. Implementation

The implementation of the LeNet-5 algorithm would be the following in Keras:

In [1]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import MaxPooling2D, Conv2D, Flatten, Dense


def lenet5():
    
    model = Sequential()

    # Layer C1
    model.add(Conv2D(filters=6, kernel_size=(3, 3), activation='relu', input_shape=(28,28,1)))
    # Layer S2
    model.add(MaxPooling2D(pool_size=(2, 2)))
    # Layer C3
    model.add(Conv2D(filters=16, kernel_size=(3, 3), activation='relu'))
    # Layer S4
    model.add(MaxPooling2D(pool_size=(2, 2)))
    # Before going into layer C5, we flatten our units
    model.add(Flatten())
    # Layer C5
    model.add(Dense(units=120, activation='relu'))
    # Layer F6
    model.add(Dense(units=84, activation='relu'))
    # Output layer
    model.add(Dense(units=10, activation = 'softmax'))
    
    return model

## IV.3. Application to MNIST Digits

We will now apply this model to the MNIST digits classification problem.

In [2]:
from tensorflow.keras.datasets import mnist
# Import the dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()

In [3]:
# Rescale the data
X_train = X_train/255.
X_test = X_test/255.

In [4]:
# Reshape
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], X_train.shape[2], 1)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], X_test.shape[2], 1)

In [5]:
from tensorflow.keras.utils import to_categorical
# Transform the targets to categorical vectors
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

In [7]:
from tensorflow.keras.callbacks import EarlyStopping, TensorBoard

# Instantiate the model
model = lenet5()

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Define the callbacks
callbacks = [EarlyStopping(monitor='val_loss', patience=5),
            TensorBoard(log_dir='./Graph', histogram_freq=0, write_graph=True, write_images=True)]


# Finally fit the model
model.fit(x=X_train, y=y_train, validation_data=(X_test, y_test), epochs=10, batch_size=64, callbacks=callbacks)

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7ff9e0590e48>

In [8]:
# Compute the accuracy
print('accuracy on train with NN:', model.evaluate(X_train, y_train)[1])
print('accuracy on test with NN:', model.evaluate(X_test, y_test)[1])

accuracy on train with NN: 0.9963167
accuracy on test with NN: 0.9872


Our algorithm has about 99% accuracy. Amazing, right?!