# Deep Learning week - Day 3 - Exercise 1

Let's imagine for a moment that you are working for the postal office (and you're in 1970 / 1980). You deal everyday with a enourmous amont of letters, and you want to automate the process of reading the numbers that have been handwritten. This task, called the _Handwriting Recognition_, has been a very complex that has been handled by Bell Labs (among other) where Yann Le Cun used to work, and where such things have been developed : 

![Number recognition](recognition.gif)


The idea is that you have an image (not a video: the animation is here to present what happens with different images) as an input and you try to predict the figure on the image - it corresponds to a classification task, where the output is the class (=figure) the image belongs to, from 0 to 9.

This task used to be quite complex back in the time, and still is a benchmark on which a lot of people work. For this reason, the MNIST (for *Modified ou Mixed National Institute of Standards and Technology*) dataset has been created: it corresponds to digit images, from 0 to 9. 

You goal in this notebook is to build your first Convolution Neural Network that can work on such images and predict the corresponding class of each digit image. Keep in mind that your first simple CNN will make you classify hand-written digits, which was a very complex task till the 90's. 

## The data

Keras provides multiple datasets within the Python package. You can load it with the following commands:

In [None]:
from tensorflow.keras import datasets

(X_train, y_train), (X_test, y_test) = datasets.mnist.load_data(path="mnist.npz")

❓ Question ❓ Let's look at some of the data. 

Select some of the values of the train set and plot them thanks to the `imshow` function from matplotlib with `cmap` set to `gray`(otherwise, the displayed colors are just some arrangement Matplotlib does, which does not exist in practice).

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

# YOUR PLOT HERE

Remember that neural networks converge faster when the input data are somehow normalized? It goes similarly for input images. 

❓ Question ❓ As a first preprocessing step, you should normalize your data. For images, it simply implies to divide your input data by the maximal value, i.e. 255. Don't forget to do it on your train and test data.

(N.B.: you can also centered your data, by substracting 0.5 but it is not mandatory). 

In [None]:
# YOUR CODE HERE

❓ Question ❓ What is the shape of your images?

In [None]:
# YOUR CODE HERE

You see that you have 60.000 training images, all of size (28, 28). However, Keras needs images whose last dimension is the number of channels, which is missing here.

❓ Question ❓ Use the `expand_dims` to add one dimension at the end of the training and test data. Then, print the shape of X_train and X_test that should respectively be (60000, 28, 28, 1) and (10000, 28, 28, 1).

In [None]:
from tensorflow.keras.backend import expand_dims

# YOUR CODE HERE

A last thing to do to prepare your data is to convert your labels to one-hot encoded categories.

❓ Question ❓ Use `to_categorical` to transform your labels. Store the results in `y_train_cat` and `y_test_cat`.

In [None]:
from tensorflow.keras.utils import to_categorical

# YOUR CODE HERE

The data are now ready to be used.

## The Convolutional Neural Network _aka_ CNN

Now, build a Convolutional Neural Network. 

❓ Question ❓ Based on the course, build a neural network that has:
- a Conv2D layer with 8 filters, each of size (4, 4), with an input shape of (28, 28, 1), and the relu activation function
- a MaxPool2D layer with a pool_size of (2, 2)
- a Flatten layer
- a first Dense layer with 10 neurons and the relu activation function
- a last layer that is suited for your task

In the function, do not forget to include the compilation of the model, which optimizes the `categorical_crossentropy` with the adam optimizer - and the accuracy should be among the metrics.

In [None]:
from tensorflow.keras import layers
from tensorflow.keras import models


def initialize_model():
    ### First convolution & max-pooling
    # YOUR CODE HERE

    ### Flattening
    # YOUR CODE HERE

    ### One fully connected
    # YOUR CODE HERE

    ### Last layer (let's say a classification with 10 output)
    # YOUR CODE HERE
    
    ### Model compilation
    # YOUR CODE HERE

❓ Question ❓ Initialize your model and fit it on the train data. Do not forget to use a validation set and an early stopping criterion (with a patience of 2).

In [None]:
from tensorflow.keras.callbacks import EarlyStopping

# YOUR CODE HERE

You probably see that the model converges within few epochs. The reason is that there are as many weight update as there are batches within each epoch. For instance, if you batch_size is of 32, you have 60.000/32 = 1875 updates.


❓ Question ❓ What is your accuracy on the test set?

In [None]:
# YOUR CODE HERE

### You should be already impressed by your skills! You solved what was a very hard problem 30 years ago with your first CNN. Let's move on to a second problem.