# Deep Learning week - Day 3 - CIFAR Classification

### Exercise objectives
- Implement a CNN for a 10-class classification problem
- Enhance the CNN performance with data augmentation techniques
- Experiment the limitations of training network with large images on your computer (more especially on your CPU)

<hr>
<hr>

You should now have a better feeling of how a CNN is working, and especially how the convolutions are affecting the image to detect specific features. Therefore, let's now play with a bit more complex images. The CIFAR-10 dataset is a dataset that contains images of 10 different classes : 
- airplane 										
- automobile 										
- bird 										
- cat 										
- deer 										
- dog 										
- frog 										
- horse 										
- ship 										
- truck

This dataset is emblematic in the research community as many enhancements for image problems have been achieved on this dataset, and later on the CIFAR-100 dataset once the performance got too high. You can check the [wikipedia](https://en.wikipedia.org/wiki/CIFAR-10) page of the dataset if you want to know more about it.

In this notebook, we propose to implement a CNN to distinguish the 10 categories from the CIFAR-10 dataset. Again, remember that until 10 years ago, this problem was very challenging to the entire research community and is now for you to tackle.


⚠️ **Warning** ⚠️ In this exercise, computations are done on your computer, and most probably on your CPU. Therefore, bear in mind that a model training will take ~10 minutes on the entire dataset. You will here experiment the fact that these computations are heavy and requires a lot of computational power. We will see in the next exercise how to overcome this problem. Until then, there is a trick that you can use in any ML problem if you are still in the experimental / design part but face long waiting times : select a subset of your data (for instance `X_train = X_train[:100]`) to code all the different parts without taking care of the performance. Once the code is ready, you can seek for performance by taking the entire dataset.



# Data

❓ **Question** ❓ To load the CIFAR-10 dataset you can use `keras` package directly (see [documentation](https://www.tensorflow.org/api_docs/python/tf/keras/datasets/cifar10)). What is the shape of the images? How many images do you have per class?

In [None]:
from tensorflow.keras.datasets import cifar10
import numpy as np

(X_train, labels_train), (X_test, labels_test) = cifar10.load_data()

labels = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

# YOUR CODE HERE

❓ **Question** ❓ To ease the convergence of the algorithm, it is usefull to normalize the data. See here what are the maximum and minimum values in your data, and normalize it accordingly (the resulting image intensities should be between 0 and 1).

In [None]:
# YOUR CODE HERE

❓ **Question** ❓ Display some of the images and their relative class. You can use the `imshow` function from matplotlib.

In [None]:
# YOUR CODE HERE

In [None]:
##############
### Answer ###
##############

import matplotlib.pyplot as plt
%matplotlib inline

for i in range(5):
    img = X_train[i]
    label = labels_train[i][0]
    
    plt.figure(figsize=(2,2))
    plt.imshow(img)
    plt.title(labels[label])
    plt.show()

❓ **Question** ❓ The labels (`labels_train` and `labels_test`) are stored as list of integers. Convert it to one-hot encoded labels so that they can be used to train a classification neural network. You can use the `to_categorical` function from Keras. Store the the categories in `y_train` and `y_test`.

In [None]:
# YOUR CODE HERE

# Convolutional Neural Network

Now, let's define the Convolutional Neural Network. 

❓ **Question** ❓ Define a CNN that is composed of:
- a Conv2D layer with 32 filters, a kernel size of (3, 3), the relu activation function, and a padding equal to `same`
- a MaxPooling2D layer with a pool size of (2, 2)
- a Conv2D layer with 64 filters, a kernel size of (3, 3), the relu activation function, and a padding equal to `same`
- a MaxPooling2D layer with a pool size of (2, 2)
- a Conv2D layer with 128 filters, a kernel size of (3, 3), the relu activation function, and a padding equal to `same`
- a MaxPooling2D layer with a pool size of (3, 3)
- a Flatten layer
- a dense function with 120 neurons with the `relu` activation function
- a dense function with 60 neurons with the `relu` activation function
- a dropout layer (with a rate of 0.5), to regularize the network
- a dense function related to your task
 
 ⚠️ **Warning** ⚠️  Do not include the compilation in the function.
 
 ⚠️ **Warning** ⚠️ Do not forget to add the input shape of your data to the first layer. And do not forget that it has three colors ;)

In [None]:
def initialize_model():
    # YOUR CODE HERE

❓ **Question** ❓ What is the number of parameters of your model? 

<details>
   <summary>If you don't remember how to check the number of parameters, click >>here<<</summary>
    `model.summary()`
</details>




In [None]:
# YOUR CODE HERE

❓ **Question** ❓ Write a function to compile your model. 

In [None]:
def compile_model(model):
    # YOUR CODE HERE

❓ **Question** ❓ Compile your model and fit it on your training data, with an early stopping (patience to 5 to keep fast computations).

Store the output of the fit in an `history` variable.

In [None]:
# YOUR CODE HERE

❓ **Question** ❓ Run the following function on the previous history (keep the default arguments, these are intended for future plots in the notebook).

In [None]:
def plot_history(history, title='', axs=None, exp_name=""):
    if axs is not None:
        ax1, ax2 = axs
    else:
        f, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
    
    if len(exp_name) > 0 and exp_name[0] != '_':
        exp_name = '_' + exp_name
    ax1.plot(history.history['loss'], label='train' + exp_name)
    ax1.plot(history.history['val_loss'], label='val' + exp_name)
    ax1.set_ylim(0., 2.2)
    ax1.set_title('loss')
    ax1.legend()

    ax2.plot(history.history['accuracy'], label='train accuracy'  + exp_name)
    ax2.plot(history.history['val_accuracy'], label='val accuracy'  + exp_name)
    ax2.set_ylim(0.25, 1.)
    ax2.set_title('Accuracy')
    ax2.legend()
    return (ax1, ax2)

# YOUR CODE HERE

❓ **Question** ❓ Evaluate your model on the test data. Are you satisfied with these performances ? What is the chance level on this task ?

In [None]:
# YOUR CODE HERE

# Data augmentation

To easily improve the accuracy of a model without much work, we can generate new data: the _data augmentation_. This widely used technique consists in applying little transformation to input images without changing its label, as mirroring, cropping, intensity changes, etc. The improved performance simply results from the Neural network training with more different data.

The natural way to generate these new images is to apply some transformations and train the model on the original and new images. However, such procedure requires to keep all these images in memory : it can be very intensive, to the point that your computer memory cannot hold any new image (your computer might even crash).

For this reason, we will augment the data _on the fly_, meaning that we will create new data, use them to fit the model, then delete them. Here, Keras is our friend as it provides the utils to do all this job for us. Look at the following code : the general writing can seem odd but don't be panicked: just look at the function arguments that defines the augmentation techniques that we will use and that you can check in the  [documentation](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator).

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
    featurewise_center=False,
    featurewise_std_normalization=False,
    rotation_range=10,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True,
    zoom_range=(0.8, 1.2),) 

datagen.fit(X_train)

You can vizualize the input image and the transformed one with the following code:

In [None]:
import numpy as np

X_augmented = datagen.flow(X_train, shuffle=False, batch_size=1)

for i, (raw_image, augmented_image) in enumerate(zip(X_train, X_augmented)):
    _, (ax1, ax2) = plt.subplots(1, 2, figsize=(6, 2))
    ax1.imshow(raw_image)
    ax2.imshow(augmented_image[0])
    plt.show()
    
    if i > 10:
        break

❗ **Remark** ❗ In this example, there is one augmented image per initial image. In fact, it is possible to have many augmented images per initial image.

❓ **Question** ❓ Previously, we used the `validation_split` argument to let the model separate a training set from the validation one. It is not possible here as using an image in the training set and its transformation in the validation set is considered as a data leakage. Therefore, we have to manually define the `validation_data` with the following commands:

In [None]:
# The model
model_2 = initialize_model()
model_2 = compile_model(model_2)

# The data generator
X_tr = X_train[:40000]
y_tr = y_train[:40000]
X_val = X_train[40000:]
y_val = y_train[40000:]
train_flow = datagen.flow(X_tr, y_tr, batch_size=32)

# The early stopping criterion
es = EarlyStopping(patience=5)

# The fit
history_2 = model_2.fit(train_flow, 
                        epochs=100, 
                        callbacks=[es], 
                        validation_data=(X_val, y_val))


### Remark: The training can be quite long here. You can go to the next exercise and gome back once in a while to finish the last questions

❓ **Question** ❓ Now, let's plot the previous and current run histories. What do you think of the data augmentation?

In [None]:
axs = plot_history(history_2, exp_name='data_augmentation')
plot_history(history ,axs=axs, exp_name='baseline')
plt.show()

❓ **Question** ❓ Evaluate the model on the test data. Do you see an improvement ?

In [None]:
# YOUR CODE HERE

##  Remark

One thing you have probably noticed in this notebook is that the training is quite long. This is the reason why we stopped training quickly to still have somehow fast experiments. However, in practice, training must be allowed to last longer, with a a stopping criterion that has a lower delta and higher patience!

How can we do that?  Actually, when you run the notebook on your compute, you train the neural network on your CPU. However, training neural network on images (in each batch) can be parallelized, and this parallelization procedure can be done on GPU.

First, you might face the fact that you don't have a GPU on your computer. Bur more importantly, it can be hard to set up the training on the GPU as it requires special hardware, software and sometimes OS. Therefore, we will look at another way to train our CNN on GPU (for free): Google Colab!