# Deep Learning week - Day 3 - Exercise 2

Let's move on to bit more complex problem! The goal of this exercise is again to classify images of the CIFAR-10 dataset. This dataset contains images of 10 different classes, being : 
- airplane 										
- automobile 										
- bird 										
- cat 										
- deer 										
- dog 										
- frog 										
- horse 										
- ship 										
- truck

In this notebook, we propose to define a simple baseline CNN to distinguish the 10 categories from the CIFAR-10 dataset.

⚠️ **Warning** ⚠️ For now, computations are done on your CPU : bear in mind that a model training will take ~10 minutes in this notebook, so don't waste your trainings !

## The data

❓ **Question** ❓ Load the data and the associated labels. To load the CIFAR-10 dataset you can use `keras` package ([documentation](https://keras.io/api/datasets/)). What is the shape of the images? How many images do you have per class?

In [None]:
from tensorflow.keras.datasets import cifar10
import numpy as np

(X_train, labels_train), (X_test, labels_test) = cifar10.load_data()

labels = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

# To complete

❓ **Question** ❓ Normalize your data by dividing all the intensities by the maximum value.

In [None]:
# YOUR CODE HERE

❓ **Question** ❓ Display some of the images using the `imshow` function from matplotlib - and print the corresponding class.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

# YOUR PLOT HERE

❓ **Question** ❓ Convert the current labels into one-hot encoded labels - stored in `y_train` and `y_test`.

In [None]:
# YOUR CODE HERE

## The Convolutional Neural Network

Now, let's define the Convolutional Neural Network - CNN. 

❓ **Question** ❓ Define a CNN that is composed of:
- a Conv2D layer with 16 filters, a kernel size of (4, 4), the relu activation function, and a padding equal to `same`
- a MaxPooling2D layer with a pool size of (2, 2)
- a Conv2D layer with 32 filters, a kernel size of (3, 3), the relu activation function, and a padding equal to `same`
- a MaxPooling2D layer with a pool size of (3, 3)
- a Conv2D layer with 64 filters, a kernel size of (3, 3), the relu activation function, and a padding equal to `same`
- a MaxPooling2D layer with a pool size of (3, 3)
- a Flatten layer
- a dense function with 75 neurons with the `relu` activation function
- a dense function related to your task
 
 PS: Do not include the compilation in the function.
 
 ⚠️ **Warning** ⚠️ Do not forget to add the input shape of your data to the first layer. And do not forget that it has three colors ;)

In [None]:
def initialize_model():
    # YOUR CODE HERE

❓ **Question** ❓ What is the number of parameters of your model? 

<details>
   <summary>If you don't remember how to check the number of parameters, click here</summary>
    `model.summary()`
</details>




In [None]:
# YOUR CODE HERE

❓ **Question** ❓ Write a function to compile your model. 

[ Advanced ] It is not mandatory but you can try to use the `adam` optimizer with a learning rate of 0.005

In [None]:
# Import

def compile_model(model):
    # YOUR CODE HERE

❓ **Question** ❓ After compiling your model, fit it on your training data, with an early stopping (patience to 5 to keep fast computations, and the `restore_best_weights` set to True and `min_delta=1e-2` - you can check what it is in the documentation if interested).

Store the output of the fit in an `history` variable.

In [None]:
# YOUR CODE HERE

❓ **Question** ❓ Run the following function on the previous history.

In [None]:
def plot_history(history, title='', axs=None, exp_name=""):
    if axs is not None:
        ax1, ax2 = axs
    else:
        f, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
    
    if len(exp_name) > 0 and exp_name[0] != '_':
        exp_name = '_' + exp_name
    ax1.plot(history.history['loss'], label='train' + exp_name)
    ax1.plot(history.history['val_loss'], label='val' + exp_name)
    ax1.set_ylim(0., 2.2)
    ax1.set_title('loss')
    ax1.legend()

    ax2.plot(history.history['accuracy'], label='train accuracy'  + exp_name)
    ax2.plot(history.history['val_accuracy'], label='val accuracy'  + exp_name)
    ax2.set_ylim(0.25, 1.)
    ax2.set_title('Accuracy')
    ax2.legend()
    return (ax1, ax2)

In [None]:
# YOUR PLOT HERE

❓ **Question** ❓ Evaluate your model on the test data. Are you satisfied with these performances ? What is the chance level on this task ?

In [None]:
# YOUR CODE HERE

## Data augmentation

To easily improve the accuracy of a model without much work, we can generate new data, also called _data augmentation_. It is a very used technique in the case of images as it is possible to apply very easy transformation of your input data without changing the label: mirroring, cropping, intensity changes, etc. This technique is intended to improve the generalization of the model (and thus improving its performance) as you feed your neural network work more images.

The first option to generate new data is to take the initial images, do some transformation and concatenate the new images with the previous one. However such procedure can be very memory intensive and at some point, you won't be able to store additionnal data on your computer memory.

For this reason, we will augment the data _on the fly_, meaning that we will create new data, use them to fit the model, then delete them. To do that, we will directly use Keras utils which does all that job for us. For that reason, the following functions and objects might seem a bit confusing, but don't be disturb: just look at the function arguments that defines the augmentation techniques that we will use and that you can check in the  [documentation](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator).

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
    featurewise_center=False,
    featurewise_std_normalization=False,
    rotation_range=10,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True,
    brightness_range=(0.8, 1.),
    zoom_range=(0.8, 1.2),
    rescale=1./255.) 

datagen.fit(X_train)

Let's now vizualize the input data and what has been generated by the ImageDataGenerator.

In [None]:
import numpy as np

viz_flow = datagen.flow(X_train, shuffle=False, batch_size=1)

for i, (raw_image, augmented_image) in enumerate(zip(X_train, viz_flow)):
    _, (ax1, ax2) = plt.subplots(1, 2, figsize=(6, 2))
    ax1.imshow(raw_image)
    ax2.imshow(augmented_image[0])
    plt.show()
    
    if i > 10:
        break

Now, run the model - you can look at how the datagen object is used in the fitting procedure.

❓ **Question** ❓ Intuitively, do you think the estimation will be faster in terms of number of epochs ? in terms of computation time ?

In [None]:
# The model
model_2 = initialize_model()
model_2 = compile_model(model_2)

# The data generator
X_tr = X_train[:30000]
y_tr = y_train[:30000]
X_val = X_train[30000:]
y_val = y_train[30000:]
train_flow = datagen.flow(X_tr, y_tr, batch_size=16)

# The early stopping criterion
es = EarlyStopping(patience=5, verbose=1, restore_best_weights=True, min_delta=1e-2)

# The fit
history_2 = model_2.fit_generator(train_flow, 
                                  epochs=100, 
                                  callbacks=[es], 
                                  validation_data=(X_val, y_val))


In [None]:
# YOUR ANSWER HERE

❓ **Question** ❓ Now, let's plot the previous and current run histories. What do you think of the data augmentation?

In [None]:
axs = plot_history(history_2, exp_name='data_augmentation')
plot_history(history ,axs=axs, exp_name='baseline')
plt.show()

In [None]:
# YOUR ANSWER HERE

❓ **Question** ❓ Evaluate the model on the test data. Do you see an improvement ?

In [None]:
test_flow = datagen.flow(X_test, y_test)
model_2.evaluate(test_flow)

In [None]:
model.evaluate(X_test, y_test, verbose=0)

In [None]:
# YOUR ANSWER HERE

##  Remark

One thing you have probably noticed in this notebook is that the training is quite long. This is the reason why we stopped training quickly to still have somehow fast experiments. However, in practice, training must be allowed to last longer, with a a stopping criterion that has a lower delta and higher patience!

How can we do that?  Actually, when you run the notebook on your compute, you train the neural network on your CPU. However, training neural network on images (in each batch) can be parallelized, and this parallelization procedure can be done on GPU.

First, you might face the fact that you don't have a GPU on your computer. Bur more importantly, it can be hard to set up the training on the GPU as it requires special hardware, software and sometimes OS. Therefore, we will look at another way to train our CNN on GPU (for free): Google Colab!