In [12]:
import tensorflow
import matplotlib.pyplot as plt
from __future__ import print_function
from tensorflow import keras
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
import numpy as np
import os

## Plan and objectives:

In this notebook, we propose to define a simple baseline cnn, train it and evaluate it on cifar-10 dataset. This dataset consists of 32x32 images of 10 different categories. 

 We will then introduce data augmentation: a method designed to arbitrarily augment the training data set by computing perturbations of the train images (random crops, intensity changes etc) and show how it allows to improve the generalization.

For now, we stick to computations on CPU, but bear in mind that a model training takes ~10 minutes on CPU in this notebook, so don't waste your trainings !

When you reach the end of the data augmentation, please go to exercise 2 and 3 before coming back to dropout and batch normalization.


## 1. Loading the data

In the following cell, we load the data using the keras utility.

In [None]:
# Loading the data
from tensorflow.keras.datasets import cifar10
labels = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = 10
(x_train, labels_train), (x_test, labels_test) = cifar10.load_data()

#### Compute the maximum of x_train and x_test.

#### Divide the image intensities by 255. Why are we doing that ?

#### Display a few images (using `plt.imshow(img_array)`) and the corresponding classes.

#### How many images are there ? What sizes have the images ? What sizes have the labels ? How do you deal with these categorical labels ? Convert the labels appropriately into `y_train` and `y_test` arrays.

#### Split the training data `x_train` and `y_train` into `x_val`, `x_train`, `y_val`, `y_train`: the validation set should contain the last 25000 images and the training set should contain the first 25000 images of the original training set returned by `keras.cifar10`. Print the shapes of all your data arrays.

#### Define a convolution network model to classify these images, using 3 convolution layers. Try to keep a low number of parameters (say < 50000 to keep fast experiments). Use `model.summary()` to see details about the different layers of your model.
See https://keras.io/layers/convolutional/

In [None]:
def build_model():
    model = Sequential()
    # model.add...
    
    return model

model = build_model()
model.summary()

#### Compile your model. What is the appropriate loss function ? What other metrics can you use to monitor training ? Use Adam with a learning rate of 1e-3 as optimizer.

#### Define an EarlyStopping criterion on the validation loss evaluated on the validation data. (`restore_best_weights=True` as option to the EarlyStopping and a patience of no more than `5`to keep fast computations). In principle, do you think it is more advisable to define the stopping criterion on the  loss or on the accuracy metric ?

#### Fit the model, specify the validation data using the `validation_data` keyword (so that everyone has the same).

In [None]:
history = model.fit(x_train, y_train, validation_data=(x_val, y_val), callbacks=[es], epochs=100, batch_size=32)

#### Below we provide a utility to plot histories from the estimation of the model.

In [None]:
def plot_history(history, title='', axs=None, exp_name=""):
    if axs is not None:
        ax1, ax2 = axs
    else:
        f, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
    
    if len(exp_name) > 0 and exp_name[0] != '_':
        exp_name = '_' + exp_name
    ax1.plot(history.history['loss'], label='train' + exp_name)
    ax1.plot(history.history['val_loss'], label='val' + exp_name)
    ax1.set_ylim(0., 2.2)
    ax1.set_title('BCE loss')
    ax1.legend()

    ax2.plot(history.history['accuracy'], label='train accuracy'  + exp_name)
    ax2.plot(history.history['val_accuracy'], label='val accuracy'  + exp_name)
    ax2.set_ylim(0.25, 1.)
    ax2.set_title('Accuracy')
    ax2.legend()
    return (ax1, ax2)

#### Use `plot_history` to plot the previous history.

#### Evaluate your model on the test data. Are you satisfied with these performances ? What is the chance level on this task ?

#### In the test set, find the images for which the true label was not among the 3 top labels predicted by the model. Looking at the largest errors of a model is a good way to understand what is happening and to obtain ideas for improvement. (hard question, ask for help)

## 2. Adding data augmentation
A common idea in deep learning is the notion of data augmentation. To make the network more resilient to new unseen data, we enrich, at each epoch and on the fly, the training set by modifications of the input images. The augmentations are multiple and problem-dependent: gaussian noise addition, contrast change, random translation, random crop, horizontal and vertical flip, elastic transform... All these operations can be achieved using an ImageDataGenerator object from the keras package: https://keras.io/preprocessing/image/

#### Below is an image data generator. Call its `fit` method on the appropriate data.
See https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator


In [None]:
from keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
    featurewise_center=False,
    featurewise_std_normalization=False,
    rotation_range=10,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True,
    brightness_range=(0.8, 1.),
    zoom_range=(0.8, 1.2),
    rescale=1./255,)

datagen.fit(??????)

#### For the first 10 images of the train set, plot the original image and a modification of this image using the data generator. 

In [None]:
# declaring the generator flow
viz_flow = datagen.flow(x_train, shuffle=False, batch_size=1)

# plot images and their augmented versions by the generator:

#### Fit the model using these augmented data, store the history in `history_data_aug`. Do you think the estimation will be faster in terms of number of epochs ? in terms of computational time ?

In [None]:
# compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied)

model_data_aug = build_model()

model.compile(????)

# Defining the iterators from the generator:
train_flow = datagen.flow(x_train, y_train, batch_size=16)
val_flow = datagen.flow(x_val, y_val, batch_size=128, shuffle=False)

# Stopping criterion
es = EarlyStopping(????)

history_data_aug = model_data_aug.fit_generator(????)

#### Plot the history information of the estimation, compare with the first run without data augmentation (on the same plot). Is the training faster or slower (in terms of number of epochs) ?

#### Evaluate the model on the test data. Do you see an improvement ?

## 3. Dropout (GO TO NOTEBOOK 2 and 3 before coming back here!)
Dropout is a technique to regularize a neural network and prevent overfitting. A Dropout layer with parameter $0 < p < 1$ randomly sets activations to zero. 
#### Add dropout layers to your model, with `p=0.1`, after each Conv2D layer.

In [None]:
def build_model_with_dropout():
    model = Sequential()
    
    ###
    ### TO COMPLETE
    ###
    
    return model

In [None]:
model_dropout = build_model_with_dropout()
model_dropout.compile(????)
model_dropout.summary()

#### Estimate your model on the data (without augmentation), with the same EarlyStopping criterion as before.

In [None]:
es = EarlyStopping(????)
history_dropout = model_dropout.fit(????)

#### Plot the history of the accuracies and loss, and compare the baseline run. Comments ?

#### Evaluate your trained model on the test data.

## 4. Batch normalization
#### As before, code a model with batch normalization after each Conv2D (don't forget to specify the right axis!), train it, plot the history and evaluate the performances.

In [None]:
from tensorflow.keras.layers import BatchNormalization

def build_model_with_batch_norm():
    model = Sequential()
    
    ####
    #### TO COMPLETE
    ####
    
    return model

In [6]:
### declare optimizer, compile

In [7]:
### fit and store history

In [None]:
### plot history

In [8]:
### evaluate on test

## 5. Remarks
 1. In these experiments, we stopped training quickly to have fast experiments. In practice, training must be allowed to last longer, with a stricter stopping criterion (lower delta and higher patience)
 2. The usage of a specific data augmentation, batch norm or dropout should be motivated by the data and by the results (properly estimated). In practice, the best method is to code a proper cross-validation (e.g. 10-fold) coupled with a grid search on all the possible hyper parameters (dropout probabilities, amplitudes of data augmentation, learning rate...) and to keep the best performing method.