# Deep Learning week - Day 3 - Transfer Learning

### Exercise objetives
- Get familiar with Google Colab
- Use a pretrained neural network : Transfer learning

<hr>
<hr>


# Google Colab

Once again, use Google Colab to run the following notebook. Do not forget to set the runtime type to GPU.


# The exercise


This notebook is dedicated to **transfer learning**. 

We have seen that the convolutions are mathematical operation that detect specific patterns in input images and use them to classify the image. One could imagine that these patterns are not 100% specific to the task but to the input images. Therefore, why not using convolutions that have been learnt on other task with the expectation that it will also work in other scenario. This has two advantages: taking less time to train and benefiting from complex architecture that have been trained for state-of-the-art challenges. We here _transfer_ a CNN from one task to another => _transfer learning_. 


⚠️ The convolutions may not be specific! However, the last layer is by design specific to the problem it was trained on! Therefore, this last layer is usually removed, replace by a layer that is design to the task. As this new last layer has random weight, it has to be retrained. This is called _fine-tunning_. 


In this exercise, we will use the [VGG-16 Neural Network](https://neurohive.io/en/popular-networks/vgg16/), a well-known architecture that has been trained on ImageNet which is a very large database of images of different categories. In a nutshell, this architecture has already learnt kernels which are supposed to be good not only for the task it has been train on but maybe for other tasks. 

The idea is that first layers are not specialized for the particular task it has been trained on ; only the last ones are. Therefore, we will load the existing VGG16 network, remove the last fully connected layers, replace them by new connected layers (whose weights are randomly set), and train these last layers on a specific classification task - here, separate types of flower. The underlying idea is that the first convolutional layers of VGG-16, that has already been trained, corresponds to filters that are able to extract meaning features from images. And you will only learn the last layers for your particular problem.


# Data loading & Preprocessing

Here, we will load the same data as in the previous exercise and try to improve our previous performance.

❓ **Question** ❓ As in the previous exercise, load the flower picture data and normalize them. You can get back to the previous exercise to get the usefull links and functions.

⚠️ **Warning** ⚠️ DO NOT NORMALIZE THE DATA! You will see later why.

In [2]:
!wget https://wagon-public-datasets.s3.amazonaws.com/flowers-dataset.zip

--2020-11-18 15:28:29--  https://wagon-public-datasets.s3.amazonaws.com/flowers-dataset.zip
Resolving wagon-public-datasets.s3.amazonaws.com (wagon-public-datasets.s3.amazonaws.com)... 52.218.97.42
Connecting to wagon-public-datasets.s3.amazonaws.com (wagon-public-datasets.s3.amazonaws.com)|52.218.97.42|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 104983809 (100M) [application/zip]
Saving to: ‘flowers-dataset.zip’


2020-11-18 15:28:33 (28.6 MB/s) - ‘flowers-dataset.zip’ saved [104983809/104983809]



In [None]:
!unzip flowers-dataset.zip

In [8]:
from tensorflow.keras.utils import to_categorical
from tqdm import tqdm
import numpy as np
import os
from PIL import Image

def load_flowers_data(loading_method):
    if loading_method == 'colab':
        data_path = '/content/drive/My Drive/Deep_learning_data/flowers'
    elif loading_method == 'direct':
        data_path = 'flowers/'
    classes = {'daisy':0, 'dandelion':1, 'rose':2}
    imgs = []
    labels = []
    for (cl, i) in classes.items():
        images_path = [elt for elt in os.listdir(os.path.join(data_path, cl)) if elt.find('.jpg')>0]
        for img in tqdm(images_path[:300]):
            path = os.path.join(data_path, cl, img)
            if os.path.exists(path):
                image = Image.open(path)
                image = image.resize((256, 256))
                imgs.append(np.array(image))
                labels.append(i)

    X = np.array(imgs)
    num_classes = len(set(labels))
    y = to_categorical(labels, num_classes)

    # Finally we shuffle:
    p = np.random.permutation(len(X))
    X, y = X[p], y[p]

    first_split = int(len(imgs) /6.)
    second_split = first_split + int(len(imgs) * 0.2)
    X_test, X_val, X_train = X[:first_split], X[first_split:second_split], X[second_split:]
    y_test, y_val, y_train = y[:first_split], y[first_split:second_split], y[second_split:]
    
    return X_train, y_train, X_val, y_val, X_test, y_test, num_classes

In [9]:
X_train, y_train, X_val, y_val, X_test, y_test, num_classes = load_flowers_data('direct')

100%|██████████| 300/300 [00:00<00:00, 301.07it/s]
100%|██████████| 300/300 [00:01<00:00, 282.37it/s]
100%|██████████| 299/299 [00:01<00:00, 263.95it/s]


# Transfer learning: VGG16 model

Let's now build our model. 

❓ **Question** ❓ Write a first function `load_model()` that loads the pretrained VGG-16 model from `tensorflow.keras.applications.vgg16`. Especially, look at the [documentation](https://www.tensorflow.org/api_docs/python/tf/keras/applications/VGG16) to load the model where:
- the `weights` have been learnt on `imagenet`
- the `input_shape` corresponds to the input shape of any of your images - you have to resize them in case they are not of the same size
- the `include_top` argument is set to `False` in order not to load the fully-connected layers of the VGG-16 without the last layer which was specifically trained on `imagenet`

❗ **Remark** ❗ Do not change the default value of the other arguments

In [10]:
X_train.shape

(571, 256, 256, 3)

In [12]:
from tensorflow.keras.applications.vgg16 import VGG16

def load_model():
    
    model = VGG16(
      include_top=False, input_shape=(256, 256, 3)
      )
    
    return model
model = load_model()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5


❓ **Question** ❓ Look at the architecture of the model thanks to the summary method

In [13]:
model.summary()

Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 256, 256, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 256, 256, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 256, 256, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 128, 128, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 128, 128, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 128, 128, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 64, 64, 128)       0     

Impressive, right? Two things to notice:
- It ends with a convolution layer (namely a maxpooling layer that is the layer that follows a convolution). The flattening of the output and the fully connected layers are not here yet! We need to add them !
- There are more than 14.000.000 parameters, which is a lot. We could fine-tune them, meaning update them as we will update the last layers weights, but it will take a lot of time. For that reason, we will inform the model that the layers until the flattening are non-trainable.

❓ **Question** ❓ Write a first function that takes the previous model as input the set the girst layers to be non-trainable, by applying `model.trainable = False`. Then check-out the summary of the model to see that now, the parameters are `non-trainable`



In [14]:
def set_nontrainable_layers(model):
    model.trainable = False
    return model
    
model = set_nontrainable_layers(model)
model.summary()

Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 256, 256, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 256, 256, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 256, 256, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 128, 128, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 128, 128, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 128, 128, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 64, 64, 128)       0     

❓ **Question** ❓ We will write a function that adds flattening and dense layers after the first convolutional layers. To do so, cannot directly use the classic `layers.Sequential()` instantiation.

For that reason, we will see another one here. The idea is that we define each layer (or group of layers) separately. Then, we concatenate them. See this example : 


```
base_model = load_model()
base_model = set_nontrainable_layers(base_model):
flattening_layer = layers.Flatten()
dense_layer = layers.Dense(SOME_NUMBER_1, activation='relu')
prediction_layer = layers.Dense(SOME_NUMBER_2, activation='APPROPRIATE_ACTIVATION')

model = tf.keras.Sequential([
  base_model,
  flattening_layer,
  dense_layer,
  prediction_layer
])

```

The first line loads a group of layer which is the previous VGG-16 model. Then, we set this layers to be non-tranable. Then, we can instantiate as many layers as we want.

Finally, we use the `Sequential` with the sequence of layers that will correspond to our overall neural network. 

Replicate the following steps by adding a flattening and two dense layers (the first with 500 neurons) to the previous VGG-16 model (do not forget to set the layers to be non-trainable).

In [24]:
from tensorflow.keras import layers
from tensorflow.keras import models

def add_last_layers():
    base_model = load_model()
    base_model = set_nontrainable_layers(base_model)
    flattening_layer = layers.Flatten()
    dense_layer = layers.Dense(500, activation='relu')
    dense_layer2 = layers.Dense(100, activation='relu')
    prediction_layer = layers.Dense(3, activation='softmax')
    
    model = Sequential([
      base_model,
      flattening_layer,
      dense_layer,
      dense_layer2,
      prediction_layer
    ])
    return model

In [25]:

model = add_last_layers()

❓ **Question** ❓ Now look at the layers and parameters of your model. Note that there is a distinction, at the end, between the trainable and non-trainable parameters

In [26]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
vgg16 (Functional)           (None, 8, 8, 512)         14714688  
_________________________________________________________________
flatten_2 (Flatten)          (None, 32768)             0         
_________________________________________________________________
dense_6 (Dense)              (None, 500)               16384500  
_________________________________________________________________
dense_7 (Dense)              (None, 100)               50100     
_________________________________________________________________
dense_8 (Dense)              (None, 3)                 303       
Total params: 31,149,591
Trainable params: 16,434,903
Non-trainable params: 14,714,688
_________________________________________________________________


❓ **Question** ❓ Write a function to compile your model - we advise Adam with `learning_rate=1e-4`. 

In [31]:
from tensorflow.keras.optimizers import Adam

adam = Adam(learning_rate=1e-4)

In [None]:
def compile(model):
    model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
    return model

❓ **Question** ❓ Write an overall function that :
- loads the model
- updates the layers
- compiles it

In [None]:

def build_model():
    # YOUR CODE HERE
    return model

model = build_model()

# Back to the data

The VGG16 model was trained on images which were preprocessed in a specific way. This is the reason why we did not normalized them earlier.

❓ **Question** ❓ Apply this processing to the images here using the method `preprocess_input` that you can import from `tensorflow.keras.applications.vgg16`. See [documentation](https://www.tensorflow.org/api_docs/python/tf/keras/applications/vgg16/preprocess_input).

In [None]:
from tensorflow.keras.applications.vgg16 import preprocess_input

# YOUR CODE HERE

# Run the model

❓ **Question** ❓ Now estimate the model, with an early stopping criterion on the validation accuracy - here, the validation data are provided, therefore use `validation_data` instead of `validation_split`.

❗ **Remark** ❗ Store the results in a `history` variable

# YOUR CODE HERE

❓ **Question** ❓ Plot the accuracy for the test and validation set.

In [None]:
# YOUR CODE HERE

❓ **Question** ❓ Evaluate the model accuracy on the test set. What is the chance level on this classification task (i.e. accuracy of a random classifier).

In [None]:
# YOUR CODE HERE

# Data augmentation

The next question are a less guided as they directly derive from what you have done in the previous exercise - don't hesitate to come back to what you have done.

❓ **Question** ❓ Use some data augmentation techniques for this task - you can store the fitting in a `history_data_aug` variable that you can plot. Do you see an improvement ? Don't forget to evaluate it on the test set

In [None]:
# YOUR CODE HERE

# Improve the model

You can here try to improve the model test accuracy. To do that, here are some options you can consider

1) Is my model overfitting ? If yes, you can try more data augmentation. If no, try a more complex model (unlikely the case here)

2) Perform precise grid search on all the hyper-parameters: learning_rate, batch_size, data augmentation etc...

3) Change the base model to more modern one (resnet, efficient nets) available in the keras library

4) Curate the data: maintaining a sane data set is one of the keys to success.

5) Obtain more data


❗ **Remark** ❗ Note also that it is good practice to perform a real cross-validation. You can also try to do that here to be sure of your results.