# Deep Learning - Exercise 4

This lecture is about advanced topics of the CNN such as transfer learning and 1D convolutions for time-series processing.

We use CIFAR-10 dataset again and [FordA](https://www.timeseriesclassification.com/description.php?Dataset=FordA) for TS classification task.

[Open in Google colab](https://colab.research.google.com/github/rasvob/VSB-FEI-Deep-Learning-Exercises/blob/main/dl_04.ipynb)
[Download from Github](https://github.com/rasvob/VSB-FEI-Deep-Learning-Exercises/blob/main/dl_04.ipynb)

##### Remember to set **GPU** runtime in Colab!

In [None]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import matplotlib.pyplot as plt # plotting
import matplotlib.image as mpimg # images
import numpy as np #numpy
import tensorflow as tf
import tensorflow.keras as keras
import requests

from tensorflow.keras.layers import Activation
from tensorflow.keras.utils import get_custom_objects
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.applications import VGG19

from sklearn.model_selection import train_test_split # split for validation sets
from sklearn.preprocessing import normalize # normalization of the matrix
from scipy.signal import convolve2d # convolutionof the 2D signals
from sklearn.metrics import confusion_matrix

from PIL import Image
from io import BytesIO


tf.version.VERSION

In [None]:
def show_history(history):
    plt.figure()
    for key in history.history.keys():
        plt.plot(history.epoch, history.history[key], label=key)
    plt.legend()
    plt.tight_layout()

def show_example(train_x, train_y, class_names):
    plt.figure(figsize=(10,10))
    for i in range(25):
        plt.subplot(5,5,i+1)
        plt.xticks([])
        plt.yticks([])
        plt.grid(False)
        plt.imshow(train_x[i], cmap=plt.cm.binary)
        plt.xlabel(class_names[train_y[i][0]])
    plt.show()
    
def display_activation(activations, col_size, row_size, act_index):
    activation = activations[act_index]
    activation_index=0
    fig, ax = plt.subplots(row_size, col_size, figsize=(row_size*2.5,col_size*1.5))
    for row in range(0,row_size):
        for col in range(0,col_size):
            ax[row][col].imshow(activation[0, :, :, activation_index], cmap='gray')
            if activation_index < activation.shape[3]-1:
                activation_index += 1

## 🔎 What is transfer learning? 🔎

* Transfer learning consists of taking features learned on one problem, and leveraging them on a new, similar problem. For instance, features from a model that has learned to identify cars may be useful to kick-start a model meant to identify trucks.
    * 🔎 Do you know any famous CNN models?

* Transfer learning is usually done for tasks where your dataset has too little data to train a full-scale model from scratch.
    * 🔎 Why do we benefit from it?
    
### Usual pipeline

1) Take layers from a previously trained model.

2) Freeze them, so as to avoid destroying any of the information they contain during future training rounds.

3) Add some new, trainable layers on top of the frozen layers. They will learn to turn the old features into predictions on a new dataset.

4) Train the new layers on your dataset.

* Optional: Fine-tuning = unfreezing the entire model you obtained above (or part of it), and re-training it on the new data with a very low learning rate. This can potentially achieve meaningful improvements, by incrementally adapting the pretrained features to the new data.
    * 🔎 Why do we use small learning rate?


# Let's start

### Import dataset **CIFAR10** again
* The CIFAR-10 dataset contains 60,000 32x32 color images in 10 different classes. 
* The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. 
* There are 6,000 images of each class.

Dataset is:
* downloaded
* splitted into train and test set
* converted from the range <0,255> into <0, 1>
* *train* is splitted into *train* and *validation* set 
* class names are defined

### We will resize the images into (224, 224) shape because we will use ResNet50 later and one-hot encode our labels
* If you do not encode the lables you will run into shape mismatch error which is hard to debug - trust me, I've been there 🙂

In [None]:
# cifar is the basic dataset for image classifaction
dataset = tf.keras.datasets.cifar10
img_size = 224

# data from any dataset are loaded using the load_Data function
(train_x, train_y), (test_x, test_y) = dataset.load_data()

train_y = tf.keras.utils.to_categorical(train_y, class_count)
test_y = tf.keras.utils.to_categorical(test_y, class_count)

rs_train_x = tf.image.resize(train_x, size=[img_size, img_size]).numpy()
rs_test_x = tf.image.resize(test_x, size=[img_size, img_size]).numpy()

rs_train_x = rs_train_x/255.0
rs_test_x = rs_test_x/255.0

train_x, valid_x, train_y, valid_y = train_test_split(rs_train_x, train_y, test_size=0.2, random_state=42)
test_x = rs_test_x

# the data are in the form of 32x32 pixes with values 0-255.
print('Train data shape: ', train_x.shape, train_y.shape)
print('Validation data shape: ', valid_x.shape, valid_y.shape)
print('Test data shape:  ', test_x.shape, test_y.shape)

class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
class_count = len(class_names)
print('Class count:', class_count, class_names)

In [None]:
show_example(train_x, np.argmax(train_y, axis=1).reshape(-1, 1), class_names)

## First, instantiate a base model with pre-trained weights.
* What the **include_top** do?
* What means **weights='imagenet'** parameter? Do we need it? What happens if we use random weights?

In [None]:
base_model = ResNet50(
    weights='imagenet',  # Load weights pre-trained on ImageNet.
    input_shape=(img_size, img_size, 3),
    include_top=False)  # Do not include the ImageNet classifier at the top.

## ⚠ IMPORTANT: Freeze the base model ⚠

In [None]:
base_model.trainable = False

## Create a model input and output layers and interconnect all the parts together

#### We make sure that the base_model is running in inference mode here, by passing `training=False`.

* Important notes about BatchNormalization layer ⚠
    * Many image models contain **BatchNormalization** layers. That layer is a special case on every imaginable count. Here are a few things to keep in mind.
    * BatchNormalization contains 2 non-trainable weights that get updated during training. These are the variables **tracking the mean and variance of the inputs**.
    * When you **unfreeze** a model that contains BatchNormalization layers in order to do **fine-tuning**, you should **keep the BatchNormalization layers in inference mode by passing training=False** when calling the base model. 
        * **Otherwise the updates applied to the non-trainable weights will suddenly destroy what the model has learned.**


* What the **GlobalAveragePooling2D** layer does?
    * After convolutional operations, *tf.keras.layers.Flatten* will reshape a tensor into (n_samples, height*width*channels), for example turning (16, 28, 28, 3) into (16, 2352)
    * *GlobalAveragePooling* layer does is average all the values according to the last axis. This means that the resulting shape will be (n_samples, last_axis). For instance, if your last convolutional layer had 64 filters, it would turn (16, 7, 7, 64) into (16, 64)

In [None]:
inputs = keras.Input(shape=(img_size, img_size, 3))
x = base_model(inputs, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
outputs = keras.layers.Dense(class_count, activation='softmax')(x)
model = keras.Model(inputs, outputs)

## Compile the model and check number of parameters
* Why do we have only 20,490 trainable parameters?

In [None]:
model.compile(optimizer=keras.optimizers.Adam(),
              loss=keras.losses.CategoricalCrossentropy(),
              metrics=[keras.metrics.CategoricalAccuracy()])

model.summary()

## Fit the model

In [None]:
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath='weights.best.hdf5',
    save_weights_only=True,
    monitor='val_loss',
    mode='auto',
    save_best_only=True)

## 💡 Always check if all the shapes match the pre-defined ranges! 
* Otherwise you will run into shape missmatch issue in the training loop and it is harder to debug than the C++ templates 😅

In [None]:
train_x.shape

In [None]:
train_y.shape

In [None]:
valid_x.shape

In [None]:
valid_y.shape

In [None]:
test_x.shape

In [None]:
test_y.shape

In [None]:
history = model.fit(train_x, train_y, validation_data=(valid_x, valid_y), batch_size=32, epochs=3, callbacks=[model_checkpoint_callback])

show_history(history)

# Load best setup
model.load_weights("weights.best.hdf5")
test_loss, test_acc = model.evaluate(test_x, test_y)
print('Test accuracy: ', test_acc)

# Fine-tuning
* Once your model has converged on the new data, you can try to unfreeze all or part of the base model and retrain the whole model end-to-end with a very low learning rate.
* It could also potentially lead to quick overfitting -- keep that in mind.
* It is critical to only do this step **after the model with frozen layers has been trained to convergence**. 
    * If you mix randomly-initialized trainable layers with trainable layers that hold pre-trained features, the randomly-initialized layers will cause very large gradient updates during training, which will **destroy your pre-trained features**.
    
### It's also critical to use a very low learning rate at this stage, because you are training a much larger model than in the first round of training, on a dataset that is typically very small. 
* As a result, you are at **risk of overfitting** very quickly if you apply large weight updates.

## Unfreeze the base model

In [None]:
base_model.trainable = True

## Recompile your model after you make any changes
* So the `trainable` attribute of any inner layer is taken into account

### Calling compile() on a model is meant to "freeze" the behavior of that model. 
* This implies that the trainable attribute values at the time the model is compiled should be preserved throughout the lifetime of that model, until compile is called again. 
* Hence, if you change any trainable value, make sure to call compile() again on your model for your changes to be taken into account.

In [None]:
model.compile(optimizer=keras.optimizers.Adam(1e-5),
              loss=keras.losses.CategoricalCrossentropy(),
              metrics=[keras.metrics.CategoricalAccuracy()])

model.summary()

## Tasks for the lecture (2p)

1) XYZ - **(1p)**