<a href="https://colab.research.google.com/github/nyp-sit/it3103/blob/main/week4/3.fine-tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Fine-tuning

Another widely used transfer learning technique is _fine-tuning_. 
Fine-tuning involves unfreezing a few of the top layers 
of a frozen model base used for feature extraction, and jointly training both the newly added part of the model (in our case, the 
fully-connected classifier) and these unfrozen top layers. This is called "fine-tuning" because it slightly adjusts the more abstract 
representations of the model being reused, in order to make them more relevant for the problem at hand.

![fine-tuning VGG16](https://nyp-aicourse.s3.ap-southeast-1.amazonaws.com/it3103/resources/vgg16_fine_tuning.png)

In [None]:
import os
import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.layers as layers

It was necessary to freeze the convolution base of VGG16 in order to be able to train a randomly initialized 
classifier on top. For the same reason, it is only possible to fine-tune the top layers of the convolutional base once the classifier on 
top has already been trained. If the classified wasn't already trained, then the error signal propagating through the network during 
training would be too large, and the representations previously learned by the layers being fine-tuned would be destroyed. Thus the steps 
for fine-tuning a network are as follow:

1. Add your custom network on top of an already trained base network.
2. Freeze the base network.
3. Train the part you added.
4. Unfreeze some layers in the base network.
5. Jointly train both these layers and the part you added.


In [None]:
img_height, img_width = 128, 128

# Load the pre-trained model 
base_model = keras.applications.VGG16(input_shape=(img_height, img_width) + (3,),
                                         include_top=False,
                                         weights='imagenet')

preprocess_input_fn = keras.applications.vgg16.preprocess_input

# freeze the base layer 
base_model.trainable = False

# Add input layer 
inputs = layers.Input(shape=(img_height, img_width, 3))
# Add preprocessing layer
x = preprocess_input_fn(inputs)
# Add the base, set training to false to freeze the convolutional base
x = base_model(x)
# Add our classification head
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dropout(rate=0.5)(x)
x = layers.Dense(units=512, activation="relu")(x)
x = layers.Dropout(rate=0.5)(x)
outputs = layers.Dense(units=1, activation="sigmoid")(x)

model = keras.models.Model(inputs=[inputs], outputs=[outputs])

base_learning_rate = 0.001

model.compile(loss="binary_crossentropy", 
                  optimizer=keras.optimizers.Adam(learning_rate=base_learning_rate), 
                  metrics=["accuracy"])


Let's confirm all the layers of convolutional base are frozen. 

In [None]:
for layer in base_model.layers:
    print(layer.name, layer.trainable)

Let's print out the model summary and see how many trainable weights. We can see that we only 263,169 trainable weights (parameters), coming from the classification head that put on top of the convolutional base. (For comparison, a VGG16 has total of 14,714,688 weights).

In [None]:
model.summary()

## Creating Datasets

We will setup our training and validation dataset as we did in earlier exercise.

In [None]:
dataset_URL = 'https://nyp-aicourse.s3-ap-southeast-1.amazonaws.com/datasets/cats_and_dogs_subset.tar.gz'
path_to_zip = tf.keras.utils.get_file('cats_and_dogs_subset.tar.gz', origin=dataset_URL, extract=True, cache_dir='.')
dataset_dir = os.path.join(os.path.dirname(path_to_zip), "cats_and_dogs_subset")

In [None]:
batch_size = 32
image_size = (img_height, img_width)

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    dataset_dir,
    validation_split=0.2,
    subset="training",
    seed=1337,
    image_size=image_size,
    batch_size=batch_size,
    label_mode='binary'
)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    dataset_dir,
    validation_split=0.2,
    subset="validation",
    seed=1337,
    image_size=image_size,
    batch_size=batch_size,
    label_mode='binary'
)

## Train the classification head 

We will go ahead and train our classification head.

In [None]:
# create model checkpoint callback to save the best model checkpoint
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath="best_checkpoint",
    save_weights_only=True,
    monitor='val_accuracy',
    mode='max',
    save_best_only=True)

model.fit(train_ds, validation_data=val_ds, 
          epochs=30, callbacks=[model_checkpoint_callback])

Now we have our classification layers trained, let's start to unfreeze some top layers of the convolutional base to fine tune the weights. 
We will fine-tune the last 3 convolutional layers, which means that all layers up until `block4_pool` should be frozen, and the layers 
`block5_conv1`, `block5_conv2` and `block5_conv3` should be trainable.

Why not fine-tune more layers? Why not fine-tune the entire convolutional base? We could. However, we need to consider that:

* Earlier layers in the convolutional base encode more generic, reusable features, while layers higher up encode more specialized features. It is 
more useful to fine-tune the more specialized features, as these are the ones that need to be repurposed on our new problem. There would 
be fast-decreasing returns in fine-tuning lower layers.
* The more parameters we are training, the more we are at risk of overfitting. The convolutional base has 15M parameters, so it would be 
risky to attempt to train it on our small dataset.

Thus, in our situation, it is a good strategy to only fine-tune the top 2 to 3 layers in the convolutional base.

Let's set this up, we will unfreeze our `base_model`, 
and then freeze individual layers inside of it, except the last 3 layers. 

Do a model ``summary()`` and you will see now that the number of trainable weights are now 7,079,424 (around 7 millions), much less than previously, because all the layers are frozen except the last 3 layers.

In [None]:
base_model.trainable = True
for layer in base_model.layers[:-4]:
    layer.trainable = False

Let us examine model summary again. We can see now that we have more trainable weights 7,342,593 compared to previously 263,169.

In [None]:
model.summary()

As you are training a much larger model and want to readapt the pretrained weights, it is important to use a lower learning rate at this stage as we do not want to make too drastic changes to the weights in the convolutional layers under fine-tuning.

In [None]:
finetune_learning_rate = base_learning_rate / 10.

model.compile(loss="binary_crossentropy",
              optimizer=keras.optimizers.Adam(learning_rate=finetune_learning_rate),
              metrics=["accuracy"])

model.fit(
    train_ds,
    epochs=15,
    validation_data=val_ds,
    callbacks=[model_checkpoint_callback])

In [None]:
model.load_weights('best_checkpoint')
model.evaluate(val_ds)

**Exercise:**

1. Is our fine-tuned model performing better or worse? 
2. Try to unfreeze less/more layers and see if the model performance improves.
