# Transfer Learning

- In this notebook, we will work with the CIFAR-10 dataset.
- This is a well-known dataset for image classification, which includes 60000 32x32 color images in 10 classes, with 6000 images per class.
- There are 50000 training images and 10000 test images.
- The ten classes are: [airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck].
- To illustrate the power and concept of transfer learning, we will first train a CNN model on the following classes only (airplane, automobile, bird, cat, deer).
- Then we will train only the last layer(s) of the network on the classes (dog, frog, horse, ship, truck) and see how well the features learned on (airplane, automobile, bird, cat, deer) help with classifying (dog, frog, horse, ship, truck).

In [1]:
import datetime
import numpy as np
import pandas as pd
import keras
import tensorflow as tf
from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.utils import to_categorical
from time import time

In [2]:
# Load CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
y_train = np.squeeze(y_train)
y_test = np.squeeze(y_test)

# Create two datasets: one with classes below 5 and one with 5 and above
x_train_lt5 = x_train[y_train < 5]
y_train_lt5 = y_train[y_train < 5]
x_test_lt5 = x_test[y_test < 5]
y_test_lt5 = y_test[y_test < 5]

# Re-index the labels for classes [5, 6, 7, 8, 9] to [0, 1, 2, 3, 4]
x_train_gte5 = x_train[y_train >= 5]
y_train_gte5 = y_train[y_train >= 5] - 5
x_test_gte5 = x_test[y_test >= 5]
y_test_gte5 = y_test[y_test >= 5] - 5

# Define the input shape based on the backend
img_rows, img_cols = 32, 32
if tf.keras.backend.image_data_format() == 'channels_first':
    input_shape = (3, img_rows, img_cols)
else:
    input_shape = (img_rows, img_cols, 3)

print(f"x_train: {x_train_lt5.shape}")
print(f"y_train: {y_train_lt5.shape}")

x_train: (25000, 32, 32, 3)
y_train: (25000,)


## Part-1: Train a CNN model on (airplane, automobile, bird, cat, deer])

In [3]:
# Define a function to train CNN models
def train_model(
    model, x_train, y_train, x_test, y_test, 
    num_classes, batch_size=64, epochs=10, 
    learning_rate=1e-3, verbose=1,
):

    # Convert class vectors to binary class matrices (one-hot encoding)
    y_train = to_categorical(y_train, num_classes)
    y_test = to_categorical(y_test, num_classes)

    # Compile the model
    optimizer = RMSprop(learning_rate=learning_rate)
    model.compile(
        optimizer=optimizer,
        loss='categorical_crossentropy',
        metrics=['accuracy'],
    )

    # Train the model
    start_time = time()
    history = model.fit(
        x_train, y_train,
        batch_size=batch_size,
        epochs=epochs,
        verbose=verbose,
        validation_data=(x_test, y_test),
    )
    training_time = time() - start_time
    print(f'Training time (min): {(training_time/60):.2f}')

    # Evaluate the model
    score = model.evaluate(x_test, y_test, verbose=0)
    print(f"Test loss: {score[0]:.4f}, Test accuracy: {score[1]:.4f}")

    return history, score[0], score[1]

In [4]:
# Train the model on the dataset with labels for (airplane, automobile, bird, cat, deer)
selected_classes = [0, 1, 2, 3, 4]
num_classes = len(selected_classes)

# Define a CNN model
model_1 = Sequential()
model_1.add(Conv2D(32, (3, 3), activation='relu', input_shape=input_shape))
model_1.add(Conv2D(64, (3, 3), activation='relu'))
model_1.add(MaxPooling2D(pool_size=(2, 2), strides=2))
model_1.add(Dropout(0.2))
model_1.add(Flatten())
model_1.add(Dense(128, activation='relu'))
model_1.add(Dropout(0.2))
model_1.add(Dense(64, activation='relu'))
model_1.add(Dropout(0.2))
model_1.add(Dense(num_classes, activation='softmax'))

model_1.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 30, 30, 32)        896       
                                                                 
 conv2d_1 (Conv2D)           (None, 28, 28, 64)        18496     
                                                                 
 max_pooling2d (MaxPooling2  (None, 14, 14, 64)        0         
 D)                                                              
                                                                 
 dropout (Dropout)           (None, 14, 14, 64)        0         
                                                                 
 flatten (Flatten)           (None, 12544)             0         
                                                                 
 dense (Dense)               (None, 128)               1605760   
                                                        

In [5]:
# Train and evaluate the model
history, loss, accuracy = train_model(
    model_1,
    x_train_lt5, y_train_lt5,
    x_test_lt5, y_test_lt5,
    num_classes=num_classes,
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Training time (min): 8.37
Test loss: 0.7621, Test accuracy: 0.7662


In [6]:
# Number of trainable parameters after the model_2
trainable_params_model_1 = np.sum([np.prod(var.shape) for var in model_1.trainable_weights])
print(f"Total trainable parameters after fine-tuning: {trainable_params_model_1}")

Total trainable parameters after fine-tuning: 1633733


## PART-2: Transfer Learning
Keras allows layers to be "frozen" during the training process. That is, some layers would have their weights updated during the training process, while others would not. This is a core part of transfer learning, the ability to train just the last one or several layers.

Let us now fine-tune our model in two different ways and compare the overall results:

- Freeze all layers except the output layer, and train your model on the classes (dog, frog, horse, ship, truck).
- Freeze all layers except the fully connected layer and the output layer, and train your model on the classes (dog, frog, horse, ship, truck). 

In [7]:
# Define a function for fine-tuning
def fine_tune_model(
        model, trainable_layers, x_train, y_train, x_test, y_test, 
        num_classes, batch_size=64, epochs=10, learning_rate=1e-3,
):
    
    # Convert class vectors to binary class matrices (one-hot encoding)
    y_train = to_categorical(y_train, num_classes)
    y_test = to_categorical(y_test, num_classes)
    
    # Freeze all layers except the specified layers
    for layer in model.layers:
        layer.trainable = False

    # Unfreeze the specified layers
    for i in trainable_layers:
        model.layers[i].trainable = True

    # Compile the pre-trained model: model_1
    optimizer = RMSprop(learning_rate=learning_rate)
    model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

    # Train the model
    start_time = time()
    history = model.fit(
        x_train, y_train,
        batch_size=batch_size,
        epochs=epochs,
        verbose=1,
        validation_data=(x_test, y_test),
    )

    training_time = time() - start_time
    print(f'Training time (min): {(training_time/60):.2f}')

    # Evaluate the model
    score = model.evaluate(x_test, y_test, verbose=0)
    print(f"Test loss: {score[0]:.4f}, Test accuracy: {score[1]:.4f}")

    return history, score

### Freeze all layers except the output layer (model_2)

In [8]:
# Copy weights from the pre-trained model: model_1
model_2 = tf.keras.models.clone_model(model_1)
model_2.set_weights(model_1.get_weights())

# Freeze all layers except the output layer
trainable_layers_model_2 = [len(model_2.layers) - 1]  # Only output layer is trainable

In [9]:
# Fine-tune model_2 on the dataset with labels for (dog, frog, horse, ship, truck)
selected_classes = [5, 6, 7, 8, 9]
num_classes = len(selected_classes)

history_exp1, score_exp1 = fine_tune_model(
    model_2,
    trainable_layers=trainable_layers_model_2,
    x_train=x_train_gte5,
    y_train=y_train_gte5,
    x_test=x_test_gte5,
    y_test=y_test_gte5,
    num_classes=num_classes,
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Training time (min): 4.23
Test loss: 1.0001, Test accuracy: 0.6060


In [10]:
# Check the model summary after fine-tuning
print("After fine-tuning:")
model_2.summary()

After fine-tuning:
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 30, 30, 32)        896       
                                                                 
 conv2d_1 (Conv2D)           (None, 28, 28, 64)        18496     
                                                                 
 max_pooling2d (MaxPooling2  (None, 14, 14, 64)        0         
 D)                                                              
                                                                 
 dropout (Dropout)           (None, 14, 14, 64)        0         
                                                                 
 flatten (Flatten)           (None, 12544)             0         
                                                                 
 dense (Dense)               (None, 128)               1605760   
                                     

In [11]:
# Number of trainable parameters after the model_2
trainable_params_model_2 = np.sum([np.prod(var.shape) for var in model_2.trainable_weights])
print(f"Total trainable parameters after fine-tuning: {trainable_params_model_2}")

Total trainable parameters after fine-tuning: 325


### Freeze all layers except the fully connected layers and the output layer (model_3)

In [12]:
# # Copy weights from the pre-trained model: model_1
model_3 = tf.keras.models.clone_model(model_1)
model_3.set_weights(model_1.get_weights())  

# Freeze all layers except the fully connected and output layers
trainable_layers_model_3 = [-3, -1]  # Last fully connected layer and output layer

In [13]:
# Fine-tune model_3 on the dataset with labels for (dog, frog, horse, ship, truck)
selected_classes = [5, 6, 7, 8, 9]
num_classes = len(selected_classes)

history_exp2, score_exp2 = fine_tune_model(
    model_3,
    trainable_layers=trainable_layers_model_3,
    x_train=x_train_gte5,
    y_train=y_train_gte5,
    x_test=x_test_gte5,
    y_test=y_test_gte5,
    num_classes=num_classes,
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Training time (min): 4.32
Test loss: 0.8294, Test accuracy: 0.6904


In [14]:
# Check the model summary after fine-tuning
print("After fine-tuning:")
model_3.summary()

After fine-tuning:
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 30, 30, 32)        896       
                                                                 
 conv2d_1 (Conv2D)           (None, 28, 28, 64)        18496     
                                                                 
 max_pooling2d (MaxPooling2  (None, 14, 14, 64)        0         
 D)                                                              
                                                                 
 dropout (Dropout)           (None, 14, 14, 64)        0         
                                                                 
 flatten (Flatten)           (None, 12544)             0         
                                                                 
 dense (Dense)               (None, 128)               1605760   
                                     

In [15]:
# Number of trainable parameters after fine-tuning the model_3
trainable_params_model_3 = np.sum([np.prod(var.shape) for var in model_3.trainable_weights])
print(f"Total trainable parameters after fine-tuning: {trainable_params_model_3}")

Total trainable parameters after fine-tuning: 8581


### Compare the results of the baseline and fine-tuned models: model_1, model_2, model_3

In [17]:
# Store train and test results into a summary df
summary = pd.DataFrame({
    "Model Name": [
        "model_1",
        "model_2",
        "model_3"
    ],
    "Number of Trainable Parameters": [
        trainable_params_model_1,
        trainable_params_model_2, 
        trainable_params_model_3  
    ],
    "Training Time (min)": [
        8.37,
        4.23,
        4.32,
    ],
    "Train Accuracy": [
        history.history['accuracy'][-1],     
        history_exp1.history['accuracy'][-1],
        history_exp2.history['accuracy'][-1] 
    ],
    "Test Accuracy": [
        accuracy,     
        score_exp1[1],
        score_exp2[1] 
    ],
})

summary

Unnamed: 0,Model Name,Number of Trainable Parameters,Training Time (min),Train Accuracy,Test Accuracy
0,model_1,1633733,8.37,0.8472,0.7662
1,model_2,325,4.23,0.57032,0.606
2,model_3,8581,4.32,0.66004,0.6904


**Discussion:**

1. How many trainable parameters are there in each case?
    - model_1 (baseline): 1,633,733
    - model_2 (fine_tuned): 325
    - model_3 (fine-tuned): 8581

2. Which fine-tuning performs better in terms of classification accuracy and why?
\
model_3 exhibited a greater test accuracy of 69% than model_2 with 60.6%. Both models utilize the pre-trained weights from the baseline model, model_1. However, model_3 is exposed to more extensive training since it has more trainable parameters (weights). This helped model_3 learn better from the training data.


3. Why is fine-tuning much faster than the initial training of the network?
\
The fine-tuned models have fewer trainable parameters (weights) than the baseline model. The classification models aim to update these weights during training to learn patterns from the training data. Thus, the training time for the less-weight models will be less.

# END