# Activity 2.2 - Transfer Learning

#### Objective(s):

This activity aims to introduce how to apply transfer learning

#### Intended Learning Outcomes (ILOs):
* Demonstrate how to build and train neural network
* Demonstrate how to apply transfer learning in neural network


#### Resources:
* Jupyter Notebook
* CIFAR-10 dataset

#### Procedures
Load the necessary libraries

In [None]:
from __future__ import print_function

import datetime
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

Set the parameters

In [None]:
now = datetime.datetime.now
batch_size = 128
num_classes = 5
epochs = 5
img_rows, img_cols = 28, 28
filters = 32
pool_size = 2
kernel_size = 3

Set how the input data is loaded

In [None]:

if K.image_data_format() == 'channels_first':
    input_shape = (1, img_rows, img_cols)
else:
    input_shape = (img_rows, img_cols, 1)

* Write a function to include all the training steps.
* Use the model, training set, test set and number of classes as function parameters


In [None]:
def train_model(model, train, test, num_classes):
    x_train = train[0].reshape((train[0].shape[0],) + input_shape)
    x_test = test[0].reshape((test[0].shape[0],) + input_shape)
    x_train = x_train.astype('float32')
    x_test = x_test.astype('float32')
    x_train /= 255
    x_test /= 255
    print('x_train shape:', x_train.shape)
    print(x_train.shape[0], 'train samples')
    print(x_test.shape[0], 'test samples')

    # convert class vectors to binary class matrices
    y_train = keras.utils.to_categorical(train[1], num_classes)
    y_test = keras.utils.to_categorical(test[1], num_classes)

    model.compile(loss='categorical_crossentropy',
                  optimizer='adadelta',
                  metrics=['accuracy'])

    t = now()
    model.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=epochs,
              verbose=1,
              validation_data=(x_test, y_test))
    print('Training time: %s' % (now() - t))

    score = model.evaluate(x_test, y_test, verbose=0)
    print('Test score:', score[0])
    print('Test accuracy:', score[1])

Shuffle and split the data between train and test sets

In [None]:

(x_train, y_train), (x_test, y_test) = mnist.load_data()



Create two datasets
* one with digits below 5
* one with 5 and above

In [None]:
x_train_lt5 = x_train[y_train < 5]
y_train_lt5 = y_train[y_train < 5]
x_test_lt5 = x_test[y_test < 5]
y_test_lt5 = y_test[y_test < 5]

x_train_gte5 = x_train[y_train >= 5]
y_train_gte5 = y_train[y_train >= 5] - 5
x_test_gte5 = x_test[y_test >= 5]
y_test_gte5 = y_test[y_test >= 5] - 5

* Define the feature layers that will used for transfer learning
* Freeze these layers during fine-tuning process

In [None]:


feature_layers = [
    Conv2D(filters, kernel_size,
           padding='valid',
           input_shape=input_shape),
    Activation('relu'),
    Conv2D(filters, kernel_size),
    Activation('relu'),
    MaxPooling2D(pool_size=pool_size),
    Dropout(0.25),
    Flatten(),
]

Define the classification layers

In [None]:


classification_layers = [
    Dense(128),
    Activation('relu'),
    Dropout(0.5),
    Dense(num_classes),
    Activation('softmax')
]

Create a model by combining the feature layers and classification layers

In [None]:

model = Sequential(feature_layers + classification_layers)

Check the model summary

In [None]:

model.summary()

Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_6 (Conv2D)           (None, 26, 26, 32)        320       
                                                                 
 activation_12 (Activation)  (None, 26, 26, 32)        0         
                                                                 
 conv2d_7 (Conv2D)           (None, 24, 24, 32)        9248      
                                                                 
 activation_13 (Activation)  (None, 24, 24, 32)        0         
                                                                 
 max_pooling2d_3 (MaxPoolin  (None, 12, 12, 32)        0         
 g2D)                                                            
                                                                 
 dropout_6 (Dropout)         (None, 12, 12, 32)        0         
                                                      

 Train the  model on the digits 5,6,7,8,9

In [None]:
train_model(model,
            (x_train_gte5, y_train_gte5),
            (x_test_gte5, y_test_gte5), num_classes)

x_train shape: (29404, 28, 28, 1)
29404 train samples
4861 test samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training time: 0:04:23.099686
Test score: 1.4176816940307617
Test accuracy: 0.7331824898719788


Freeze only the feature layers

In [None]:

for l in feature_layers:
    l.trainable = False

Check again the summary and observe the parameters from the previous model

In [None]:
model.summary()

Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_6 (Conv2D)           (None, 26, 26, 32)        320       
                                                                 
 activation_12 (Activation)  (None, 26, 26, 32)        0         
                                                                 
 conv2d_7 (Conv2D)           (None, 24, 24, 32)        9248      
                                                                 
 activation_13 (Activation)  (None, 24, 24, 32)        0         
                                                                 
 max_pooling2d_3 (MaxPoolin  (None, 12, 12, 32)        0         
 g2D)                                                            
                                                                 
 dropout_6 (Dropout)         (None, 12, 12, 32)        0         
                                                      

Train again the model using the 0 to 4 digits

In [None]:
train_model(model,
            (x_train_lt5, y_train_lt5),
            (x_test_lt5, y_test_lt5), num_classes)

x_train shape: (30596, 28, 28, 1)
30596 train samples
5139 test samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training time: 0:01:19.823630
Test score: 1.306450366973877
Test accuracy: 0.7851722240447998


The training and evaluation of this neural network model is divided in two phases: first, with only a subset of layers (5-9) trained while freezing earlier layers, and second, with the entire model trained. The second phase, where the entire model was trained, showed better performance in terms of accuracy on both training and test datasets compared to the first phase. Additionally, the training time for the second phase was shorter. This means that training the entire model without freezing any layers leads to better performance, likely due to the increased learning capacity from training all layers.

#### Supplementary Activity
Now write code to reverse this training process. That is, you will train on the digits 0-4, and then finetune only the last layers on the digits 5-9.

In [None]:
model_sup = Sequential(feature_layers + classification_layers)

We first train the model on digits 0-4 (x_train_lt5, y_train_lt5).

In [None]:
train_model(model_sup,
            (x_train_lt5, y_train_lt5),
            (x_test_lt5, y_test_lt5), num_classes)

x_train shape: (30596, 28, 28, 1)
30596 train samples
5139 test samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training time: 0:01:22.918334
Test score: 0.8666096925735474
Test accuracy: 0.8865538239479065


In [None]:
for l in feature_layers:
    l.trainable = False

Setting trainable to False for the layers in feature_layers effectively freezes these layers during training. This helps keep the features they've learned intact.

In [None]:
model_sup.summary()

Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_6 (Conv2D)           (None, 26, 26, 32)        320       
                                                                 
 activation_12 (Activation)  (None, 26, 26, 32)        0         
                                                                 
 conv2d_7 (Conv2D)           (None, 24, 24, 32)        9248      
                                                                 
 activation_13 (Activation)  (None, 24, 24, 32)        0         
                                                                 
 max_pooling2d_3 (MaxPoolin  (None, 12, 12, 32)        0         
 g2D)                                                            
                                                                 
 dropout_6 (Dropout)         (None, 12, 12, 32)        0         
                                                      

We are now training the model (model_sup) on the subset of the MNIST dataset containing images of digits 5-9.

In [None]:
train_model(model_sup,
            (x_train_gte5, y_train_gte5),
            (x_test_gte5, y_test_gte5), num_classes)

x_train shape: (29404, 28, 28, 1)
29404 train samples
4861 test samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training time: 0:01:18.572260
Test score: 1.0288006067276
Test accuracy: 0.7494342923164368


For the subset 0-4:
* It achieved higher accuracy on both the training and test datasets.
* It also has longer training time compared to the subset 5-9
* Test accuracy reached 0.8865


As for the subset 5-9:
* It has lower accuracy compared to subset 0-4 on both training and test datasets.
* It has shorter training time
* The test accuracy reached 0.7494

## Conclusion


In this activity, we used Keras to create Convolutional Neural Networks or CNNs models, aimed at classifying handwritten digits from the MNIST dataset. We divided the dataset into two groups, one for numbers less than 5 and another for numbers 5 and above. For each group, we trained a separate model to learn from the data. During training, we also froze some layers in the models, which means we stopped them from learning further. This was done to ensure that the early layers, responsible for recognizing basic features, retained their knowledge. After training, we evaluated the models' accuracy on new, unseen data.