#### **Castillo, Maria Antonette O.**
#### **CPE32S8**

# Activity 2.2 - Transfer Learning 

#### Objective(s):

This activity aims to introduce how to apply transfer learning 

#### Intended Learning Outcomes (ILOs):
* Demonstrate how to build and train neural network 
* Demonstrate how to apply transfer learning in neural network


#### Resources:
* Jupyter Notebook
* CIFAR-10 dataset

#### Procedures
Load the necessary libraries

In [1]:
from __future__ import print_function

import datetime
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K




Set the parameters

In [2]:
now = datetime.datetime.now
batch_size = 128
num_classes = 5
epochs = 5
img_rows, img_cols = 28, 28
filters = 32
pool_size = 2
kernel_size = 3

Set how the input data is loaded

In [3]:
if K.image_data_format() == 'channels_first':
    input_shape = (1, img_rows, img_cols)
else:
    input_shape = (img_rows, img_cols, 1)

* Write a function to include all the training steps. 
* Use the model, training set, test set and number of classes as function parameters


In [4]:
def train_model(model, train, test, num_classes):
    x_train = train[0].reshape((train[0].shape[0],) + input_shape)
    x_test = test[0].reshape((test[0].shape[0],) + input_shape)
    x_train = x_train.astype('float32')
    x_test = x_test.astype('float32')
    x_train /= 255
    x_test /= 255
    print('x_train shape:', x_train.shape)
    print(x_train.shape[0], 'train samples')
    print(x_test.shape[0], 'test samples')

    # convert class vectors to binary class matrices
    y_train = keras.utils.to_categorical(train[1], num_classes)
    y_test = keras.utils.to_categorical(test[1], num_classes)

    model.compile(loss='categorical_crossentropy',
                  optimizer='adadelta',
                  metrics=['accuracy'])

    t = now()
    model.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=epochs,
              verbose=1,
              validation_data=(x_test, y_test))
    print('Training time: %s' % (now() - t))

    score = model.evaluate(x_test, y_test, verbose=0)
    print('Test score:', score[0])
    print('Test accuracy:', score[1])

**The train model function includes reshaping of data to fit the input shape needs by the model, and convert the data type to float. For y_train and y_test, they are converted into one-hot encoded vectors. In addition, the model is compiled with the categorical cross-entropy loss function, adadelta optimizer, and accuracy metric. Then, the data is trained using the parameters batch size and number of epochs.**

Shuffle and split the data between train and test sets

In [5]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

Create two datasets 
* one with digits below 5
* one with 5 and above

In [6]:
x_train_lt5 = x_train[y_train < 5]
y_train_lt5 = y_train[y_train < 5]
x_test_lt5 = x_test[y_test < 5]
y_test_lt5 = y_test[y_test < 5]

x_train_gte5 = x_train[y_train >= 5]
y_train_gte5 = y_train[y_train >= 5] - 5
x_test_gte5 = x_test[y_test >= 5]
y_test_gte5 = y_test[y_test >= 5] - 5

* Define the feature layers that will used for transfer learning
* Freeze these layers during fine-tuning process

In [7]:
feature_layers = [
    Conv2D(filters, kernel_size,
           padding='valid',
           input_shape=input_shape),
    Activation('relu'),
    Conv2D(filters, kernel_size),
    Activation('relu'),
    MaxPooling2D(pool_size=pool_size),
    Dropout(0.25),
    Flatten(),
]





Define the classification layers

In [8]:
classification_layers = [
    Dense(128),
    Activation('relu'),
    Dropout(0.5),
    Dense(num_classes),
    Activation('softmax')
]

Create a model by combining the feature layers and classification layers

In [9]:

model = Sequential(feature_layers + classification_layers)

Check the model summary

In [10]:

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 activation (Activation)     (None, 26, 26, 32)        0         
                                                                 
 conv2d_1 (Conv2D)           (None, 24, 24, 32)        9248      
                                                                 
 activation_1 (Activation)   (None, 24, 24, 32)        0         
                                                                 
 max_pooling2d (MaxPooling2  (None, 12, 12, 32)        0         
 D)                                                              
                                                                 
 dropout (Dropout)           (None, 12, 12, 32)        0         
                                                        

 Train the  model on the digits 5,6,7,8,9

In [11]:
train_model(model,
            (x_train_gte5, y_train_gte5),
            (x_test_gte5, y_test_gte5), num_classes)

x_train shape: (29404, 28, 28, 1)
29404 train samples
4861 test samples

Epoch 1/5


Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training time: 0:00:57.564397
Test score: 1.4633499383926392
Test accuracy: 0.6749640107154846


Freeze only the feature layers

In [12]:

for l in feature_layers:
    l.trainable = False

Check again the summary and observe the parameters from the previous model

In [13]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 activation (Activation)     (None, 26, 26, 32)        0         
                                                                 
 conv2d_1 (Conv2D)           (None, 24, 24, 32)        9248      
                                                                 
 activation_1 (Activation)   (None, 24, 24, 32)        0         
                                                                 
 max_pooling2d (MaxPooling2  (None, 12, 12, 32)        0         
 D)                                                              
                                                                 
 dropout (Dropout)           (None, 12, 12, 32)        0         
                                                        

Train again the model using the 0 to 4 digits

In [14]:
train_model(model,
            (x_train_lt5, y_train_lt5),
            (x_test_lt5, y_test_lt5), num_classes)

x_train shape: (30596, 28, 28, 1)
30596 train samples
5139 test samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training time: 0:00:30.458924
Test score: 1.3706990480422974
Test accuracy: 0.8133878111839294


### **When digits more than or equal to 5 were trained, it resulted to an accuracy of approximately 67% and a loss of 1.46. Then, when the model's knowledge was transferred to learn another set of data which is 5 and below, the accuracy got higher at 81% and a loss of 1.37. This could mean that the initial training on previous digits provided a good foundation for learning the new dataset containing other digits.**

#### Supplementary Activity
Now write code to reverse this training process. That is, you will train on the digits 0-4, and then finetune only the last layers on the digits 5-9.

In [15]:
model2 = Sequential(feature_layers + classification_layers)

In [16]:
model2.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 activation (Activation)     (None, 26, 26, 32)        0         
                                                                 
 conv2d_1 (Conv2D)           (None, 24, 24, 32)        9248      
                                                                 
 activation_1 (Activation)   (None, 24, 24, 32)        0         
                                                                 
 max_pooling2d (MaxPooling2  (None, 12, 12, 32)        0         
 D)                                                              
                                                                 
 dropout (Dropout)           (None, 12, 12, 32)        0         
                                                      

### **The model's layers starts with two convolutional layers, each with 32 filters. These layers are followed by relu activation functions. Moreover, a max pooling layer is added to reduce their spatial dimensions to 12x12. Dropout layers are then used to randomly drop the neurons to prevent overfitting. The output of the max pooling layer is flattened into a one-dimensional vector. Next is 128 neurons in the first dense layer and number of classes in the output layer. Softmax is used for multiple classificattion. Overall, the model has a total of 600,165 parameters.**

In [17]:
x_train_lt5 = x_train[y_train < 5]
y_train_lt5 = y_train[y_train < 5]
x_test_lt5 = x_test[y_test < 5]
y_test_lt5 = y_test[y_test < 5]

x_train_gte5 = x_train[y_train >= 5]
y_train_gte5 = y_train[y_train >= 5] - 5  
x_test_gte5 = x_test[y_test >= 5]
y_test_gte5 = y_test[y_test >= 5] - 5  

In [18]:
train_model(model2,
            (x_train_lt5, y_train_lt5),
            (x_test_lt5, y_test_lt5), num_classes)

x_train shape: (30596, 28, 28, 1)
30596 train samples
5139 test samples
Epoch 1/5


Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training time: 0:00:28.426810
Test score: 1.1683063507080078
Test accuracy: 0.9087371230125427


In [19]:
for l in feature_layers:
    l.trainable = False

**By setting the trainable to false, the neural network will not update the parameters of these specific layers during training. When we freezed the certain layers, it allows other parts of the model to be updated or fine-tuned on the new part of dataset.**

In [20]:
train_model(model2,
            (x_train_gte5, y_train_gte5),
            (x_test_gte5, y_test_gte5), num_classes)

x_train shape: (29404, 28, 28, 1)
29404 train samples
4861 test samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training time: 0:00:26.489872
Test score: 1.2551113367080688
Test accuracy: 0.7599259614944458


### **When the training process was reversed, the accuracy also got reversed. The initial training got the higher accuracy which is approximately 91% and a loss of 1.17. Meanwhile, the last layers that were finetuned on the digits 5-9 got a lower accuracy of approximately 76% and a loss of 1.26. This decrease in accuracy indicates that the model's performance on the digits 5-9 dataset was not that good after the fine-tuning process.**

-----------

### **Conclusion:**

### **This activity helped me to explore the transfer learning on a neural network. This task showed that when the model was initially trained on digits 5 and above, it reached about 67% accuracy. Then, when it was transferred on digits below 5, the model improved at 81% accuracy. However, when the training was reversed, the results were not that good after fine-tuning on digits 5-9. Its accuracy dropped to around 76%. This means that the the order of training might have an effect on the model's performance in transfer learning.**