<a href="https://colab.research.google.comgithub/nyp-sit/sdaai-iti107/blob/main/session-3/improved_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" align="left"/></a>

# Improved model using Transfer Learning

Welcome to this week's programming exercise. In this exercise, we use transfer learning to improve our baseline model. We make use of a model (VGG19) that is already trained on ImageNet and use the convolutional neural network as a feature extractor and train a classifier specifically for our emotion classification task.

At the end of this exercise, you will be able to: 
- understand how to load a pretrained model with and without the classification layer  
- extract training features using the pre-trained model as feature extractor
- train a classifier using the extracted features 


Transfer learning involved using the knowledge learnt in another network (that is trained on large dataset) for some other similar task and transfer that to a new task. There are two ways to leverage a pre-trained network: feature extraction and fine-tuning. Let's start with feature extraction approach

## Feature extraction

In this approach, we only take the convolutional base of pretrained models and use it to extract features from the images, and use the extracted features as input features to train a classifier. 
<img src="https://nyp-aicourse.s3.ap-southeast-1.amazonaws.com/iti107/resources/swapping_fc_classifier.png" width="500" />

In [1]:
import os
import tensorflow as tf
import tensorflow.keras as keras 
import numpy as np
import tensorflow.keras.layers as layers

from sklearn.metrics import classification_report

### Using pre-trained Model as Feature Extractor

We will be using VGG19 as our pretrained model (you can choose any other pretrained model, such as ResNet, etc). Keras comes with a set of [pretrained models](https://www.tensorflow.org/api_docs/python/tf/keras/applications) you can choose from. In the following call, we load the model VGG19 without including the classification layers (`include_top=False`). In the weights, we specify that we want to download the weights that was trained on ImageNet dataset.

In [2]:
from tensorflow.keras.applications import mobilenet_v2
from tensorflow.keras.applications import vgg16
from tensorflow.keras.applications import EfficientNetB0

In [3]:
# adjust this to larger or smaller size
img_height, img_width = 224, 224

In [4]:
# base_model = tf.keras.applications.VGG16(input_shape=(img_height, img_width) + (3,),
#                                                include_top=False,
#                                                weights='imagenet')
base_model = EfficientNetB0(input_shape=(img_height, img_width) + (3,), include_top=False, weights='imagenet')
base_model.summary()

2021-10-22 14:11:29.796327: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:923] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-10-22 14:11:29.806478: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:923] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-10-22 14:11:29.807145: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:923] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-10-22 14:11:29.808627: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate

Model: "efficientnetb0"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
rescaling (Rescaling)           (None, 224, 224, 3)  0           input_1[0][0]                    
__________________________________________________________________________________________________
normalization (Normalization)   (None, 224, 224, 3)  7           rescaling[0][0]                  
__________________________________________________________________________________________________
stem_conv_pad (ZeroPadding2D)   (None, 225, 225, 3)  0           normalization[0][0]              
_____________________________________________________________________________________

**Question:**

Examine the print out from `model.summary()`
- What is the last layer in the pretrained model and what is the output shape? Do you have any Fully connected layers?

<details><summary>Click here for answer</summary>

The last layer is the MaxPooling2D layer. The output is a 512 feature maps of 4x4 size. There is no Fully connected (Dense) layers. The network is a convolutional base network.

</details>

## Creating Datasets

In [5]:
dataset_URL = 'https://nyp-aicourse.s3.ap-southeast-1.amazonaws.com/iti107/datasets/intel_emotions_dataset.zip'
path_to_zip = tf.keras.utils.get_file('intel_emotions_dataset.zip', origin=dataset_URL, extract=True, cache_dir='.')
dataset_dir = os.path.dirname(path_to_zip)

We will setup our training and validation dataset as before. 

In [6]:
batch_size = 16
image_size = (img_height, img_width)

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    dataset_dir,
    validation_split=0.2,
    subset="training",
    seed=1337,
    image_size=image_size,
    batch_size=batch_size,
    label_mode='binary'
)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    dataset_dir,
    validation_split=0.2,
    subset="validation",
    seed=1337,
    image_size=image_size,
    batch_size=batch_size,
    label_mode='binary'
)

Found 1631 files belonging to 2 classes.
Using 1305 files for training.
Found 1631 files belonging to 2 classes.
Using 326 files for validation.


### Extracting features on the train set 

We use `predict()` to loop through all the train images (and also the validation images). We can also pass the images directly to the keras model, e.g. `model(images)`. The output will be the features spit out by the convolutional base. We will then use these features as our training samples instead of the original images.

However, before we pass the images through the convolutional base, it is IMPORTANT to pre-process the image using the model-specific preprocessing function. Many people *FORGOT* about this step. 

In [7]:
# # retrieve the preprocess_input function of vgg16 for use later 
# preprocess_input_fn = vgg16.preprocess_input
import tensorflow.keras.applications.efficientnet as efficientnet
preprocess_input_fn = efficientnet.preprocess_input

In [8]:
base_model.trainable = False

def get_features_labels(dataset): 

    all_features = []
    all_labels = []

    for images, labels in dataset:   # each iteration yields a batch of images
        # pre-process the features
        preprocessed_images = preprocess_input_fn(images)
        features = base_model(preprocessed_images)
        all_features.append(features)
        all_labels.append(labels)

    # concatenate the features from all the batches
    all_features, all_labels = np.concatenate(all_features), np.concatenate(all_labels)
    
    return all_features, all_labels


Now we will call the extract function for both training dataset and validation dataset.

In [None]:
X_train, y_train = get_features_labels(train_ds)
X_val, y_val = get_features_labels(val_ds)

In [None]:
# Check the shape of the features
print(X_train.shape)
print(X_val.shape)

We will now save the features to local storage, as numpy arrays. We will load these features later on to be used for training our classifiers.

In [None]:
np.save("X_train.npy", X_train)
np.save("y_train.npy", y_train)
np.save("X_val.npy", X_val)
np.save("y_val.npy", y_val)

## Classification model

Now we will build a new model that takes in the extracted features as input. Instead of the usual flatten layer, followed by dense layers, let us use a GAP layer, followed by Dense, a Droput and another Dense that output the prediction. 

**Questions:**

1. What should be input shape to our model? 
2. What is the output shape of the Global Average Pooling (GAP) layer? 
3. How many units we need for output, and what should we use as activation function? 

Complete the code below. 

<details><summary>Click here for answer</summary>
    
1. The input shape should be (4, 4, 512) which is the output shape of our convolutional base
2. The output shape of GAP is (512) since the maxpooling layer (the last layer) of the convolutional base has 512 feature maps (channels). 
3. We need only 1 output unit as we are doing binary classification (0 or 1) and we should use 'sigmoid' as the activation function for binary classification. 

Codes: 

```python
inputs = layers.Input(shape=X_train.shape[1:])
x = layers.GlobalAveragePooling2D()(inputs)
x = layers.Dropout(rate=0.5)(x)
x = layers.Dense(units=512, activation="relu")(x)
x = layers.Dropout(rate=0.5)(x)
outputs = layers.Dense(units=1, activation="sigmoid")(x)

model_top = keras.models.Model(inputs=[inputs], outputs=[outputs], name="top")

model_top.compile(loss="binary_crossentropy", 
                  optimizer=keras.optimizers.Adam(learning_rate=0.001), 
                  metrics=["accuracy"])

``` 

</details>


In [None]:
# Build the model here, you can use either Keras Sequential or functional API to build your model

### START YOUR CODE HERE ###

## TODO: build your layers here, include the input and output layer

inputs = layers.Input(shape=X_train.shape[1:])
x = layers.GlobalAveragePooling2D()(inputs)
x = layers.Dropout(rate=0.5)(x)
x = layers.Dense(units=512, activation="relu")(x)
x = layers.Dropout(rate=0.5)(x)
outputs = layers.Dense(units=1, activation="sigmoid")(x)

# define the inputs and outputs of the model 

model_top = keras.models.Model(inputs=[inputs], outputs=[outputs], name="top")

model_top.compile(loss="binary_crossentropy", 
                  optimizer=keras.optimizers.Adam(learning_rate=0.001), 
                  metrics=["accuracy"])

### END YOUR CODE HERE ###    


In [None]:
model_top.summary()

Now we train our classifier we the extracted features (X_train) for 100 epochs. The training will be fast, as we only have very few parameters (around 200k) to train.

In [14]:
X_train = np.load('X_train.npy')
y_train = np.load('y_train.npy')
X_val = np.load('X_val.npy')
y_val = np.load('y_val.npy')

In [15]:
# we will now load the extracted features from the files we save to earlier 


model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath="best_checkpoint",
    save_weights_only=True,
    monitor='val_accuracy',
    mode='max',
    save_best_only=True)

hist_top = model_top.fit(X_train, y_train, 
                         epochs=100, 
                         validation_data=(X_val, y_val), 
                         callbacks=[model_checkpoint_callback])


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

In [16]:
model_top.load_weights('best_checkpoint')
y_preds = model_top.predict(X_val)

In [17]:
print(classification_report(y_val, y_preds.flatten() >= 0.5))

              precision    recall  f1-score   support

         0.0       0.73      0.88      0.80       163
         1.0       0.85      0.68      0.76       163

    accuracy                           0.78       326
   macro avg       0.79      0.78      0.78       326
weighted avg       0.79      0.78      0.78       326



In [18]:
print(y_preds[:10])

[[0.0668769 ]
 [0.551925  ]
 [0.56629497]
 [0.1865359 ]
 [0.935955  ]
 [0.10884209]
 [0.9848113 ]
 [0.25866514]
 [0.9392409 ]
 [0.9731554 ]]


In [19]:
print(y_val[:10])

[[0.]
 [1.]
 [1.]
 [0.]
 [1.]
 [1.]
 [0.]
 [0.]
 [1.]
 [1.]]


You should see an good improvement in the model (should be around 30%). The model also takes much less time to train. 

## Prepare the model for deployment

We cannot just use our `model_top` that is trained for image classification, as it take extracted features as input, not images. We need to stick back our convolutional base and use an input layer of appropriate shape. This is what we are going to do below.

In [20]:
inputs = layers.Input(shape=(img_height, img_width, 3))
x = preprocess_input_fn(inputs)
x = base_model(x)
outputs = model_top(x)

model_final = keras.models.Model(inputs=[inputs], outputs=[outputs])
model_final.compile(loss="binary_crossentropy", 
                  optimizer=keras.optimizers.Adam(learning_rate=0.001), 
                  metrics=["accuracy"])

# inputs = layers.Input(shape=(150, 150, 3))
# x = preprocess_input_fn(inputs)
# x = conv_base(x)
# top_outputs = model_top(x)
# model_final = Model(inputs=[inputs], outputs=[top_outputs])
# model_final.compile(loss="binary_crossentropy", optimizer=optimizers.RMSprop(lr=2e-5), metrics=['acc'])
# model_final.summary()
# model_final.save("final_model")

In [21]:
model_final.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_3 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
efficientnetb0 (Functional)  (None, 7, 7, 1280)        4049571   
_________________________________________________________________
top (Functional)             (None, 1)                 656385    
Total params: 4,705,956
Trainable params: 656,385
Non-trainable params: 4,049,571
_________________________________________________________________


In [22]:
model_final.save("full_model")

2021-10-22 14:12:24.552680: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.


INFO:tensorflow:Assets written to: full_model/assets




In [23]:
model_final.evaluate(val_ds)



[0.5611798167228699, 0.7791411280632019]

Now let just test our full model on the images from validation set.

In [24]:
all_images = []
all_labels = []
iterator1 = val_ds.as_numpy_iterator()
for images, labels in iterator1:
    all_labels.append(labels)
    all_images.append(images)

all_labels = np.concatenate(all_labels, axis=0)
all_images = np.concatenate(all_images, axis=0)

In [25]:
y_pred_probs = model_final.predict(all_images)
# convert probabilities into classification label based on threshold of 0.5 
y_preds = y_pred_probs > 0.5
print(len(y_preds))

326


In [26]:
print(classification_report(all_labels, y_preds))

              precision    recall  f1-score   support

         0.0       0.73      0.88      0.80       163
         1.0       0.85      0.68      0.76       163

    accuracy                           0.78       326
   macro avg       0.79      0.78      0.78       326
weighted avg       0.79      0.78      0.78       326



### Extra exercises

Try another pre-trained model such as MobileNetV2 or EfficientNetB0. 
