<a href="https://colab.research.google.com/github/nyp-sit/it3103/blob/main/week4/2.feature_extraction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transfer learning - Feature Extraction

Welcome to this week's programming exercise. In this exercise, we use transfer learning to improve our baseline model. We will use a pre-trained model (VGG19) as a feature extractor and use the extracted features to train a classifier for our emotion classification task.

At the end of this exercise, you will be able to: 
- understand how to load a pretrained model with and without the classification layer  
- extract features using the pre-trained model as feature extractor
- train a classifier using the extracted features 


Transfer learning involved using the "knowledge" learnt from another task (e.g. doing image classification on a large dataset such as ImageNet) and transfer that knowledge to a new and related task (e.g doing image classification on different types of objects than the original ones or for doing object detection). There are two ways to leverage a pre-trained network: feature extraction and fine-tuning. Let's start with feature extraction approach. 

## Feature extraction

In this approach, we only take the convolutional base of a pretrained model and use it to extract features from the images, and use the extracted features as input features to train a separate classifier. 

<img src="https://nyp-aicourse.s3.ap-southeast-1.amazonaws.com/it3103/resources/swapping_fc_classifier.png" width="500" />

### Using pre-trained Model as Feature Extractor

We will be using VGG19 as our pretrained model (you can choose any other pretrained model, such as ResNet, etc). Keras comes with a set of [pretrained models](https://www.tensorflow.org/api_docs/python/tf/keras/applications) you can choose from. In the following call, we load the model VGG19 without including the classification layers (`include_top=False`). In the weights, we specify that we want to download the weights that was trained on ImageNet dataset.

In [1]:
import os
import tensorflow as tf
import tensorflow.keras as keras 
import numpy as np
import tensorflow.keras.layers as layers
from tensorflow.keras.applications import vgg16
from sklearn.metrics import classification_report

In [2]:
# Specify the intended image size we want
img_height, img_width = 128, 128
base_model = tf.keras.applications.VGG16(input_shape=(img_height, img_width) + (3,),
                                         include_top=False,
                                         weights='imagenet')
base_model.summary()

2021-11-27 23:17:40.741323: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-27 23:17:40.817231: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-27 23:17:40.817558: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-27 23:17:40.819673: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 128, 128, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 128, 128, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 128, 128, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 64, 64, 64)        0         
                                                                 
 block2_conv1 (Conv2D)       (None, 64, 64, 128)       73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 64, 64, 128)      

**Exercise:**

Examine the print out from `model.summary()`
- What is the last layer in the pretrained model and what is the output shape? Do you have any Fully connected layers?

<details><summary>Click here for answer</summary>

The last layer is the MaxPooling2D layer. The output is a 512 feature maps of 4x4 size. There is no Fully connected (Dense) layers. The network is a convolutional base network.

</details>

## Creating Datasets

We will setup our training and validation dataset as we did in earlier exercise.

In [3]:
dataset_URL = 'https://nyp-aicourse.s3-ap-southeast-1.amazonaws.com/datasets/cats_and_dogs_subset.tar.gz'
path_to_zip = tf.keras.utils.get_file('cats_and_dogs_subset.tar.gz', origin=dataset_URL, extract=True, cache_dir='.')
dataset_dir = os.path.join(os.path.dirname(path_to_zip), "cats_and_dogs_subset")

Downloading data from https://nyp-aicourse.s3-ap-southeast-1.amazonaws.com/datasets/cats_and_dogs_subset.tar.gz


In [4]:
batch_size = 32

# resize all the images to the same size as expected by VGG model we downloaded above
image_size = (img_height, img_width)

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    dataset_dir,
    validation_split=0.2,
    subset="training",
    seed=1337,
    image_size=image_size,
    batch_size=batch_size,
    label_mode='binary'
)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    dataset_dir,
    validation_split=0.2,
    subset="validation",
    seed=1337,
    image_size=image_size,
    batch_size=batch_size,
    label_mode='binary'
)

Found 3000 files belonging to 2 classes.
Using 2400 files for training.
Found 3000 files belonging to 2 classes.
Using 600 files for validation.


### Extracting features on the train set 

We will first define a function to perform feature extraction, given an image dataset. 

We can use `predict()` of the model to loop through all the train images (and also the validation images), or just pass the images directly to the keras model, e.g. `model(images)`. The output will be the features spit out by the convolutional base. We will then use these features as our training samples instead of the original images.

However, before we pass the images through the convolutional base, it is IMPORTANT to pre-process the image using the model-specific preprocessing function. Many people *FORGOT* about this step. Different model expect the images to be of specific range of values (e.g. some models expect the pixel values to be between 0 and 1, some between -1 and 1) and specific channel ordering (e.g. VGGNet expects the channel to be BGR). So we need to make sure our images are pre-processed according to what the model expects.

In [5]:
# retrieve the preprocess_input function of vgg16 for use later 
preprocess_input_fn = vgg16.preprocess_input

In [6]:
base_model.trainable = False

def get_features_labels(dataset): 

    all_features = []
    all_labels = []

    for images, labels in dataset:   # each iteration yields a batch of images
        # pre-process the features
        preprocessed_images = preprocess_input_fn(images)
        features = base_model(preprocessed_images)
        
        # append the batch of features to all_features and all_labels
        all_features.append(features)
        all_labels.append(labels)

    # concatenate the features from all the batches
    all_features, all_labels = np.concatenate(all_features), np.concatenate(all_labels)
    
    return all_features, all_labels


Now we will call the extract function on both training dataset and validation dataset.

In [7]:
# Extract features and labels for train set
X_train, y_train = get_features_labels(train_ds)

# Extract features and labels for validation set
X_val, y_val = get_features_labels(val_ds)

2021-11-27 23:18:14.590698: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8204
2021-11-27 23:18:16.401107: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2021-11-27 23:18:18.281139: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.04GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-11-27 23:18:18.281600: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.04GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-11-27 23:18:18.706736: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.5

In [8]:
# Check the shape of the features
print(X_train.shape)
print(X_val.shape)

(2400, 4, 4, 512)
(600, 4, 4, 512)


We will now save the features to local storage, as numpy arrays. We will load these features later on to be used for training our classifiers.

In [9]:
np.save("X_train.npy", X_train)
np.save("y_train.npy", y_train)
np.save("X_val.npy", X_val)
np.save("y_val.npy", y_val)

## Classification model

Now we will build a new classification model that takes in the extracted features as input. Instead of the usual flatten layer, followed by dense layers, let us use a GAP layer, followed by Dense (with 512 units), a Dropout (with 50%) and another Dense that output the prediction. Compile your model using Adam with a learning rate of 0.001.

**Exercise:**

1. What should be input shape to our model? 
2. What is the output shape of the Global Average Pooling (GAP) layer? 
3. How many units we need for output, and what should we use as activation function? 

Complete the code below. 

<details><summary>Click here for answer</summary>
    
1. The input shape should be (4, 4, 512) which is the output shape of our convolutional base
2. The output shape of GAP is (512) since the maxpooling layer (the last layer) of the convolutional base has 512 feature maps (channels). 
3. We need only 1 output unit as we are doing binary classification (0 or 1) and we should use 'sigmoid' as the activation function for binary classification. 

Codes: 

```python
inputs = layers.Input(shape=X_train.shape[1:])
x = layers.GlobalAveragePooling2D()(inputs)
x = layers.Dropout(rate=0.5)(x)
x = layers.Dense(units=512, activation="relu")(x)
x = layers.Dropout(rate=0.5)(x)
outputs = layers.Dense(units=1, activation="sigmoid")(x)

model_top = keras.models.Model(inputs=[inputs], outputs=[outputs], name="top")

model_top.compile(loss="binary_crossentropy", 
                  optimizer=keras.optimizers.Adam(learning_rate=0.001), 
                  metrics=["accuracy"])

``` 

</details>


In [10]:
## TODO: build your classification model here, try to use functional API to do so.

inputs = layers.Input(shape=X_train.shape[1:])
x = layers.GlobalAveragePooling2D()(inputs)
x = layers.Dropout(rate=0.5)(x)
x = layers.Dense(units=512, activation="relu")(x)
x = layers.Dropout(rate=0.5)(x)
outputs = layers.Dense(units=1, activation="sigmoid")(x)

model_top = keras.models.Model(inputs=[inputs], outputs=[outputs], name="top")

model_top.compile(loss="binary_crossentropy", 
                  optimizer=keras.optimizers.Adam(learning_rate=0.001), 
                  metrics=["accuracy"])

In [11]:
model_top.summary()

Model: "top"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 4, 4, 512)]       0         
                                                                 
 global_average_pooling2d (G  (None, 512)              0         
 lobalAveragePooling2D)                                          
                                                                 
 dropout (Dropout)           (None, 512)               0         
                                                                 
 dense (Dense)               (None, 512)               262656    
                                                                 
 dropout_1 (Dropout)         (None, 512)               0         
                                                                 
 dense_1 (Dense)             (None, 1)                 513       
                                                               

Now we train our classifier we the extracted features (X_train) for 100 epochs. The training will be fast, as we only have very few parameters (around 200k) to train.

In [12]:
# we will now load the extracted features from the files we save to earlier
X_train = np.load('X_train.npy')
y_train = np.load('y_train.npy')
X_val = np.load('X_val.npy')
y_val = np.load('y_val.npy')

In [13]:
# create the tensorboard callback
import os
import time

root_logdir = os.path.join(os.curdir, "tb_logs")

def get_run_logdir():    # use a new directory for each run
    run_id = time.strftime("run_%Y_%m_%d-%H_%M_%S")
    return os.path.join(root_logdir, run_id)

run_logdir = get_run_logdir()
tb_callback = tf.keras.callbacks.TensorBoard(run_logdir)

# create model checkpoint callback to save the best model checkpoint
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath="best_checkpoint",
    save_weights_only=True,
    monitor='val_accuracy',
    mode='max',
    save_best_only=True)

In [14]:
model_top.fit(X_train, y_train, 
              epochs=100, 
              validation_data=(X_val, y_val), 
              callbacks=[tb_callback, model_checkpoint_callback])


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x7f88c403d070>

In [15]:
%load_ext tensorboard
%tensorboard --logdir tb_logs

Let's load the best-performing model checkpoints and use it to compute classification report.

In [16]:
model_top.load_weights('best_checkpoint')
y_preds = model_top.predict(X_val)

# flatten the 2d array into 1D
y_preds = y_preds.flatten() >= 0.5

print(classification_report(y_val, y_preds))

              precision    recall  f1-score   support

         0.0       0.95      0.97      0.96       292
         1.0       0.97      0.95      0.96       308

    accuracy                           0.96       600
   macro avg       0.96      0.96      0.96       600
weighted avg       0.96      0.96      0.96       600



You should see an good improvement in the model compared to your previous result. The model also takes much less time to train. 

## Prepare the model for deployment

We cannot use our `model_top` directly for image classification, as it take extracted features as input, not images. We need to stick back our convolutional base that can take in images directly. This is what we are going to do below. It is also important to include the model-specific pre-processing function as one of the layer.

In [17]:
# specify the input layer with appropriate image shape
inputs = layers.Input(shape=(img_height, img_width, 3))

# import to include model-specific preprocess function
x = preprocess_input_fn(inputs)

x = base_model(x)
outputs = model_top(x)

model_full = keras.models.Model(inputs=[inputs], outputs=[outputs])
model_full.compile(loss="binary_crossentropy", 
                  optimizer=keras.optimizers.Adam(learning_rate=0.001), 
                  metrics=["accuracy"])

model_full.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_3 (InputLayer)        [(None, 128, 128, 3)]     0         
                                                                 
 tf.__operators__.getitem (S  (None, 128, 128, 3)      0         
 licingOpLambda)                                                 
                                                                 
 tf.nn.bias_add (TFOpLambda)  (None, 128, 128, 3)      0         
                                                                 
 vgg16 (Functional)          (None, 4, 4, 512)         14714688  
                                                                 
 top (Functional)            (None, 1)                 263169    
                                                                 
Total params: 14,977,857
Trainable params: 263,169
Non-trainable params: 14,714,688
___________________________________________

In [18]:
model_full.save("full_model")

2021-11-27 23:19:58.719937: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.


INFO:tensorflow:Assets written to: full_model/assets


Let's make sure our full model works on the validation dataset (which are images) and gives the same accuracy as before.

In [19]:
model_full.evaluate(val_ds)



[0.10947905480861664, 0.9583333134651184]

### See your classifier in action

Ok, now we are ready to put our classifier to test. Upload your favourite cat and dog images and see your model in action.

In [None]:
# we use google colab api to upload the file
# If you are running locally, you may need to change this to use FileUpload ipywidget
from google.colab import files

uploaded = files.upload()
# take only the first file
filename = list(uploaded.keys())[0]

In [None]:
img = tf.keras.preprocessing.image.load_img(
    filename, target_size=(img_height, img_width)
)
# we convert the image to numpy array
img_array = tf.keras.preprocessing.image.img_to_array(img)

# Although we only have single image, however our model expected data in batches
# so we will need to add in the batch axis too
img_array = tf.expand_dims(img_array, 0) # Create a batch

# we load the model saved earlier and do the inference 
model = tf.keras.models.load_model('full_model')
predictions = model(img_array)
print(predictions)
if predictions[0] > 0.5: 
    print('It is a dog')
else:
    print('It is a cat')