<a href="https://colab.research.google.com/github/nyp-sit/iti107/blob/main/session-2/2.feature_extraction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transfer learning - Feature Extraction

In this exercise, we use transfer learning to improve our baseline model. We will use a pre-trained CNN model as a feature extractor and use the extracted features to train a classifier for our emotion classification task.

At the end of this exercise, you will be able to:
- understand how to load a pretrained model with and without the classification layer  
- extract features using the pre-trained model as feature extractor
- train a classifier using the extracted features


Transfer learning involved using the "knowledge" learnt from another task (e.g. doing image classification on a large dataset such as ImageNet) and transfer that knowledge to a new and related task (e.g doing image classification on different types of objects than the original ones or for doing object detection). There are two ways to leverage a pre-trained network: feature extraction and fine-tuning. Let's start with feature extraction approach.

## Feature extraction

In this approach, we only take the convolutional base of a pretrained model and use it to extract features from the images, and use the extracted features as input features to train a separate classifier.

<img src="https://nyp-aicourse.s3.ap-southeast-1.amazonaws.com/iti107/resources/swapping_fc_classifier.png" width="500" />

In [None]:
import os
import tensorflow as tf
import tensorflow.keras as keras
import numpy as np

## Create train and validation dataset
Let's go ahead and prepare our train and validation dataset as before.

In [None]:
dataset_url = 'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz'
path_to_zip = tf.keras.utils.get_file(origin=dataset_url, extract=True, cache_dir='.')
dataset_folder = os.path.dirname(path_to_zip)
dataset_folder = os.path.join(dataset_folder, 'flower_photos')

In [None]:
batch_size = 32
image_size = (128,128)

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    dataset_folder,
    validation_split=0.2,
    subset="training",
    seed=1337,
    image_size=image_size,
    batch_size=batch_size,
    label_mode='int'
)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    dataset_folder,
    validation_split=0.2,
    subset="validation",
    seed=1337,
    image_size=image_size,
    batch_size=batch_size,
    label_mode='int'
)

In [None]:
num_classes = len(val_ds.class_names)

## Using pre-trained Model as Feature Extractor

Let's use EfficientNetB0 as our pretrained model (you can choose any other pretrained model, such as VGG19, ResNet, etc). In the following code, we load the model EfficientNetB0 without including the classification layers (`include_top=False`). In the weights, we specify that we want to download the weights that was trained on ImageNet dataset.

In [None]:
# Specify the intended image size we want
base_model = keras.applications.efficientnet.EfficientNetB0(input_shape=image_size + (3,),
                                      include_top=False,
                                      weights='imagenet')
base_model.summary()

**Exercise:**

Examine the print out from `model.summary()`
- What is the last layer in the pretrained model and what is the output shape? Do you have any Fully connected layers?

<details><summary>Click here for answer</summary>

The last layer is the Conv2D layer. The output is a 1280 feature maps of 4x4 size. There is no Fully connected (Dense) layers. The network is a convolutional base network.

</details>

### Extracting features on the train set

We will first define a function to perform feature extraction, given an image dataset.

We can use `predict()` of the model to loop through all the train images (and also the validation images), or just pass the images directly to the keras model, e.g. `model(images)`. The output will be the features spit out by the convolutional base. We will then use these features as our training samples instead of the original images.

However, before we pass the images through the convolutional base, it is IMPORTANT to pre-process the image using the model-specific preprocessing function. Many people *FORGOT* about this step. Different model expect the images to be of specific range of values (e.g. some models expect the pixel values to be between 0 and 1, some between -1 and 1) and specific channel ordering (e.g. VGGNet expects the channel to be BGR). So we need to make sure our images are pre-processed according to what the model expects.

**NOTE**: For EfficientNet, the pre-processing is part of the model, so the preprocess_input function is just pass-thru and not necessary.

In [None]:
# retrieve the preprocess_input function of convolutional model for use later
# NOTE: For EfficientNet, the pre-processing is part of the model, so the preprocess_input function is just pass-thru
preprocess_input_fn = keras.applications.efficientnet.preprocess_input

In [None]:
base_model.trainable = False

def get_features_labels(dataset):

    all_features = []
    all_labels = []

    for images, labels in dataset:   # each iteration yields a batch of images
        # pre-process the features
        preprocessed_images = preprocess_input_fn(images)
        features = base_model(preprocessed_images)

        # append the batch of features to all_features and all_labels
        all_features.append(features)
        all_labels.append(labels)

    # concatenate the features from all the batches
    all_features, all_labels = np.concatenate(all_features), np.concatenate(all_labels)

    return all_features, all_labels


Now we will call the extract function on both training dataset and validation dataset.

In [None]:
# Extract features and labels for train set
X_train, y_train = get_features_labels(train_ds)

# Extract features and labels for validation set
X_val, y_val = get_features_labels(val_ds)

In [None]:
# Check the shape of the features
print(X_train.shape)
print(X_val.shape)

We will now save the features to local storage, as numpy arrays. We will load these features later on to be used for training our classifiers.

In [None]:
np.save("X_train.npy", X_train)
np.save("y_train.npy", y_train)
np.save("X_val.npy", X_val)
np.save("y_val.npy", y_val)

## Classification model

Now we will build a new classification model that takes in the extracted features as input. Instead of the usual flatten layer, followed by dense layers, let us use a GAP layer, followed by Dense (with 256 units), a Dropout (with 50%) and another Dense that output the prediction. Compile your model using Adam with a learning rate of 0.001.

**Exercise:**

1. What should be input shape to our model?
2. What is the output shape of the Global Average Pooling (GAP) layer?
3. How many units we need for output, and what should we use as activation function?

Complete the code below.

<details><summary>Click here for answer</summary>
    
1. The input shape should be (4, 4, 1280) which is the output shape of our convolutional base
2. The output shape of GAP is (1280) since the maxpooling layer (the last layer) of the convolutional base has 1280 feature maps (channels).
3. We need  5 output units as we are classifying 5 different flowers and we should use 'softmax' as the activation function for multi-class classification.

Codes:

```python
inputs = keras.layers.Input(shape=X_train.shape[1:])
x = keras.layers.GlobalAveragePooling2D()(inputs)
x = keras.layers.Dropout(rate=0.5)(x)
x = keras.layers.Dense(units=256, activation="relu")(x)
x = keras.layers.Dropout(rate=0.5)(x)
outputs = keras.layers.Dense(units=5, activation="softmax")(x)

model_top = keras.models.Model(inputs=[inputs], outputs=[outputs], name="top")

model_top.compile(loss="sparse_categorical_crossentropy",
                  optimizer=keras.optimizers.Adam(learning_rate=0.001),
                  metrics=["accuracy"])

```

</details>


In [None]:
# TODO: build your classification model here, try to use functional API to do so.

inputs = ??

## any other layers

outputs = ??

model_top = keras.models.Model(inputs=[inputs], outputs=[outputs], name="top")

model_top.compile(loss=??,
                  optimizer=??,
                  metrics=["accuracy"])


In [None]:
model_top.summary()

Now we train our classifier we the extracted features (X_train) for 30 epochs. The training will be fast, as we only have very few parameters (around 200k) to train.

In [None]:
# we will now load the extracted features from the files we save to earlier
X_train = np.load('X_train.npy')
y_train = np.load('y_train.npy')
X_val = np.load('X_val.npy')
y_val = np.load('y_val.npy')

In [None]:
# create the tensorboard callback
import os
import time

root_logdir = os.path.join(os.curdir, "tb_logs")

def get_run_logdir():    # use a new directory for each run
    run_id = time.strftime("run_%Y_%m_%d-%H_%M_%S")
    return os.path.join(root_logdir, run_id)

run_logdir = get_run_logdir()
tb_callback = keras.callbacks.TensorBoard(run_logdir)

# create model checkpoint callback to save the best model checkpoint
model_checkpoint_callback = keras.callbacks.ModelCheckpoint(
    filepath="best_checkpoint",
    save_weights_only=True,
    monitor='val_accuracy',
    mode='max',
    save_best_only=True)

In [None]:
model_top.fit(X_train, y_train,
              epochs=50,
              batch_size=16,
              validation_data=(X_val, y_val),
              callbacks=[tb_callback, model_checkpoint_callback])


In [None]:
%load_ext tensorboard
%tensorboard --logdir tb_logs

Let's load the best-performing model checkpoints and use it to compute classification report.

In [None]:
model_top.load_weights('best_checkpoint')
model_top.evaluate(X_val, y_val)


You should see an good improvement in the model, as compared to the previous model. The model also takes much less time to train.

## Prepare the model for deployment

We cannot use our `model_top` directly for image classification, as it take extracted features as input, not images. We need to stick back our convolutional base that can take in images directly. This is what we are going to do below. It is also important to include the model-specific pre-processing function as one of the layer.

In [None]:
# specify the input layer with appropriate image shape
inputs = keras.layers.Input(shape=image_size+(3,))

# important to include model-specific preprocess function
x = preprocess_input_fn(inputs)

x = base_model(x)
outputs = model_top(x)

model_full = keras.models.Model(inputs=[inputs], outputs=[outputs])
model_full.compile(loss="sparse_categorical_crossentropy",
                  optimizer=keras.optimizers.Adam(learning_rate=0.001),
                  metrics=["accuracy"])

model_full.summary()

In [None]:
model_full.save("full_model")

Let's make sure our full model works on the validation dataset (which are images) and gives the same accuracy as before.

In [None]:
restored_model = tf.keras.models.load_model('full_model')
restored_model.evaluate(val_ds)

## Extra exercises

Try another pre-trained model such as VGG19 or ResNet50 and see if the extracted features give you better classification result.
