<a href="https://colab.research.google.com/github/nyp-sit/iti107/blob/main/session-3/2.feature_extraction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transfer learning - Feature Extraction

In this exercise, we use transfer learning to improve our baseline model. We will use a pre-trained model (VGG19) as a feature extractor and use the extracted features to train a classifier for our emotion classification task.

At the end of this exercise, you will be able to: 
- understand how to load a pretrained model with and without the classification layer  
- extract features using the pre-trained model as feature extractor
- train a classifier using the extracted features 


Transfer learning involved using the "knowledge" learnt from another task (e.g. doing image classification on a large dataset such as ImageNet) and transfer that knowledge to a new and related task (e.g doing image classification on different types of objects than the original ones or for doing object detection). There are two ways to leverage a pre-trained network: feature extraction and fine-tuning. Let's start with feature extraction approach. 

## Feature extraction

In this approach, we only take the convolutional base of a pretrained model and use it to extract features from the images, and use the extracted features as input features to train a separate classifier. 

<img src="https://nyp-aicourse.s3.ap-southeast-1.amazonaws.com/iti107/resources/swapping_fc_classifier.png" width="500" />

### Using pre-trained Model as Feature Extractor

We will be using VGG19 as our pretrained model (you can choose any other pretrained model, such as ResNet, etc). In the following code, we load the model VGG19 without including the classification layers (`include_top=False`). In the weights, we specify that we want to download the weights that was trained on ImageNet dataset.

In [1]:
import os
import tensorflow as tf
import tensorflow.keras as keras 
import numpy as np



In [2]:
dataset_url = 'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz'
path_to_zip = tf.keras.utils.get_file(origin=dataset_url, extract=True, cache_dir='.')
dataset_folder = os.path.dirname(path_to_zip)
dataset_folder = os.path.join(dataset_folder, 'flower_photos')

In [3]:
batch_size = 24
image_size = (128,128)

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    dataset_folder,
    validation_split=0.2,
    subset="training",
    seed=1337,
    image_size=image_size,
    batch_size=batch_size,
    label_mode='int'
)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    dataset_folder,
    validation_split=0.2,
    subset="validation",
    seed=1337,
    image_size=image_size,
    batch_size=batch_size,
    label_mode='int'
)

Found 3670 files belonging to 5 classes.
Using 2936 files for training.
Found 3670 files belonging to 5 classes.
Using 734 files for validation.


In [None]:
# Specify the intended image size we want
image_size = (128, 128)
base_model = keras.applications.efficientnet.EfficientNetB0(input_shape=image_size + (3,),
                                      include_top=False,
                                      weights='imagenet')
base_model.summary()

In [5]:
val_ds.class_names

['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips']

**Exercise:**

Examine the print out from `model.summary()`
- What is the last layer in the pretrained model and what is the output shape? Do you have any Fully connected layers?

<details><summary>Click here for answer</summary>

The last layer is the MaxPooling2D layer. The output is a 512 feature maps of 4x4 size. There is no Fully connected (Dense) layers. The network is a convolutional base network.

</details>

## Creating Datasets

We will setup our training and validation dataset as we did in earlier exercise.

In [6]:
# dataset_URL = 'https://nyp-aicourse.s3-ap-southeast-1.amazonaws.com/datasets/cats_and_dogs_subset.tar.gz'
# tf.keras.utils.get_file(origin=dataset_URL, extract=True, cache_dir='.')
# dataset_folder = os.path.join('datasets', 'cats_and_dogs_subset')
import os 

# dataset_URL = 'https://nyp-aicourse.s3-ap-southeast-1.amazonaws.com/datasets/cats_and_dogs_subset.tar.gz'
# tf.keras.utils.get_file(origin=dataset_URL, extract=True, cache_dir='.')
# dataset_folder = os.path.join('datasets', 'cats_and_dogs_subset')

# dataset_URL = 'https://nyp-aicourse.s3.ap-southeast-1.amazonaws.com/iti107/datasets/emotions_dataset_jpg.zip'
# path_to_zip= keras.utils.get_file('emotions_dataset_jpg.zip', origin=dataset_URL, extract=True, cache_dir='.')
# print(path_to_zip)
# dataset_folder = os.path.dirname(path_to_zip)

### Extracting features on the train set 

We will first define a function to perform feature extraction, given an image dataset. 

We can use `predict()` of the model to loop through all the train images (and also the validation images), or just pass the images directly to the keras model, e.g. `model(images)`. The output will be the features spit out by the convolutional base. We will then use these features as our training samples instead of the original images.

However, before we pass the images through the convolutional base, it is IMPORTANT to pre-process the image using the model-specific preprocessing function. Many people *FORGOT* about this step. Different model expect the images to be of specific range of values (e.g. some models expect the pixel values to be between 0 and 1, some between -1 and 1) and specific channel ordering (e.g. VGGNet expects the channel to be BGR). So we need to make sure our images are pre-processed according to what the model expects.

In [7]:
# retrieve the preprocess_input function of vgg16 for use later 
preprocess_input_fn = keras.applications.efficientnet.preprocess_input

In [8]:
base_model.trainable = False

def get_features_labels(dataset): 

    all_features = []
    all_labels = []

    for images, labels in dataset:   # each iteration yields a batch of images
        # pre-process the features
        preprocessed_images = preprocess_input_fn(images)
        features = base_model(preprocessed_images)
        
        # append the batch of features to all_features and all_labels
        all_features.append(features)
        all_labels.append(labels)

    # concatenate the features from all the batches
    all_features, all_labels = np.concatenate(all_features), np.concatenate(all_labels)
    
    return all_features, all_labels


Now we will call the extract function on both training dataset and validation dataset.

In [9]:
# Extract features and labels for train set
X_train, y_train = get_features_labels(train_ds)

# Extract features and labels for validation set
X_val, y_val = get_features_labels(val_ds)

In [10]:
# Check the shape of the features
print(X_train.shape)
print(X_val.shape)

(2936, 4, 4, 1280)
(734, 4, 4, 1280)


We will now save the features to local storage, as numpy arrays. We will load these features later on to be used for training our classifiers.

In [11]:
np.save("X_train.npy", X_train)
np.save("y_train.npy", y_train)
np.save("X_val.npy", X_val)
np.save("y_val.npy", y_val)

## Classification model

Now we will build a new classification model that takes in the extracted features as input. Instead of the usual flatten layer, followed by dense layers, let us use a GAP layer, followed by Dense (with 512 units), a Dropout (with 50%) and another Dense that output the prediction. Compile your model using Adam with a learning rate of 0.001.

**Exercise:**

1. What should be input shape to our model? 
2. What is the output shape of the Global Average Pooling (GAP) layer? 
3. How many units we need for output, and what should we use as activation function? 

Complete the code below. 

<details><summary>Click here for answer</summary>
    
1. The input shape should be (4, 4, 512) which is the output shape of our convolutional base
2. The output shape of GAP is (512) since the maxpooling layer (the last layer) of the convolutional base has 512 feature maps (channels). 
3. We need only 1 output unit as we are doing binary classification (0 or 1) and we should use 'sigmoid' as the activation function for binary classification. 

Codes: 

```python
inputs = keras.layers.Input(shape=X_train.shape[1:])
x = keras.layers.GlobalAveragePooling2D()(inputs)
x = keras.layers.Dropout(rate=0.5)(x)
x = keras.layers.Dense(units=512, activation="relu")(x)
x = keras.layers.Dropout(rate=0.5)(x)
outputs = keras.layers.Dense(units=1, activation="sigmoid")(x)

model_top = keras.models.Model(inputs=[inputs], outputs=[outputs], name="top")

model_top.compile(loss="binary_crossentropy", 
                  optimizer=keras.optimizers.Adam(learning_rate=0.001), 
                  metrics=["accuracy"])

``` 

</details>


In [12]:
# inputs = keras.layers.Input(shape=X_train.shape[1:])
# x = keras.layers.GlobalAveragePooling2D()(inputs)
# x = keras.layers.Dropout(rate=0.5)(x)
# x = keras.layers.Dense(units=512, activation="relu")(x)
# x = keras.layers.Dropout(rate=0.5)(x)
# outputs = keras.layers.Dense(units=1, activation="sigmoid")(x)

# model_top = keras.models.Model(inputs=[inputs], outputs=[outputs], name="top")

# model_top.compile(loss="binary_crossentropy", 
#                   optimizer=keras.optimizers.Adam(learning_rate=0.001), 
#                   metrics=["accuracy"])

In [13]:
inputs = keras.layers.Input(shape=X_train.shape[1:])

x = keras.layers.GlobalAveragePooling2D()(inputs)
x = keras.layers.Dropout(rate=0.5)(x)
x = keras.layers.Dense(units=128, activation="relu")(x)
x = keras.layers.Dropout(rate=0.5)(x)
outputs = keras.layers.Dense(units=5, activation="softmax")(x)

model_top = keras.models.Model(inputs=[inputs], outputs=[outputs], name="top")

model_top.compile(loss="sparse_categorical_crossentropy", 
                  optimizer=keras.optimizers.Adam(learning_rate=0.001), 
                  metrics=["accuracy"])

In [14]:
X_train.shape[1:]

(4, 4, 1280)

In [16]:
# TODO: build your classification model here, try to use functional API to do so.

# inputs = ??

# ## any other layers

# outputs = ??

# model_top = keras.models.Model(inputs=[inputs], outputs=[outputs], name="top")

# model_top.compile(loss=??, 
#                   optimizer=??, 
#                   metrics=["accuracy"])


In [17]:
model_top.summary()

Model: "top"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         [(None, 4, 4, 1280)]      0         
_________________________________________________________________
global_average_pooling2d (Gl (None, 1280)              0         
_________________________________________________________________
dropout (Dropout)            (None, 1280)              0         
_________________________________________________________________
dense (Dense)                (None, 128)               163968    
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 5)                 645       
Total params: 164,613
Trainable params: 164,613
Non-trainable params: 0
_________________________________________________________

Now we train our classifier we the extracted features (X_train) for 30 epochs. The training will be fast, as we only have very few parameters (around 200k) to train.

In [18]:
# we will now load the extracted features from the files we save to earlier
X_train = np.load('X_train.npy')
y_train = np.load('y_train.npy')
X_val = np.load('X_val.npy')
y_val = np.load('y_val.npy')

In [19]:
# create the tensorboard callback
import os
import time

root_logdir = os.path.join(os.curdir, "tb_logs")

def get_run_logdir():    # use a new directory for each run
    run_id = time.strftime("run_%Y_%m_%d-%H_%M_%S")
    return os.path.join(root_logdir, run_id)

run_logdir = get_run_logdir()
tb_callback = keras.callbacks.TensorBoard(run_logdir)

# create model checkpoint callback to save the best model checkpoint
model_checkpoint_callback = keras.callbacks.ModelCheckpoint(
    filepath="best_checkpoint",
    save_weights_only=True,
    monitor='val_accuracy',
    mode='max',
    save_best_only=True)

In [20]:
model_top.fit(X_train, y_train, 
              epochs=50, 
              batch_size=16,
              validation_data=(X_val, y_val), 
              callbacks=[tb_callback, model_checkpoint_callback])


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x23d65152310>

In [21]:
%load_ext tensorboard
%tensorboard --logdir tb_logs

Reusing TensorBoard on port 6006 (pid 18604), started 6 days, 23:21:19 ago. (Use '!kill 18604' to kill it.)

Let's load the best-performing model checkpoints and use it to compute classification report.

In [22]:
model_top.load_weights('best_checkpoint')
model_top.evaluate(X_val, y_val)




[0.3594142496585846, 0.9046321511268616]

You should see an good improvement in the model (should be around 30%). The model also takes much less time to train. 

## Prepare the model for deployment

We cannot use our `model_top` directly for image classification, as it take extracted features as input, not images. We need to stick back our convolutional base that can take in images directly. This is what we are going to do below. It is also important to include the model-specific pre-processing function as one of the layer.

In [23]:
# specify the input layer with appropriate image shape
inputs = keras.layers.Input(shape=image_size+(3,))

# import to include model-specific preprocess function
x = preprocess_input_fn(inputs)

x = base_model(x)
outputs = model_top(x)

model_full = keras.models.Model(inputs=[inputs], outputs=[outputs])
model_full.compile(loss="binary_crossentropy", 
                  optimizer=keras.optimizers.Adam(learning_rate=0.001), 
                  metrics=["accuracy"])

model_full.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_3 (InputLayer)         [(None, 128, 128, 3)]     0         
_________________________________________________________________
efficientnetb0 (Functional)  (None, 4, 4, 1280)        4049571   
_________________________________________________________________
top (Functional)             (None, 5)                 164613    
Total params: 4,214,184
Trainable params: 164,613
Non-trainable params: 4,049,571
_________________________________________________________________


In [24]:
model_full.save("full_model")

INFO:tensorflow:Assets written to: full_model\assets




Let's make sure our full model works on the validation dataset (which are images) and gives the same accuracy as before.

In [25]:
model_full.evaluate(val_ds)

ValueError: in user code:

    C:\Users\markk\miniconda3\envs\dlenv\lib\site-packages\keras\engine\training.py:1330 test_function  *
        return step_function(self, iterator)
    C:\Users\markk\miniconda3\envs\dlenv\lib\site-packages\keras\engine\training.py:1320 step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    C:\Users\markk\miniconda3\envs\dlenv\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:1286 run
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    C:\Users\markk\miniconda3\envs\dlenv\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2849 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    C:\Users\markk\miniconda3\envs\dlenv\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:3632 _call_for_each_replica
        return fn(*args, **kwargs)
    C:\Users\markk\miniconda3\envs\dlenv\lib\site-packages\keras\engine\training.py:1313 run_step  **
        outputs = model.test_step(data)
    C:\Users\markk\miniconda3\envs\dlenv\lib\site-packages\keras\engine\training.py:1269 test_step
        self.compiled_loss(
    C:\Users\markk\miniconda3\envs\dlenv\lib\site-packages\keras\engine\compile_utils.py:201 __call__
        loss_value = loss_obj(y_t, y_p, sample_weight=sw)
    C:\Users\markk\miniconda3\envs\dlenv\lib\site-packages\keras\losses.py:141 __call__
        losses = call_fn(y_true, y_pred)
    C:\Users\markk\miniconda3\envs\dlenv\lib\site-packages\keras\losses.py:245 call  **
        return ag_fn(y_true, y_pred, **self._fn_kwargs)
    C:\Users\markk\miniconda3\envs\dlenv\lib\site-packages\tensorflow\python\util\dispatch.py:206 wrapper
        return target(*args, **kwargs)
    C:\Users\markk\miniconda3\envs\dlenv\lib\site-packages\keras\losses.py:1809 binary_crossentropy
        backend.binary_crossentropy(y_true, y_pred, from_logits=from_logits),
    C:\Users\markk\miniconda3\envs\dlenv\lib\site-packages\tensorflow\python\util\dispatch.py:206 wrapper
        return target(*args, **kwargs)
    C:\Users\markk\miniconda3\envs\dlenv\lib\site-packages\keras\backend.py:5000 binary_crossentropy
        return tf.nn.sigmoid_cross_entropy_with_logits(labels=target, logits=output)
    C:\Users\markk\miniconda3\envs\dlenv\lib\site-packages\tensorflow\python\util\dispatch.py:206 wrapper
        return target(*args, **kwargs)
    C:\Users\markk\miniconda3\envs\dlenv\lib\site-packages\tensorflow\python\ops\nn_impl.py:245 sigmoid_cross_entropy_with_logits_v2
        return sigmoid_cross_entropy_with_logits(
    C:\Users\markk\miniconda3\envs\dlenv\lib\site-packages\tensorflow\python\util\dispatch.py:206 wrapper
        return target(*args, **kwargs)
    C:\Users\markk\miniconda3\envs\dlenv\lib\site-packages\tensorflow\python\ops\nn_impl.py:132 sigmoid_cross_entropy_with_logits
        raise ValueError("logits and labels must have the same shape (%s vs %s)" %

    ValueError: logits and labels must have the same shape ((None, 5) vs (None, 1))


## Extra exercises

Try another pre-trained model such as MobileNetV2 or EfficientNetB1 and see if the extracted features give you better classification result. 
