<a href="https://colab.research.google.com/github/nyp-sit/iti107/blob/main/session-1/using_pretrained_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using Pretrained CNN models

Welcome to this week's programming exercise. We have covered many different Convolutional Neural Network architectures such as VGG, ResNet, Inception and MobileNet. It is time to see them in action.

At the end of this exercise, you will be able to:
- load pretrained models of some popular Convolutional Neural Networks and use them to classify images
- identify some of the architecture patterns in the popular Convolutional Neural Network
- compare the inference speed of different models


## Get the sample image

We will use the pretrained model to classify a sample image (a picture of table and chair). Let's go ahead and download the image.

In [None]:
# wget is a linux command available on linux os like Ubuntu
!wget https://nyp-aicourse.s3.ap-southeast-1.amazonaws.com/iti107/resources/chair_table.png

In [None]:
from PIL import Image
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Read Images
img_path = 'chair_table.png'
img = tf.keras.utils.load_img(img_path)

# display Images
plt.imshow(img)

## VGG16 - Pretrained Model

In [None]:
from tensorflow.keras.applications import vgg16

vgg16_model = vgg16.VGG16(weights='imagenet')
vgg16_model.summary()

***Questions***

1. What is the expected input image size?
2. What are the last four layers in VGG-16?

<details><summary>Click here for answer</summary>
    
1. it is expected to have a height of 224 and width of 224
2. the last 4 layers are flatten (which flattens the 2-D array into 1-D array before feeding to FC layer), and 2 Fully-connected (Dense) layers, and the last layer is a soft-max layer to classify 1000-classes. This is quite typical of a image classifier.

</details>

In [None]:
# Utility Function to Load Image, Preprocess input and Targets
def predict_image(model, img_path, preprocess_input_fn, decode_predictions_fn, target_size=(224,224)):
    img = tf.keras.utils.load_img(img_path, target_size=target_size)
    x = tf.keras.utils.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input_fn(x)
    preds = model.predict(x)
    predictions_df = pd.DataFrame(decode_predictions_fn(preds, top=10)[0])
    predictions_df.columns = ["Predicted Class", "Name", "Probability"]
    return predictions_df

In [None]:
# Predict Results
predict_image(vgg16_model, img_path, vgg16.preprocess_input, vgg16.decode_predictions)

Notice that we pass in `vgg.preprocess_input` function to preprocess the image before calling `model.predict()`. Different network (e.g. VGG, ResNet, etc) expects the input image to be normalized in different ways, and different models will provide their own preprocess_input() function to perform the normalization.

We also call `np.expand_dims(x, axis=0)` before calling `preprocess_input()` and `predict()`.

***Question***

1. What does `np.expand_dims(x, axis=0)` do and why do we need it?
2. Our sample picture consists of both table and chair? What does VGG16 predict? and why do you think it predicts so?
3. Of the top 10 predictions, did you see any prediction about chair?


<details><summary>Click here for answer</summary>

1. np.expand_dims() increases the number of dimensions and the axis of the new dimension is specified by the axis parameter. In this case, we add in a new axis as axis=0, first axis. This is because the preprocess_input() and predict() function expects the images to be in the shape (samples, height, width, channels), the 1st axis being the batch.

2. It predicts dining table. It probably focus on the object in the middle of the image.

3. Yes, folder chair is one of the top 10 predictions.

</details>

## Resnet50 - Pretrained Model

In [None]:
# It will download the weights that might take a while
# Also, the summary will be quite long, since Resnet50 is a much larger network than VGG16
from tensorflow.keras.applications import resnet50

resnet50_model = resnet50.ResNet50(weights='imagenet')

# let's plot the model, instead of using model.summary(), as it is easier to see some of the skip connections
tf.keras.utils.plot_model(resnet50_model, to_file="resnet.png", show_shapes=True)

***Questions***

1. Can you identify the skip connection block from the model plot()?
2. Look at the last few layers in the ResNet. How are they different from those of VGG-16?

<details><summary>Click here for answer</summary>
    
1. Look for those 'Add' layer (e.g. layer with name add_2). The Add layer adds the skip connection with the previous layer. Notice that the add is done before the Activation function. You can also call plot_model() to get a graphical visualization of the model.

2. ResNet does not use make use of Full-connected layers as classification layers. Instead it replaces the FC layers with GlobalAveragePooling2D. This architecture is very common in more modern architectures.

</details>

In [None]:
# Predict Results
predict_image(resnet50_model, img_path, resnet50.preprocess_input, resnet50.decode_predictions)

## MobileNet v2 - Pretrained Model

In [None]:
from tensorflow.keras.applications import mobilenet_v2
mobilenet_v2_model = mobilenet_v2.MobileNetV2(weights='imagenet')

# print the model summary
tf.keras.utils.plot_model(mobilenet_v2_model, to_file='mobilenet_v2.png', show_shapes=True)

***Questions***

1. Can you identify the Depth-wise Convolution layer from the model summary?

<details><summary>Click here for answer</summary>

1. For example, the layer called 'block_1_depthwise'.

    
</details>

In [None]:
predict_image(mobilenet_v2_model, img_path, mobilenet_v2.preprocess_input, mobilenet_v2.decode_predictions)

### Speed comparison

We compare the inference speed of the three different models. Which one has the fastest inference speed?

In [None]:
img = tf.keras.utils.load_img(img_path, target_size=(224,224))
img_arr = tf.keras.utils.img_to_array(img)
img_arr = np.expand_dims(img_arr, axis=0)

# duplicate the image 128 times so that we can see significant differences between different models
images = tf.repeat(img_arr, repeats=[128], axis=0)

In [None]:
processed_image = vgg16.preprocess_input(images)


In [None]:
%%timeit -n 1 -r 1
vgg16_model.predict(processed_image)

In [None]:
processed_image = resnet50.preprocess_input(images)

In [None]:
%%timeit -n 1 -r 1
resnet50_model.predict(processed_image)

In [None]:
processed_image = mobilenet_v2.preprocess_input(images)

In [None]:
%%timeit -n 1 -r 1
mobilenet_v2_model.predict(processed_image)