<a href="https://colab.research.google.com/github/nyp-sit/it3103/blob/main/week4/1.using_pretrained_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using Pretrained CNN models

Welcome to this week's programming exercise. We have covered many different Convolutional Neural Network architectures such as VGG, ResNet, Inception and MobileNet. It is time to see them in action. 

At the end of this exercise, you will be able to: 
- load pretrained models of some popular Convolutional Neural Networks and use them to classify images
- identify some of the architecture patterns in the popular Convolutional Neural Network
- compare the inference speed of different models


## Get the sample image

We will use the pretrained model to classify a sample image (a picture of table and chair). Let's go ahead and download the image.

In [None]:
# wget is a linux command available on linux os like Ubuntu
!wget https://nyp-aicourse.s3.ap-southeast-1.amazonaws.com/it3103/resources/chair_table.png

In [None]:
from PIL import Image
from keras.preprocessing import image
import keras
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

%matplotlib inline

In [None]:
# Read Images 
img_path = 'chair_table.png'
img = keras.utils.load_img(img_path)
  
# display Images 
plt.imshow(img) 

## VGG16 - Pretrained Model

In [None]:
from keras.applications import vgg16

vgg16_model = vgg16.VGG16(weights='imagenet')
vgg16_model.summary()

***Questions***

1. What is the expected input image size?
2. What are the last four layers in VGG-16? 

<details><summary>Click here for answer</summary>
    
1. it is expected to have a height of 224 and width of 224
2. the last 4 layers are flatten (which flattens the 2-D array into 1-D array before feeding to FC layer), and 2 Fully-connected (Dense) layers, and the last layer is a soft-max layer to classify 1000-classes. This is quite typical of a image classifier.

</details>

In [None]:
# Utility Function to Load Image, Preprocess input and Targets
def predict_image(model, img_path, preprocess_input_fn, decode_predictions_fn, target_size=(224,224)):
    img = keras.utils.load_img(img_path, target_size=target_size)
    x = keras.utils.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input_fn(x)
    preds = model.predict(x)
    predictions_df = pd.DataFrame(decode_predictions_fn(preds, top=10)[0])
    predictions_df.columns = ["Predicted Class", "Name", "Probability"]
    return predictions_df

In [None]:
#img_path="rocking_chair.png"  ## Uncomment this and put the path to your file here if desired
# Predict Results
predict_image(vgg16_model, img_path, vgg16.preprocess_input, vgg16.decode_predictions)

Notice that we pass in `vgg.preprocess_input` function to preprocess the image before calling `model.predict()`. Different network (e.g. VGG, ResNet, etc) expects the input image to be normalized in different ways, and different models will provide their own preprocess_input() function to perform the normalization.

We also call `np.expand_dims(x, axis=0)` before calling `preprocess_input()` and `predict()`. 

***Question***

1. What does `np.expand_dims(x, axis=0)` do and why do we need it? 
2. Our sample picture consists of both table and chair? What does VGG16 predict? and why do you think it predicts so?
3. Of the top 10 predictions, did you see any prediction about chair? 


<details><summary>Click here for answer</summary>

1. np.expand_dims() increases the number of dimensions and the axis of the new dimension is specified by the axis parameter. In this case, we add in a new axis as axis=0, first axis. This is because the preprocess_input() and predict() function expects the images to be in the shape (samples, height, width, channels), the 1st axis being the batch.

2. It predicts dining table. It probably focus on the object in the middle of the image.

3. Yes, folder chair is one of the top 10 predictions.

</details>

## Resnet50 - Pretrained Model

In [None]:
# It will download the weights that might take a while
# Also, the summary will be quite long, since Resnet50 is a much larger network than VGG16

from keras.applications import ResNet50

resnet50_model = ResNet50(weights='imagenet')

# let's plot the model, instead of using model.summary(), as it is easier to see some of the skip connections
keras.utils.plot_model(resnet50_model, to_file="resnet.png")

***Questions***

1. Can you identify the skip connection block from the model plot()?
2. Look at the last few layers in the ResNet. How are they different from those of VGG-16?

<details><summary>Click here for answer</summary>
    
1. Look for those 'Add' layer (e.g. layer with name add_2). The Add layer adds the skip connection with the previous layer. Notice that the add is done before the Activation function. You can also call plot_model() to get a graphical visualization of the model.

2. ResNet does not use make use of Full-connected layers as classification layers. Instead it replaces the FC layers with GlobalAveragePooling2D. This architecture is very common in more modern architectures.

</details>

In [None]:
# Predict Results
from keras.applications import resnet
predict_image(resnet50_model, img_path, resnet.preprocess_input, resnet.decode_predictions)

## MobileNet v1 - Pretrained Model

In [None]:
from keras.applications import mobilenet
mobilenet_model = mobilenet.MobileNet(weights='imagenet')

# plot the model
keras.utils.plot_model(mobilenet_model, to_file="mobilenet.png")

***Questions***

1. Can you identify the Depth-wise separable Convolution layer from the model summary()?
2. How about the Point-wise convolution? 
3. Look at the last few layers in the MobileNet. How are they different from those of VGG-16?

<details><summary>Click here for answer</summary>

1. For example, the layer called 'conv_dw1'. 

2. For example, the layer called 'conv_pw1'. 

3. MobileNet does not use make use of Full-connected layers as classification layers. Instead it replaces the FC layers with GlobalAveragePooling2D. This architecture is very common in more modern architectures.
    
</details>

In [None]:
predict_image(mobilenet_model, img_path, mobilenet.preprocess_input, mobilenet.decode_predictions)


### Speed comparison 

We compare the inference speed of the three different models. Which one has the fastest inference speed?

In [None]:
%timeit predict_image(vgg16_model, img_path, vgg16.preprocess_input, vgg16.decode_predictions)


In [None]:
%timeit predict_image(resnet50_model, img_path, resnet.preprocess_input, resnet.decode_predictions)


In [None]:
%timeit predict_image(mobilenet_model, img_path, mobilenet.preprocess_input, mobilenet.decode_predictions)

#### Additional Exercises (Optional)

1. Experiment with other networks such as InceptionV3 and compare the accuracy and speed with VGG/ResNet/MobileNet.
2. Identify the architectual patterns used in such networks.