# CS470 Introduction to Artificial Intelligent
## Deep Learning Practice
#### Prof. Ho-Jin Choi
#### School of Computing, KAIST


## 3. Convolutional Neural Network
#### Contents
- Why transfer learning?
- Case studies
- Feature extraction using the pre-trained models
- Fine-tuning

---

### 3-4. Why transfer learning?

Constructing and training your own convolutional neural network models from scratch can be hard and a long task. A common trick used in deep learning is to use a **pre-trained** model and **fine-tune** it to the specific data it will be used for.

Today, we are going to discuss about **transfer learning**.  

Basic idea of the **transfer learning** is that we want to transfer weights from the models trained on the other dataset.  
As we learn at the first CNN part, CNN model is typically divided to feature extractor and classification part. Classification layer is task-specific and depends on output. So, we can't transfer weights from classification layer. However, the feature extractor trained on large scale dataset can extract general features from given input, and it can be reused to extract features from another dataset.  
Therefore, we call feature extractor as general layers, and the weights of the feature extractor can be transferred to another model.  

![FineTune](https://github.com/keai-kaist/CS470/blob/main/Lab2/May%206/images/fine-tune.png?raw=true)

Also, we can divide transfer learning to **feature extraction** and **fine-tuning**.  

In the case of **feature extraction**, we freeze the general layers, and only train fully-connected layers on the top of the model for our task.  
However, **fine-tuning** unfreezes every layers and train all of them. Typically, we use very small learning rate to prevent existing parameters from overwritten. 

If we have small dataset, there is a high possibility of overfitting. In this case, using transfer learning can prevent overfitting because knowledge can be trasferred without the need to train it from scratch.

### 3-5. Case studies

To use the pre-trained models for our task, we will first look into several well-known CNN models. Many CNN models have been studied since the 1990s. Especially, since 2010, more advanced models have been developed  through a [ImageNet: Large scale visual recognition challenge (ILSVRC)](http://www.image-net.org/challenges/LSVRC/) in the computer vision fields such as image recognition, object detection, etc.

- LeNet 
- AlexNet
- VGG 
- MobileNet
- Inception (GoogLeNet)
- ResNet50 
- Xception
- ... more to come

![ImageNetWinners](https://github.com/keai-kaist/CS470/blob/main/Lab2/May%206/images/imagenet-winners.png?raw=true)

#### LeNet (Y. LeCun et al., 1989)

![LeNet](https://github.com/keai-kaist/CS470/blob/main/Lab2/May%206/images/lenet.png?raw=true)
Paper: http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf

- Yann LeCun et al. proposed a neural network architecture for handwritten and machine-printed character recognition in 1990s.
- The first successful applications of CNN.
- This model consists of 3 convolution layers, 2 pooling layers and 1 fully-connected layer.

#### AlexNet (Krizhevsky, A. et al., 2012)

![AlexNet](https://github.com/keai-kaist/CS470/blob/main/Lab2/May%206/images/alexnet.png?raw=true)
Paper: https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf

- The first work that popularized convolutional neural networks in computer vision.
- This was submitted to the ImageNet ILSVRC challenge in 2012. 
- This network had a very similar architecture to LeNet, but was deeper, bigger, and featured convolutional layers stacked on top of each other.
- This model utilizes GPU, and trained using two GPUs for a week.

#### VGG (Simonyan, K. et al., 2014)
##### VGG-16, VGG-19
![VGG](https://github.com/keai-kaist/CS470/blob/main/Lab2/May%206/images/vgg.png?raw=true)

Paper: https://arxiv.org/pdf/1409.1556.pdf

- The runner-up in ILSVRC 2014 (VGG16)
- Its main contribution was in showing that the depth of the network is a critical component for good performance.
- If model has deeper layers, it is possible to have more non-linearities, which results in better performance.

#### Inception(v3) (GoogLeNet) (Szegedy, C. et al., 2014)

![GoogLeNet](https://github.com/keai-kaist/CS470/blob/main/Lab2/May%206/images/googlenet.png?raw=true)

![InceptionModule](https://github.com/keai-kaist/CS470/blob/main/Lab2/May%206/images/inception-module.png?raw=true)
Paper: https://arxiv.org/abs/1409.4842

- The winner in ILSVRC 2014
- Its main contribution was the development of an `Inception Module` that dramatically reduced the number of parameters in the network.
- There are also several follow-up versions to the GoogLeNet, most recently Inception-v4.

#### ResNet (He, K. et al., 2015)

![ResNet](https://github.com/keai-kaist/CS470/blob/main/Lab2/May%206/images/resnet.png?raw=true)

![ResidualConnection](https://github.com/keai-kaist/CS470/blob/main/Lab2/May%206/images/residual-connection.png?raw=true)
Paper: https://arxiv.org/pdf/1512.03385.pdf

- The winner in ILSVRC 2015
- It features special skip connections and a heavy use of batch normalization.
- The skip connection enables each block to additionally learn small information, that reduce the amount of information each layer needs to learn.
- The architecture is also missing fully connected layers at the end of the network. 

### 3-6. Image classification using the pre-trained models

#### VGG16
We can use the pre-trained CNN models mentioned above using the Keras API [tf.keras.applications](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/applications). (More models are available in Keras which can be found here: https://github.com/keras-team/keras-applications)

In [1]:
try:
    %tensorflow_version 2.x
except Exception:
    pass
import tensorflow as tf

import os
import numpy as np
from PIL import Image

ModuleNotFoundError: No module named 'tensorflow'

We can easily download and load the pre-trained VGG16 model using `tf.keras.applications.VGG16`.

In [None]:
# TODO: Download and load the pre-trained VGG16 model
vgg16 = 
vgg16.summary()

Then, let's download a strawberry image to classify it using the pre-trained model we just have loaded.

In [None]:
!wget --output-document="strawberry.jpg" https://upload.wikimedia.org/wikipedia/commons/c/ce/Bowl_of_Strawberries.jpg
Image.open('strawberry.jpg')

To feed an image to the pre-trained model, we first have to apply preprocesses that the model used.

In [None]:
# TODO: Load and preprocess the downloaded image
image = 
x = 
x = 
x = 

print('Input image shape:', x.shape)

Now, we can feed the input image to the pre-trained model and get prediction results.

In [None]:
# TODO: Predict the image using VGG16
predictions = 
predictions = 

print(f'Top-{len(predictions)} predictions:')
for index, prediction in enumerate(predictions):
    print(f'{index + 1}. {prediction}')

As shonw in the prediction results, the VGG16 model predicted a class of the input as a _'strawberry'_ with highest confidence value (or probability), 0.9982.

Let's try to predict again with another image. 

In [None]:
!wget --output-document="orange.jpg" https://upload.wikimedia.org/wikipedia/commons/c/c4/Orange-Fruit-Pieces.jpg
Image.open('orange.jpg')

In [None]:
# TODO: Load and preprocess the downloaded image
image = 
x = 
x = 
x = 

print('Input image shape:', x.shape)

In [None]:
# TODO: Predict the image using VGG16
predictions = 
predictions = 

print(f'Top-{len(predictions)} predictions:')
for index, prediction in enumerate(predictions):
    print(f'{index + 1}. {prediction}')

#### ResNet50
Similar to VGG16 model, we can use RestNet50 using the Keras API. ResNet50 is so big compared to the VGG16. Let's check it out.

In [None]:
# TODO: Download and load the pre-trained ResNet50 model
resnet50 = 
resnet50.summary()

In [None]:
# TODO: Predict the images using ResNet50


### References 
- [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556) - please cite this paper if you use the VGG models in your work.
- [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) - please cite this paper if you use the ResNet model in your work.
- [Rethinking the Inception Architecture for Computer Vision](http://arxiv.org/abs/1512.00567) - please cite this paper if you use the Inception v3 model in your work.