# CS492F 전산학특강<인공지능 산업 및 스마트에너지>
## Deep Learning Practice 
#### Prof. Ho-Jin Choi
#### School of Computing, KAIST

---

## 3. Convolutional Neural Network
### 3-4. Case studies

Constructing and training your own convolutional neural network models from scratch can be hard and a long task. A common trick used in deep learning is to use a **pre-trained** model and **fine-tune** it to the specific data it will be used for. 

To use the pre-trained models for our task, we will first look into several well-known CNN models. Many CNN models have been studied since the 1990s. Especially, since 2010, more advanced models have been developed  through a [ImageNet: Large scale visual recognition challenge (ILSVRC)](http://www.image-net.org/challenges/LSVRC/) in the computer vision fields such as image recognition, object detection, etc.

- LeNet 
- AlexNet
- VGG 
- MobileNet
- Inception (GoogLeNet)
- ResNet50 
- Xception
- ... more to come

#### LeNet

![LeNet](images/lenet.png)

- Yann LeCun et al. proposed a neural network architecture for handwritten and machine-printed character recognition in 1990s.
- The first successful applications of CNN.
- This model consists of 3 convolution layers, 2 pooling layers and 1 fully-connected layer.

#### AlexNet

![AlexNet](images/alexnet.png)

- The first work that popularized convolutional neural networks in computer vision.
- This was submitted to the ImageNet ILSVRC challenge in 2012. 
- This network had a very similar architecture to LeNet, but was deeper, bigger, and featured convolutional layers stacked on top of each other.

#### VGG
##### VGG-16
![VGG-16](images/vgg16.jpg)

##### VGG-19
![VGG-19](images/vgg19.jpg)

- The runner-up in ILSVRC 2014 (VGG16)
- Its main contribution was in showing that the depth of the network is a critical component for good performance.

#### Inception(v3) (GoogLeNet)

![GoogLeNet](images/googlenet.png)

- The winner in ILSVRC 2014
- Its main contribution was the development of an `Inception Module` that dramatically reduced the number of parameters in the network.
- There are also several follow-up versions to the GoogLeNet, most recently Inception-v4.

#### ResNet

![ResNet](images/resnet.png)

- The winner in ILSVRC 2015
- It features special skip connections and a heavy use of batch normalization.
- The architecture is also missing fully connected layers at the end of the network. 

### 3-5. Image classification using the pre-trained models

#### VGG16
We can use the pre-trained CNN models mentioned above using the Keras API [tf.keras.applications](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/applications). (More models are available in Keras which can be found here: https://github.com/keras-team/keras-applications)

In [None]:
try:
    %tensorflow_version 2.x
except Exception:
    pass
import tensorflow as tf

import os
import numpy as np
from PIL import Image

We can easily download and load the pre-trained VGG16 model using `tf.keras.applications.VGG16`.

In [None]:
# TODO: Download and load the pre-trained VGG16 model
vgg16 = 
vgg16.summary()

Then, let's download a strawberry image to classify it using the pre-trained model we just have loaded.

In [None]:
!wget --output-document="strawberry.jpg" https://upload.wikimedia.org/wikipedia/commons/c/ce/Bowl_of_Strawberries.jpg
Image.open('strawberry.jpg')

To feed an image to the pre-trained model, we first have to apply preprocesses that the model used.

In [None]:
# TODO: Load and preprocess the downloaded image
image = 
x = 
x = 
x = 

print('Input image shape:', x.shape)

Now, we can feed the input image to the pre-trained model and get prediction results.

In [None]:
# TODO: Predict the image using VGG16
predictions = 
predictions = 

print(f'Top-{len(predictions)} predictions:')
for index, prediction in enumerate(predictions):
    print(f'{index + 1}. {prediction}')

As shonw in the prediction results, the VGG16 model predicted a class of the input as a _'strawberry'_ with highest confidence value (or probability), 0.9982.

Let's try to predict again with another image. 

In [None]:
!wget --output-document="orange.jpg" https://upload.wikimedia.org/wikipedia/commons/c/c4/Orange-Fruit-Pieces.jpg
Image.open('orange.jpg')

In [None]:
# TODO: Load and preprocess the downloaded image
image = 
x = 
x = 
x = 

print('Input image shape:', x.shape)

In [None]:
# TODO: Predict the image using VGG16
predictions = 
predictions = 

print(f'Top-{len(predictions)} predictions:')
for index, prediction in enumerate(predictions):
    print(f'{index + 1}. {prediction}')

#### ResNet50
Similar to VGG16 model, we can use RestNet50 using the Keras API. ResNet50 is so big compared to the VGG16. Let's check it out.

In [None]:
# TODO: Download and load the pre-trained ResNet50 model
resnet50 = 
resnet50.summary()

In [None]:
# TODO: Predict the images using ResNet50


### References 
- [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556) - please cite this paper if you use the VGG models in your work.
- [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) - please cite this paper if you use the ResNet model in your work.
- [Rethinking the Inception Architecture for Computer Vision](http://arxiv.org/abs/1512.00567) - please cite this paper if you use the Inception v3 model in your work.