# CS470 Introduction to Artificial Intelligent
## Deep Learning Practice 
#### Prof. Ho-Jin Choi
#### School of Computing, KAIST

---

## 3. Convolutional Neural Network
### 3-6. Feature extraction using pre-trained models

The pre-trained model is a saved network that was previously trained on a large dataset, typically on a large-scale image-classificaton task. For the pre-trained model, we can either **use the pre-trained model as it is**, or **use transfer learning to customize this model** to a given task. 

The intuition behind transfer learning is that if a model trained on a large and general enough dataset, this model will effectively serve as a generic model of the visual world. We can then take advantage of these learned feature maps without having to start from scratch training a large model on a large dataset.

In this notebook, we will try two ways to customize a pre-trained model:
1. **Feature Extraction**: To extract meaningful features from new samples. We will simply add a new classifier, which will be trained from scratch, on top of the pretrained model so that we can repurpose the feature maps learned previously on the large dataset. Please note that we will train the final parts of model, classification part of the pre-trained model, thus, we do not have to (re)train the entire model.
1. **Fine-tuning**: We will jointly train both the newly-added classifier layers and the last layers of the pre-trained model. This allows us to "fine tune" the higher-order feature representations in the pre-trained model in order to make them more relevant for the specific task.

In [None]:
try:
    %tensorflow_version 2.x
except Exception:
    pass
import tensorflow as tf

import os
import numpy as np
import matplotlib.pyplot as plt

#### Load the dataset

We will first download and load the cats and dogs dataset from the [`Tensorflow Datasets`](https://www.tensorflow.org/datasets). This [`tfds`](https://www.tensorflow.org/datasets/api_docs/python/tfds) package is the easiest way to load pre-defined data.

In [None]:
import tensorflow_datasets as tfds
tfds.disable_progress_bar()

The [`tfds.load`](https://www.tensorflow.org/datasets/api_docs/python/tfds/load) method downloads and caches the data, and returns a [`tf.data.Dataset`](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/data/Dataset) objects. These objects provide powerful, efficient methods for manipulating data and piping it into your model.

Let's load `cats_vs_dogs` dataset and use the subsplit feature to divide it into (train, validation, test) with 80%, 10%, and 10% of the data respectively.

In [None]:
# TODO: Load cats vs dogs image dataset


The resulting [`tf.data.Dataset`](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/data/Dataset) objects contain **(image, label) pairs**. Where the images have variable shape and 3 channels, and the label is a scalar.

In [None]:
print(raw_train)
print(raw_validation)
print(raw_test)

Let's display the first two images and labels from the training set:

In [None]:
get_label_name = metadata.features['label'].int2str

for image, label in raw_train.take(2):
    plt.figure()
    plt.imshow(image)
    plt.title(get_label_name(label))

#### Preprocess the dataset
Let's preprocess the images for the task:
- Rescale the input channels to a range of [-1, 1]
- Resize the images to a fixes input size

First, we will define the `preprocess_example()` function.

In [None]:
IMG_SIZE = 160 # All images will be resized to 160x160

def preprocess_example(image, label):
    # TODO: Cast the image vector as tf.float32
    
    # TODO: Rescale the input channels to a range of [-1, 1]
    
    # TODO: Resize the images to a fixes input size
    
    return image, label

Then, we can apply the defined function to the dataset using [`tf.data.Dataset.map()`](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/data/Dataset#map).

In [None]:
# TODO: Apply process_example() to dataset


After that, we are going to randomly shuffle the dataset using [`tf.data.Dataset.shuffle()`](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/data/Dataset#shuffle) and combine consecutive items of the dataset into batches using [`tf.data.Dataset.batch()`](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/data/Dataset#batch).

In [None]:
BATCH_SIZE = 64
SHUFFLE_BUFFER_SIZE = 1000

# TODO: Shuffle the dataset and combine them into batches


Let's inspect a batch of data:

In [None]:
image_batch, label_batch = next(iter(train_batches.take(1)))
print(image_batch.shape)
print(label_batch.shape)

#### Build the model from the pre-trained convolutional neural network
![Convolutional neural network](https://github.com/keai-kaist/CS470/blob/main/Lab2/May%206/images/cnn-architectures.png?raw=true)

We are going to build the model from the MobileNet V2. First, we need to choose which layer of the model we will use for the feature extraction. Obviously, the very last classification layer is not very useful. Thus, we will use all the layers before the flatten operation. The last layer before the flatten operation is called the "*bottleneck layer*" and bottleneck features retain much generality as compared to the final classification layer.

To do this, let's instantiate the MobileNet V2 model pre-loaded with weights trained on ImageNet. By specifying the `include_top=False` argument, we can load a network that doesn't include the classification layers at the top, which is ideal for feature extraction.

In [None]:
IMG_SHAPE = (IMG_SIZE, IMG_SIZE, 3)

# TODO: Load MobileNetV2 without the final classification layer
base_model = 

This feature extractor converts each 160x160x3 image to a 5x5x1280 block of features. Let' see what it does to the example batch of images:

In [None]:
feature_batch = base_model(image_batch)
print(image_batch.shape,  '->', feature_batch.shape)

#### Freeze the convolutional blocks for feature extraction

We will freeze the convolutional blocks of the model and use that as a feature extractor, add a classifier on top of it and train only the top-level classifier.

It's important to freeze the convolutional blocks before we compile and train the model. By freezing them, we can prevent the weights in a given layer from being updated during training.

In [None]:
# TODO: Freeze the base model


In [None]:
# Let's take a look at the base model architecture
base_model.summary()

#### Add the classification layer

To generate predictions from the block of features, average over the spatial 5x5 spatial locations, using a tf.keras.layers.GlobalAveragePooling2D layer to convert the features to a single 1280-element vector per image.

To generate predictions from the block of features, let's add the following layers:
- `tf.keras.layers.GlobalAveragePooling2D`:  Layer to convert the features to a single 1280-element vector per image
    - This layer converts (32, 5, 5, 1280) tensor into (32, 1280) by averaging 5x5 feature maps
- `tf.keras.layers.Dense`:  Layer to classify the input vector as cat or dog

In [None]:
# TODO: Add the classification layer
model = 

#### Compile the model
You must compile the model before training it. Since there are two classes, let's use a binary cross-entropy loss.

In [None]:
learning_rate = 0.0001

# TODO: Compile the model with the following paramters:
# - optimizer: RMSprop
# - loss: binary crossentropy
# - metrics: accuracy
model.compile(
)

In [None]:
model.summary()

The 2.3M parameters in MobileNet are frozen, but there are 1.3K trainable parameters in the Dense layer.

#### Train the model

In [None]:
loss0, accuracy0 = model.evaluate(validation_batches, steps=10)

In [None]:
print("initial loss: {:.2f}".format(loss0))
print("initial accuracy: {:.2f}".format(accuracy0))

The accuracy is same to randomly select the answer. So, we will check an accuracy after training the model.

In [None]:
initial_epochs = 10

# Train the model
history = model.fit(
    train_batches,
    epochs=initial_epochs,
    validation_data=validation_batches,
)

![Feature Extraction Epochs](https://github.com/keai-kaist/CS470/blob/main/Lab2/May%206/images/feature-extraction-epochs.PNG?raw=true)

#### Learning curves

Let's take a look at the learning curves of the training and validation accuracy/loss when using the MobileNet V2 base model as a fixed feature extractor.

In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

loss = history.history['loss']
val_loss = history.history['val_loss']

plt.figure(figsize=(8, 8))
plt.subplot(2, 1, 1)
plt.plot(acc, label='Training Accuracy')
plt.plot(val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.ylabel('Accuracy')
plt.ylim([min(plt.ylim()),1])
plt.title('Training and Validation Accuracy')

plt.subplot(2, 1, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.ylabel('Cross Entropy')
plt.ylim([0,1.0])
plt.title('Training and Validation Loss')
plt.xlabel('epoch')
plt.show()

![Feature Extraction Graph](https://github.com/keai-kaist/CS470/blob/main/Lab2/May%206/images/feature-extraction-graph.PNG?raw=true)

### 3-6. Fine tuning the model

In the previous feature extraction part, we were only training a few layers on top of an MobileNet V2 base model. The weights of the pre-trained network were not updated during training.

One way to increase performance even further is to train (or "fine-tune") the weights of the top layers of the pre-trained model alongside the training of the classifier you added. The training process will force the weights to be tuned from generic features maps to features associated specifically to our dataset.

Also, we should try to fine-tune a small number of top layers rather than the whole MobileNet model. In most convolutional networks, the higher up a layer is, the more specialized it is. The first few layers learn very simple and generic features which generalize to almost all types of images. As we go higher up, the features are increasingly more specific to the dataset on which the model was trained. The goal of fine-tuning is to adapt these specialized features to work with the new dataset, rather than overwrite the generic learning.

![Features by layers](images/features-by-layers.png)

#### Unfreeze the top layers of the model
All we need to do is unfreeze the `base_model` and set the bottom layers be untrainable. (i.e., we will fine-tune a small number of top layers of base mode). Then, you should recompile the model (necessary for these changes to take effect), and resume training.

In [None]:
# TODO: Unfreeze the base model


In [None]:
# Let's take a look to see how many layers are in the base model
print("Number of layers in the base model: ", len(base_model.layers))

# Fine tune from this layer onwards
fine_tune_at = 100

# TODO: Freeze all the layers before the `fine_tune_at` layer


#### Compile the model
Let's compile the model using a much lower training rate.

In [None]:
# TODO: Compile the model with the following paramters:
# - optimizer: RMSprop
# - loss: binary crossentropy
# - metrics: accuracy

model.compile(

)

In [None]:
model.summary()

#### Train the model

In [None]:
fine_tune_epochs = 10
total_epochs =  initial_epochs + fine_tune_epochs

history_fine_tuned = model.fit(
    train_batches,
     epochs=total_epochs,
     initial_epoch=initial_epochs,
     validation_data=validation_batches)

![Fine Tuning Epochs](https://github.com/keai-kaist/CS470/blob/main/Lab2/May%206/images/fine-tuning-epochs.PNG?raw=true)

#### Learning curves

Let's take a look at the learning curves of the training and validation accuracy/loss, when fine tuning the last few layers of the MobileNet V2 base model and training the classifier on top of it. The validation loss is much higher than the training loss, so you may get some overfitting.

In [None]:
acc += history_fine_tuned.history['accuracy']
val_acc += history_fine_tuned.history['val_accuracy']

loss += history_fine_tuned.history['loss']
val_loss += history_fine_tuned.history['val_loss']

In [None]:
plt.figure(figsize=(8, 8))
plt.subplot(2, 1, 1)
plt.plot(acc, label='Training Accuracy')
plt.plot(val_acc, label='Validation Accuracy')
plt.ylim([0.8, 1])
plt.plot([initial_epochs-1,initial_epochs-1],
          plt.ylim(), label='Start Fine Tuning')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(2, 1, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.ylim([0, 1.0])
plt.plot([initial_epochs-1,initial_epochs-1],
         plt.ylim(), label='Start Fine Tuning')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.xlabel('epoch')
plt.show()

![Fine Tuning Graph](https://github.com/keai-kaist/CS470/blob/main/Lab2/May%206/images/fine-tuning-graph.PNG?raw=true)