# Applied Deep Learning Tutorial 
# Transfer Learning for Object Classification 

## Introduction
In this tutorial, you will attempt to benefit from a model that has been pretrained for the same task but on a different dataset. You will deploy the first layers and their feature extraction capabilities of a converged network. This process is known as transfer learning.

<img src="graphics/Katze.jpg" width="700"><br>
<center> Fig. 1: Cat and dog in an image </center>

## Core idea
A pre-trained model is a saved network that was previously trained on a large dataset, typically on a large-scale image-classification task, such as [ImageNet](http://image-net.org/challenges/LSVRC/), and [COCO](http://cocodataset.org/#home). We can either use the pretrained model as it is for inference on the task it has been trained on or we can do transfer learning using the pretrained convents for further training on a new dataset with a possibly new output space. 

The intuition behind transfer learning is that if this model trained on a large and general enough dataset, this model will effectively serve as a generic model of the visual world and the semantic features present in the visual world and shared between all visual tasks. We can leverage these learned feature maps without having to train a large model on a large dataset by using these models as the basis of our own model specific to our task. There are 2 scenarios of transfer learning using a pretrained model:

- Fine Tuning or Retraining: Unfreezing a few of the top layers of a frozen model base used for feature extraction, and jointly training both the newly added classifier layers as well as the last layers of the frozen model. This allows us to "fine tune" the higher order feature representations in addition to our final classifier in order to make them more relevant for the specific task involved.
- Feature Extraction: Use the representations learned by a previous model to extract meaningful features from new samples. We simply add a new output layer, which will be trained from scratch, on top of the pretrained model so that we can repurpose the feature maps learned previously for our dataset and our new output space.

## Imports
Import the necessary libraries and load the [Dogs vs Cats](https://www.kaggle.com/c/dogs-vs-cats) dataset from Kaggle.

In [1]:
from __future__ import absolute_import, division, print_function, unicode_literals

import os

import tensorflow as tf
from tensorflow import keras
#print("TensorFlow version is ", tf.__version__)

import numpy as np
#import cv2

import matplotlib.pyplot as plt
import matplotlib.image as mpimg

# Load Cats vs Dogs dataset
zip_file = tf.keras.utils.get_file(origin="https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip",
                                   fname="cats_and_dogs_filtered.zip", extract=True)

base_dir, _ = os.path.splitext(zip_file)


## Preparing the data
Create directories for training and validation for both classes, such as dog and cat.

In [2]:
train_dir = os.path.join(base_dir, 'train')
validation_dir = os.path.join(base_dir, 'validation')

# Directory with our training cat pictures
train_cats_dir = os.path.join(train_dir, 'cats')
print ('Total training cat images:', len(os.listdir(train_cats_dir)))

# Directory with our training dog pictures
train_dogs_dir = os.path.join(train_dir, 'dogs')
print ('Total training dog images:', len(os.listdir(train_dogs_dir)))

# Directory with our validation cat pictures
validation_cats_dir = os.path.join(validation_dir, 'cats')
print ('Total validation cat images:', len(os.listdir(validation_cats_dir)))

# Directory with our validation dog pictures
validation_dogs_dir = os.path.join(validation_dir, 'dogs')
print ('Total validation dog images:', len(os.listdir(validation_dogs_dir)))

Total training cat images: 1000
Total training dog images: 1000
Total validation cat images: 500
Total validation dog images: 500


Next we will set up a pipeline for data augmentation with Keras

In [3]:
image_size = 200 # All images will be resized to 160x160
batch_size = 32

# Rescale all images by 1./255 and apply image augmentation
train_datagen = keras.preprocessing.image.ImageDataGenerator(rescale=1./255)
validation_datagen = keras.preprocessing.image.ImageDataGenerator(rescale=1./255)

# Flow training images in batches of 20 using train_datagen generator
train_generator = train_datagen.flow_from_directory(
                train_dir,  # Source directory for the training images
                target_size=(image_size, image_size),
                batch_size=batch_size,
                # Since we use binary_crossentropy loss, we need binary labels
                class_mode='binary')

# Flow validation images in batches of 20 using test_datagen generator
validation_generator = validation_datagen.flow_from_directory(
                validation_dir, # Source directory for the validation images
                target_size=(image_size, image_size),
                batch_size=batch_size,
                class_mode='binary')

Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.


## Preparing pretrained model
We will create the base model from the [VGG16](https://arxiv.org/pdf/1409.1556.pdf) model, the first model to introduce skip-connections, and pre-trained on the [ImageNet](http://image-net.org/challenges/LSVRC/) dataset, a large dataset of 1.4M images and 1000 classes of web images. This is a powerful model. Let's see what the features that it has learned can do for our cat vs. dog problem.

You can find more pretrained and ready to load models [here](https://www.tensorflow.org/api_docs/python/tf/keras/applications).

First, we need to pick which intermediate layer of the model we will use for feature extraction. A common practice is to use the output of the very last layer before the flatten operation, the so-called "bottleneck layer". The reasoning here is that the following fully-connected layers will be too specialized to the task the network was trained on, and thus the features learned by these layers won't be very useful for a new task. The bottleneck features, however, retain much generality.


Let's instantiate a VGG16 model pre-loaded with weights trained on ImageNet. By specifying the include_top=False argument, we load a network that doesn't include the classification layers, which is ideal for feature extraction.

In [4]:
IMG_SHAPE = (image_size, image_size, 3)

# Create the base model from the pre-trained model MobileNet V2
feature_extractor = tf.keras.applications.VGG16(input_shape=IMG_SHAPE,
                                                include_top=False,
                                                weights='imagenet')



## Feature Extraction

We will freeze the layers of the VGG16 and utilize the feature extractor capabilities of this part of the network. By adding a classification layer on top of it and training the top-level classifier on our data we repurpose the pretrained model.
Freezing means keeping the respective weights from updating in the weight update phase of the training process.

In [5]:
feature_extractor.trainable = False

# Let's take a look at the base model architecture (notice the amount of non-trainable params)
feature_extractor.summary()


Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 200, 200, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 200, 200, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 200, 200, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 100, 100, 64)      0         
                                                                 
 block2_conv1 (Conv2D)       (None, 100, 100, 128)     73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 100, 100, 128)     147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 50, 50, 128)       0     

Now we are adding a classification layer to the base model. Compile the newly combined model.


In [6]:
model = tf.keras.Sequential([
  feature_extractor,
  keras.layers.GlobalAveragePooling2D(),
  keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.01),
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 vgg16 (Functional)          (None, 6, 6, 512)         14714688  
                                                                 
 global_average_pooling2d (G  (None, 512)              0         
 lobalAveragePooling2D)                                          
                                                                 
 dense (Dense)               (None, 1)                 513       
                                                                 
Total params: 14,715,201
Trainable params: 513
Non-trainable params: 14,714,688
_________________________________________________________________


Now we can already train our classification layer based on the base model.
Notice how few epochs are necessary to reach a decent performance.

In [None]:
# Saving the model
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_training_vgg16")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

epochs = 2 #the pretrained model ckpt_training_vgg16 has been trained for 2 epochs reaching a validation accuracy of: 0.8841
steps_per_epoch = train_generator.n // batch_size
validation_steps = validation_generator.n // batch_size

history = model.fit(train_generator,
                    steps_per_epoch = steps_per_epoch,
                    epochs=epochs,
                    validation_data=validation_generator,
                    validation_steps=validation_steps,
                    callbacks=[checkpoint_callback])

Epoch 1/2
Epoch 2/2

## Next steps to take it from here

- Search a fun dataset for object classification and try fine-tuning and feature extraction. Which approach does work best, why? Which one would you prefer over the other and why?
- Can you think of a reason why someone would train a model from scratch now that you know about Transfer Learning?
- Try deploying another base model. Can you point out differences in the transfer learning process. What are the characteristics you should be aware of when selecting a base model?
- How would you use a base model in a time-series problem? Try deploying a model in that way.