# Transfer Learning - Lecture Code Along

## Objectives

*   Further solidify intuition behind CNNs
*   Understand the transfer learning workflow
*   Provide an opportunity to practice transfer learning





In [None]:
import matplotlib.pyplot as plt
import scipy
import numpy as np
from PIL import Image
from scipy import ndimage
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Conv2D, MaxPooling2D, Flatten
from keras.optimizers import RMSprop

The idea behind transfer learning is simply to make use of some pre-trained model to make predictions, rather than building a new model from scratch.

The plain fact is that several very powerful image-processing networks have already been built and perfected by scientists who have detailed knowledge about how all the layers of their models work. Moreover, many successful models have been trained on hundreds of thousands if not millions of images, and so they could be used for your images as well.

In general, the target will of course be different from the original that was used in training the model in the first place. But the idea is that the model will be good at picking up on the *deep features* of images, and so we can use *most* of the pre-trained model, in order to extract those deep features, and then just stick on a couple extra layers at the end that are appropriate for the data we have.

In what follows here we'll try building a network from scratch on some chest X-ray data. And then we'll see if we can get better accuracy by using [Imagenet](https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/), a leading CNN for image recognition.

## CNN from Scratch

Let's look at some X-rays of lungs!

In [None]:
# Instantiating pre-established train, test, val splits as objects

train_f = './chest_xray/train/'
test_f = './chest_xray/test/'
val_f = './chest_xray/val'

Keras's ImageDataGenerator can convert images (we have JPEGs here) to tensors of visual information!

In [None]:
# Generating resized data for downstream augmentation

test_generator = ImageDataGenerator(rescale=1./255).flow_from_directory(
        test_f, 
        target_size=(64, 64)) # Since the pipeline processes batches of images that must all have the same size, this must be provided.
                              # Size to resize images to after they are read from disk, specified as (height, width)

val_generator = ImageDataGenerator(rescale=1./255).flow_from_directory(
        val_f, 
        target_size=(64, 64)) 

train_generator = ImageDataGenerator(rescale=1./255).flow_from_directory(
        train_f, 
        target_size=(64, 64))

ImageDataGenerator uses *data augmentation*, which means that it will take each image and transform it in various ways, ultimately using *only these transformations* as training data. [Here's](https://www.pyimagesearch.com/2019/07/08/keras-imagedatagenerator-and-data-augmentation/) a nice resource on keras's `ImageDataGenerator`.

And [here](https://bair.berkeley.edu/blog/2019/06/07/data_aug/) is a page with more information about data augmentation.

In [None]:
# Using an iterator to generate images and their labels
# from train, test, and val image subsets

train_images, train_labels = next(train_generator)
test_images, test_labels = next(test_generator)
val_images, val_labels = next(val_generator)

In [None]:
# Plotting scaled image output

plt.imshow(train_images[0])

### Model Building

![](https://cs231n.github.io/assets/cnn/cnn.jpeg)

Source: Stanford's Convolutional Neural Networks for Visual Recognition Course Notes

Even though CNNs are uniquely suited to the problem of image classification -- their architectures make the explicit assumption that the inputs are images, which allows for certain properties to be encoded into the architecture and for the forward function to be more efficient to implement, thereby vastly reducing the amount of parameters in the network -- ***it is not considered to be in line with best practices to code a CNN from scratch for the purpose of building a custom image classifier***. Instead, it is important to point out here that the typical workflow would be more centered on using the weights from a prettrained model to fine-tine a classifier on images with classes similar to those seen in the original training set (i.e., [ImageNet's 1000 classes](https://deeplearning.cms.waikato.ac.nz/user-guide/class-maps/IMAGENET/)), a process known as transfer learning.

For more guidance on CNN architecture best practices, [this open-source Stanford course material](https://cs231n.github.io/convolutional-networks/) provides some good additional insights that are beyond the scope of what is required for this lesson.

In [None]:
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu',
                        input_shape=(64, 64, 3)))
                                      
model.add(MaxPooling2D((2, 2)))              

model.add(Conv2D(32, (4, 4), activation='relu')) 
model.add(MaxPooling2D((2, 2)))

model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(2, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer="sgd",
              metrics=['acc'])

In [None]:
model.summary()

In [None]:
history_log = model.fit(train_images,
                    train_labels,
                    epochs=10,
                    batch_size=32,
                    validation_data=(test_images, test_labels))

Note the acc and val_acc scores!

## Now with Transfer Learning!

### The typical transfer-learning workflow
This leads us to how a typical transfer learning workflow can be implemented in Keras:



1.   Instantiate a base model and load pre-trained weights into it.
2.   Freeze all layers in the base model by setting `trainable = False`.
3.   Create a new model on top of the output of one (or several) layers from the base model.
4.   Train your new model on your new dataset.

Note that an alternative, more lightweight workflow could also be:

1.   Instantiate a base model and load pre-trained weights into it.
2.   Run your new dataset through it and record the output of one (or several) layers from the base model. This is called *feature extraction*.
3.   Use that output as input data for a new, smaller model.

A key advantage of that second workflow is that you only run the base model once on your data, rather than once per epoch of training. So it's a lot faster & cheaper.

An issue with that second workflow, though, is that it doesn't allow you to dynamically modify the input data of your new model during training, which is required when doing data augmentation, for instance. Transfer learning is typically used for tasks when your new dataset has too little data to train a full-scale model from scratch, and in such scenarios data augmentation is very important. So in what follows, we will focus on the first workflow.

[Source: Keras' Transfer Learning Guide](https://keras.io/guides/transfer_learning/)

In [None]:
from keras.applications import VGG19

This tool comes from the [Visual Geometry Group](http://www.robots.ox.ac.uk/~vgg/research/very_deep/). More info about the pretrained model can be found in [Keras' VGG documentation](https://keras.io/api/applications/vgg/).

In [None]:
cnn_base = VGG19(weights='imagenet',
                  include_top=False, # Excludes top (output) layer(s)
                  input_shape=(64, 64, 3)
                )

cnn_base.trainable = False # Freeze the base model

cnn_base.summary()

#### Specifying Batch Size and Performing Feature Extraction

In [None]:
batch_size = 8

In [None]:
def extract_features(directory, sample_amount):
    features = np.zeros(shape=(sample_amount, 2, 2, 512)) 
    labels = np.zeros(shape=(sample_amount))
    generator = ImageDataGenerator(rescale=1./255).flow_from_directory(
        directory, target_size=(64, 64), 
        batch_size=batch_size, 
        class_mode='binary')
    i=0
    for inputs_batch, labels_batch in generator:
        features_batch = cnn_base.predict(inputs_batch)
        features[i * batch_size: (i + 1) * batch_size] = features_batch 
        labels[i * batch_size: (i + 1) * batch_size] = labels_batch
        i += 1
        if i * batch_size >= sample_amount:
            break
    return features, labels

In [None]:
train_features, train_labels = extract_features(train_f, 5216) 
validation_features, validation_labels = extract_features(val_f, 16) 
test_features, test_labels = extract_features(test_f, 624)

train_features = np.reshape(train_features, (5216, 2048))
validation_features = np.reshape(validation_features, (16, 2048))
test_features = np.reshape(test_features, (624, 2048))

In [None]:
train_features.shape

### Let's Build Our Model!

In [None]:
model = Sequential()
model.add(Dense(256, activation='relu', input_dim=2048))
model.add(Dense(1, activation='sigmoid'))

In [None]:
model.compile(optimizer=RMSprop(),
              loss='binary_crossentropy',
              metrics=['acc'])

history = model.fit(train_features, train_labels,
                    epochs=10,
                    batch_size=10,
                    validation_data=(test_features, test_labels))

### Fine-Tuning

Once your model has converged on the new data, you can try to unfreeze all or part of the base model and retrain the whole model end-to-end with a very low learning rate.

This is an optional last step that can potentially give you incremental improvements. It could also potentially lead to quick overfitting -- keep that in mind.

It is critical to only do this step after the model with frozen layers has been trained to convergence. If you mix randomly-initialized trainable layers with trainable layers that hold pre-trained features, the randomly-initialized layers will cause very large gradient updates during training, which will destroy your pre-trained features.

It's also critical to use a very low learning rate at this stage, because you are training a much larger model than in the first round of training, on a dataset that is typically very small. As a result, you are at risk of overfitting very quickly if you apply large weight updates. Here, you only want to readapt the pretrained weights in an incremental way.

Source: Stanford's Convolutional Neural Networks for Visual Recognition Course Notes

In [None]:
# Unfreeze the base model
cnn_base.trainable = True

# It's important to recompile your model after you make any changes
# to the `trainable` attribute of any inner layer, so that your changes
# are take into account
model.compile(optimizer=RMSprop(),
              loss='binary_crossentropy',
              metrics=['acc'])

# Train end-to-end. Be careful to stop before you overfit!
# Callbacks (i.e., early stopping, etc.) might be considered to regularize...
model.fit(train_features, train_labels, epochs=10, validation_data=(validation_features, validation_labels))

Yay, it doesn't seem that we overfit! 🎉

## Explore
What other networks are available inside keras?!

Check out the Keras docs for some [usage examples](https://keras.io/api/applications/#usage-examples-for-image-classification-models)!

In [None]:
from keras.applications import *

In [None]:
# Exercise: Use transfer learning with another pre-trained CNN on these data.
# See if you can improve on our metrics!

