# CNN Transfer Learning Analysis

The following notebook will provide a basis for understanding the benefits of transfer learning for computer vision applications. The most popular image classification algorithms involve the use of convolutional neural networks (CNNs), which have been researched extensively. Using transfer learning allows us to leverage this extensive research by "standing on the shoulders of giants".

This notebook is broken up into several sections, including:

1. Get data
2. Build neural networks
3. Discover suitable solver and learning rate
4. Train neural network
    1. Random weights
    2. Fine tune ImageNet layers
    3. Freeze ImageNet layers
5. Analyze results

## 1. Get data
Import necessary packages and prepare training, validation, and testing data.

In [1]:
import numpy as np
from keras.utils import np_utils
from keras.preprocessing import image
from keras.applications.inception_v3 import InceptionV3
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from sklearn.datasets import load_files       
from glob import glob
from tqdm import tqdm
from PIL import ImageFile                            
ImageFile.LOAD_TRUNCATED_IMAGES = True                 

# define function to load train, test, and validation datasets
def load_dataset(path):
    data = load_files(path)
    dog_files = np.array(data['filenames'])
    dog_targets = np_utils.to_categorical(np.array(data['target']), 133)
    return dog_files, dog_targets

# load train, test, and validation datasets
train_files, train_targets = load_dataset('dogImages/train')
valid_files, valid_targets = load_dataset('dogImages/valid')
test_files, test_targets = load_dataset('dogImages/test')

# load list of dog names
dog_names = [item[20:-1] for item in sorted(glob("dogImages/train/*/"))]
num_classes = len(dog_names)

# print statistics about the dataset
print('There are %d total dog categories.' % len(dog_names))
print('There are %s total dog images.\n' % len(np.hstack([train_files, valid_files, test_files])))
print('There are %d training dog images.' % len(train_files))
print('There are %d validation dog images.' % len(valid_files))
print('There are %d test dog images.'% len(test_files))

Using TensorFlow backend.


There are 133 total dog categories.
There are 8351 total dog images.

There are 6680 training dog images.
There are 835 validation dog images.
There are 836 test dog images.


We need to prepare the data so that it's compatible with Keras and TensorFlow.

In [2]:
def path_to_tensor(img_path):
    # loads RGB image as PIL.Image.Image type
    img = image.load_img(img_path, target_size=(224, 224))
    # convert PIL.Image.Image type to 3D tensor with shape (224, 224, 3)
    x = image.img_to_array(img)
    # convert 3D tensor to 4D tensor with shape (1, 224, 224, 3) and return 4D tensor
    return np.expand_dims(x, axis=0)

def paths_to_tensor(img_paths):
    list_of_tensors = [path_to_tensor(img_path) for img_path in tqdm(img_paths)]
    return np.vstack(list_of_tensors)

# pre-process the data for Keras
train_tensors = paths_to_tensor(train_files).astype('float32')/255
valid_tensors = paths_to_tensor(valid_files).astype('float32')/255
test_tensors = paths_to_tensor(test_files).astype('float32')/255

100%|██████████| 6680/6680 [00:50<00:00, 131.45it/s]
100%|██████████| 835/835 [00:05<00:00, 146.20it/s]
100%|██████████| 836/836 [00:05<00:00, 162.05it/s]


## 2. Build neural networks
We'll be evaluating three models using the popular CNN architecture created by Google called Inception V3. The first model will contain layers with randomized weights. The second will contain layers with ImageNet pre-weights that can be fine-tuned. The third will contain layers with ImageNet pre-weights that will be frozen.

First, let's build the neural network containing randomized weights.

In [3]:
def create_scratch_model():
    base_scratch_model = InceptionV3(include_top=False, weights=None, input_shape=train_tensors.shape[1:])

    # Extend the base model
    x = base_scratch_model.output
    x = GlobalAveragePooling2D()(x)
    x = Dense(512, activation='relu')(x)
    scratch_predictions = Dense(num_classes, activation='softmax')(x)

    scratch_model = Model(inputs=base_scratch_model.input, outputs=scratch_predictions)

Let's build the neural network that uses ImageNet pre-weights.

In [11]:
def create_finetune_model():
    base_finetune_model = InceptionV3(include_top=False, weights='imagenet',
                                      input_shape=train_tensors.shape[1:])

    # Extend the base model
    x = base_finetune_model.output
    x = GlobalAveragePooling2D()(x)
    x = Dense(512, activation='relu')(x)
    finetune_predictions = Dense(num_classes, activation='softmax')(x)

    return Model(inputs=base_finetune_model.input, outputs=finetune_predictions)

Finally, we'll build the neural network that uses frozen ImageNet pre-weights.

In [5]:
def create_frozen_model():
    base_frozen_model = InceptionV3(include_top=False, weights='imagenet', input_shape=train_tensors.shape[1:])

    # Extend the base model
    x = base_frozen_model.output
    x = GlobalAveragePooling2D()(x)
    x = Dense(512, activation='relu')(x)
    frozen_predictions = Dense(num_classes, activation='softmax')(x)

    frozen_model = Model(inputs=base_frozen_model.input, outputs=frozen_predictions)

    # Freeze the base model's layers
    for layer in base_frozen_model.layers:
        layer.trainable = False

Now we've built the neural networks we want to analyze later on.

## 3. Discover suitable solver and learning rate
We want to perform a quick analysis to determine what's a good solver and learning rate for this particular application. To be clear, this is by no means an exhaustive analysis and is extremely naive. However, it's better than nothing and provides us at least a basic viewpoint into an optimal model.

In terms of solvers, we'll take a look at SGD, Nesterov, and Adam. In terms of learning rates, we'll take a look at 1e-3, 5e-4, and 1e-4. We'll evaluate these sets on the scratch model. Later on when we train the pre-weight models, we'll use two learning rates because when a model starts out with pre-weights, we want to usually use a lower learning rate.

In [13]:
from keras.optimizers import SGD, Adam
from copy import deepcopy

# Number of epochs to train
epochs = 2

solvers = ['sgd']
learning_rates = [5e-4, 1e-3]
# solvers = ['sgd', 'nesterov', 'adam']
# learning_rates = [1e-3, 5e-4, 1e-4]

def train(model, solver, lr):
    if solver.lower() == 'sgd':
        model.compile(optimizer=SGD(lr=lr, momentum=0.9), loss='categorical_crossentropy',
                      metrics=['accuracy'])
    elif solver.lower() == 'nesterov':
        model.compile(optimizer=SGD(lr=lr, momentum=0.9, nesterov=True), loss='categorical_crossentropy',
                      metrics=['accuracy'])
    elif solver.lower() == 'adam':
        model.compile(optimizer=Adam(lr=lr, momentum=0.9, beta_1=0.9, beta_2=0.999, epsilon=1e-8),
                      loss='categorical_crossentropy', metrics=['accuracy'])

    result = model.fit(train_tensors[:48], train_targets[:48], validation_data=(valid_tensors[:20], valid_targets[:20]),
                       epochs=epochs, batch_size=4, verbose=1)
#     return model.fit(train_tensors, train_targets, validation_data=(valid_tensors, valid_targets),
#                      epochs=epochs, batch_size=64, verbose=1)

    return solver, lr, result

results = []

# Train using sets
for solver in solvers:
    for lr in learning_rates:
        print('Training %s with a learning rate of %f' % (solver, lr))
        model = create_finetune_model()
        results.append(train(model, solver, lr))

Training sgd with a learning rate of 0.000500
Train on 48 samples, validate on 20 samples
Epoch 1/2
Epoch 2/2
Training sgd with a learning rate of 0.001000
Train on 48 samples, validate on 20 samples
Epoch 1/2
Epoch 2/2
[('sgd', 0.0005, <keras.callbacks.History object at 0x7f5d57a4b748>), ('sgd', 0.001, <keras.callbacks.History object at 0x7f5d393ec7f0>)]


We want to now determine which one of these solver and learning rate sets to use, so we'll see which setup produced the lowest validation loss. Again, this is by no means a thorough analysis.

In [32]:
print(results)
# print(results[0])
# print(results[0][2].history)

[('sgd', 0.001, <keras.callbacks.History object at 0x7f21ac857ac8>)]
('sgd', 0.001, <keras.callbacks.History object at 0x7f21ac857ac8>)
{'val_loss': [5.4644955635070804, 5.4795576095581051], 'val_acc': [0.0, 0.0], 'loss': [3.4664315978686013, 2.9534252683321633], 'acc': [0.45833333333333331, 0.52083333333333337]}
