# Transfer learning / fine-tuning
---

This tutorial will guide you through the process of using transfer learning to learn an accurate image classifier from a relatively small number of training samples. Generally speaking, transfer learning refers to the process of leveraging the knowledge learned in one model for the training of another model.

More specifically, the process involves taking an existing neural network which was previously trained to good performance on a larger dataset, and using it as the basis for a new model which leverages that previous network's accuracy for a new task. This method has become popular in recent years to improve the performance of a neural net trained on a small dataset; the intuition is that the new dataset may be too small to train to good performance by itself, but we know that most neural nets trained to learn image features often learn similar features anyway, especially at early layers where they are more generic (edge detectors, blobs, and so on).

Transfer learning has been largely enabled by the open-sourcing of state-of-the-art models; for the top performing models in image classification tasks (like from ILSVRC), it is common practice now to not only publish the architecture, but to release the trained weights of the model as well. This lets amateurs use these top image classifiers to boost the performance of their own task-specific models.

## Feature extraction vs. fine-tuning
At one extreme, transfer learning can involve taking the pre-trained network and freezing the weights, and using one of its hidden layers (usually the last one) as a feature extractor, using those features as the input to a smaller neural net.

At the other extreme, we start with the pre-trained network, but we allow some of the weights (usually the last layer or last few layers) to be modified. Another name for this procedure is called "fine-tuning" because we are slightly adjusting the pre-trained net's weights to the new task. We usually train such a network with a lower learning rate, since we expect the features are already relatively good and do not need to be changed too much.

Sometimes, we do something in-between: Freeze just the early/generic layers, but fine-tune the later layers. Which strategy is best depends on the size of your dataset, the number of classes, and how much it resembles the dataset the previous model was trained on (and thus, whether it can benefit from the same learned feature extractors). A more detailed discussion of how to strategize can be found in [[1]](https://cs231n.github.io/transfer-learning/) and [[2]](https://sebastianruder.com/transfer-learning/).

Procedure
In this guide will go through the process of loading a state-of-the-art, 1000-class image classifier, VGG16 which won the ImageNet challenge in 2014, and using it as a fixed feature extractor to train a smaller custom classifier on our own images, although with very few code changes, you can try fine-tuning as well.

We will first load VGG16 and remove its final layer, the 1000-class softmax classification layer specific to ImageNet, and replace it with a new classification layer for the classes we are training over. We will then freeze all the weights in the network except the new ones connecting to the new classification layer, and then train the new classification layer over our new dataset.

We will also compare this method to training a small neural network from scratch on the new dataset, and as we shall see, it will dramatically improve our accuracy. We will do that part first.

As our test subject, we'll use a dataset consisting of around 6000 images belonging to 97 classes, and train an image classifier with around 80% accuracy on it. It's worth noting that this strategy scales well to image sets where you may have even just a couple hundred or less images. Its performance will be lesser from a small number of samples (depending on classes) as usual, but still impressive considering the usual constraints.

In [None]:
import os
import sys

#if using Theano with GPU
#os.environ["KERAS_BACKEND"] = "tensorflow"

import random
import numpy as np
import keras

import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow

from PIL import Image
from keras.preprocessing import image
from keras.applications.imagenet_utils import preprocess_input
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Activation
from keras.layers import Conv2D, MaxPooling2D
from keras.models import Model

Execute the code below if you want to download the Cats_And_Dogs dataset:

In [None]:
!echo "Downloading Cats_and_Dogs for image notebooks"
!curl -L -o cats_and_dogs.zip --progress-bar https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip
!unzip cats_and_dogs.zip
!rm cats_and_dogs.zip
!ls

As I ran some tests, I discovered that there are two images that are corupted:
*   PetImages/Dog/11702.jpg
*   PetImages/Cat/666.jpg

Let's remove those images from our dataset:

In [None]:
files_to_remove = ['PetImages/Dog/11702.jpg', 'PetImages/Cat/666.jpg']
try:
    for file in files_to_remove:
        os.remove(file)
except FileNotFoundError as e:
    print('The files were already removed: ', e)
except Exception as e:
    print('There was an unexpected error: ', e)

Creating a list with all categories, according to the directories inside the Root path:

In [None]:
root = 'PetImages'

categories = [x[0] for x in os.walk(root) if x[0]][1:]
categories = [c for c in categories]

print(categories)

Creating a helper function, useful for pre-processing:

In [None]:
# helper function to load image and return it and input vector
def get_image(path):
    img = image.load_img(path, target_size=(224, 224))
    x = image.img_to_array(img).astype(np.uint8)
    #x = np.expand_dims(x, axis=0)
    #x = preprocess_input(x)
    return x

Inserting all images inside a list:

In [None]:
data = []
for c, category in enumerate(categories):
    try:
        images = [os.path.join(dp, f) for dp, dn, filenames 
                in os.walk(category) for f in filenames 
                if os.path.splitext(f)[1].lower() in ['.jpg','.png','.jpeg']]
        for img_path in images:
            x = get_image(img_path)
            data.append({'x':np.array(x), 'y':c})
    except Exception as e:
        print(img_path)


print(f'Variable size is: {sys.getsizeof(data)} bytes')
# count the number of classes
num_classes = len(categories)

Shuffling data, but keeping track of a reproducible seed:

In [None]:
random.Random(42).shuffle(data)

Once shuffled, lets head to the **split** step.
In this step we will consider the following:
*   Training: 70% of the samples;
*   Validation: 15% of the samples;
*   Test: 15% of the samples.

In [None]:
tr_split = 0.7
val_split = 0.15
test_split = 0.15

idx_val = int(tr_split * len(data))
idx_test = int((tr_split + val_split) * len(data))
train = data[:idx_val]
val = data[idx_val:idx_test]
test = data[idx_test:]

del data

Now let's perform the split into X (features) and y (labels) sets (*similar to **sklearn.train_test_split()***):

In [None]:
X_train, y_train = np.array([t["x"] for t in train]), [t["y"] for t in train]
X_val, y_val = np.array([t["x"] for t in val]), [t["y"] for t in val]
X_test, y_test = np.array([t["x"] for t in test]), [t["y"] for t in test]

Normalizing the colors between 0 and 1:

In [None]:
# normalize data
X_train = X_train.astype('float32') / 255
X_val = X_val.astype('float32') / 255
X_test = X_test.astype('float32') / 255

# convert labels to one-hot vectors
y_train = keras.utils.to_categorical(y_train, num_classes)
y_val = keras.utils.to_categorical(y_val, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)