# Transfer learning for cats vs. dogs classification

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import keras
from keras.preprocessing.image import ImageDataGenerator, load_img, img_to_array
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Activation, Dropout, Flatten, Dense
from keras.optimizers import adam
from keras import applications
from keras import backend as K

In this notebook we will use labeled images of cats and dogs to train a classifier able to distinguish between them. This is a task which, like many others in computer vision, we have learned how to master in the few past years, mainly because of *convolutional neural networks*.

There has been a [competition on Kaggle](https://www.kaggle.com/c/dogs-vs-cats) for this, back in 2014. Pierre Sermanet, a student of Yann LeCun, took 1st place, with an impressive 98.9% of accuracy on the test set. He briefly explains how he did it in this [Google+ post](https://plus.google.com/+PierreSermanet/posts/GxZHEH9ynoj):

> I just won the Dogs vs. Cats Kaggle competition, using the deep learning library I wrote during my PhD: OverFeat http://cilvr.nyu.edu/doku.php?id=code:start
>
> My system was pre-trained on ImageNet (ILSVRC12 classification dataset) and subsequently refined on the cats and dogs data. I limited my number of submissions to 5 to avoid test set tuning and obtained 1st place against 215 other teams with 98.9% accuracy.

But what exactly does it mean to *pre-train* a model? Well, let's figure out!

## Loading the data

We are going to use the data from the Kaggle contest. The dataset is quite big: there are 25000 training samples, and 12500 test samples, all of different sizes. We are thus going to work with only a few of them: we are taking 3000 samples from the training set, and using 2000 of them for training and 1000 for validation.

**BEFORE PROCEEDING**: please download the smaller dataset [here](https://filesender.renater.fr/?s=download&token=0b4ecc42-58ed-f056-ceed-468638a796a4) and extract inside the `lec4/mldata` folder.

Let's load the data - Keras has some nice routines for that purpose. We start by loading a single image, just to see how it looks like (try changing the filename to see some others). Note we are doing color images, and thus working with 3-dimensional arrays.

In [None]:
# Load image and transform it to a Numpy array
img = load_img("mldata/catsvsdogs/train/cats/cat.2.jpg")
x = img_to_array(img)
print("array size:", x.shape)

# Show image
plt.imshow(x / 255.)

Next we specify some training parameters, and create a *generator* for the data, by using a quite convenient tool from Keras known as `ImageDataGenerator`. Essentially it iterates through the images in the directory, and preprocess them for us.

Below, we are rescaling each of the images, so that each pixel intensity goes from 0 to 1 instead of 0 to 255; and also adding some noise to the image so that the classifier becomes more robust, a technique known as *data augmentation*.

In [None]:
# Parameters
batch_size = 16
epochs = 20

n_train_samples = 2000
n_valid_samples = 1000
img_width, img_height = 150, 150

# Set up generator for training and validation images
train_datagen = ImageDataGenerator(
    rescale=1./255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)
train_generator = train_datagen.flow_from_directory("mldata/catsvsdogs/train",
                                                    target_size=(img_height, img_width),
                                                    batch_size=batch_size,
                                                    class_mode="binary")

test_datagen = ImageDataGenerator(rescale=1./255)
valid_generator = test_datagen.flow_from_directory("mldata/catsvsdogs/validation",
                                                   target_size=(img_height, img_width),
                                                   batch_size=batch_size,
                                                   class_mode="binary")

## Training a CNN from scratch

Finally we will write down our neural network using Keras and train it. Below we use a neural network with 3 convolutional layers, which are supposed to learn relevant features, and 2 more dense layers, that will use the learned features to classify the image as a cat or a dog.

In [None]:
# Specify network architecture: 3 conv. layers w/ ReLU activations + 2 dense layers
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(img_width, img_height, 3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, (3, 3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(64))
model.add(Activation("relu"))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation("sigmoid"))

model.compile(loss="binary_crossentropy",
              optimizer="rmsprop",
              metrics=["accuracy"])
model.summary()

Let's fit our model now.

In [None]:
# Fit our model
#model.fit_generator(
    #train_generator,
    #steps_per_epoch=n_train_samples // batch_size,
    #epochs=epochs,
    #validation_data=valid_generator,
    #validation_steps=n_valid_samples // batch_size)

#model.save_weights("mldata/cnn_catsvsdogs.h5")

In [None]:
# (or if you don't want to wait you can't just load the weights below)
model.load_weights("mldata/cnn_catsvsdogs.h5")

Whew, that took a long time (unless you have a GPU!). If only we could use weights that we already trained on other datasets...

Let's see what's the accuracy we get on the test set:

In [None]:
test_loss, test_acc = model.evaluate_generator(valid_generator, steps=n_valid_samples//batch_size)
print("accuracy on test set:", test_acc)

Not bad -- with such accuracy we would be among the top half in the Kaggle contest (see ref. 1).

## Using pre-trained model

Training our CNN took a very long time. What if we replace it by another one, which was trained in a *similar dataset*? After all, the important features should be more or less the same, right?

People have been using lots of computational resources, training very deep networks over huge datasets. Luckily they have made their weights available, so we can use them! Below we gonna use a 16-layer convolutional neural network that was trained over the Imagenet dataset, known as [VGG-16](http://www.robots.ox.ac.uk/%7Evgg/research/very_deep/).

![VGG16](vgg16.png)

Let's do it in steps: first we create generators as before (now without any data augmentation) ...

In [None]:
# Parameters
batch_size = 16
epochs = 100

n_train_samples = 160
n_valid_samples = 80

# Set up generator for training and validation images
train_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory("mldata/catsvsdogs_small/train",
                                                    target_size=(img_width, img_height),
                                                    batch_size=batch_size,
                                                    class_mode=None,
                                                    shuffle=False)

test_datagen = ImageDataGenerator(rescale=1./255)
valid_generator = test_datagen.flow_from_directory("mldata/catsvsdogs_small/validation",
                                                   target_size=(img_width, img_height),
                                                   batch_size=batch_size,
                                                   class_mode=None,
                                                   shuffle=False)

... then we load the VGG16 network, and run our samples through it (this might take a while...).

In [None]:
# Load VGG16 weights and generate features for each image
vgg16 = applications.VGG16(include_top=False, weights="imagenet")
train_features = vgg16.predict_generator(train_generator, n_train_samples // batch_size)
valid_features = vgg16.predict_generator(valid_generator, n_valid_samples // batch_size)

# Create array containing labels
train_labels = np.hstack((np.zeros(n_train_samples // 2), np.ones(n_train_samples // 2)))
valid_labels = np.hstack((np.zeros(n_valid_samples // 2), np.ones(n_valid_samples // 2)))

Let's try to understand what the VGG16 network is doing. First let's see what the output looks like.

In [None]:
print(train_features[0, :].shape)

fig, axs = plt.subplots(8, 8, figsize=(7, 7))
for i in range(64):
    axs[i // 8, i % 8].imshow(train_features[0, :, :, i])
    axs[i // 8, i % 8].get_xaxis().set_visible(False)
    axs[i // 8, i % 8].get_yaxis().set_visible(False)
fig.subplots_adjust(hspace=0.1)

It consists of 512 images of size 4x4, but looking at them doesn't reveal us much... Let us look at the output of the layers one by one.

In [None]:
# Create function that reads image and returns each layer output
input_img = vgg16.input
outputs = [layer.output for layer in vgg16.layers]
functors = [K.function([input_img], [out]) for out in outputs]

# Input first image in the training set to this function
train_generator.reset()
batch = train_generator.next()

img = batch[0][np.newaxis, :]
layer_outputs = [func([img]) for func in functors]

for i in range(len(layer_outputs)):
    print("layer %d shape: %s" % (i, layer_outputs[i][0].shape))

Change the value of `layer` below, from 1 to 18.

In [None]:
layer = 1

fig, axs = plt.subplots(8, 8, figsize=(14, 14))
for i in range(64):
    axs[i // 8, i % 8].imshow(layer_outputs[layer][0][0, :, :, i], cmap="gray")
    axs[i // 8, i % 8].get_xaxis().set_visible(False)
    axs[i // 8, i % 8].get_yaxis().set_visible(False)
fig.subplots_adjust(hspace=0.1)

We can understand each layer of a CNN as performing multiple image processing tasks in parallel - detecting edges, sharpening, blurring - [by doing convolutions with the learned filters](https://en.wikipedia.org/wiki/Kernel_(image_processing). These tasks are performed not on the original image, but on the output of the previous layer; that's why the output of the rightmost layers become very hard to interpret.

Which task is performed depends on which filter is used. Here we are using the filters embedded on VGG16, but in principle we could learn them, if we were to train a CNN from scratch.

An interesting exercise, which provides a way of interpreting the output of the rightmost layers, is to find the image [which maximizes the activation of a certain filter](https://blog.keras.io/how-convolutional-neural-networks-see-the-world.html).

Finally, we take the output of the VGG16 network and plug it into the 2-layer classifier we had before.

In [None]:
# Specify our architecture (quite simple now!)
model = Sequential()
model.add(Flatten(input_shape=train_features.shape[1:]))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1, activation='sigmoid'))

# Compile model and print summary
model.compile(loss="binary_crossentropy",
              optimizer=adam(1e-4),
              metrics=["accuracy"])
model.summary()

Let's fit it!

In [None]:
# Fit model
model.fit(train_features, train_labels,
          epochs=epochs,
          batch_size=batch_size,
          validation_data=(valid_features, valid_labels))

In [None]:
test_loss, test_acc = model.evaluate(valid_features, valid_labels)
print("accuracy on test set:", test_acc)

That looks good no? And definitely much faster :-)

In [None]:
# Load 8 images at random, pass them through VGG16 and then through classifier
datagen = ImageDataGenerator(rescale=1./255)
generator = datagen.flow_from_directory("mldata/catsvsdogs/validation",
                         target_size=(img_width, img_height),
                         batch_size=8,
                         class_mode="binary")
batch = generator.next()
features = vgg16.predict(batch[0])
probs = model.predict_proba(features)

# Show images together with probabilities
fig, axs = plt.subplots(2, 4, figsize=(16, 8))
for i in range(8):
    axs[i // 4, i % 4].imshow(batch[0][i])
    axs[i // 4, i % 4].set_title("prob. dog: %.2f" % (probs[i]))
    
    axs[i // 4, i % 4].get_xaxis().set_visible(False)
    axs[i // 4, i % 4].get_yaxis().set_visible(False)

## References

1. https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
2. https://gist.github.com/fchollet/0830affa1f7f19fd47b06d4cf89ed44d
3. https://gist.github.com/fchollet/f35fbc80e066a49d65f1688a7e99f069
4. https://github.com/abursuc/dldiy-practicals/blob/master/10_05_lesson1.ipynb
5. https://github.com/fastai/courses/blob/master/deeplearning1/nbs/dogs_cats_redux.ipynb
6. http://www.cs.toronto.edu/~frossard/post/vgg16/
7. https://adeshpande3.github.io/adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/