# Transfer Learning

We'll review the idea of transfer learning and data augmentation.

The dataset we'll use is the "cats versus dogs" dataset from Kaggle.  It's a collection of cat and dog photos obtained from Flickr (yes, that's what happens when you blindly accept the terms of service-- your photos become public domain).

This notebook is taken from Francois Chollet's excellent [blog post](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html). Francois points out that when this dataset was first posted, the notes mentioned that a classifer that scored 60% would be very impressive and require "a major advance in the state of the art". That was in 2013. 

In [1]:
import numpy as np

In [2]:
from keras import backend as K
K.set_image_dim_ordering('tf')

Using TensorFlow backend.


In [3]:
from keras import applications
from keras.models import Model

# Data Augmentation

What happens if you don't have "enough" data for deep learning?  Here we have about 25,000 images to both train and validate our models. That's probably way too few examples for a convolutional network to generalize the statistics and fit good convolutional filters. However, this presents an opportunity. 

Ideally, we want a classifier that can recognize cat versus dog regardless of where the cat is in the picture or how the cat is rotated or scaled. It'd be less useful to have a classifier that only works well when the cat is centered and in perfect focus. (Will a cat stand still for that long?)

So what if we take our original photos and rotate, translate, flip, zoom, and blur them at random? Keras has a tool called ImageDataGenerator which does just that. It uses openCV to rotate, shift, shear, and zoom photos randomly as they are fed into the model for training. So every time the model sees the picture it looks slightly different. Hence one photo can become 100.


In [4]:
from keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
        rotation_range=40,
        width_shift_range=0.2,
        height_shift_range=0.2,
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True,
        fill_mode='nearest')


# Image Data Generator

Here's so code that just shows the image generator in action.  We'll take a cat image (cat.0.jpg), run it through our data generator, and save 20 "versions" of the image-- each one is slightly different. In practice, we don't actually save these pictures, but this is just a way for us to view what is happening under the hood.

In [5]:
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img

datagen = ImageDataGenerator(
        rotation_range=40,
        width_shift_range=0.2,
        height_shift_range=0.2,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True,
        fill_mode='nearest')

img = load_img('data/train/cats/cat.0.jpg')  # this is a PIL image
x = img_to_array(img)  # this is a Numpy array with shape (3, 150, 150)
x = x.reshape((1,) + x.shape)  # this is a Numpy array with shape (1, 3, 150, 150)

# the .flow() command below generates batches of randomly transformed images
# and saves the results to the `preview/` directory
i = 0
for batch in datagen.flow(x, batch_size=1,
                          save_to_dir='preview', save_prefix='cat', save_format='jpeg'):
    i += 1
    if i > 20:
        break  # otherwise the generator would loop indefinitely



# Simple Convolutional Neural Network (CNN)

Here's a very simple CNN with 3 convolutional layers and 3 max pooling layers. Very, very simple. And, yet we are already scoring better accuracy than the experts from 2014.

In [6]:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense

model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(150, 150, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

# the model so far outputs 3D feature maps (height, width, features)

#  Now we flatten out the model and run it through one dense layer and one sigmoid function layer.

So our output is a single number between 0 and 1 (probability of cat or dog).

In [7]:
model.add(Flatten())  # this converts our 3D feature maps to 1D feature vectors
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

In [8]:
batch_size = 8

# this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

# this is the augmentation configuration we will use for testing:
# only rescaling
test_datagen = ImageDataGenerator(rescale=1./255)

# this is a generator that will read pictures found in
# subfolers of 'data/train', and indefinitely generate
# batches of augmented image data
train_generator = train_datagen.flow_from_directory(
        'data/train',  # this is the target directory
        target_size=(150, 150),  # all images will be resized to 150x150
        batch_size=batch_size,
        class_mode='binary')  # since we use binary_crossentropy loss, we need binary labels

# this is a similar generator, for validation data
validation_generator = test_datagen.flow_from_directory(
        'data/validation',
        target_size=(150, 150),
        batch_size=batch_size,
        class_mode='binary')

Found 23426 images belonging to 2 classes.
Found 1574 images belonging to 2 classes.


In [9]:
model.fit_generator(
        train_generator,
        steps_per_epoch=2000 // batch_size,
        epochs=50,
        validation_data=validation_generator,
        validation_steps=800 // batch_size)
model.save_weights('first_try.h5')  # always save your weights after training or during training

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


In [10]:
model.evaluate_generator(validation_generator, 500)

[0.45251897763895521, 0.78818227343996228]

In [11]:
model.metrics_names

['loss', 'acc']

# 78% accuracy without breaking a sweat

You now have a model that would have won you fame and fortune in 2013.

# Use pre-trained VGG16

Let's see if we can do even better by transfering knowledge from one model into another.

Karen Simonyan and Andrew Zisserman at the University of Oxford's [Visual Geometry Group](http://www.robots.ox.ac.uk/~vgg/research/very_deep/) created the VGG convolutional model in 2012 and scored first (localization task) and second (classification task) place in the ImageNet 2014 challenge. 

They have two [VGG models](http://arxiv.org/pdf/1409.1556):
+ VGG16 - a 16-layer CNN
![VGG16](http://book.paddlepaddle.org/03.image_classification/image/vgg16.png)
+ VGG19 - a 19-layer CNN
![VGG19](https://image.slidesharecdn.com/neuralarten-160316180514/95/neural-art-english-version-19-638.jpg?cb=1458367899)

Both are released under the Creative Commons License and have been used as starting points for many researchers to create new neural networks.



In [12]:
from keras import applications
from keras.preprocessing.image import ImageDataGenerator
from keras import optimizers
from keras.models import Sequential
from keras.models import Model
from keras.layers import Dropout, Flatten, Dense

# path to the model weights files.
weights_path = 'vgg16_weights.h5'

# dimensions of our images.
img_width, img_height = 150, 150

train_data_dir = 'data/train'
validation_data_dir = 'data/validation'
nb_train_samples = 2000
nb_validation_samples = 800
epochs = 50
batch_size = 16

# VGG-16

Because VGG has an open-source license, it is included in many packages (including Keras).  Note that the weights file is over 500 MB in size!

In [13]:
# build the VGG16 network
base_model = applications.VGG16(weights='imagenet', include_top=False, input_shape=(150,150,3))
print('Model loaded.')

Model loaded.


# Append our new layers

VGG was used to predict one of 1,000 classes. We only want to predict either cat (0) or dog (1). To do this, we'll just append a few more dense layers to the network and add a single output with a sigmoid activation function.

In [14]:
# build a classifier model to put on top of the convolutional model
top_model = Sequential()
top_model.add(Flatten(input_shape=base_model.output_shape[1:]))
top_model.add(Dense(256, activation='relu'))
top_model.add(Dropout(0.5))
top_model.add(Dense(1, activation='sigmoid'))  # Sigmoid goes between 0 and 1


# Add the two parts

Here's where we add VGG16 to our custom output layers.

In [15]:
# add the model on top of the VGG base
model = Model(inputs=base_model.input, outputs=top_model(base_model.output))

# Don't train VGG again

The point of using VGG is that it is already trained. Yes, it was trained on a completely different set of images, but the basic concept is that the convolutional filters "might" be applicable to any real image. 

So here we make sure that we *do not* change the weights of the first 16 layers of our new network.  We instead will only train our additional layers. 

In [16]:
for layer in model.layers[:15]:
    layer.trainable = False

In [17]:
# compile the model with a SGD/momentum optimizer
# and a very slow learning rate.
model.compile(loss='binary_crossentropy',
              optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),
              metrics=['accuracy'])

In [18]:
# prepare data augmentation configuration
train_datagen = ImageDataGenerator(
    rescale=1. / 255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1. / 255)

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
    validation_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='binary')

Found 23426 images belonging to 2 classes.
Found 1574 images belonging to 2 classes.


In [19]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 150, 150, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 150, 150, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 150, 150, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 75, 75, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 75, 75, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 75, 75, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 37, 37, 128)       0         
__________

In [20]:
# fine-tune the model
model.fit_generator(
    train_generator,
    steps_per_epoch=nb_train_samples // batch_size,
    epochs=epochs,
    validation_data=validation_generator,
    validation_steps=nb_validation_samples // batch_size,
    verbose=2)

Epoch 1/50
48s - loss: 0.6135 - acc: 0.6640 - val_loss: 0.4372 - val_acc: 0.8250
Epoch 2/50
47s - loss: 0.4326 - acc: 0.8060 - val_loss: 0.3037 - val_acc: 0.8734
Epoch 3/50
45s - loss: 0.3519 - acc: 0.8500 - val_loss: 0.2572 - val_acc: 0.8788
Epoch 4/50
45s - loss: 0.3093 - acc: 0.8685 - val_loss: 0.2452 - val_acc: 0.8899
Epoch 5/50
45s - loss: 0.3168 - acc: 0.8605 - val_loss: 0.2698 - val_acc: 0.8875
Epoch 6/50
45s - loss: 0.2900 - acc: 0.8695 - val_loss: 0.2211 - val_acc: 0.9062
Epoch 7/50
45s - loss: 0.2660 - acc: 0.8905 - val_loss: 0.2609 - val_acc: 0.8861
Epoch 8/50
46s - loss: 0.2771 - acc: 0.8875 - val_loss: 0.2019 - val_acc: 0.9213
Epoch 9/50
45s - loss: 0.2480 - acc: 0.8975 - val_loss: 0.2099 - val_acc: 0.9127
Epoch 10/50
45s - loss: 0.2380 - acc: 0.9020 - val_loss: 0.2050 - val_acc: 0.9012
Epoch 11/50
45s - loss: 0.2337 - acc: 0.9015 - val_loss: 0.2085 - val_acc: 0.9250
Epoch 12/50
47s - loss: 0.2161 - acc: 0.9110 - val_loss: 0.1854 - val_acc: 0.9190
Epoch 13/50
45s - loss: 0

<keras.callbacks.History at 0x7fd31920e810>

In [22]:
model.evaluate_generator(validation_generator, 1000)

[0.14301167962855935, 0.94245283023366389]

In [23]:
model.metrics_names

['loss', 'acc']

# Greater than 90% accuracy in under 40 minutes!

That's definitely not hype. Plus, we have a model that can take pictures of cats and dogs from any angle, any scale, and/or any translation and not be confused.