In the convolution networks module we have seen how we can use pre-trained neural networks, and use the features they learnt to solve new (supervised) learning tasks.


This approach to machine learning is referred to as `Transfer learning`, that is the improvement of learning in a new task through the transfer of knowledge from a related task that has already been learned. From Stanford's CS231 machine learning class:

>> **In practice, very few people train an entire Convolutional Network from scratch (with random initialization), because it is relatively rare to have a dataset of sufficient size. Instead, it is common to pretrain a ConvNet on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use the ConvNet either as an initialization or a fixed feature extractor for the task of interest.**

## Transfer learning with MNIST data

We have seen during the lecture how to implement a CNN using the Keras framework, and how we can exploit a pre-trained network to extract features for image classification.

In this lab we'll combine both approaches. You are asked to train a Convolutional Neural Network on the first five digits of the MNIST dataset, freeze and tune dense layers to classify the remaining 5. You are encouraged to review and incorporate the programming examples provided in the lecture material (see `convnet.ipynb`).

In [1]:
from keras.datasets import mnist

Using TensorFlow backend.


In [2]:
import numpy as np
np.random.seed(42) 
from keras.utils.np_utils import to_categorical
from keras import backend as K
from keras.models import Sequential
from keras.layers.convolutional import Convolution2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Activation
from keras.layers import Dense

** 1. Load and split the MNIST data set into train and test **

Hint: remember the `load_data()` method in `keras.datasets.mnist`

In [3]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

In [4]:
height, width = 28, 28
n_classes = 10

** 2. Train a simple convnet on the MNIST dataset the first 5 digits [0..4]. **

We've already seen how to train a CNN on MNIST data during the lecture. Let's reorganize the code a bit to make it more reusable.

In [11]:
def train_cnn(model, train, test):
    # using global variables for better reusability :joy:
    global height, width, n_classes
    
    X_train, y_train = train
    X_test, y_test = test
    n_train = X_train.shape[0]
    n_test = X_test.shape[0]
    # we have to preprocess the data into the right form
    X_train = X_train.reshape(n_train, 1, height, width).astype('float32') / 255
    X_test = X_test.reshape(n_test, 1, height, width).astype('float32') / 255

    y_train = to_categorical(y_train, n_classes)
    y_test = to_categorical(y_test, n_classes)

    X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
    X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)
    
    model.compile(
        loss='categorical_crossentropy',
        optimizer='adam',
        metrics=['accuracy'])
    
    model.fit(X_train,
          y_train,
          batch_size=128,
          nb_epoch=3,
          validation_data=(X_test, y_test))
    score = model.evaluate(X_test, y_test)
    print('Test score:', score[0])
    print('Test accuracy:', score[1])

Hint: it might be useful to store convolution and dense layers in separate tuples.

In [17]:
model = Sequential()

n_filters = 32
filter_size = 3
pool_size = 2

conv_layers =  (Convolution2D(n_filters, filter_size, filter_size, border_mode='valid', 
                              input_shape=(height, width, 1)),
                Activation('relu'),
                Convolution2D(n_filters, filter_size, filter_size),
                Activation('relu'),
                Flatten())
dense_layers = (Dense(128), 
                Activation('relu'),
                Dense(n_classes),
                Activation('softmax'))

for layer in conv_layers:
    model.add(layer)
    
for layer in dense_layers:
    model.add(layer)

We now need to train a model on the 0-5 digits. `y_train` contains the digit labels

In [9]:
set(y_train)

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

Hint: we can select the target digits using `numpy`'s indexing capabilities. You can use `numpy`'s indexing functionality to select which classes to train on. You should select only datapoints labeled as < 5.

In [12]:
train_cnn(model,
          (X_train[y_train < 5], y_train[y_train < 5]),
          (X_test[y_test < 5], y_test[y_test < 5]))

Train on 30596 samples, validate on 5139 samples
Epoch 1/1
Test score: 0.0273331785328
Test accuracy: 0.991438022962


 ** 2. Freeze the convolution layers and fine-tune the dense layers for the classification of digits [5..9]**

Compare the performance of this approach, to the results obtained by training and tuning an end-to-end network. Hint: review the `trainable` property of `Convolution2D` layers.

In [13]:
for layer in conv_layers:
    layer.trainable = False

In [14]:
model.layers[0].trainable

False

In [15]:
model.layers[5].trainable

True

In the transfer step we train the model using `train_cnn()`, while `fit`ting (fine-tune) only the dense layers. You should select only data points labelled as >= 5.

In [16]:
train_cnn(model,
            (X_train[y_train >= 5], y_train[y_train >= 5]),
            (X_test[y_test >= 5]  , y_test[y_test >= 5]))

Train on 29404 samples, validate on 4861 samples
Epoch 1/1
Test score: 0.058881396084
Test accuracy: 0.982102447946


Training time is almost cut in half and accuracy is comparable to performance of the model trained on all digits (see lecture notes).

What happened in the transfer step?
    
The convolution layers act as a feature extraction component. When fitting the model on the [5..9] numbers, 
after freezing the layers with `layer.trainable=False` we "transter" the features that the model learnt on [0..4] to the new dataset.

## References

 [1] [How transferable are features in deep neural networks?](https://arxiv.org/abs/1411.1792)

 [2] [CNN Features off-the-shelf: an Astounding Baseline for Recognition](https://arxiv.org/abs/1403.6382)
 
 [3] [Stanford CS231, transfer learning](http://cs231n.github.io/transfer-learning/)