# Transfer Learning

In this notebook, we will classify T-cell activity by retraining sub-layers of a pre-trained deep CNN model Inception V3. This model will learn non-linear relationship between image features and the output. Also, it is more efficient than training a deep CNN from scratch.

Inception V3 comprises multiple Inception modules, where each module can be considered as a concatenated neural network layer. We are interested in exploring if retraining more Inception modules can help performance, so we denote the number of last Inception modules to retrain as $n$ and treat it as a hyper-parameter along with `learning rate` and `batch size`. Then, we can use Nested Cross-Validation to tune these values and test the final models. You can learn more about the Nested Cross-Validation scheme in [logistic_regression.ipynb](logistic_regression.ipynb#1.-Nested-Cross-Validation-Scheme). We will use `Keras` with `Tensorflow` backend to implement the neural network.

## 1. Data Preparation

`TODO`

- Change the requirement list
- Add in-text link

In [85]:
import numpy as np
import cv2
import re
import pandas as pd
from numpy.random import seed
from glob import glob
from os.path import basename, join, exists
from os import mkdir
from collections import Counter
from keras.models import Model
from keras.utils import Sequence
from keras.applications.inception_v3 import InceptionV3
from keras import optimizers
from keras import layers
from keras import backend as K
from keras.layers import Dense, GlobalAveragePooling2D, Input, MaxPooling2D, \
    Activation, BatchNormalization, Conv2D, AveragePooling2D
from keras.callbacks import EarlyStopping
from skimage.io import imread
from skimage.transform import resize
from sklearn.utils import shuffle
from sklearn import metrics
from json import dump, load
from IPython.display import display, Markdown

# Call ggplot2 code from this notebook
%load_ext rpy2.ipython

# We set a random seed to make the notebook results consistent
RANDOM_SEED = 53715
seed(RANDOM_SEED)
set_random_seed(RANDOM_SEED)

The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython


### 1.1. Extract Bottlenecks

For this model, we will use image pixel feature indirectly. Because we are only retraining the last $n$ layers, it is more efficient to extract image features from higher layers instead of feeding image pixels into the whole CNN model. Then, we can think the last $n$ layers as an independent CNN network and we train it using these extracted features as input.

These features extracted from the higher layers are sometimes called "image feature vector" or "bottleneck". With different $n$, the bottlenecks are different. Since we treat $n$ as a hyper-parameter, we want to generate all versions of bottlenecks corresponding to every value of $n$ (from $1$ to $11$). In our study, we are training this CNN model end-to-end from scratch when $n = 11$. In this case, we can directly use image pixel features. The data preparation for $n = 11$ is discussed in [1.2. Generate Image Pixel Features]().

Working with deep CNN is computationally expensive, even for our small subset of sample images. Therefore, even though the code to extract all bottlenecks are included in this notebook, we only generate bottlenecks with $n=1$ and $n=2$ here.

![](./plots/bottleneck.pdf)

In [27]:
def generate_bottleneck(data_dir, image_size, mapping, bottleneck_model,
                        image_format='png'):
    """
    Generate bottlenecks for all images in data_dir.

    Args:
        data_dir(string): path to the data directory. This directory is
            expected to have n_class subdirectories, where each contains data
            of one class.
        image_size(tuple): expected image shape
        mapping(dict): a dictionary mapping labels to integers
            {
            'label_1': 0,
            'label_2': 1
            }
        bottleneck_model(keras.model): the bottleneck model to extract features
        image_format(string): the image format, default is png
    Return:
        x(array(n, k)): bottlenecks for images
        y(array(n)): labels corresponding to each row of x
        names: the image names corresponding to each row of x
    """
    
    features, labels, names = [], [], []

    # Load the image name and their labels
    for label in mapping:
        sub_dir = join(data_dir, label)

        if not exists(sub_dir):
            print("can't find {} directory".format(label))
            continue

        for image_name in glob(join(sub_dir, "*.{}".format(image_format))):
            # Load and resize image
            image_data = np.array([resize(imread(image_name, as_gray=True),
                                          image_size, mode='constant',
                                          anti_aliasing=None)])

            # Extract features from the image (generate bottleneck)
            # We use [0] because it returns a batch of predictions
            bottleneck_features = bottleneck_model.predict(image_data)[0]

            features.append(bottleneck_features)
            labels.append(mapping[label])
            names.append(basename(image_name))

    # Format the features
    return np.array(features), np.array(labels), np.array(names).astype(str)

Then, we can define a wrapper function to generate bottlenecks for each $n$ value. In the pre-trained Inception V3 model with ImageNet weights in Keras uses `mixed*` to denote the concatenated layer of Inception modules. For example, `mixed10` is the concatenated layer of the 11th Inception modules (the last one). We will use this to select the range of retraining layers in this notebook.

In [44]:
def make_bottleneck_main(pre_layer):
    """
    The main function to pre-generate bottlenecks for different starting layers.
    This function saves the bottleneck as npz files encoding three arrays:
    extracted features, labels and image names.
    
    Args:
        pre_layer(string): The name of the layer before the starting layer.
            For example, if you want to retrain 'mixed10', then pre_layer is
            'mixed9'.
    """
    
    # Compile a bottleneck model based on the given pre_layer
    base_model = InceptionV3(weights='imagenet', include_top=True)
    bottleneck_model = Model(
        inputs=base_model.input,
        outputs=base_model.get_layer(pre_layer).output
    )

    # The pre-trained Inception model expects (299, 299, 3) input
    image_size = (299, 299, 3)
    data_dir = "./images/sample_images/processed/augmented/donor_{}/"
    mapping = {'activated': 1, 'quiescent': 0}
    
    # Create output directory
    if not exists('./images/sample_images/bottlenecks'):
        mkdir('./images/sample_images/bottlenecks')

    # Save one bottleneck for each donor per layer
    for donor in [1, 2, 3, 5, 6]:
        print('Start generating {} bottlenecks for donor {}.'.format(pre_layer,
                                                                    donor))
        features, labels, names = generate_bottleneck(data_dir.format(donor),
                                                      image_size, mapping,
                                                      bottleneck_model)

        np.savez(
            "./images/sample_images/bottlenecks/bottleneck_{}_donor_{}.npz".format(
                pre_layer, donor),
            features=features, labels=labels, names=names
        )

In [31]:
# This code takes about 30 minutes
make_bottleneck_main('mixed9')

Start generating mixed9 bottlenecks for donor 1.
Start generating mixed9 bottlenecks for donor 2.
Start generating mixed9 bottlenecks for donor 3.
Start generating mixed9 bottlenecks for donor 5.
Start generating mixed9 bottlenecks for donor 6.


In [68]:
# This code takes about 30 minutes
make_bottleneck_main('mixed8')

Start generating mixed8 bottlenecks for donor 1.
Start generating mixed8 bottlenecks for donor 2.
Start generating mixed8 bottlenecks for donor 3.
Start generating mixed8 bottlenecks for donor 5.
Start generating mixed8 bottlenecks for donor 6.


### 1.2. Extract Image Features



When $n=11$, we are training the whole Inception V3 model end-to-end so there is not need to extract bottlenecks. Instead, we directly resize the images and use their pixel features.

In [42]:
def make_train_data_main():
    """
    The main function to generate right-size images for training a complete
    new Inception v3 model.
    """
    
    # The pre-trained Inception model expects (299, 299, 3) input
    image_size = (299, 299, 3)
    data_dir = "./images/sample_images/processed/augmented/donor_{}/"
    mapping = {'activated': 1, 'quiescent': 0}
    
    # By our convention to name the "bottleneck", the pre_layer here is -1
    pre_layer = 'mixed-1'
    
    # Create output directory
    if not exists('./images/sample_images/bottlenecks'):
        mkdir('./images/sample_images/bottlenecks')

    for donor in [1, 2, 3, 5, 6]:
        print('Start generating {} bottlenecks for donor {}.'.format(pre_layer,
                                                                    donor))
        
        features, labels, names = [], [], []

        # Load the image name and their labels
        for label in mapping:
            sub_dir = join(data_dir, label).format(donor)

            if not exists(sub_dir):
                print("can't find {} directory for donor {}".format(label,
                                                                    donor))
                continue

            for image_name in glob(join(sub_dir, "*.{}".format('png'))):
                # Load and resize image
                image_data = resize(imread(image_name, as_gray=True),
                                    image_size, mode='constant',
                                    anti_aliasing=None)

                features.append(image_data)
                labels.append(mapping[label])
                names.append(basename(image_name))

        np.savez(
            "./images/sample_images/bottlenecks/bottleneck_{}_donor_{}.npz".format(
                pre_layer, donor),
            features=np.array(features),
            labels=np.array(labels),
            names=np.array(names).astype(str)
        )

Since we don't include `n=11` (`mixed-1`) in this notebook, we won't run the code below.

In [45]:
# make_train_data_main()

### 1.3. Data Generator Pipeline

Similarly to the infrastructure of simple neural network and simple CNN, we want to develop a pipeline to generate training data on multiple cores in real time and feed the model. We can make an instance of `keras.utils.Sequence` and customize the data generation rule.

In [38]:
class DataGenerator(Sequence):
    """
    Implement the DataGenertor class (instance of Sequence), so we can feed the
    training model with better parallel computing support.
    
    In this inherited class, we want to implement __getitem__ and the __len__
    methods.
    """
    def __init__(self, x, y, batch_size=32):
        """
        Args:
            x(array(n, k): feature arrays
            y(array(n)): label array
            batch_size(int): number of training samples per epoch
        """
        
        self.x, self.y = shuffle(x, y)

        self.batch_size = batch_size
        self.indexes = np.arange(x.shape[0])

    def __len__(self):
        """
        This method tells keras how many times to go through the whole sample.
        """
        
        return int(np.ceil(self.x.shape[0] / float(self.batch_size)))

    def __getitem__(self, index):
        """
        This method generates one batch of data.
        
        Args:
            index(int): Current batch index
        """
        
        batch_indexes = self.indexes[index * self.batch_size:
                                     (index + 1) * self.batch_size]
        batch_x = self.x[batch_indexes, :]
        batch_y = self.y[batch_indexes, :]

        return batch_x, batch_y

Then, we want a function to partition the bottlenecks into train and validation sets according to the nested cross-validation scheme.

One should note that the `vali_data_generator` and `vali_testing_x` are sampled from the same donor(`vali_did`). `vali_data_generator` is used for early stopping, while `vali_testing_x` is used to evaluate the parameter performance. Arguably this setup makes parameter tuning optimistic, but we want to maximize the samples in the training set.

In [53]:
def partition_data_cv(bottleneck_dir, pre_layer, batch_size, test_did, vali_did):
    """
    Make train and vali generator for training and inner performance testing.
    
    Args:
        bottleneck_dir(string): directory containing all bottlenecks
        pre_layer(string): one layer before the layer which retraining starts
        batch_size(int): batch size for training
        test_did(int): donor id for test set
        vali_did(int): donor id for validation set
        
    Returns:
        train_data_generator(DataGenerator): DataGenerator for training data
        vali_data_generator(DataGenerator): DataGenerator for validation data
        class_weight(dict): class weight based on label count in the traning data
        vali_testing_x(array(n,k)): feature matrix for evaluating the parameter;
            it has no augmented images
        vali_testing_y(array(n)): label array for evaluating the parameter;
            it has no augmented images
    """
    donors = [1, 2, 3, 5, 6]

    # Fill the training set
    donors.remove(test_did)
    donors.remove(vali_did)
    print("\t=> Start splitting data...")

    train_x = []
    train_y = []

    for d in donors:
        npz = join(bottleneck_dir, "bottleneck_{}_donor_{}.npz".format(
            pre_layer, d
        ))
        npz = np.load(npz)
        train_x.append(npz['features'])
        train_y.extend(npz['labels'])

    train_x = np.vstack(train_x)
    train_y = np.array(train_y)

    # Count the labels for class weighting
    count = Counter(train_y)

    # One-hot encoding for the labels
    train_y = np.vstack([[0, 1] if l else [1, 0] for l in train_y])

    # Fill the validation set for early stopping
    npz = join(bottleneck_dir, "bottleneck_{}_donor_{}.npz".format(pre_layer,
                                                                   vali_did))
    npz = np.load(npz)
    vali_x = npz['features']
    vali_y = np.vstack([[0, 1] if l else [1, 0] for l in npz['labels']])

    # Fill the validation set for inner evaluation
    # This set is the validation set for early stopping without augmented
    # images
    vali_testing_names = npz['names']
    vali_testing_index = np.array(
        [('r' not in n) and ('f' not in n) for n in vali_testing_names]
    )
    vali_testing_x = npz['features'][vali_testing_index]
    vali_testing_y = np.vstack([[0, 1] if l else [1, 0] for l in
                                npz['labels'][vali_testing_index]])

    # Add class weights
    if count[1] > count[0]:
        class_weight = {1: 1.0, 0: count[1] / count[0]}
    else:
        class_weight = {1: count[0] / count[1], 0: 1.0}

    return (DataGenerator(train_x, train_y, batch_size),
            DataGenerator(vali_x, vali_y, batch_size),
            class_weight,
            vali_testing_x,
            vali_testing_y)

In [52]:
# Verify the partition_data function
(train_gen, vali_gen, class_weights, vali_testing_x,
 vali_testing_y) = partition_data('./images/sample_images/bottlenecks',
                                  'mixed9',
                                  16, 1, 2)

print('train_gen batches: {}, vali_gen batches: {}'.format(len(train_gen),
                                                           len(vali_gen)))
print('class_weights: {}'.format(class_weights))
print('vali_testing_x shape: {}, vali_testing_y shape: {}'.format(
    vali_testing_x.shape,
    vali_testing_y.shape
))

	=> Start splitting data...
train_gen batches: 138, vali_gen batches: 30
class_weights: {1: 1.3375796178343948, 0: 1.0}
vali_testing_x shape: (80, 8, 8, 2048), vali_testing_y shape: (80, 2)


## 2. Model Tuning/Training

In this section, we want to implement nested cross-validation tuning and training. The hyper-parameters are `learning rate`, `batch size` and $n$. `batch size` is controlled by `partition_data_cv()` and `DataGenerator`.

### 2.1. Building Network

First, we want to build the retraining network. As discussed in [1.1. Extract Bottlenecks](), we split the complete Inception V3 model into two parts; we use the first part to extract bottlenecks, and we use bottlenecks to retrain the second part. Here, we want to have a function to dynamically generate the second part by different values of $n$.

It is not easy to split layers of the `InceptionV3` model from `keras.applications.inception_v3`. One alternative is to use the original model by setting un-retraining layers to `untrainable`, but this is not taking advantage of bottleneck cache. Therefore, we can refer to the `Keras` source code of `InceptionV3` to implement our own construction functions.

The following functions `conv2d_bn()` and `create_model_multiple_layers()` are referring to [`inception_v3.py`](https://github.com/keras-team/keras/blob/b0f1bb9c7c68e24137a9dc284dc210eb0050a6b4/keras/applications/inception_v3.py) written by the Keras team. The latest version of `inception_v3.py` is maintained at [here](https://github.com/keras-team/keras-applications/blob/master/keras_applications/inception_v3.py). The original version has some misleading comments, but they are later fixed in the newer version. You can learn more on this [issue](https://github.com/keras-team/keras-applications/issues/39).

In [54]:
def conv2d_bn(x, filters, num_row, num_col, padding='same', strides=(1, 1),
              name=None):
    """
    Utility function to apply conv + BN.
    
    Args:
        x: input tensor.
        filters: filters in `Conv2D`.
        num_row: height of the convolution kernel.
        num_col: width of the convolution kernel.
        padding: padding mode in `Conv2D`.
        strides: strides in `Conv2D`.
        name: name of the ops; will become `name + '_conv'`
            for the convolution and `name + '_bn'` for the
            batch norm layer.
            
    Returns:
        Output tensor after applying `Conv2D` and `BatchNormalization`.
    """
    
    if name is not None:
        bn_name = name + '_bn'
        conv_name = name + '_conv'
    else:
        bn_name = None
        conv_name = None
    if K.image_data_format() == 'channels_first':
        bn_axis = 1
    else:
        bn_axis = 3
    x = Conv2D(
        filters, (num_row, num_col),
        strides=strides,
        padding=padding,
        use_bias=False,
        name=conv_name)(x)
    x = BatchNormalization(axis=bn_axis, scale=False, name=bn_name)(x)
    x = Activation('relu', name=name)(x)
    return x

In [55]:
def create_model_multiple_layers(layer=1):
    """
    Retrain some of the last layers instead of adding a fully connected layer.

    Args:
        layer(int): 1 => Retrain mixed10
                    2 => Retrain mixed9, 10
                    3 => Retrain mixed8, 9 , 10
                    4 => Retrain mixed7, 8, 9, 10
                    5 => Retrain mixed6, 7, 8, 9, 10
                    6 => Retrain mixed5, 6, 7, 8, 9, 10
                    7 => Retrain mixed4, 5, 6, 7, 8, 9, 10
                    8 => Retrain mixed3, 4, 5, 6, 7, 8, 9, 10
                    9 => Retrain mixed2, 3, 4, 5, 6, 7, 8, 9, 10
                    10 => Retrain mixed1, 2, 3, 4, 5, 6, 7, 8, 9, 10
                    11 => Train the entire Inception v3 from scratch

    Return:
        train_model: the second part of Inception v3 that we want to retrain
            using extracted bottlenecks
    """

    if layer == 11:
        # Generate a Inception v3 model without pre-trained weights
        return InceptionV3(weights=None, include_top=True, classes=2)

    channel_axis = 3

    # Constant config diction
    config = {
        1: ['mixed9', (8, 8, 2048)],
        2: ['mixed8', (8, 8, 1280)],
        3: ['mixed7', (17, 17, 768)],
        4: ['mixed6', (17, 17, 768)],
        5: ['mixed5', (17, 17, 768)],
        6: ['mixed4', (17, 17, 768)],
        7: ['mixed3', (17, 17, 768)],
        8: ['mixed2', (35, 35, 288)],
        9: ['mixed1', (35, 35, 288)],
        10: ['mixed0', (35, 35, 256)]
    }

    # Create training model
    bottleneck_input = Input(shape=config[layer][1])
    x = bottleneck_input

    # mixed 1: 35 x 35 x 256
    if layer >= 10:
        branch1x1 = conv2d_bn(x, 64, 1, 1)

        branch5x5 = conv2d_bn(x, 48, 1, 1)
        branch5x5 = conv2d_bn(branch5x5, 64, 5, 5)

        branch3x3dbl = conv2d_bn(x, 64, 1, 1)
        branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
        branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)

        branch_pool = AveragePooling2D((3, 3), strides=(1, 1),
                                       padding='same')(x)
        branch_pool = conv2d_bn(branch_pool, 64, 1, 1)
        x = layers.concatenate(
            [branch1x1, branch5x5, branch3x3dbl, branch_pool],
            axis=channel_axis,
            name='mixed1')

    # mixed 2: 35 x 35 x 256
    if layer >= 9:
        branch1x1 = conv2d_bn(x, 64, 1, 1)

        branch5x5 = conv2d_bn(x, 48, 1, 1)
        branch5x5 = conv2d_bn(branch5x5, 64, 5, 5)

        branch3x3dbl = conv2d_bn(x, 64, 1, 1)
        branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
        branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)

        branch_pool = AveragePooling2D((3, 3), strides=(1, 1),
                                       padding='same')(x)
        branch_pool = conv2d_bn(branch_pool, 64, 1, 1)
        x = layers.concatenate(
            [branch1x1, branch5x5, branch3x3dbl, branch_pool],
            axis=channel_axis,
            name='mixed2')

    # mixed 3: 17 x 17 x 768
    if layer >= 8:
        branch3x3 = conv2d_bn(x, 384, 3, 3, strides=(2, 2), padding='valid')

        branch3x3dbl = conv2d_bn(x, 64, 1, 1)
        branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
        branch3x3dbl = conv2d_bn(
            branch3x3dbl, 96, 3, 3, strides=(2, 2), padding='valid')

        branch_pool = MaxPooling2D((3, 3), strides=(2, 2))(x)
        x = layers.concatenate(
            [branch3x3, branch3x3dbl, branch_pool],
            axis=channel_axis, name='mixed3')

    # mixed 4: 17 x 17 x 768
    if layer >= 7:
        branch1x1 = conv2d_bn(x, 192, 1, 1)

        branch7x7 = conv2d_bn(x, 128, 1, 1)
        branch7x7 = conv2d_bn(branch7x7, 128, 1, 7)
        branch7x7 = conv2d_bn(branch7x7, 192, 7, 1)

        branch7x7dbl = conv2d_bn(x, 128, 1, 1)
        branch7x7dbl = conv2d_bn(branch7x7dbl, 128, 7, 1)
        branch7x7dbl = conv2d_bn(branch7x7dbl, 128, 1, 7)
        branch7x7dbl = conv2d_bn(branch7x7dbl, 128, 7, 1)
        branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 1, 7)

        branch_pool = AveragePooling2D((3, 3), strides=(1, 1),
                                       padding='same')(x)
        branch_pool = conv2d_bn(branch_pool, 192, 1, 1)
        x = layers.concatenate(
            [branch1x1, branch7x7, branch7x7dbl, branch_pool],
            axis=channel_axis,
            name='mixed4')

    # mixed 5 / 6: 17 x 17 x 768
    mixed_5_6 = []
    if layer >= 6:
        mixed_5_6 = [0, 1]
    elif layer >= 5:
        mixed_5_6 = [1]

    for i in mixed_5_6:
        branch1x1 = conv2d_bn(x, 192, 1, 1)

        branch7x7 = conv2d_bn(x, 160, 1, 1)
        branch7x7 = conv2d_bn(branch7x7, 160, 1, 7)
        branch7x7 = conv2d_bn(branch7x7, 192, 7, 1)

        branch7x7dbl = conv2d_bn(x, 160, 1, 1)
        branch7x7dbl = conv2d_bn(branch7x7dbl, 160, 7, 1)
        branch7x7dbl = conv2d_bn(branch7x7dbl, 160, 1, 7)
        branch7x7dbl = conv2d_bn(branch7x7dbl, 160, 7, 1)
        branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 1, 7)

        branch_pool = AveragePooling2D(
            (3, 3), strides=(1, 1), padding='same')(x)
        branch_pool = conv2d_bn(branch_pool, 192, 1, 1)
        x = layers.concatenate(
            [branch1x1, branch7x7, branch7x7dbl, branch_pool],
            axis=channel_axis,
            name='mixed' + str(5 + i))

    # mixed 7: 17 x 17 x 768
    if layer >= 4:
        branch1x1 = conv2d_bn(x, 192, 1, 1)

        branch7x7 = conv2d_bn(x, 192, 1, 1)
        branch7x7 = conv2d_bn(branch7x7, 192, 1, 7)
        branch7x7 = conv2d_bn(branch7x7, 192, 7, 1)

        branch7x7dbl = conv2d_bn(x, 192, 1, 1)
        branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 7, 1)
        branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 1, 7)
        branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 7, 1)
        branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 1, 7)

        branch_pool = AveragePooling2D((3, 3), strides=(1, 1),
                                       padding='same')(x)
        branch_pool = conv2d_bn(branch_pool, 192, 1, 1)
        x = layers.concatenate(
            [branch1x1, branch7x7, branch7x7dbl, branch_pool],
            axis=channel_axis,
            name='mixed7')

    # mixed 8: 8 x 8 x 1280
    if layer >= 3:
        branch3x3 = conv2d_bn(x, 192, 1, 1)
        branch3x3 = conv2d_bn(branch3x3, 320, 3, 3,
                              strides=(2, 2), padding='valid')

        branch7x7x3 = conv2d_bn(x, 192, 1, 1)
        branch7x7x3 = conv2d_bn(branch7x7x3, 192, 1, 7)
        branch7x7x3 = conv2d_bn(branch7x7x3, 192, 7, 1)
        branch7x7x3 = conv2d_bn(
            branch7x7x3, 192, 3, 3, strides=(2, 2), padding='valid')

        branch_pool = MaxPooling2D((3, 3), strides=(2, 2))(x)
        x = layers.concatenate(
            [branch3x3, branch7x7x3, branch_pool], axis=channel_axis,
            name='mixed8')

    # mixed 9 / 10 : 8 x 8 x 2048
    for i in range(2) if layer >= 2 else [1]:
        branch1x1 = conv2d_bn(x, 320, 1, 1)

        branch3x3 = conv2d_bn(x, 384, 1, 1)
        branch3x3_1 = conv2d_bn(branch3x3, 384, 1, 3)
        branch3x3_2 = conv2d_bn(branch3x3, 384, 3, 1)
        branch3x3 = layers.concatenate(
            [branch3x3_1, branch3x3_2],
            axis=channel_axis, name='mixed9_' + str(i)
        )

        branch3x3dbl = conv2d_bn(x, 448, 1, 1)
        branch3x3dbl = conv2d_bn(branch3x3dbl, 384, 3, 3)
        branch3x3dbl_1 = conv2d_bn(branch3x3dbl, 384, 1, 3)
        branch3x3dbl_2 = conv2d_bn(branch3x3dbl, 384, 3, 1)
        branch3x3dbl = layers.concatenate(
            [branch3x3dbl_1, branch3x3dbl_2], axis=channel_axis
        )

        branch_pool = AveragePooling2D(
            (3, 3), strides=(1, 1), padding='same'
        )(x)
        branch_pool = conv2d_bn(branch_pool, 192, 1, 1)
        x = layers.concatenate(
            [branch1x1, branch3x3, branch3x3dbl, branch_pool],
            axis=channel_axis,
            name='mixed' + str(9 + i)
        )

    x = GlobalAveragePooling2D(name='avg_pool')(x)
    predictions = Dense(2, activation='softmax')(x)
    train_model = Model(inputs=bottleneck_input,
                        outputs=predictions)

    return train_model

In [62]:
# Test the network constructor
model_retrain_mixed10 = create_model_multiple_layers(1)
model_retrain_mixed10.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_16 (InputLayer)           (None, 8, 8, 2048)   0                                            
__________________________________________________________________________________________________
conv2d_932 (Conv2D)             (None, 8, 8, 448)    917504      input_16[0][0]                   
__________________________________________________________________________________________________
batch_normalization_932 (BatchN (None, 8, 8, 448)    1344        conv2d_932[0][0]                 
__________________________________________________________________________________________________
activation_932 (Activation)     (None, 8, 8, 448)    0           batch_normalization_932[0][0]    
__________________________________________________________________________________________________
conv2d_929

In [63]:
# Test the network constructor
model_retrain_mixed9_10 = create_model_multiple_layers(2)
model_retrain_mixed9_10.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_17 (InputLayer)           (None, 8, 8, 1280)   0                                            
__________________________________________________________________________________________________
conv2d_941 (Conv2D)             (None, 8, 8, 448)    573440      input_17[0][0]                   
__________________________________________________________________________________________________
batch_normalization_941 (BatchN (None, 8, 8, 448)    1344        conv2d_941[0][0]                 
__________________________________________________________________________________________________
activation_941 (Activation)     (None, 8, 8, 448)    0           batch_normalization_941[0][0]    
__________________________________________________________________________________________________
conv2d_938

### 2.2. Auto-checkpoint Retraining Process

In our study, we used [Cooley](https://www.alcf.anl.gov/user-guides/cooley) to run the 3,520 nested cross-validation inner-loop jobs. Some jobs that retraining more layers are too slow to finish in the time limit (12 hours). Therefore, we added an auto-checkpoint feature to save retraining process after every epoch. Once one job gets evicted, we can resubmit it and continue training.

The implementation is to write our own early stopping call back, so it saves the network weights after each epoch. When we implemented this function, there was a [bug](https://github.com/keras-team/keras/issues/11101) of saving weights in the multi-threading context (not solved on 4/1/2019). Our workaround is to save weights using different filenames. The implementation of `MyEarlyStopping` is based on Keras' [`callbacks.py`](https://github.com/keras-team/keras/blob/master/keras/callbacks.py#L733).

In [99]:
# This callback and the training function need to know where to pick up
# checkpoints, so we define the following two global constants.

CONFIG_PATH = "./temp/retrain_config.json"
WEIGHT_PATH = "./temp/saved_weights_e{}.h5"

if not exists('./temp'):
    mkdir('./temp')

In [66]:
class MyEarlyStopping(EarlyStopping):
    """
    Subclass of the EarlyStopping callback. Add patience record for auto-
    checkpoint feature. Using this callback would add I/O's.
    """

    def __init__(self,
                 monitor='val_loss',
                 min_delta=0,
                 patience=0,
                 verbose=0,
                 mode='auto',
                 baseline=None,
                 restore_best_weights=False,
                 wait_write_path=CONFIG_PATH,
                 weight_write_path=WEIGHT_PATH,
                 wait=0):
        super(EarlyStopping, self).__init__()

        self.monitor = monitor
        self.baseline = baseline
        self.patience = patience
        self.verbose = verbose
        self.min_delta = min_delta
        self.wait = wait
        self.stopped_epoch = 0
        self.restore_best_weights = restore_best_weights
        self.best_weights = None

        # Add a new attribute
        self.wait_write_path = wait_write_path
        self.weight_write_path = weight_write_path

        if mode not in ['auto', 'min', 'max']:
            print('EarlyStopping mode %s is unknown, fallback to auto mode.'
                  % mode)
            mode = 'auto'

        if mode == 'min':
            self.monitor_op = np.less
        elif mode == 'max':
            self.monitor_op = np.greater
        else:
            if 'acc' in self.monitor:
                self.monitor_op = np.greater
            else:
                self.monitor_op = np.less

        if self.monitor_op == np.greater:
            self.min_delta *= 1
        else:
            self.min_delta *= -1

    def on_train_begin(self, logs=None):
        # Allow instances to be re-used
        # self.wait = 0
        self.stopped_epoch = 0
        if self.baseline is not None:
            self.best = self.baseline
        else:
            self.best = np.Inf if self.monitor_op == np.less else -np.Inf

    def on_epoch_end(self, epoch, logs=None):
        current = self.get_monitor_value(logs)
        if current is None:
            return

        if self.monitor_op(current - self.min_delta, self.best):
            self.best = current
            self.wait = 0
            if self.restore_best_weights:
                self.best_weights = self.model.get_weights()
        else:
            self.wait += 1
            if self.wait >= self.patience:
                self.stopped_epoch = epoch
                self.model.stop_training = True
                if self.restore_best_weights:
                    if self.verbose > 0:
                        print('Restoring model weights from the end of '
                              'the best epoch')
                    self.model.set_weights(self.best_weights)

        print("epoch: {}, wait: {}, best: {}, current:{}".format(
            epoch, self.wait, self.best, current
        ))

        # Write the `wait` attribute to the config file
        # We also save epoch here so we can reconstruct the weight h5 name
        wait_dict = {"waited_epoch": self.wait, "best": self.best,
                     "epoch": epoch}
        dump(wait_dict, open(self.wait_write_path, 'w'), indent=2)

        # Instead of saving model, we only save weights
        self.model.save_weights(self.weight_write_path.format(epoch))

        # Order matters here. Suppose the system died before saving current
        # weights, then we have last weights left. If we did save the current
        # weights, then it is fine to delete the previous one.
        if exists(self.weight_write_path.format(epoch - 1)):
            os.remove(self.weight_write_path.format(epoch - 1))

    def on_train_end(self, logs=None):
        if self.stopped_epoch > 0 and self.verbose > 0:
            print('Epoch %05d: early stopping' % (self.stopped_epoch + 1))

    def get_monitor_value(self, logs):
        monitor_value = logs.get(self.monitor)
        if monitor_value is None:
            print(
                'Early stopping conditioned on metric `%s` '
                'which is not available. Available metrics are: %s' %
                (self.monitor, ','.join(list(logs.keys())))
            )
        return monitor_value

### 2.3. Nested Cross-validation

In this section, we will implement the real retraining routine and the nested cross-validation scheme.

In [69]:
def retrain(train_generator, vali_generator, test_did, vali_did,
            vali_testing_x, vali_testing_y, class_weights, train_model,
            patience, results, wait_count, best, cont_train, epoch=1, nproc=1,
            lr=0.001, native_early_stopping=False, verbose=0, test_name=[]):
    """
    Retrain the given train_model on train data and evaluate the performance on
    vali_testing data.

    Args:
        train_generator(DataGenerator): the training data Sequence
        vali_generator(DataGenerator): the validation data Sequence
        test_did(int): donor id for the test set
        vali_did(int): donor id for the validation set
        vali_testing_x(np.array): feature from validation set to evaluate parameters
            (no augmented images)
        vali_testing_y(np.array): labels from validation set to evaluate parameters
            (no arugmented images)
        epoch(int): training epoch
        nprocs(int): number of workers
        lr(float): learning rate
        class_weights(dict): class weighting for training
            For example, {1: 1.33, 0: 1.0}
        train_model(Model): the training model
        patience(int): early stopping patience
        results(dict): a dictionary to record the training results
        naive_early_stopping(bool): whether to use the native early stopping(no
            auto checkpoints)
        verbose(int): the level of output wordness
        test_name([string]): image name for vali_testing_y
            
    This function does not return any value, but it writes the following dict to
    the `results` argument.

    {
        'acc': accuracy,
        'ap': average precision,
        'auc': auc roc,
        'pr': points to plot the PR curve,
        'roc': points to plot the ROC curve,
        'history': training history,
        'y_true': true y values,
        'y_true_name': image names of y_true,
        'y_score': predicted y values
    }
    """

    # Compile the model
    train_model.compile(optimizer=optimizers.Adam(lr=lr,
                                                  beta_1=0.9,
                                                  beta_2=0.999,
                                                  epsilon=None,
                                                  decay=0.0),
                        metrics=['accuracy'],
                        loss='categorical_crossentropy')
    # print(train_model.summary())

    # Add early stopping
    if native_early_stopping:
        # Only use native stopping, no model checkpoing
        my_callbacks = [EarlyStopping(monitor='val_loss', patience=patience)]
    else:
        my_callbacks = [MyEarlyStopping(monitor='val_loss', patience=patience,
                                        wait=wait_count, baseline=best)]
    # train the model on the given data
    # Caveat: the old version of Keras requires argument steps_per_epoch
    # and validation_steps
    print("\t=> Start training")
    hist = train_model.fit_generator(generator=train_generator,
                                     validation_data=vali_generator,
                                     steps_per_epoch=len(train_generator),
                                     validation_steps=len(vali_generator),
                                     epochs=epoch,
                                     callbacks=my_callbacks,
                                     class_weight=class_weights,
                                     verbose=0,
                                     use_multiprocessing=True,
                                     workers=nproc)

    # Evaluate the trained model on the inner validation set
    vali_predict = train_model.predict(vali_testing_x)

    # Remember the silly precision_curve requires 1d prob array
    y_predict_prob = [x[1] for x in vali_predict]
    vali_y_1d = [np.argmax(i) for i in vali_testing_y]
    vali_predict_1d = [np.argmax(i) for i in vali_predict]

    auc = metrics.roc_auc_score(vali_y_1d, y_predict_prob)
    ap = metrics.average_precision_score(vali_y_1d, y_predict_prob)
    acc = metrics.accuracy_score(vali_y_1d, vali_predict_1d)

    precisions, recalls, thresholds = metrics.precision_recall_curve(
        vali_y_1d,
        y_predict_prob
    )

    fprs, tprs, roc_thresholds = metrics.roc_curve(vali_y_1d, y_predict_prob)

    # Record the results
    results[str((test_did, vali_did))] = {
        'acc': acc,
        'ap': ap,
        'auc': auc,
        'pr': [precisions.tolist(), recalls.tolist(), thresholds.tolist()],
        'roc': [fprs.tolist(), tprs.tolist(), roc_thresholds.tolist()],
        'history': hist.history,
        # Convert int64 and float32 to int and float
        'y_true': list(map(int, vali_y_1d)),
        'y_true_name': test_name,
        'y_score': list(map(float, y_predict_prob))
    }

In [87]:
def cross_validation_main(pre_layer, lr, batch_size, epoch=500, test_did=None,
                          vali_did=None):
    """
    The main function for parameter tuning using nested cross validation.
    
    Args:
        pre_layer(string): one layer before the training layer,
            with format mixed\d+.
        lr(float): learning rate
        batch_size(int): batch size
        test_did(int): the donor id for the test set
        vali_did(int): the donor id for the validation set
        
    This function does not return anything, but saves the training and inner-
    loop evaluations in a JSON file.
    """

    # Training constants
    bottleneck_dir = "./images/sample_images/bottlenecks"
    
    # Translate string pre_layer into the integer we used
    # in create_model_multiple_layers()
    if '-1' in pre_layer:
        num_layer = 11
    else:
        num_layer = 10 - int(re.sub(r'mixed(\d+)', r'\1', pre_layer))
    
    # Set up early stopping patience and an empty dict to record training results
    patience = 20
    nproc = 6
    results = {}

    # Try all 20 combinations in one job if the sets are not given
    if not test_did or not vali_did:
        # Rotation for different test donor and validation donor
        donors = [1, 2, 3, 4, 5, 6]
        for test_did in donors:
            for vali_did in [i for i in donors if i != test_did]:
                print("Working on cur combination: test={}, vali={}".
                      format(test_did, vali_did))

                (train_gen, vali_gen, class_weights, vali_testing_x,
                 vali_testing_y) = partition_data(bottleneck_dir, pre_layer,
                                                  batch_size, test_did,
                                                  vali_did)

                # Generate a brand new model for each training
                train_model = create_model_multiple_layers(layer=num_layer)

                # Retrain this model and evaluate it on the validation set
                retrain(train_gen, vali_gen, test_did, vali_did,
                        vali_testing_x, vali_testing_y, class_weights,
                        train_model, patience, results, epoch=epoch,
                        nproc=nproc, lr=lr)

                # Overwrite the results json after each combination
                dump(results, open("results_{}_{}_{}.json".format(pre_layer,
                                                                  lr,
                                                                  batch_size),
                                   'w'), indent=2)
    else:
        # Only train and evaluate following the given vali and test set
        print("Working on combination: test={}, vali={}".format(test_did,
                                                                vali_did))

        (train_gen, vali_gen, class_weights, vali_testing_x,
         vali_testing_y) = partition_data(bottleneck_dir, pre_layer,
                                          batch_size, test_did, vali_did)

        # Check if we should continue training or start a new session
        if exists(CONFIG_PATH):
            print("Found existing config path.")
            # Get the epoch info
            config_dict = load(open(CONFIG_PATH, 'r'))
            weights_h5 = WEIGHT_PATH.format(config_dict["epoch"])
            if exists(weights_h5):
                train_model = create_model_multiple_layers(layer=num_layer)
                train_model.load_weights(weights_h5)
                wait_count = config_dict["waited_epoch"]
                best = config_dict["best"]
                cont_train = True
                print("\t=> Continue training, with wait_count={}, best={}".
                      format(wait_count, best))
            else:
                print("Found config json but no weights '{}'!".format(
                    weights_h5
                ))
                return

        else:
            print("Start new training.")
            train_model = create_model_multiple_layers(layer=num_layer)
            wait_count = 0
            cont_train = False
            best = None

        # Retrain this model and evaluate it on the validation set
        retrain(train_gen, vali_gen, test_did, vali_did,
                vali_testing_x, vali_testing_y, class_weights,
                train_model, patience, results, wait_count, best, cont_train,
                epoch=epoch, nproc=nproc, lr=lr)

        # Save the result dictionary
        dump(results, open('./temp/results_{}_{}_{}_{}_{}.json'.format(
            pre_layer, lr, batch_size, test_did, vali_did), 'w'), indent=2)

The code below tries to run one inner-loop job with hyper-parameters: `learning rate` = 0.01, `batch size` = 32 and $n$ = 1. It only runs for one epoch.

In [86]:
cross_validation_main('mixed9', 0.01, 32, epoch=1, test_did=1, vali_did=2)

Working on combination: test=1, vali=2
	=> Start splitting data...
Start new training.
	=> Start training
epoch: 0, wait: 0, best: 0.524006990591685, current:0.524006990591685


In [98]:
result_dict = load(open('./temp/results_{}_{}_{}_{}_{}.json'.format(
    'mixed9', 0.01, 32, 1, 2), 'r'))

print("After 1 epoch of layer={}, lr={}, bs={}, test_id={} and vali_id={}, ".format(
    'mixed9', 0.01, 32, 1, 2), end='')
      
print("we have acc={:.4f} and auc={:.4f}, ap={:.4f}.".format(
    result_dict[str((1, 2))]['acc'],
    result_dict[str((1, 2))]['auc'],
    result_dict[str((1, 2))]['ap']))

After 1 epoch of layer=mixed9, lr=0.01, bs=32, test_id=1 and vali_id=2, we have acc=0.8625 and auc=0.9949, ap=0.9989.


Then, we can try to tune the hyper-parameters using nested cross-validation. In our study, we have hyper-parameter candidates: 

- `layers = ["mixed-1", "mixed0", ..., "mixed9"]`
- `lrs = [0.00001, 0.0001, 0.001, 0.01]`
- `batch_sizes = [8, 16, 32, 64]`

In this notebook, we can reduce the size for each hyper-parameter candidate to $2$. It still requires $2 \times 2 \times 2 \times 5 \times 4 = 160$ inner jobs.

Instead of using a nested for-loop to run $160$ jobs on one cluster, we recommend to split each inner-loop as an independent job and run jobs on different clusters. We will not run the code below in this notebook, but the results are merged and stored at `./resource/transfer_cv_results.json`.

In [84]:
layer_candidates = ['mixed8', 'mixed9']
lr_candidates = [0.001, 0.01]
bs_candidates = [16, 32]

# Grid search the best parameter
donors = [1, 2, 3, 5, 6]
for layer in layer_candidates:
    for lr in lr_candidates:
        for bs in bs_candidates:
            for test_did in donors:
                for vali_did in [i for i in donors if i != test_did]:
                    pass
#                    cross_validation_main(layer, lr, bs, epoch=500,
#                                          test_did=test_did, vali_did=vali_did)