In this notebook we train a convolution neural network (based upon ResNetv2) to classify routes by their grades and achieve 70% accuracy on the test dataset (using a one out accuracy, the true grade can be +- 1 of our guess).  We experiment with three different loss functions to try and take advantage of the ordering of our labels (grades can be arranged on a number line).  For our loss functions we use:
- CJS (cummlative Jensen-Shannon divergence), https://arxiv.org/pdf/1708.07089.pdf
- Squared earth mover's distance (or Wasserstein metric), https://arxiv.org/pdf/1708.07089.pdf
- Cross-entropy loss (standard loss function for any classification problem, which ignores the orderings of labels)

Our one out accuracy results are:

|Loss function| Accuracy (one out)|
|---|---|
|CJS | 69.4%|
|squared earth mover's distance| 70.8%|
|cross-entropy| 64.2%|

Convolutional neural network model: this model is based upon a 14 layer ResNetv2 but with a few key differences: we add dropout layers and add the scaling of the residuals as in Inception-ResNet (https://arxiv.org/pdf/1602.07261.pdf).

### Building the data pipeline
We load the input data, create the training, validation and testing datasets ensuring the proper distribution of grade 6, 7 and 8's in each, and then build the data pipeline

In [1]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from IPython.utils import io

from skopt import gp_minimize, callbacks, load
from skopt.space import Real
from skopt.utils import use_named_args
from skopt.plots import plot_convergence
from skopt.callbacks import CheckpointSaver
# Need skopt.__version__ > 0.5.2 or pip install git+https://github.com/scikit-optimize/scikit-optimize/

%matplotlib inline
plt.style.use('seaborn-white')
plt.rcParams["image.origin"] = 'lower'
plt.rcParams['figure.figsize'] = (10.0, 8.0)

# ignore warnings about Tensorflow API v2.0
tf.logging.set_verbosity(tf.logging.ERROR)

In [2]:
def load_data(train=0.7, val=0.2, test=0.1, data_format='channels_first'):
    """
    Loads the datasets from data/data.npz and randomly creates the train, test and 
    validation datasets.

    Inputs:
    - train, val, test: the fraction of the dataset in the train dataset, validation dataset
      and test dataset respectively
    - data_format: A string, one of 'channels_last' (default) or 'channels_first'.
      'channels_last' best for CPU and 'channels_first' best for GPU
    Returns:
    - 6 numpy arrays: X_train, y_train, X_val, y_val, X_test, y_test in either
      channel_last or channel_first format
    - grade_dict : a dictionary of coverting the grades to numerical scores.
    """
    # check fraction of datasets sum up to 1 (ignoring float rounding errors)
    assert np.isclose(train + val + test, 1)

    # load the data, n.b. arrays are sorted by grade
    loaded = np.load('data/data_user.npz')
    moves = loaded['moves']
    grades = loaded['grades']
    grade_dict = loaded['grade_dict'][()]
    
    # reduce the last axis - start, mid or ending hold
    # Doesn't change anything to test accuracy
    moves = np.sum(moves, axis=-1, keepdims=True)

    # Find partition arguments between grade 6, 7 & 8
    part_arg = np.searchsorted(grades, [grade_dict['7A'], grade_dict['8A']])

    # now shuffle within the grade 6's, 7's and 8's
    permute_idx = np.arange(grades.shape[0])
    np.random.shuffle(permute_idx[:part_arg[0]])
    np.random.shuffle(permute_idx[part_arg[0]:part_arg[1]])
    np.random.shuffle(permute_idx[part_arg[1]:])
    moves = moves[permute_idx]
    grades = grades[permute_idx]

    # data processing
    if data_format == 'channels_first':
        moves = np.moveaxis(moves, -1, 1)
    moves = moves.astype(np.float32)

    # create the train, val and test datasets from the grade classes
    part_start = np.append(0, part_arg)
    size = np.array([part_arg[0], part_arg[1] - part_arg[0],
                     len(grades) - part_arg[1]])

    num_val = (val * size).astype(int)
    num_test = (test * size).astype(int)
    num_train = (size - num_val - num_test).astype(int)

    # generate the training, val and test sets
    slice_range = [part_start,
                   part_start + num_train,
                   part_start + num_train + num_val,
                   part_start + num_train + num_val + num_test]
    X, y = [], []
    for j in range(3):
        grade_list, moves_list = [], []
        for i in range(3):
            grade_list.append(grades[slice_range[j][i]: slice_range[j+1][i]])
            moves_list.append(moves[slice_range[j][i]: slice_range[j+1][i]])
        X.append(np.concatenate(moves_list))
        y.append(np.concatenate(grade_list))

    X_train, X_val, X_test = X
    y_train, y_val, y_test = y

    # check: sets are the correct length
    assert (len(y_val) == np.sum(num_val) and len(y_test) == np.sum(num_test)
            and len(y_train) == np.sum(num_train))

    # Normalize the data: subtract the mean pixel and divide by std
    mean_pixel = X_train.mean(axis=(0, 1, 2), keepdims=True)
    std_pixel = X_train.std(axis=(0, 1, 2), keepdims=True)
    X_train = (X_train - mean_pixel) / std_pixel
    X_val = (X_val - mean_pixel) / std_pixel
    X_test = (X_test - mean_pixel) / std_pixel

    return X_train, y_train, X_val, y_val, X_test, y_test, grade_dict


data_format = 'channels_first'
X_train, y_train, X_val, y_val, X_test, y_test, grade_dict = load_data()
num_classes = len(grade_dict) - 1

print('Train data shape: ', X_train.shape, X_train.dtype)
print('Train labels shape: ', y_train.shape, y_train.dtype)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

Train data shape:  (2575, 1, 18, 11) float32
Train labels shape:  (2575,) int32
Validation data shape:  (733, 1, 18, 11)
Validation labels shape:  (733,)
Test data shape:  (366, 1, 18, 11)
Test labels shape:  (366,)


In [3]:
batch_size = 256


def construct_datasets(num_epochs=1):
    """
    Constructs the datasets in Tensorflow.

    Inputs: 
    - num_epochs: The number of epochs to run the training data for

    Outputs:
    - next_element_train, next_element_test: get_next() method for train and test dataset iterators.
      The next_element_test is either from the validation or testing dataset depending on which has
      been initialised
    - train_init_op, val_init_op, test_init_op:
      iterator initialisation operations for the respective datasets
    - steps_to_epochs: The number of integers steps to each epoch of the training dataset
    """
    prefetch = 2

    # make sure the dataset is on the CPU to leave the GPU for training the model
    with tf.device('/cpu:0'):
        with tf.variable_scope('train_dataset'):
            dataset_train = tf.data.Dataset.from_tensor_slices(
                (X_train, y_train))
            dataset_train = dataset_train.apply(
                tf.data.experimental.shuffle_and_repeat(len(X_train), count=num_epochs))
            dataset_train = dataset_train.shuffle(len(X_train))
            dataset_train = dataset_train.batch(batch_size).prefetch(prefetch)

        with tf.variable_scope('validation_dataset'):
            dataset_val = tf.data.Dataset.from_tensor_slices((X_val, y_val))
            dataset_val = dataset_val.batch(batch_size).prefetch(prefetch)
        with tf.variable_scope('test_dataset'):
            dataset_test = tf.data.Dataset.from_tensor_slices((X_test, y_test))
            dataset_test = dataset_test.batch(batch_size).prefetch(prefetch)

        iterator_train = tf.data.Iterator.from_structure(dataset_train.output_types,
                                                         dataset_train.output_shapes)
        next_element_train = iterator_train.get_next()
        iterator_test = tf.data.Iterator.from_structure(dataset_train.output_types,
                                                        dataset_train.output_shapes)
        next_element_test = iterator_test.get_next()

        train_init_op = iterator_train.make_initializer(dataset_train)
        val_init_op = iterator_test.make_initializer(dataset_val)
        test_init_op = iterator_test.make_initializer(dataset_test)
        steps_to_epochs = len(X_train) // batch_size

    return next_element_train, next_element_test, train_init_op, val_init_op, test_init_op, steps_to_epochs

## Define the neural network

Our neural network is a deep network based upon Resnetv2 and has the same structure as the CIFAR-10 version of ResNetv2.

In [4]:
def model_ResNetv2(inputs, is_training, total_layers=20, num_classes=10, reg=2e-4,
                   drop_rate=0.5, data_format='channels_first', scaling=False):
    """
    Creates a ResNetv2 model based upon CIFAR-10 ResNet.  
    Total_layers = 6n + 2
    """
    assert (total_layers - 2) % 6 == 0
    num_layers = (total_layers - 2) // 6
    filters = [16, 32, 64]
    
    initializer = tf.contrib.layers.variance_scaling_initializer()

    # Helper layer functions
    def batch_norm_relu_conv2d(inputs, filters, stride=1):
        inputs = batch_norm_relu(inputs)
        inputs = conv2d(inputs, filters, stride=stride)
        return inputs


    def batch_norm_relu(inputs):
        inputs = tf.layers.batch_normalization(inputs, training=is_training)
        return tf.nn.relu(inputs)


    def conv2d(inputs, filters, kernel_size=3, stride=1):
        inputs = tf.layers.conv2d(inputs, filters, kernel_size, strides=stride, padding="same", kernel_initializer=initializer,
                                  kernel_regularizer=tf.contrib.layers.l2_regularizer(
                                      scale=reg),
                                  data_format=data_format)
        if drop_rate != 0:
            # n.b rate = 1 - keep_prob
            inputs = tf.layers.dropout(inputs, rate=drop_rate, training=is_training)
        return inputs

    # Resnet unit
    def ResNet_unit(inputs, filters, i, j, subsample):
        with tf.variable_scope(f"conv{i+2}_{j+1}"):
            shortcut = inputs
            stride = 2 if subsample else 1

            # for the first unit batch_norm_relu before splitting into two paths
            if i == 0 and j == 0:
                inputs = batch_norm_relu(inputs)
                shortcut = inputs
                inputs = conv2d(inputs, filters, stride=stride)
            else:
                inputs = batch_norm_relu_conv2d(inputs, filters, stride=stride)
            inputs = batch_norm_relu_conv2d(inputs, filters)

            if subsample:
                if data_format == 'channels_last':
                    paddings = tf.constant(
                        [[0, 0], [0, 0], [0, 0], [0, filters // 2]])
                    # reduce image height and width by striding as in resnet paper
                    shortcut = shortcut[:, ::2, ::2, :]
                else:
                    paddings = tf.constant(
                        [[0, 0], [0, filters // 2], [0, 0], [0, 0]])
                    shortcut = shortcut[:, :, ::2, ::2]
                shortcut = tf.pad(shortcut, paddings)
            # scale activation ala Inception-ResNet
            if scaling is True:
                inputs = 0.2 * inputs
            inputs = shortcut + inputs

            return inputs
    
    # Construct ResNet

    # first do a single convolution ResNet_unit with no addition
    with tf.variable_scope("conv1"):
        inputs = conv2d(inputs, filters[0], stride=2)

    # now some ResNet units
    for i in range(3):
        for j in range(num_layers):
            # don't subsample on first go round
            subsample = i > 0 and j == 0
            inputs = ResNet_unit(inputs, filters[i], i, j, subsample)
    
    with tf.variable_scope("pool_and_fc"):
        # Final activation
        inputs = tf.nn.relu(inputs)

        # Global average pooling, 10 way FC layer and then output to scores.
        # Global average pooling is same as doing reduce_mean
        if data_format == 'channels_last':
            reduce_axis = [1, 2]
        else:
            reduce_axis = [2, 3]
        inputs = tf.reduce_mean(inputs, axis=reduce_axis)
        inputs = tf.layers.flatten(inputs)
        inputs = tf.layers.batch_normalization(inputs, training=is_training)
        scores = tf.layers.dense(inputs, num_classes, kernel_initializer=initializer,
                                 kernel_regularizer=tf.contrib.layers.l2_regularizer(
                                     scale=reg))
    return scores

A small test to check that our neutral network works correctly

In [5]:
def test_model_ResNet_fc():
    """ A small unit test for model_ResNetv2 above. """
    tf.reset_default_graph()

    x = tf.zeros((50, 3, 18, 11))
    scores = model_ResNetv2(x, 1, num_classes=num_classes, drop_rate=0.8,
                            data_format='channels_first', scaling=True)

    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        scores_np = sess.run(scores)
        print(scores_np.shape)

test_model_ResNet_fc()

(50, 15)


In [8]:
def sparse_earth_mover(labels, logits, p=2):
    """
    Computes the normalised squared Earth Mover’s Distance loss from https://arxiv.org/pdf/1611.05916.pdf.
    Since our classes are ordered this loss behaves much better than the usual softmax cross entropy.

    Inputs:
    - labels: Tensor of shape [batch_size] and dtype int32 or int64.
      Each entry in labels must be an index in [0, num_classes)
    - logits: Unscaled log probabilities of shape [batch_size, num_classes]
    - p: which l^p norm to use. p = 2 represents the squared Earth mover's distance

    Returns:
    - loss: A Tensor of the same shape as labels and of the same type as logits with the softmax cross entropy loss.
    """

    with tf.name_scope("sparse_earth_mover"):
        num_classes = tf.shape(logits)[-1]

        logits_normed = tf.nn.softmax(logits)
        one_hot_labels = tf.one_hot(labels, num_classes)

        cdf_labels = tf.cumsum(one_hot_labels, axis=-1)
        cdf_logits = tf.cumsum(logits_normed, axis=-1)
        if p == 2:
            loss = tf.sqrt(tf.reduce_mean(
                tf.square(cdf_labels - cdf_logits), axis=-1))
        if p == 1:
            loss = tf.reduce_mean(
                tf.abs(cdf_labels - cdf_logits), axis=-1)
        else:
            loss = (tf.reduce_mean(
                (cdf_labels - cdf_logits) ** p, axis=-1)) ** (1.0 / p)

    return loss

In [9]:
def CJS(labels, logits):
    """Computes the symmetrical discrete cumulative Jensen-Shannon divergence from https://arxiv.org/pdf/1708.07089.pdf

    Inputs:
    - labels: Tensor of shape [batch_size] and dtype int32 or int64.
      Each entry in labels must be an index in [0, num_classes)
    - logits: Unscaled log probabilities of shape [batch_size, num_classes]

    Returns:
    - loss: A Tensor of the same shape as labels and of the same type as logits with the softmax cross entropy loss.
    """
    with tf.name_scope("CJS_loss"):
        num_classes = tf.shape(logits)[-1]

        logits_normed = tf.nn.softmax(logits)
        one_hot_labels = tf.one_hot(labels, num_classes)

        cdf_labels = tf.cumsum(one_hot_labels, axis=-1)
        cdf_logits = tf.cumsum(logits_normed, axis=-1)

        def ACCJS(p, q):
            with tf.name_scope("ACCJS"):
            # if p(i) = 0 then ACCJS(p, q)(i) = 0 since xlog(x) -> 0 as x-> 0
                p = tf.clip_by_value(p, 1e-10, 1.0)
                return 0.5 * tf.reduce_sum(p * tf.log(p / (0.5 * (p + q))), axis=-1)

        loss = ACCJS(cdf_logits, cdf_labels) + ACCJS(cdf_labels, cdf_logits)

    return loss

In [10]:
def check_acc_test(sess, x, next_element, scores, is_training):
    """
    Checks the accuracy of a classification model.

    Inputs:
    - sess: A TensorFlow Session that will be used to run the graph
    - x: A TensorFlow placeholder Tensor where input images should be fed
    - next_element: A TensorFlow placeholder Tensor where the next batch of elements will be fed
    - scores: A TensorFlow Tensor representing the scores output from the
      model; this is the Tensor we will ask TensorFlow to evaluate.
    - A TensorFlow placeholder Tensor where a bool should be fed if we are training the dataset

    Returns: Accuracy of the model
    """
    exact, top3, one_out, num_samples = [0.0] * 4
    with tf.name_scope('accuracy'):
        while True:
            try:
                (x_np, y_np) = sess.run(next_element)
            except tf.errors.OutOfRangeError:
                break
            feed_dict = {x: x_np, is_training: False}
            scores_np = sess.run(scores, feed_dict=feed_dict)
            num_samples += x_np.shape[0]
            # find top 3 and top 1 predictions (nb argpartition doesn't sort)
            pred_top3 = np.argpartition(scores_np, -3, axis=-1)[:, -3:]
            pred_exact = scores_np.argmax(axis=-1)
            # add num correct
            # add extra dimension to y_np to broadcast
            top3 += np.sum((pred_top3 - y_np[:, None]) == 0)
            one_out += np.sum(np.abs(pred_exact - y_np) <= 1)
            exact += np.sum(pred_exact == y_np)
        acc_top3 = top3 / num_samples
        acc_one_out = one_out / num_samples
        acc_exact = exact / num_samples
    return acc_exact, acc_one_out, acc_top3

In [11]:
def check_acc_train(x, y, scores):
    """
    Check accuracy on a classification model from a batch of data.

    Inputs:
    - x: A TensorFlow placeholder Tensor where input images should be fed
    - y: A TensorFlow placeholder Tensor where input classification scores
      should be fed
    - scores: A TensorFlow Tensor representing the scores output from the
      model; this is the Tensor we will ask TensorFlow to evaluate.

    Returns: Accuracy of the model on a batch of training data
    """
    with tf.name_scope('accuracy'):
        num_samples = tf.cast(tf.shape(x)[0], tf.float32)
        
        top3 = tf.count_nonzero(tf.nn.in_top_k(scores, y, 3))
        y_pred = tf.argmax(scores, axis=1, output_type=tf.int32)
        one_out = tf.count_nonzero(tf.abs(y_pred - y) <= 1)
        exact = tf.count_nonzero(tf.equal(y_pred, y))
        
        # calculate accuracies
        acc_top3 = tf.cast(top3, tf.float32) / num_samples
        acc_one_out = tf.cast(one_out, tf.float32) / num_samples
        acc_exact = tf.cast(exact, tf.float32) / num_samples
    return acc_exact, acc_one_out, acc_top3

## Define the training loop

In [14]:
def train(model_init_fn, optimizer_init_fn, loss_fn, lr, num_epochs=1,
          decay_at=[], decay_to=[], experiment_name="",
          save=False, log=True, save_graph=False, val=True):
    """
    Simple training loop for use with models defined using tf.layers. It trains
    a model for num_epochs, peridoically checks the accuracy on the validation
    dataset, logs the training data to Tensorboard, saves the graph, and tests 
    the final accuracy on the test dataset.

    Inputs:
    - model_init_fn: A function that takes no parameters; when called it
      constructs the model we want to train: model = model_init_fn()
    - optimizer_init_fn: A function which takes no parameters; when called it
      constructs the Optimizer object we will use to optimize the model:
      optimizer = optimizer_init_fn()
    - num_epochs: The number of epochs to train for
    - data_format: Channels first or last for the tensors
    - experiment_nume: The name to call the experiement when logging and saving
    - deacy_at: A list of epochs to decay the learning rate at
    - decay_to: A list of learning rates to decay to
    - save: A bool to decide if we save the Tensorflow graph after training the model
    - log: A bool to decide to log the training for Tensorboard
    - save_graph: A bool to decide if we save the computational graph for Tensorboard
    - val: A bool to decide if we check the accuracy on the validation data or test data.
      Set to False if there is no validation dataset.

    Returns:
    - acc: List of length 3 which is the accuracy on the validation or test dataset,
      see val parameter.  acc[0] is exact accuracy, acc[1] is one out accuracy and
      acc[2] is top 3 accuracy.
      
    """

    tf.reset_default_graph()

    # construct the datasets
    (next_element_train, next_element_test, train_init_op,
     val_init_op, test_init_op, steps_to_epochs) \
        = construct_datasets(num_epochs)
    (x, y) = next_element_train

    # declare placeholders
    is_training = tf.placeholder(tf.bool, name='is_training')
    lr_var = tf.Variable(lr, trainable=False, name='learning_rate')

    # Whenever we need to record the loss, feed the test accuracy to these placeholders
    with tf.name_scope('acc'):
        tf_acc_ph_1out = tf.placeholder(tf.float32, shape=None)
        tf_acc_ph_exact = tf.placeholder(tf.float32, shape=None)
        tf_acc_ph_top3 = tf.placeholder(tf.float32, shape=None)
        # Create a scalar summary object for the accuracy so it can be displayed
        tf.summary.scalar("accuracy_within_1", tf_acc_ph_1out)
        tf.summary.scalar("accuracy_exact", tf_acc_ph_exact)
        tf.summary.scalar("accuracy_top3", tf_acc_ph_top3)

    # Use the model function to build the forward pass.
    scores = model_init_fn(x, is_training)

    # Compute the losses
    loss_scores = loss_fn(labels=y, logits=scores)
    loss_scores = tf.reduce_mean(loss_scores)
    loss_reg = tf.losses.get_regularization_loss()
    loss = loss_scores + loss_reg

    # Tensorboard logging scalars
    tf.summary.scalar('loss_scores', loss_scores)
    tf.summary.scalar('loss_reg', loss_reg)
    tf.summary.scalar('total_loss', loss)
    tf.summary.scalar('learning_rate', lr_var)

    # initialise the optimizer and create the training operation
    optimizer = optimizer_init_fn(lr_var)
    update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
    with tf.control_dependencies(update_ops):
        with tf.name_scope('train'):
            train_op = optimizer.minimize(loss)

    # check train accuarcy function
    acc_train_op = check_acc_train(x, y, scores)

    with tf.Session() as sess:

        # Tensorboard, merge all summaries but the error ones
        merged = tf.summary.merge_all(scope="(?!acc)")
        merged_acc = tf.summary.merge_all(scope="(acc)")

        sess.run(tf.global_variables_initializer())

        # Create the saver and Tensorboard log writers
        if save:
            saver = tf.train.Saver()
        if log:
            log_path = "C:/tmp/logs"
            if save_graph:
                train_writer = tf.summary.FileWriter(
                    log_path + '/train/' + experiment_name, sess.graph)
            else:
                train_writer = tf.summary.FileWriter(
                    log_path + '/train/' + experiment_name)
            test_writer = tf.summary.FileWriter(
                log_path + '/test/' + experiment_name)

        # Initialize an iterator over the training dataset.
        sess.run(train_init_op)
        t = 0
        while True:
            # decay learning rate
            if (t / steps_to_epochs) in decay_at:
                lr_var.load(
                    decay_to[decay_at.index(t / steps_to_epochs)], sess)

            # train on next batch of data
            feed_dict = {is_training: True}
            try:
                # check running accuracy on training batch and add to tensorboard every 20 steps
                if (t + 1) % 20 == 0:
                    summary, _, acc_train = sess.run(
                        [merged, train_op, acc_train_op], feed_dict=feed_dict)
                    if log:
                        train_writer.add_summary(summary, t)
                        acc_feed_dict = {tf_acc_ph_exact: acc_train[0],
                                         tf_acc_ph_1out: acc_train[1],
                                         tf_acc_ph_top3: acc_train[2]}
                        train_writer.add_summary(
                            sess.run(merged_acc, feed_dict=acc_feed_dict), t)
                else:
                    # train normally
                    loss_np, _ = sess.run(
                        [loss, train_op], feed_dict=feed_dict)
                    # stop training if loss blows up
                    if np.isnan(loss_np):
                        if val:
                            # roughly expected accuracy from random guess
                            return 0.2 * np.ones(3)
                        else:
                            break
            except tf.errors.OutOfRangeError:
                break
            t += 1

            # Check accuacry on validation dataset every epoch
            if t % steps_to_epochs == 0 and log and val:
                sess.run(val_init_op)
                acc_val = check_acc_test(sess, x, next_element_test, scores, is_training)
                acc_feed_dict = {tf_acc_ph_exact: acc_val[0],
                                 tf_acc_ph_1out: acc_val[1],
                                 tf_acc_ph_top3: acc_val[2]}
                test_writer.add_summary(
                    sess.run(merged_acc, feed_dict=acc_feed_dict), t)

        # End of training.  Calculate accuracy on validation dataset
        if val:
            sess.run(val_init_op)
            acc_val = check_acc_test(sess, x, next_element_test, scores, is_training)
            acc_feed_dict = {tf_acc_ph_exact: acc_val[0],
                             tf_acc_ph_1out: acc_val[1],
                             tf_acc_ph_top3: acc_val[2]}
            
            if log:
                test_writer.add_summary(
                    sess.run(merged_acc, feed_dict=acc_feed_dict), t)
        else:
            acc_val = None

        print('End of training')
        # Save the graph to disk.
        if save:
            save_path = saver.save(sess, f"C:/tmp/save/{experiment_name}.ckpt")
        if val:
            print(f"Validation accuracy is:")
            print(f"Exact: {acc_val[0]}")
            print(f"1 out: {acc_val[1]}")
            print(f"Top 3: {acc_val[2]}\n")
            return acc_val
        else:
            # Calculate accuracy on test dataset
            sess.run(test_init_op)
            acc_test = check_acc_test(sess, x, next_element_test, scores, is_training)
            print(f"Accuracy on the test dataset")
            print(f"Exact: {acc_test[0]}")
            print(f"1 out: {acc_test[1]}")
            print(f"Top 3: {acc_test[2]}")
            return acc_test

Before we optimiser our hyperparameters we check the model with some sensible parameters.

In [16]:
num_epochs = 250
total_layers = 14
learning_rate = 0.05
reg = 2e-4
drop_rate = 0.6 # set this prob of the neurons to 0
# decay by 0.95 learning rate every 5 epochs
decay_at = list(np.arange(50) * 5)
decay_to = list(learning_rate * 0.95 ** np.arange(50))

name = (f"climbing_ResNet{total_layers}_lr{learning_rate}"
        f"_reg{reg}_drop{drop_rate}_adam_CJS_scale0.2_eps1e-3_decay_0.95-every5")

def model_init_fn(inputs, is_training, total_layers=total_layers, reg=reg):
    return model_ResNetv2(inputs, is_training, total_layers=total_layers, reg=reg, drop_rate=drop_rate,
                          num_classes=num_classes, scaling=True)

def optimizer_init_fn(lr):
    return tf.train.AdamOptimizer(lr, epsilon=1e-3)

acc_val = train(model_init_fn, optimizer_init_fn, CJS, learning_rate, 
                      num_epochs=num_epochs, experiment_name=name,
                      decay_at=decay_at, decay_to=decay_to, save_graph=True)

End of training
Validation accuracy is:
Exact: 0.41336971350613916
1 out: 0.6562073669849932
Top 3: 0.7066848567530696



## Optimise the hyperparameters
We use `scikit-optimize` to perform a random grid search and Gaussian process optimisation to find the best hyperparameters: `learning rate`, `reg` and `decay_rate`.  We train a 14 layer network over 250 epochs.  This optimisation process takes 30 minutes per loss function on my laptop.  We choose to use an Adam optimizer and decay the learning rate by multiplying by the decay_rate every 5 epochs.

In [12]:
num_epochs = 250
total_layers = 14
num_calls = 25
drop_rate = 0.6 # set this prob of the neurons to 0
dim_learning_rate = Real(low=1e-6, high=2e-1, prior='log-uniform',
                         name='learning_rate')
dim_reg = Real(low=1e-5, high=1e-1, prior='log-uniform',
               name='reg')
dim_decay_rate = Real(0.9, 1, name='decay_rate')
dimensions = [dim_learning_rate, dim_reg, dim_decay_rate]

@use_named_args(dimensions=dimensions)
def op_acc(learning_rate, reg, decay_rate):
    # decay learning rate by 0.95 learning rate every 5 epochs
    decay_at = list(np.arange(50) * 5)
    decay_to = list(learning_rate * decay_rate ** np.arange(50))
    
    name = (f"climbing_ResNet{total_layers}_lr{learning_rate}_reg{reg}"
            f"_drop{drop_rate}_adam_em_gp_search_scale0.2_eps1e-3_decay_{decay_rate}-every5")

    def model_init_fn(inputs, is_training, total_layers=total_layers, reg=reg):
        return model_ResNetv2(inputs, is_training, total_layers=total_layers, drop_rate=drop_rate,
                              reg=reg, num_classes=num_classes)

    def optimizer_init_fn(lr):
        return tf.train.AdamOptimizer(lr, epsilon=1e-3)

    # Stop printing from train function
    # (https://stackoverflow.com/questions/23610585/ipython-notebook-avoid-printing-within-a-function/23611571#23611571)
    with io.capture_output() as captured:
        acc_val = train(model_init_fn, optimizer_init_fn, sparse_earth_mover, learning_rate,
                             num_epochs=num_epochs, experiment_name=name,
                             decay_at=decay_at, decay_to=decay_to)
    # optimise one_out accuaracy
    return -acc_val[1]


search_result = gp_minimize(func=op_acc, dimensions=dimensions, n_calls=num_calls,
                            verbose=True, n_restarts_optimizer=20)
plot_convergence(search_result)
print(search_result.x)

In [13]:
@use_named_args(dimensions=dimensions)
def op_acc(learning_rate, reg, decay_rate):
    name = (f"climbing_ResNet{total_layers}_lr{learning_rate}_reg{reg}"
            f"_drop{drop_rate}_adam_CJS_gp_search_scale0.2_eps1e-3_decay_{decay_rate}-every5")
    decay_at = list(np.arange(50) * 5)
    decay_to = list(learning_rate * decay_rate ** np.arange(50))

    def model_init_fn(inputs, is_training, total_layers=total_layers, reg=reg):
        return model_ResNetv2(inputs, is_training, total_layers=total_layers, drop_rate=drop_rate,
                              reg=reg, num_classes=num_classes)

    def optimizer_init_fn(lr):
        return tf.train.AdamOptimizer(lr, epsilon=1e-3)

    # Stop printing from train function
    with io.capture_output() as captured:
        acc_val = train(model_init_fn, optimizer_init_fn, CJS, learning_rate,
                             num_epochs=num_epochs, experiment_name=name,
                             decay_at=decay_at, decay_to=decay_to)
    # optimise one_out accuaracy
    return -acc_val[1]


search_result = gp_minimize(func=op_acc, dimensions=dimensions, n_calls=num_calls,
                            verbose=True, n_restarts_optimizer=20)
plot_convergence(search_result)
print(search_result.x)

In [14]:
@use_named_args(dimensions=dimensions)
def op_acc(learning_rate, reg, decay_rate):
    name = (f"climbing_ResNet{total_layers}_lr{learning_rate}_reg{reg}"
            f"_drop{drop_rate}_adam_cross_entropy_gp_search_scale0.2_eps1e-3_decay_{decay_rate}-every5")
    decay_at = list(np.arange(50) * 5)
    decay_to = list(learning_rate * decay_rate ** np.arange(50))

    def model_init_fn(inputs, is_training, total_layers=total_layers, reg=reg):
        return model_ResNetv2(inputs, is_training, total_layers=total_layers, drop_rate=drop_rate,
                              reg=reg, num_classes=num_classes)

    def optimizer_init_fn(lr):
        return tf.train.AdamOptimizer(lr, epsilon=1e-3)

    # Stop printing from train function
    with io.capture_output() as captured:
        acc_val = train(model_init_fn, optimizer_init_fn, tf.nn.sparse_softmax_cross_entropy_with_logits, learning_rate,
                             num_epochs=num_epochs, experiment_name=name,
                             decay_at=decay_at, decay_to=decay_to)
    # optimise one_out accuaracy
    return -acc_val[1]


search_result = gp_minimize(func=op_acc, dimensions=dimensions, n_calls=num_calls,
                            verbose=True, n_restarts_optimizer=20)
plot_convergence(search_result)
print(search_result.x)

# Test Results

We run the models with their optimised hyperparameters and retrain the model with the validation data included in the training dataset.

In [17]:
X_train = np.concatenate((X_train, X_val))
y_train = np.concatenate((y_train, y_val))
decay_at = list(np.arange(50) * 5)

In [18]:
# Earth Mover
[learning_rate, reg, decay_rate] = [0.01, 0.07, 0.92]
decay_to = list(learning_rate * decay_rate ** np.arange(50))
name = (f"climbing_ResNet{total_layers}_lr{learning_rate}_reg{reg}"
        "_scale0.2_eps1e-3_decay_{decay_rate}-every5"
        f"_drop{drop_rate}_adam_em_test")


def model_init_fn(inputs, is_training, total_layers=total_layers, reg=reg):
    return model_ResNetv2(inputs, is_training, total_layers=total_layers, drop_rate=drop_rate,
                          reg=reg, num_classes=num_classes)


def optimizer_init_fn(lr):
    return tf.train.AdamOptimizer(lr)


acc_test = train(model_init_fn, optimizer_init_fn, sparse_earth_mover, learning_rate, num_epochs=num_epochs,
                 experiment_name=name, val=False, decay_at=decay_at, decay_to=decay_to)

End of training
Accuracy on the test dataset
Exact: 0.4781420765027322
1 out: 0.7076502732240437
Top 3: 0.73224043715847


In [19]:
# CJS
[learning_rate, reg, decay_rate] = [0.02, 1e-05, 1.0]
decay_to = list(learning_rate * decay_rate ** np.arange(50))
name = (f"climbing_ResNet{total_layers}_lr{learning_rate}_reg{reg}"
        "_scale0.2_eps1e-3_decay_{decay_rate}-every5"
        f"_drop{drop_rate}_adam_CJS_test")


def model_init_fn(inputs, is_training, total_layers=total_layers, reg=reg):
    return model_ResNetv2(inputs, is_training, total_layers=total_layers, drop_rate=drop_rate,
                          reg=reg, num_classes=num_classes)


def optimizer_init_fn(lr):
    return tf.train.AdamOptimizer(lr)


acc_test = train(model_init_fn, optimizer_init_fn, CJS, learning_rate, num_epochs=num_epochs,
                 experiment_name=name, val=False, decay_at=decay_at, decay_to=decay_to)

End of training
Accuracy on the test dataset
Exact: 0.43989071038251365
1 out: 0.6939890710382514
Top 3: 0.7131147540983607


In [20]:
# Cross Entropy
[learning_rate, reg, decay_rate] = [0.0007, 0.0004, 0.9]
decay_to = list(learning_rate * decay_rate ** np.arange(50))
name = (f"climbing_ResNet{total_layers}_lr{learning_rate}_reg{reg}"
        "_scale0.2_eps1e-3_decay_{decay_rate}-every5"
        f"_drop{drop_rate}_adam_cross_test")


def model_init_fn(inputs, is_training, total_layers=total_layers, reg=reg):
    return model_ResNetv2(inputs, is_training, total_layers=total_layers, drop_rate=drop_rate,
                          reg=reg, num_classes=num_classes)


def optimizer_init_fn(lr):
    return tf.train.AdamOptimizer(lr)


acc_test = train(model_init_fn, optimizer_init_fn, tf.nn.sparse_softmax_cross_entropy_with_logits,
                 learning_rate, num_epochs=num_epochs,
                 experiment_name=name, val=False, decay_at=decay_at, decay_to=decay_to)

End of training
Accuracy on the test dataset
Exact: 0.4426229508196721
1 out: 0.6420765027322405
Top 3: 0.7021857923497268
