## CNTK convolution network - CIFAR10

In this notebook we create a convolution neural network using CNTK. We then train it using 50000 examples from the CIFAR10 dataset and test it on the remaining 10000.

In [24]:
import os
from os import path
import numpy as np
from itertools import repeat, chain

from cntk.layers import Convolution, MaxPooling, AveragePooling, Dense
from cntk.io import MinibatchSource, ImageDeserializer, StreamDef, StreamDefs
from cntk.initializer import glorot_uniform
from cntk import Trainer
from cntk.learner import adam_sgd, learning_rate_schedule, UnitType, momentum_schedule
from cntk.ops import cross_entropy_with_softmax, classification_error, relu, input_variable, softmax, element_times
from cntk.utils import *

The dimensions of the CIFAR10 images are 32 by 32 with three colour channels. The dataset is also made up of 10 classes.

In [25]:
# model dimensions
IMAGE_HEIGHT = 32
IMAGE_WIDTH = 32
NUM_CHANNELS = 3
NUM_CLASSES = 10

### Network definition
We define a convolution network with 3 convolution layers

In [26]:
def create_model(input, out_dims):
    '''Creates convolution network
    '''
    net = Convolution((5,5), 32, init=glorot_uniform(), activation=relu, pad=True)(input)
    net = MaxPooling((3,3), strides=(2,2))(net)

    net = Convolution((5,5), 32, init=glorot_uniform(), activation=relu, pad=True)(net)
    net = AveragePooling((3,3), strides=(2,2), pad=True)(net)

    net = Convolution((5,5), 64, init=glorot_uniform(), activation=relu, pad=True)(net)
    net = AveragePooling((3,3), strides=(2,2), pad=True)(net)
    
    net = Dense(64, init=glorot_uniform())(net)
    net = Dense(out_dims, init=glorot_uniform(), activation=None)(net)
    
    return net

### Image reader
In CNTK we need to define a reader that will read in the images and compose the data into the appropriate format.

In [27]:
def create_reader(map_file, mean_file, train):
    '''Define the reader for both training and evaluation action.
    '''
    if not os.path.exists(map_file) or not os.path.exists(mean_file):
        raise RuntimeError("Please make sure you run the process_cifar_data notebook to prepare the data")

    # transformation pipeline for the features
    transforms = [
        ImageDeserializer.scale(width=IMAGE_WIDTH,
                                height=IMAGE_HEIGHT,
                                channels=NUM_CHANNELS,
                                interpolations='linear'),
        ImageDeserializer.mean(mean_file)
    ]
    # deserializer
    return MinibatchSource(ImageDeserializer(map_file, StreamDefs(
        features=StreamDef(field='image', transforms=transforms),  # first column in map file is referred
                                                                   # to as 'image'
        labels=StreamDef(field='label', shape=NUM_CLASSES)  # and second as 'label'
    )))


### Initialise model and trainer
We need to normalise the values in the image as well and initialise the convolution network we defined earlier.

In [28]:
def initialise_model(input_var, model_func):
    # Normalize the input
    feature_scale = 1.0 / 256.0
    input_var_norm = element_times(feature_scale, input_var)

    # apply model to input
    initialised_model = model_func(input_var_norm, out_dims=10)
    log_number_of_parameters(initialised_model)
    return initialised_model

Here we define the the optimiser used to train the neural network. We are using Adam since it compares favourably with other optimisation techniques [[1]](https://arxiv.org/abs/1412.6980). I found it hard to reconcile the naming of the variables in the [paper](https://arxiv.org/abs/1412.6980) to those of the function. Thankfully I found the [answer here](http://stackoverflow.com/questions/41305918/in-cntk-implementation-of-adam-optimizer-how-the-parameters-alpha-beta1-beta2/41305959#41305959).

In [29]:
def initialise_trainer(initialised_model, label_var, minibatch_size=64, epoch_size=50000):
    # loss and metric
    ce = cross_entropy_with_softmax(initialised_model, label_var)
    pe = classification_error(initialised_model, label_var)

    # Set training parameters
    lr_per_minibatch = learning_rate_schedule(0.005, UnitType.minibatch)
    beta1 = momentum_schedule(0.9)
    beta2 = momentum_schedule(0.999)
    # Adam optimiser
    learner = adam_sgd(initialised_model.parameters,
                       lr_per_minibatch,
                       beta1,
                       variance_momentum=beta2)
    
    trainer = Trainer(initialised_model, ce, pe, [learner])
    return trainer

### Minibatch generator

In [30]:
def batch_size_iterator_from(epoch_size, batch_size):
    ''' Returns iterator of batch sizes
    '''
    complete_batch_count, remainder = epoch_size//batch_size, epoch_size%batch_size
    remainder_list = [remainder] if remainder>0 else []
    return chain(repeat(batch_size, complete_batch_count),remainder_list)

We use the function below to generate batches of data which we can then use to train the network with.

In [31]:
def minibatch_generator(data_reader, input_map, minibatch_size=64, epoch_size=50000):
    ''' Generates batches of data
    '''
    for batch_size in batch_size_iterator_from(epoch_size, minibatch_size):
        yield data_reader.next_minibatch(batch_size, input_map=input_map)  # fetch minibatch.

In [32]:
def input_map_for(data_reader, input_var, label_var):
    return {
        input_var: data_reader.streams.features,
        label_var: data_reader.streams.labels
    }

### Training
Below we train the network. We can specify how long we wish to train the network for by changing the value of the variable max_epochs. We will simply let the network run for 20 epochs. Which means we will run through the training data 20 times. 

In the code below we can see that we have an outer loop that runs over the number of epochs and an inner loop that runs over the minibatches in the epoch.

In [35]:
%%time
# Input variables denoting the features and label data
input_var = input_variable((NUM_CHANNELS, IMAGE_HEIGHT, IMAGE_WIDTH))
label_var = input_variable((NUM_CLASSES))
data_path = os.path.join('data', 'CIFAR-10')

reader_train = create_reader(os.path.join(data_path, 'train_map.txt'),
                             os.path.join(data_path, 'CIFAR-10_mean.xml'),
                             True)

progress_printer = ProgressPrinter(tag='Training')
initialised_model = initialise_model(input_var, create_model)
trainer = initialise_trainer(initialised_model, label_var)
input_map = input_map_for(reader_train, input_var, label_var)
max_epochs = 20

for epoch in range(max_epochs): # Epoch loop
    for batch_data in minibatch_generator(reader_train, input_map, minibatch_size=64, epoch_size=50000):# Minibatch
        trainer.train_minibatch(batch_data)
        progress_printer.update_with_trainer(trainer, with_metric=True) # log progress
    progress_printer.epoch_summary(with_metric=True)

Training 145578 parameters in 10 parameter tensors.
Finished Epoch[1 of 300]: [Training] loss = 1.809947 * 50000, metric = 66.7% * 50000 109.960s (454.7 samples per second)
Finished Epoch[2 of 300]: [Training] loss = 1.425233 * 50000, metric = 51.6% * 50000 110.140s (454.0 samples per second)
Finished Epoch[3 of 300]: [Training] loss = 1.283573 * 50000, metric = 46.2% * 50000 109.582s (456.3 samples per second)
Finished Epoch[4 of 300]: [Training] loss = 1.168716 * 50000, metric = 41.8% * 50000 109.393s (457.1 samples per second)
Finished Epoch[5 of 300]: [Training] loss = 1.071775 * 50000, metric = 38.2% * 50000 109.925s (454.9 samples per second)
Finished Epoch[6 of 300]: [Training] loss = 0.993943 * 50000, metric = 35.4% * 50000 109.230s (457.8 samples per second)
Finished Epoch[7 of 300]: [Training] loss = 0.936975 * 50000, metric = 33.1% * 50000 109.148s (458.1 samples per second)
Finished Epoch[8 of 300]: [Training] loss = 0.887111 * 50000, metric = 31.2% * 50000 108.306s (461.7 

### Testing
Once the network is trained we can simply run the test data against it and see how well our nework does

In [34]:
reader_test = create_reader(os.path.join(data_path, 'test_map.txt'),
                            os.path.join(data_path, 'CIFAR-10_mean.xml'),
                            False)
epoch_size = 10000
minibatch_size = 16
batch_gen = minibatch_generator(reader_test, input_map, minibatch_size=minibatch_size, epoch_size=epoch_size)
metric_per_batch = [trainer.test_minibatch(batch_data) for batch_data in batch_gen]
batch_weight = list(batch_size_iterator_from(epoch_size, minibatch_size))
print("Average test error rate: {}".format(np.average(metric_per_batch, weights=batch_weight)))

Average test error rate: 0.507


The error rate we get is around 33%. You can train the network for longer, use different training parameters or even change the structure of the network.