# **Composite Data Readers Tutorial**

**Composite Reader** is a reader which takes several data deserializers and composes them into a single input read, which can then be used as an input map for a computation graph.

**This Tutorial** walks through how to compose image data along side a feature vector for training where the prediction is 4 regressor outputs.  A common example of this might be in a robotics scenario where you need to compose the current image data with a feature vector representing the robot's current state which then produces 4 servo outputs or perhaps 4 different reward values for different possible actions to take.  The possibilities are really endless and therefor the need to understand this level of flexibility.

**Key Concepts** Beyond the ability to just compose data, this tutorial highlights the ability to use specific types of neural network layers for specific data inputs and then compose those together into a prediction where it is best.  This is not unlike the human sensory system with individual sensory systems specifically tuned to the type of signal input which produces more or less a feature map that is then consumed by the judgement engine to produce signal outputs that are understood by the rest of your body to produce actions.

In [1]:
#Just some import statements
from __future__ import print_function 
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import numpy as np
import os
import sys
import time
import cntk as C
import cntk.io.transforms as xforms 

# Define Image dimensions for this experiment.
# Since we use CNTK's image deserializer, we are
# able to configure this out and experiment to 
# our hearts content with out re-processing our data.
num_channels = 3
image_width = 32
image_height = 32

# images are 32 x 32 with 3 channels of color
# grey scale is 1 channel of color
# this can really be any multi dimensional 3 axis data.
image_shape = (num_channels, image_width, image_height) 

# number of features to produce from the convolutional processing
# no reason for 8, I felt like 8 today.  This will likely have
# a large impact on performance and probably relates to input image size
# and image complexities and nuances.
conv_feature_map_size = 8

# number of features coming from our 
# CTF file, which is 3.
tab_features = 3

# number of regressor outputs to create
num_regression_outputs = 4

# place holder variables for our image data
x_i = C.input_variable(image_shape)

# place holder variable for our tabular data
x_t = C.input_variable(tab_features)

# place holder variable for our predictions
y = C.input_variable(num_regression_outputs)

### **Data Format** 

You can find the actual file examples in this directory as .map (for the image files) and .ctf (for the feature data).  The combination of CTF files and image map files allows you to compose whatever kind of labels with any type of data you wish.  For example; you can use a 0 index class identifier in the images file should you be performing classification; however in this instance, we are performing regression on 4 output nodes and therefor take advantage of the label flexibility provided in the CTF format.

**The image map file looks like:**

  images/image1.png    0  
  images/image3.png    0

Note:  You can also use fully qualified paths such as c:/data/images/image1.png to refer to your image as well.  The class labels in this specific scenario are unused, so we just use 0's.

**The CTF file looks like:**

  |features 0 0 1 |label 1 2 3 4   
  |features 0 1 0 |label 4 3 2 1 

All features must be in float format.  Either as integers or decimal point representation.  All features and labels must also be of the same shape.  If you have variable sized data, you should be using a sequence representation of that data, which is documented here: https://www.cntk.ai/pythondocs/cntk.io.html?highlight=ctf#cntk.io.CTFDeserializer.

**Finally & Importantly:** The index in which each example exists in the files is the index they are pulled together.  For example the first line in each file is index 0 and the second is index 1.  If a minibatch pulls index 1 into the batch, then it will be pulling the second line from each file.

In [2]:
# Read a COMPOSITE reader to read data from both the image map and CTF files
def create_reader(map_file, ctf_file, is_training, num_regression_outputs):
    
    # create transforms
    transforms = []        
    if is_training:
        # don't want to do this for validation, just a data augmentation technique, 
        # and why we like CNTK's image deserializer with composition :D
        transforms += [ xforms.crop(crop_type='randomside', side_ratio=0.8)  ]        
    transforms +=    [
        # gets all images to the correct shape.
        xforms.scale(width=image_width, height=image_height, channels=num_channels, interpolations='linear')]

    # create IMAGE DESERIALIZER for map file
    image_source = C.io.ImageDeserializer(map_file, C.io.StreamDefs(
        features_image = C.io.StreamDef(field='image', transforms=transforms)))
    
    # create CTF DESERIALIZER for CTF file
    ctf_source = C.io.CTFDeserializer(ctf_file, C.io.StreamDefs(
        labels = C.io.StreamDef(field="label", shape=num_regression_outputs, is_sparse=False),
        features_tabular = C.io.StreamDef(field="features", shape=3)))

    # create a minibatch source by compositing them together 
    return C.io.MinibatchSource([image_source, ctf_source], max_samples=sys.maxsize, randomize=is_training)

## Take Note
The names of the variables fed by StreamDef are features_image, labels and features_tabular.  We will be able to extract those and use those specifically within our input map configuration.

## Build the Model
Here we actual build a model.  Notice it takes in two inputs x_i and x_t.  x_i is the image input data after it has been deserialized and x_t is the features from the ctf file after it has been deserialized.  Each of these are brought into the same model definition and can be used and composed within this model.  We first push x_i through a typical convolutional model.  Instead of producing a prediction though, we produce a feature map.  

*You can think of a feature map as a set of features that you let the neural network figure out and tune on that are the best simplistic representation of the data which can be merged with other data sets.*

We then take the feature map, which is essentially a numpy array and append the features from our ctf file, which is also essentially a numpy array.  These then form the input for a standard feed forward network to produce our 4 regressor outputs.

*Notice that we use relu as the standard activation function, but on the final layer we specify an activation of None* a None activation essentially just computes a final linear regression on each output node with no extra function on top of it.  This allows you the ability to produce outputs from -infinity to +infinity.

**Loss and Errors:**  Notice that we use squared_error as our loss.  This is because we are predicting regressors as our output and not a classifier.  errs = loss when you are predicting regressors.

In [3]:
# function to build model
def create_model(x_i, x_t):
    with C.layers.default_options(init = C.layers.glorot_uniform(), activation = C.relu):
            h = x_i
            
            h = C.layers.Convolution2D(filter_shape=(5,5), num_filters=8, strides=(1,1), pad=True, name="conv_1")(h)            
            h = C.layers.MaxPooling(filter_shape=(2,2), strides=(2,2), name="max_1")(h)            
            h = C.layers.Convolution2D(filter_shape=(5,5), num_filters=16, strides=(1,1), pad=True, name="conv_2")(h)            
            h = C.layers.MaxPooling(filter_shape=(3,3), strides=(3,3), name="max_2")(h)
            
            # create a feature map
            h = C.layers.Dense(conv_feature_map_size, name="conv_feature_map")(h)
            
            #merge the convolutional feature map with raw tabular data
            h = C.splice(h, x_t, axis=0)
            
            #mix up the data in a dense output sequence
            h = C.layers.Dense(conv_feature_map_size, name="merged_dense_1")(h)
            p = C.layers.Dense(num_regression_outputs, activation = None, name="prediction")(h)
            
            return p
        
def create_errors(model, labels):
    loss = C.losses.squared_error(model, labels)
    errs = loss
    return loss, errs # (model, labels) -> (loss, error metric)        

## Training Progress Printer

Basically this just takes in the trainer, the current minibatch, a frequency and prints out the current statistics from this minibatch.  

In [4]:
# Defines a utility that prints the training progress
def print_training_progress(trainer, mb, frequency, train = True):
    training_loss = "NA"
    eval_error = "NA"

    if mb%frequency == 0:
        training_loss = trainer.previous_minibatch_loss_average
        eval_error = trainer.previous_minibatch_evaluation_average
        print ("Minibatch: {0}, Loss: {1:.4f}, Error: {2:.2f}%".format(mb, training_loss, eval_error*100))
        
    return mb, training_loss, eval_error

### Configure Train and Test Loop

This is pretty much boiler plate as well, but I would suggest reading some of the other articles that really focus on this area because there are some great optimizations you could make here that I just left out for the sake of simplicity.

**The big things to notice here are: ** the input map is very important.  input_map and test_input_map.  Notice they pull out those specific streamdef variables we defined in our reader and populate our placeholder variables y, x_i and x_t.


In [5]:
def train_test(train_reader, test_reader, num_sweeps_to_train_with=10):
    
    # Instantiate the loss and error function
    # z comes from a global scope outside of this function (defined later).
    loss, label_error = create_errors(z, y)
    
    # Instantiate the trainer object to drive the model training
    # we use a super low learning rate so we don't get exploding gradients
    # which is common for regressors in large networks.
    learning_rate = 0.000001
    lr_schedule = C.learning_rate_schedule(learning_rate, C.UnitType.minibatch)
    learner = C.sgd(z.parameters, lr_schedule)
    trainer = C.Trainer(z, (loss, label_error), [learner])
    
    # Initialize the parameters for the trainer
    minibatch_size = 64
    num_samples_per_sweep = 60000
    num_minibatches_to_train = (num_samples_per_sweep * num_sweeps_to_train_with) / minibatch_size
    
    # Map the data streams to the input and labels.
    # this is where we can pull in our label pairs
    input_map={
        y  : train_reader.streams.labels,
        x_i  : train_reader.streams.features_image,
        x_t  : train_reader.streams.features_tabular
    } 
    
    training_progress_output_freq = 500
     
    # Start a timer
    start = time.time()

    for i in range(0, int(num_minibatches_to_train)):
        # Read a mini batch from the training data file
        data=train_reader.next_minibatch(minibatch_size, input_map=input_map)         
        if (i==0):
            print("data=", data)        
        # train with this minibatch
        trainer.train_minibatch(data)        
        # print progress 
        print_training_progress(trainer, i, training_progress_output_freq)     
    # Print training time
    print("Training took {:.1f} sec".format(time.time() - start))
    
    # Test the model
    test_input_map = {
        y  : test_reader.streams.labels,
        x_i  : test_reader.streams.features_image,
        x_t : test_reader.streams.features_tabular
    }

    # Test data for trained model
    test_minibatch_size = 512
    num_samples = 10000
    num_minibatches_to_test = num_samples // test_minibatch_size

    test_result = 0.0   
    for i in range(num_minibatches_to_test):    
        # We are loading test data in batches specified by test_minibatch_size
        # Each data point in the minibatch is a MNIST digit image of 784 dimensions 
        # with one pixel per dimension that we will encode / decode with the 
        # trained model.
        data = test_reader.next_minibatch(test_minibatch_size, input_map=test_input_map)
        eval_error = trainer.test_minibatch(data)
        test_result = test_result + eval_error

    # Average of evaluation errors of all test minibatches
    print("Average test error: {0:.2f}".format(test_result / num_minibatches_to_test))

## Train this THING!!!
Alright, time for rubber to hit the pavement on this thing.  We create our model by passing it the placeholder x_i and x_t which will be populated by our readers for each mini batch during training.

Create those readers, and call into the loop we created above with our readers.  Our model lives outside of the function, because we want to be able to persist it outside or possibly continue training it after this first loop.

In [6]:
z = create_model(x_i, x_t)
reader_train = create_reader("train.map", "train.ctf", True, num_regression_outputs)
reader_test = create_reader("test.map", "test.ctf", False, num_regression_outputs)

train_test(reader_train, reader_test, num_sweeps_to_train_with = 10)

data= {Input('Input3', [#], [3 x 32 x 32]): MinibatchData(data=Value([64 x 1 x 3 x 32 x 32], GPU), samples=64, seqs=64), Input('Input4', [#], [3]): MinibatchData(data=Value([64 x 1 x 3], GPU), samples=64, seqs=64), Input('Input5', [#], [4]): MinibatchData(data=Value([64 x 1 x 4], GPU), samples=64, seqs=64)}
Minibatch: 0, Loss: 44430.9805, Error: 4443098.05%
Minibatch: 500, Loss: 33.2312, Error: 3323.12%
Minibatch: 1000, Loss: 32.7910, Error: 3279.10%
Minibatch: 1500, Loss: 32.5484, Error: 3254.84%
Minibatch: 2000, Loss: 32.3945, Error: 3239.45%
Minibatch: 2500, Loss: 32.2752, Error: 3227.52%
Minibatch: 3000, Loss: 32.1301, Error: 3213.01%
Minibatch: 3500, Loss: 31.9984, Error: 3199.84%
Minibatch: 4000, Loss: 31.8687, Error: 3186.87%
Minibatch: 4500, Loss: 31.7385, Error: 3173.85%
Minibatch: 5000, Loss: 31.6093, Error: 3160.93%
Minibatch: 5500, Loss: 31.4889, Error: 3148.89%
Minibatch: 6000, Loss: 31.3532, Error: 3135.32%
Minibatch: 6500, Loss: 31.2155, Error: 3121.55%
Minibatch: 7000, 

# Understanding These Metrics

In this case, we have an average error of 32.96 for all inputs.  Since the definition of Squared Loss is the Sum of the Squares of our errors, we have a little math to do to understand our outputs.  Since there are 4 regressors nodes, we take 32.96 and divide by 4, which gives us: 8.24 as the average error on each node.  We then take the square root of that, which is 2.87.  Finally we divide this number by 2 as the error could have happened in 2 directions (positive or negative).  This means our average error for each regressor node with only 32 seconds of training is ~ 1.435.  Not half bad.  If we let it go for a while and converge I'm sure we would see this drop.

# ** SUMMARY **
Alright, so thats about it.  You now know how to create multiple data files with different kinds of data in them and then compose that data together into a single model.  Congratulations!  See what you can do with this on Kaggle.  This particular challenge looks very ripe for this type of approach: https://www.kaggle.com/c/zillow-prize-1 

You can find some additional details here: http://dacrook.com/complex-neural-network-data-modelling-with-cntk/ 