# Hot Dog vs. Not Hot Dog
## Cognitive Toolkit Edition

We use the Transfer Learning example from the CNTK repository as our baseline, and build a model based on ResNet18 that can distinguish between hotdogs and various forms of Not hotdogs (See [this clip](https://www.youtube.com/watch?v=ACmydtFDTGs&feature=youtu.be) for reference). 

We'll start off with a few of the imports required for CNTK...

In [22]:
import numpy as np
import cntk as C
import os
from PIL import Image
from cntk.device import try_set_default_device, gpu
from cntk import load_model, placeholder, Constant, Trainer, UnitType
from cntk.logging.graph import find_by_name, get_node_outputs
from cntk.io import MinibatchSource, ImageDeserializer, StreamDefs, StreamDef
import cntk.io.transforms as xforms
from cntk.layers import Dense
from cntk.learners import momentum_sgd, learning_rate_schedule, momentum_schedule
from cntk.ops import combine, softmax
from cntk.ops.functions import CloneMethod
from cntk.losses import cross_entropy_with_softmax
from cntk.metrics import classification_error
from cntk.logging import log_number_of_parameters, ProgressPrinter

## Settings

Now we set up a few constants - where to find the base model we'll be using (you should have downloaded this before-hand using `download_model.py`), which layer from it to use, what you want the new layers to be named.

We also set up our learning parameters
- `max_epochs`: How many times do we want to run through our training data?
- `mb_size`: We process multiple images at a time in a "mini-batch" - how many?
- `lr_per_mb`: The _learning rate_ controls how quickly we converge vs. how much we jump around the "search space". Decaying it over time ensures (usually) that we gradually come to rest on an answer.
- `momentum_per_mb`: Momentum ensures that when we're being told repeatedly that the answer is in a given "direction", we move more strongly in that direction.
- `l2_reg_weight`: Regularization helps to ensure that models don't _overfit_ and just get good at predicting our training data.

In [23]:
# general settings
base_folder = os.getcwd()
print('Current directory: {}'.format(base_folder))
tl_model_file = os.path.join(base_folder, "Output", "TransferLearning.model")
output_file = os.path.join(base_folder, "Output", "predOutput.txt")
features_stream_name = 'features'
label_stream_name = 'labels'
new_output_node_name = "prediction"

# Learning parameters
max_epochs = 20
mb_size = 50
lr_per_mb = [0.2]*10 + [0.1]
momentum_per_mb = 0.9
l2_reg_weight = 0.0005

# define base model location and characteristics
_base_model_file = os.path.join(base_folder, "ResNet_18.model")
_feature_node_name = "features"
_last_hidden_node_name = "z.x"
_image_height = 224
_image_width = 224
_num_channels = 3

# define data location and characteristics
_data_folder = os.path.join(base_folder, "images")
_train_map_file = os.path.join(_data_folder, "train.tsv")
_test_map_file = os.path.join(_data_folder, "test.tsv")
_num_classes = 2

Current directory: C:\dev\git_ws\msready2017


# Creating the Model and the Data

You noticed above that we're using "mini-batches" for our data, so we need to create a "source" for these mini-batches to come from. We do so by using the `ImageDeserializer` to read in our training data map-file (containing tab-delimited image file paths and labels, one per line). This splits the data into "labels" and "features", where the features are the raw image data and the labels are either "hotdog" (1) or "not hotdog" (0).

For creating the model, we load in our downloaded pre-trained ResNet18 model. We then find the last "feature layer" (where it's learned all of the high-level features like "cat's eye" and "dog's ear"), clone the model from that point to the root, and add our own final predictor layer to tell us whether we have a hotdog or a "not hotdog".

In [24]:
# Creates a minibatch source for training or testing
def create_mb_source(map_file, image_width, image_height, num_channels, num_classes, randomize=True):
    transforms = [xforms.scale(width=image_width, height=image_height, channels=num_channels, interpolations='linear')] 
    return MinibatchSource(ImageDeserializer(map_file, StreamDefs(
            features =StreamDef(field='image', transforms=transforms),
            labels   =StreamDef(field='label', shape=num_classes))),
            randomize=randomize)


# Creates the network model for transfer learning
def create_model(base_model_file, feature_node_name, last_hidden_node_name, num_classes, input_features, freeze=False):
    # Load the pretrained classification net and find nodes
    base_model   = load_model(base_model_file)
    feature_node = find_by_name(base_model, feature_node_name)
    last_node    = find_by_name(base_model, last_hidden_node_name)

    # Clone the desired layers with fixed weights
    cloned_layers = combine([last_node.owner]).clone(
        CloneMethod.freeze if freeze else CloneMethod.clone,
        {feature_node: placeholder(name='features')})

    # Add new dense layer for class prediction
    feat_norm  = input_features - Constant(114)
    cloned_out = cloned_layers(feat_norm)
    z          = Dense(num_classes, activation=None, name=new_output_node_name) (cloned_out)

    return z

# Training the Model

Training the model is where we put all of this together - we create our training mini-batch source, create our model, define how we want it to be judged, and set it off training. This code is fairly vanilla CNTK code, so I won't talk through it in detail. The important pieces to notice are the `Trainer` class, which takes in the model, the _criterion_ for judgment (loss function, evaluation metric), and the "learner". In our case we're using Stochastic Gradient Descent (SGD) with Momentum - a relatively standard method recently replaced in many cases with ADAM.

In [25]:
# Trains a transfer learning model
def train_model(base_model_file, feature_node_name, last_hidden_node_name,
                image_width, image_height, num_channels, num_classes, train_map_file,
                num_epochs, max_images=-1, freeze=False):
    epoch_size = sum(1 for line in open(train_map_file))
    if max_images > 0:
        epoch_size = min(epoch_size, max_images)

    # Create the minibatch source and input variables
    minibatch_source = create_mb_source(train_map_file, image_width, image_height, num_channels, num_classes)
    image_input = C.input_variable((num_channels, image_height, image_width))
    label_input = C.input_variable(num_classes)

    # Define mapping from reader streams to network inputs
    input_map = {
        image_input: minibatch_source[features_stream_name],
        label_input: minibatch_source[label_stream_name]
    }

    # Instantiate the transfer learning model and loss function
    tl_model = create_model(base_model_file, feature_node_name, last_hidden_node_name, num_classes, image_input, freeze)
    ce = cross_entropy_with_softmax(tl_model, label_input)
    pe = classification_error(tl_model, label_input)

    # Instantiate the trainer object
    lr_schedule = learning_rate_schedule(lr_per_mb, unit=UnitType.minibatch)
    mm_schedule = momentum_schedule(momentum_per_mb)
    learner = momentum_sgd(tl_model.parameters, lr_schedule, mm_schedule, l2_regularization_weight=l2_reg_weight)
    progress_printer = ProgressPrinter(tag='Training', num_epochs=num_epochs)
    trainer = Trainer(tl_model, (ce, pe), learner, progress_printer)

    # Get minibatches of images and perform model training
    print("Training transfer learning model for {0} epochs (epoch_size = {1}).".format(num_epochs, epoch_size))
    log_number_of_parameters(tl_model)
    for epoch in range(num_epochs):       # loop over epochs
        sample_count = 0
        while sample_count < epoch_size:  # loop over minibatches in the epoch
            data = minibatch_source.next_minibatch(min(mb_size, epoch_size-sample_count), input_map=input_map)
            trainer.train_minibatch(data)                                    # update model with it
            sample_count += trainer.previous_minibatch_sample_count          # count samples processed so far
            if sample_count % (100 * mb_size) == 0:
                print ("Processed {0} samples".format(sample_count))

        trainer.summarize_training_progress()

    return tl_model

# Model Performance Evaluation

Evaluating the model is, in some sense, more painful than training it. You need to massage each image into the format that CNTK expects, which involves converting the pixel data into one big flattened array of floats in the right order. Once you've done that, you send it into the trained model and that spits out a value for each class. Running [Softmax](https://en.wikipedia.org/wiki/Softmax_function) over those values converts them into (roughly) a probability, and we pick the highest probability as our class (using `numpy.argmax`). 

In [26]:
# Evaluates a single image using the provided model
def eval_single_image(loaded_model, image_path, image_width, image_height):
    # load and format image (resize, RGB -> BGR, CHW -> HWC)
    img = Image.open(image_path)
    if image_path.endswith("png"):
        temp = Image.new("RGB", img.size, (255, 255, 255))
        temp.paste(img, img)
        img = temp
    resized = img.resize((image_width, image_height), Image.ANTIALIAS)
    bgr_image = np.asarray(resized, dtype=np.float32)[..., [2, 1, 0]]
    hwc_format = np.ascontiguousarray(np.rollaxis(bgr_image, 2))

    ## Alternatively: if you want to use opencv-python
    # cv_img = cv2.imread(image_path)
    # resized = cv2.resize(cv_img, (image_width, image_height), interpolation=cv2.INTER_NEAREST)
    # bgr_image = np.asarray(resized, dtype=np.float32)
    # hwc_format = np.ascontiguousarray(np.rollaxis(bgr_image, 2))

    # compute model output
    arguments = {loaded_model.arguments[0]: [hwc_format]}
    output = loaded_model.eval(arguments)

    # return softmax probabilities
    sm = softmax(output[0])
    return sm.eval()

# Evaluates an image set using the provided model
def eval_test_images(loaded_model, output_file, test_map_file, image_width, image_height, max_images=-1, column_offset=0):
    num_images = sum(1 for line in open(test_map_file))
    if max_images > 0:
        num_images = min(num_images, max_images)
    print("Evaluating model output node '{0}' for {1} images.".format(new_output_node_name, num_images))

    pred_count = 0
    correct_count = 0
    np.seterr(over='raise')
    with open(output_file, 'wb') as results_file:
        with open(test_map_file, "r") as input_file:
            for line in input_file:
                tokens = line.rstrip().split('\t')
                img_file = tokens[0 + column_offset]
                probs = eval_single_image(loaded_model, img_file, image_width, image_height)

                pred_count += 1
                true_label = int(tokens[1 + column_offset])
                predicted_label = np.argmax(probs)
                if predicted_label == true_label:
                    correct_count += 1

                np.savetxt(results_file, probs[np.newaxis], fmt="%.3f")
                if pred_count % 100 == 0:
                    print("Processed {0} samples ({1} correct)".format(pred_count, (float(correct_count) / pred_count)))
                if pred_count >= num_images:
                    break

    print ("{0} out of {1} predictions were correct {2}.".format(correct_count, pred_count, (float(correct_count) / pred_count)))

# Putting It All Together

Now that we can train and evaluate our model, let's put it all together into a final run, where we train our model for `max_epochs` epochs and evaluate the results on `_test_map_file`'s images.

In [27]:
try_set_default_device(gpu(0))
# check for model and data existence
if not (os.path.exists(_base_model_file) and os.path.exists(_train_map_file) and os.path.exists(_test_map_file)):
    print("Please run 'python download_model.py' and 'img_downloader.py' first to get the required data and model.")
    exit(0)

# Train only if no model exists yet
if os.path.exists(tl_model_file):
    print("Loading existing model from %s" % tl_model_file)
    trained_model = load_model(tl_model_file)
else:
    trained_model = train_model(_base_model_file, _feature_node_name, _last_hidden_node_name,
                                _image_width, _image_height, _num_channels, _num_classes, _train_map_file, max_epochs)
    trained_model.save(tl_model_file)
    print("Stored trained model at %s" % tl_model_file)

# Evaluate the test set
eval_test_images(trained_model, output_file, _test_map_file, _image_width, _image_height)

print("Done. Wrote output to %s" % output_file)


Training transfer learning model for 20 epochs (epoch_size = 532).
Training 15898178 parameters in 68 parameter tensors.
Learning rate per minibatch: 0.2
Momentum per minibatch: 0.9
Finished Epoch[1 of 20]: [Training] loss = 0.691645 * 532, metric = 12.03% * 532 14.792s ( 36.0 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 0.248316 * 532, metric = 3.01% * 532 7.765s ( 68.5 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 0.720047 * 532, metric = 11.65% * 532 7.596s ( 70.0 samples/s);
Finished Epoch[4 of 20]: [Training] loss = 0.782503 * 532, metric = 12.78% * 532 7.738s ( 68.8 samples/s);
Finished Epoch[5 of 20]: [Training] loss = 0.214429 * 532, metric = 9.02% * 532 7.631s ( 69.7 samples/s);
Finished Epoch[6 of 20]: [Training] loss = 0.133607 * 532, metric = 5.08% * 532 7.893s ( 67.4 samples/s);
Finished Epoch[7 of 20]: [Training] loss = 0.128591 * 532, metric = 4.51% * 532 7.792s ( 68.3 samples/s);
Finished Epoch[8 of 20]: [Training] loss = 0.087163 * 532, metric = 1.88