# Notebook for Training ResNet on The TinyImageNet dataset. Notebook (3/6) in the End-to-End Scalable Deep Learning Pipeline on Hops.

ResNet is one of the state-of-the-art deep networks for computer vision. ResNet is based on the idea of residual learning. A residual network has so called "shortcut connections" parallel to the normal convolutional layers. Those shortcuts act like highways for gradients which allows to train extremely deep networks without suffering from vanishing or exploding gradients. Intuitively, by having the short-cut connections available during training, the network can learn which layers it doesn't need, since it can always fall-back on the shortcut (identity) connection.
![sample_images_from_tfr.png](./../images/resnet_1.png)

Despite the huge increase in the overall depth, a ResNet with 50 layers has roughly half the parameters in AlexNet, this is because ResNet relies only on small filters in the network. E.g a stack of several 3x3 filters uses the same number of parameters as a single 7x7 filter. 

ResNet come in different depths: Resnet18, Resnet34, Resnet50,Resnet101, Resnet152

For the deeper variants, ResNet uses "bottleneck building blocks" to lower the training time for the deepest networks. 

![bottlenneck.png](./../images/bottleneck.png)

ResNet uses batch notmalization right after each convolution and before the activation function. Typically ReLU is used as the activation function.


This notebook will read the TFRecords that were written by notebook number 2 ([Notebook number two](./Step2_Image_PreProcessing.ipynb)) and feed them into ResNet for single-GPU training and distributed hyperparameter search using several GPUs. 

Specifically, this notebook reads TFRecords from:

- hdfs:///Projects/ImageNet_EndToEnd_MLPipeline/tiny-imagenet/tiny-imagenet-200/tfrecords_clean

And write hyperparameter results to: 

- hdfs:///Projects/ImageNet_EndToEnd_MLPipeline/tiny-imagenet/tiny-imagenet-200/hyperparams.txt

![step3.png](./../images/step3.png)

## Package Imports

Tested with versions:

- numpy: 1.14.5
- hops: 2.6.4
- pydoop: 2.0a3
- tensorboard: 1.8.0
- tensorflow: 1.8.0
- tensorflow-gpu: 1.8.0

In [None]:
from hops import hdfs

## Constants

In [None]:
PROJECT_DIR = hdfs.project_path()
DATASET_BASE_DIR = PROJECT_DIR + "tiny-imagenet/tiny-imagenet-200/"
TRAIN_DIR = DATASET_BASE_DIR + "train"
TEST_DIR = DATASET_BASE_DIR + "test"
VAL_DIR = DATASET_BASE_DIR + "val/images/"
ID_TO_CLASS_FILE = DATASET_BASE_DIR + "/words.txt"
INPUT_DIR = DATASET_BASE_DIR + "tfrecords_clean/"
LOG_DIR = DATASET_BASE_DIR + "logs/"
VAL_LABELS_FILE = DATASET_BASE_DIR + "val/val_annotations.txt"
FILE_PATTERN = "*.JPEG"
SIZES_FILE = DATASET_BASE_DIR + "sizes.txt"
HYPERPARAMS_FILE = DATASET_BASE_DIR + "hyperparams.txt"
VALIDATION_SET_RESULTS_FILE = DATASET_BASE_DIR + "single_gpu_validation_results.txt"
EXPORT_MODEL_DIR = DATASET_BASE_DIR + "exported_model/"
DEFAULT_BATCH_SIZE = 100

## Functions for Building the Layers in ResNet

In [None]:
class LayerBuilder(object):
    """
    Class for building ResNet layers, contains functions for building the various layers:
    - conv layers
    - max pooling layers
    """
    def __init__(self, activation=None, data_format='channels_last',
                 training=False, use_batch_norm=False, batch_norm_config=None,
                 conv_initializer=None, adv_bn_init=False):
        """
        Initialize the layer builder with configuration parameters
        
        :param activation: activation function to use
        :param data_format: the format of the images, e.g channels first (3,64,64) or channels last (64,64,3)
        :param training: boolean wether training or evaluation is performed
        :param use_batch_norm: whether to add a batch normalization layer
        :param batch_norm_config: configuration for normalizing a batch, e.g how to scale/decay how much variance etc.
        :param conv_initializer: optional initializer for conv layers (tf.contrib.layers.variance_scaling_initializer)
        :param adv_bn_init: init gamma of the last BN of each ResMod at 0
        """
        self.activation = activation
        self.data_format = data_format
        self.training = training
        self.use_batch_norm = use_batch_norm
        self.batch_norm_config = batch_norm_config
        self.conv_initializer = conv_initializer
        self.adv_bn_init = adv_bn_init
        # If no batch_normalization configuration provided, use a default one
        if self.batch_norm_config is None:
            self.batch_norm_config = {
                'decay': 0.9,
                'epsilon': 1e-4,
                'scale': True,
                'zero_debias_moving_mean': False,
            }

    def _conv2d(self, inputs, activation, *args, **kwargs):
        """
        This function returns an operation for performing a 2D convolution operation over a given input.
        
        :param inputs: the tensor input data
        :param activation: activation function to apply after the convolution operation, 
        if none then it uses linear activation
        """
        x = tf.layers.conv2d(
            inputs, data_format=self.data_format,
            use_bias=not self.use_batch_norm,
            kernel_initializer=self.conv_initializer,
            activation=None if self.use_batch_norm else activation,
            *args, **kwargs)
        # Apply normalization on top of the convolution
        if self.use_batch_norm:
            x = self.batch_norm(x)
            x = activation(x) if activation is not None else x
        return x

    def conv2d_linear_last_bn(self, inputs, *args, **kwargs):
        """
        This function performs convolution with a linear activation 
        and batch normalization
        """
        x = tf.layers.conv2d(
            inputs, data_format=self.data_format,
            use_bias=False,
            kernel_initializer=self.conv_initializer,
            activation=None, *args, **kwargs)
        param_initializers = {
            'moving_mean': tf.zeros_initializer(),
            'moving_variance': tf.ones_initializer(),
            'beta': tf.zeros_initializer(),
        }
        if self.adv_bn_init:
            param_initializers['gamma'] = tf.zeros_initializer()
        else:
            param_initializers['gamma'] = tf.ones_initializer()
        x = self.batch_norm(x, param_initializers=param_initializers)
        return x

    def conv2d_linear(self, inputs, *args, **kwargs):
        """
        linear convolution without an activation function
        """
        return self._conv2d(inputs, None, *args, **kwargs)

    def conv2d(self, inputs, *args, **kwargs):
        """
        generic convolution with activation function
        """
        return self._conv2d(inputs, self.activation, *args, **kwargs)

    def pad2d(self, inputs, begin, end=None):
        if end is None:
            end = begin
        try:
            _ = begin[1]
        except TypeError:
            begin = [begin, begin]
        try:
            _ = end[1]
        except TypeError:
            end = [end, end]
        if self.data_format == 'channels_last':
            padding = [[0, 0], [begin[0], end[0]], [begin[1], end[1]], [0, 0]]
        else:
            padding = [[0, 0], [0, 0], [begin[0], end[0]], [begin[1], end[1]]]
        return tf.pad(inputs, padding)

    def max_pooling2d(self, inputs, *args, **kwargs):
        """
        max pooling layer over 2D inputs (images) (selects the max signal for each part it is slided over)
        """
        return tf.layers.max_pooling2d(
            inputs, data_format=self.data_format, *args, **kwargs)

    def average_pooling2d(self, inputs, *args, **kwargs):
        """
        average pooling layer over 2D inputs (images) (averages the signals for each part it is slided over)
        """
        return tf.layers.average_pooling2d(
            inputs, data_format=self.data_format, *args, **kwargs)

    def dense_linear(self, inputs, units, **kwargs):
        return tf.layers.dense(inputs, units, activation=None)

    def dense(self, inputs, units, **kwargs):
        """
        This layer implements the operation: 
        outputs = activation(inputs.kernel + bias) 
        Where activation is the activation function passed as the activation argument (if not None), 
        kernel is a weights matrix created by the layer, 
        and bias is a bias vector created by the layer (only if use_bias is True).
        I.e just a simple activation of sum of logits + bias, no convolution
        """
        return tf.layers.dense(inputs, units, activation=self.activation)

    def activate(self, inputs, activation=None):
        """
        Applies an activation function (if any) over a set of inputs,
        identity function if there is no activity function.
        """
        activation = activation or self.activation
        return activation(inputs) if activation is not None else inputs

    def batch_norm(self, inputs, **kwargs):
        """
        Adds a batch normalization layer.
        The normalization is over all but the last dimension if 
        data_format is NHWC and all but the second dimension if data_format is NCHW.
        
        Normalizing well could get better performance and converge quickly. Most of
        time we will subtract mean value to make input mean to be zero to prevent 
        weights change same directions so that converge slowly
        
        Batch normalization potentially helps in two ways: faster learning and higher 
        overall accuracy. The improved method also allows you to use a higher 
        learning rate, potentially providing another boost in speed. 
        Why does this work? Well, we know that normalization (shifting inputs 
        to zero-mean and unit variance) is often used as a pre-processing step to 
        make the data comparable across features. As the data flows through a
        deep network, the weights and parameters adjust those values, sometimes 
        making the data too big or too small again - a problem the authors refer 
        to as "internal covariate shift". By normalizing the data in each mini-batch, 
        this problem is largely avoided.
        
        Batch Normalization is a technique to provide any layer in a Neural Network 
        with inputs that are zero mean/unit variance
        """
        all_kwargs = dict(self.batch_norm_config)
        all_kwargs.update(kwargs)
        data_format = 'NHWC' if self.data_format == 'channels_last' else 'NCHW'
        return tf.contrib.layers.batch_norm(
            inputs, is_training=self.training, data_format=data_format,
            fused=True, **all_kwargs)

    def spatial_average2d(self, inputs):
        """
        Performs average pooling over specific spatial location
        """
        shape = inputs.get_shape().as_list()
        if self.data_format == 'channels_last':
            n, h, w, c = shape
        else:
            n, c, h, w = shape
        n = -1 if n is None else n
        x = tf.layers.average_pooling2d(inputs, (h, w), (1, 1),
                                        data_format=self.data_format)
        return tf.reshape(x, [n, c])

    def flatten2d(self, inputs):
        """
        Flattens 2d inputs 
        """
        x = inputs
        if self.data_format != 'channel_last':
            # Note: This ensures the output order matches that of NHWC networks
            x = tf.transpose(x, [0, 2, 3, 1])
        input_shape = x.get_shape().as_list()
        num_inputs = 1
        for dim in input_shape[1:]:
            num_inputs *= dim
        return tf.reshape(x, [-1, num_inputs], name='flatten')

    def residual2d(self, inputs, network, units=None, scale=1.0, activate=False):
        """
        The Residual connection 
        """
        outputs = network(inputs)
        c_axis = -1 if self.data_format == 'channels_last' else 1
        h_axis = 1 if self.data_format == 'channels_last' else 2
        w_axis = h_axis + 1
        ishape, oshape = [y.get_shape().as_list() for y in [inputs, outputs]]
        ichans, ochans = ishape[c_axis], oshape[c_axis]
        strides = ((ishape[h_axis] - 1) // oshape[h_axis] + 1,
                   (ishape[w_axis] - 1) // oshape[w_axis] + 1)
        with tf.name_scope('residual'):
            if (ochans != ichans or strides[0] != 1 or strides[1] != 1):
                inputs = self.conv2d_linear(inputs, units, 1, strides, 'SAME')
            x = inputs + scale * outputs
            if activate:
                x = self.activate(x)
        return x

## ResNet BuildingBlocks (grouped layers)

In [None]:
def resnet_bottleneck_v1(builder, inputs, depth, depth_bottleneck, stride,
                         basic=False):
    """
    Bottleneck residual unit
    """
    num_inputs = inputs.get_shape().as_list()[1]
    x = inputs
    with tf.name_scope('resnet_v1'):
        if depth == num_inputs:
            if stride == 1:
                shortcut = x
            else:
                shortcut = builder.max_pooling2d(x, 1, stride)
        else:
            shortcut = builder.conv2d_linear(x, depth, 1, stride, 'SAME')
        if basic:
            x = builder.pad2d(x, 1)
            x = builder.conv2d(x, depth_bottleneck, 3, stride, 'VALID')
            x = builder.conv2d_linear(x, depth, 3, 1, 'SAME')
        else:
            x = builder.conv2d(x, depth_bottleneck, 1, 1, 'SAME')
            x = builder.conv2d(x, depth_bottleneck, 3, stride, 'SAME')
            # x = builder.conv2d_linear(x, depth,            1, 1,      'SAME')
            x = builder.conv2d_linear_last_bn(x, depth, 1, 1, 'SAME')
        x = tf.nn.relu(x + shortcut)
        return x

## Put the Building Blocks and All of the Layers Together Into A Complete Network Architecture

In [None]:
def inference_resnet_v1_impl(builder, inputs, layer_counts, basic=False):
    """
    Build the complete resnet from a input operation and a list of layers
    """
    x = inputs
    x = builder.pad2d(x, 3)
    x = builder.conv2d(x, 64, 7, 2, 'VALID')
    x = builder.max_pooling2d(x, 3, 2, 'SAME')
    for i in range(layer_counts[0]):
        x = resnet_bottleneck_v1(builder, x, 256, 64, 1, basic)
    for i in range(layer_counts[1]):
        x = resnet_bottleneck_v1(builder, x, 512, 128, 2 if i == 0 else 1, basic)
    for i in range(layer_counts[2]):
        x = resnet_bottleneck_v1(builder, x, 1024, 256, 2 if i == 0 else 1, basic)
    for i in range(layer_counts[3]):
        x = resnet_bottleneck_v1(builder, x, 2048, 512, 2 if i == 0 else 1, basic)
    return builder.spatial_average2d(x)


In [None]:
def inference_resnet_v1(inputs, nlayer, data_format='channels_last',
                        training=False, conv_initializer=None, adv_bn_init=False):
    """Deep Residual Networks family of models
    https://arxiv.org/abs/1512.03385
    Infer the complete ResNet architecture based on parameters, e.g resnet18,34,50,101, or 152
    and then build the corresponding network and return
    """
    builder = LayerBuilder(tf.nn.relu, data_format, training, use_batch_norm=True,
                           conv_initializer=conv_initializer, adv_bn_init=adv_bn_init)
    if nlayer == 18:
        return inference_resnet_v1_impl(builder, inputs, [2, 2, 2, 2], basic=True)
    elif nlayer == 34:
        return inference_resnet_v1_impl(builder, inputs, [3, 4, 6, 3], basic=True)
    elif nlayer == 50:
        return inference_resnet_v1_impl(builder, inputs, [3, 4, 6, 3])
    elif nlayer == 101:
        return inference_resnet_v1_impl(builder, inputs, [3, 4, 23, 3])
    elif nlayer == 152:
        return inference_resnet_v1_impl(builder, inputs, [3, 8, 36, 3])
    else:
        raise ValueError("Invalid nlayer (%i); must be one of: 18,34,50,101,152" %
                         nlayer)


In [None]:
def get_model_func(model_name):
    """
    Get a function representing the model from the model_name (resnet18,34,50,101,152) and return it
    """
    if model_name.startswith('resnet'):
        nlayer = int(model_name[len('resnet'):])
        return lambda images, *args, **kwargs: \
            inference_resnet_v1(images, nlayer, *args, **kwargs)
    else:
        raise ValueError("Invalid model type: %s" % model_name)

## Parse TFRecords from HopsFS

In [None]:
def parse_tfr(example_proto):
    """
    Parses an example protocol buffer (TFRecord) into a dict of
    feature names and tensors
    """
    features = {
        'label_one_hot': tf.FixedLenFeature((), tf.string, default_value=""),
        'image_raw': tf.FixedLenFeature((), tf.string, default_value="")
    }
    parsed_features = tf.parse_single_example(example_proto, features)
    return parsed_features["image_raw"], parsed_features["label_one_hot"]

In [None]:
def decode_bytes(image, label):
    """
    Decode the bytes that was serialized in the TFRecords in HopsFS to tensors so that we can apply 
    image preprocessing
    """
    image_tensor = tf.decode_raw(image, tf.uint8),
    label_tensor = tf.decode_raw(label, tf.uint8),
    label_tensor = tf.cast(label_tensor, tf.int32)
    image_tensor = tf.reshape(image_tensor, [64,64,3]) #dimension information was lost when serializing to disk
    return image_tensor,label_tensor[0]

In [None]:
def make_dataset(filenames,batch_size, training=False):
    """
    Creates a dataset of examples to use for training or inference in Tensorflow, reads TFRecords from HopsFS
    """
    num_readers = os.cpu_count()
    
    # Train Dataset is already shuffled and augmented
    ds = tf.data.TFRecordDataset(filenames,
        compression_type=None,    
        buffer_size=100240, 
        num_parallel_reads=num_readers) # Parallel read from HopsFS
    
    # Parse the binary TFR format into dataset
    ds = ds.map(parse_tfr)
    # Decode the bytestrings into tensors
    ds = ds.map(decode_bytes)
    ds = ds.batch(batch_size)
    if(training):
        ds = ds.repeat() # So that we can iterate infinite times, let the model decide when to stop based on num_epochs
    return ds

## Staging Optimization

In [None]:
def stage(tensors):
    """
    Stages the given tensors in a StagingArea for asynchronous put/get.
    """
    # Creates the staging area
    stage_area = data_flow_ops.StagingArea(
        dtypes=[tensor.dtype for tensor in tensors],
        shapes=[tensor.get_shape() for tensor in tensors])
    # Operation for inserting a batch of tensors into the staging area
    put_op = stage_area.put(tensors)
    # Operation for extracting the first tensor from the staging area
    get_tensors = stage_area.get()
    # Create a tensorflow collection of all tensors inserted to the staging area (a general purpose key/data storage)
    tf.add_to_collection('STAGING_AREA_PUTS', put_op)
    # Return the two operations for interacting with the staging area
    return put_op, get_tensors

## Monitoring the Execution Using Hooks

In [None]:
class PrefillStagingAreasHook(tf.train.SessionRunHook):
    """
    A hook for prefilling the staging area
    """
    def after_create_session(self, session, coord):
        """
        This function is called right after the TF Session is created
        and execution is about to start. It will prefill the staging area
        with the next batch
        """
        enqueue_ops = tf.get_collection('STAGING_AREA_PUTS')
        for i in range(len(enqueue_ops)):
            session.run(enqueue_ops[:i + 1])

In [None]:
class LogSessionRunHook(tf.train.SessionRunHook):
    """
    A hook for logging during the execution
    """
    def __init__(self, global_batch_size, num_records, display_every=10):
        self.global_batch_size = global_batch_size
        self.num_records = num_records
        self.display_every = display_every

    def after_create_session(self, session, coord):
        """
        This function is called right after the TF Session is created.
        Initializes counters
        """
        self.elapsed_secs = 0.
        self.count = 0

    def before_run(self, run_context):
        """
        This function is called right before each call to sess.run()
        it returns a RunArgs object that will be added as arguments
        to sess.run(). In this case we parameterize the run function
        with the global step and some logging information.
        
        :param run_context: contains information about the upcoming run() method call 
        """
        self.t0 = time.time()
        return tf.train.SessionRunArgs(
            fetches=[tf.train.get_global_step(),
                     'loss:0', 'total_loss:0', 'learning_rate:0', "top1acc:0", "top5acc:0"])

    def after_run(self, run_context, run_values):
        """
        This function is called after each call to sess.run().
        It will collect some statistics about the run() and log it.
        
        :param run_context: contains information about the upcoming run() method call
        :param run_values: contains results of requested tensors/ops by before_run()
        """
        self.elapsed_secs += time.time() - self.t0
        self.count += 1
        global_step, loss, total_loss, lr, top1acc, top5acc = run_values.results
        if global_step == 1 or global_step % self.display_every == 0:
            dt = self.elapsed_secs / self.count
            img_per_sec = self.global_batch_size / dt
            epoch = global_step * self.global_batch_size / self.num_records
            print('Training Stats: | Step: {} | Epoch: {} | Speed(img/sec): {} | Loss: {} | TotalLoss: {} | LR: {} | Top1Acc: {} | Top5Acc: {} |'.format(global_step, epoch, img_per_sec, loss, total_loss, lr, top1acc[0], top5acc[0]))
            self.elapsed_secs = 0.
            self.count = 0

## Functions for training with Reduced Precision

In [None]:
def _fp32_trainvar_getter(getter, name, shape=None, dtype=None,
                          trainable=True, regularizer=None,
                          *args, **kwargs):
    """
    Cast variables to fp16 (half the precision) if running with mixed precision.
    """
    storage_dtype = tf.float32 if trainable else dtype
    variable = getter(name, shape, dtype=storage_dtype,
                      trainable=trainable,
                      regularizer=regularizer if trainable and 'BatchNorm' not in name else None,
                      *args, **kwargs)
    if trainable and dtype != tf.float32:
        cast_name = name + '/fp16_cast'
        try:
            cast_variable = tf.get_default_graph().get_tensor_by_name(
                cast_name + ':0')
        except KeyError:
            cast_variable = tf.cast(variable, dtype, name=cast_name)
        cast_variable._ref = variable._ref
        variable = cast_variable
    return variable

In [None]:
def fp32_trainable_vars(name='fp32_vars', *args, **kwargs):
    """
    A varible scope with custom variable getter to convert fp16 trainable
    variables with fp32 storage followed by fp16 cast.
    """
    return tf.variable_scope(
        name, custom_getter=_fp32_trainvar_getter, *args, **kwargs)

In [None]:
class MixedPrecisionOptimizer(tf.train.Optimizer):
    """
    An optimizer that updates trainable variables in fp32.
    Reduced precision training can save memory capacity, memory bandwidth, memory power, 
    and arithmetic power by using smaller numbers. FP16 works with little effort: 2x
    gain in memory, 4x in multiply power. With care, one can use: 8b for convolutions, 
    4b for fully-connected layers.
    """

    def __init__(self, optimizer,
                 scale=None,
                 name="MixedPrecisionOptimizer",
                 use_locking=False):
        super(MixedPrecisionOptimizer, self).__init__(
            name=name, use_locking=use_locking)
        self._optimizer = optimizer
        self._scale = float(scale) if scale is not None else 1.0

    def compute_gradients(self, loss, var_list=None, *args, **kwargs):
        if var_list is None:
            var_list = (
                    tf.trainable_variables() +
                    tf.get_collection(tf.GraphKeys.TRAINABLE_RESOURCE_VARIABLES))

        replaced_list = var_list

        if self._scale != 1.0:
            loss = tf.scalar_mul(self._scale, loss)

        gradvar = self._optimizer.compute_gradients(loss, replaced_list, *args, **kwargs)

        final_gradvar = []
        for orig_var, (grad, var) in zip(var_list, gradvar):
            if var is not orig_var:
                grad = tf.cast(grad, orig_var.dtype)
            if self._scale != 1.0:
                grad = tf.scalar_mul(1. / self._scale, grad)
            final_gradvar.append((grad, orig_var))

        return final_gradvar
    
    def apply_gradients(self, *args, **kwargs):
        return self._optimizer.apply_gradients(*args, **kwargs)

## Get New Learning Rate After Decay

In [None]:
def get_lr(lr, steps, lr_steps, warmup_it, decay_steps, global_step, lr_decay_mode):
    """ Gets the new learning rate after decay"""
    if lr_decay_mode == 'steps':
        learning_rate = tf.train.piecewise_constant(global_step,
                                                    steps, lr_steps)
    elif lr_decay_mode == 'poly':
        learning_rate = tf.train.polynomial_decay(lr,
                                                  global_step - warmup_it,
                                                  decay_steps=decay_steps - warmup_it,
                                                  end_learning_rate=0.00001,
                                                  power=2,
                                                  cycle=False)
    else:
        raise ValueError('Invalid type of lr_decay_mode')
    return learning_rate

In [None]:
def warmup_decay(warmup_lr, global_step, warmup_steps, warmup_end_lr):
    """
    Decay learning rate during warmup iterations
    """
    from tensorflow.python.ops import math_ops
    p = tf.cast(global_step, tf.float32) / tf.cast(warmup_steps, tf.float32)
    diff = math_ops.subtract(warmup_end_lr, warmup_lr)
    res = math_ops.add(warmup_lr, math_ops.multiply(diff, p))
    return res

## Running the Model

In [None]:
def cnn_model_function(features, labels, mode, params):
    """
    Performs the Training/Evaluation by Running the Model
    """
    
    print("Estimator Mode: {}".format(tf.estimator.ModeKeys.PREDICT))
    
    # Extract parameters from dict
    lr = params['lr']
    lr_steps = params['lr_steps']
    steps = params['steps']
    lr_decay_mode = params['lr_decay_mode']
    decay_steps = params['decay_steps']
    model_name = params['model']
    num_classes = params['n_classes']
    model_dtype = get_with_default(params, 'dtype', tf.float32)
    model_format = get_with_default(params, 'format', 'channels_last')
    device = get_with_default(params, 'device', '/gpu:0')
    model_func = get_model_func(model_name)
    inputs = features  # TODO: Should be using feature columns?
    is_training = (mode == tf.estimator.ModeKeys.TRAIN)
    momentum = params['mom']
    weight_decay = params['wdecay']
    warmup_lr = params['warmup_lr']
    warmup_it = params['warmup_it']
    loss_scale = params['loss_scale']
    adv_bn_init = params['adv_bn_init']
    conv_init = params['conv_init']

    # Stage batch in staging area
    if mode == tf.estimator.ModeKeys.TRAIN:
        with tf.device('/cpu:0'):
            preload_op, (inputs, labels) = stage([inputs, labels])
            image = tf.reshape(features[:10], [-1, 64,64,3])
            tf.summary.image('image', image)
            
    with tf.device(None):
        if mode == tf.estimator.ModeKeys.TRAIN:
            gpucopy_op, (inputs, labels) = stage([inputs, labels])
        
        # Normalize images 
        inputs = tf.cast(inputs, model_dtype)
        imagenet_mean = np.array([121, 115, 100], dtype=np.float32)
        imagenet_std = np.array([70, 68, 71], dtype=np.float32)
        inputs = tf.subtract(inputs, imagenet_mean)
        inputs = tf.multiply(inputs, 1. / imagenet_std)
        
        if model_format == 'channels_first':
            inputs = tf.transpose(inputs, [0, 3, 1, 2])
        
        #Compute Logits    
        with fp32_trainable_vars(
                regularizer=tf.contrib.layers.l2_regularizer(weight_decay)):
            # Get reference to top layer of the entire network from the model func
            top_layer = model_func(
                inputs, data_format=model_format, training=is_training,
                conv_initializer=conv_init, adv_bn_init=adv_bn_init)
            # Compute the output logits
            logits = tf.layers.dense(top_layer, num_classes,
                                     kernel_initializer=tf.random_normal_initializer(stddev=0.01))
        # Get prediction by taking max of logits
        predicted_classes = tf.argmax(logits, axis=1, output_type=tf.int32)
        logits = tf.cast(logits, tf.float32)
        # If performing prediction, we dont need to do optimization or compute the loss, we can return just
        # the logits and the predictions 
        # (EstimatorSpec is a nobject that defines the model that the estimator will run)
        if mode == tf.estimator.ModeKeys.PREDICT:
            with tf.device(None):
                probabilities = tf.nn.softmax(logits) #Get probabilities by using softmax (sum to 1)
                predictions = {
                    'class_ids': predicted_classes[:, None],
                    'probabilities': probabilities,
                    'logits': logits
                }
                return tf.estimator.EstimatorSpec(mode, predictions=predictions,export_outputs={
                    'predict': tf.estimator.export.PredictOutput(predictions)})
        # If training, compute loss
        loss = tf.losses.softmax_cross_entropy(
            logits=logits, onehot_labels=labels)
        loss = tf.identity(loss, name='loss')  # For access by logger (TODO: Better way to access it?)
        
        # accuracies
        with tf.device(None):  # Allow fallback to CPU if no GPU support for these ops
            top1acc = tf.metrics.accuracy(
                        labels=tf.argmax(labels,axis=1), predictions=predicted_classes)
            top5acc = tf.metrics.mean(
                        tf.cast(tf.nn.in_top_k(logits, tf.argmax(labels, axis=1), 5), tf.float32))
            top1acc = tf.identity(top1acc, name='top1acc') # for access by the logger
            top5acc = tf.identity(top5acc, name='top5acc') # for access  by the logger
            
        # If performing evaluation, compute accuracies and return the EstimatorSpec with the operations
        if mode == tf.estimator.ModeKeys.EVAL:
            with tf.device(None):
                top1acc = tf.metrics.accuracy(
                            labels=tf.argmax(labels,axis=1), predictions=predicted_classes)
                top5acc = tf.metrics.mean(
                            tf.cast(tf.nn.in_top_k(logits, tf.argmax(labels, axis=1), 5), tf.float32))
                metrics = {'val-top1acc': top1acc, 'val-top5acc': top5acc}
            return tf.estimator.EstimatorSpec(mode, loss=loss, eval_metric_ops=metrics)

        assert (mode == tf.estimator.ModeKeys.TRAIN)
        reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
        total_loss = tf.add_n([loss] + reg_losses, name='total_loss')

        batch_size = tf.shape(inputs)[0]
        global_step = tf.train.get_global_step()
        #global_step = tf.identity(global_step, name='global_step')  # For access by logger (TODO: Better way to access it?)
        
        # Decay learning rate
        with tf.device('/cpu:0'):  # Allow fallback to CPU if no GPU support for these ops
            learning_rate = tf.cond(global_step < warmup_it,
                                    lambda: warmup_decay(warmup_lr, global_step, warmup_it,
                                                         lr),
                                    lambda: get_lr(lr, steps, lr_steps, warmup_it, decay_steps, global_step,
                                                   lr_decay_mode))
            learning_rate = tf.identity(learning_rate, 'learning_rate')
            tf.summary.scalar('learning_rate', learning_rate)
        
        # Define the optimizer for training (performing backprop)
        # Momentum is a trick to power-through local minimas during training
        # Momentum-based acceleration schemes increase the speed of learning and damp oscillations 
        # in directions of high curvature
        opt = tf.train.MomentumOptimizer(
            learning_rate, momentum, use_nesterov=True)
        # Wrap the MomentumOptimizer in a MixedPrecision one
        opt = MixedPrecisionOptimizer(opt, scale=loss_scale)
        
        update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) or []
        with tf.control_dependencies(update_ops):
            gate_gradients = (tf.train.Optimizer.GATE_NONE)
            # Training step
            train_op = opt.minimize(
                total_loss, global_step=tf.train.get_global_step(),
                gate_gradients=gate_gradients)
        train_op = tf.group(preload_op, gpucopy_op, train_op)  # , update_ops)

        return tf.estimator.EstimatorSpec(mode, loss=total_loss, train_op=train_op)

## Utility Functions

In [None]:
def sort_and_load_ckpts(log_dir,local_logdir=False):
    """
    Collects all ckpts from the log dir and stores them in a sorted list that is returned
    """
    ckpts = []
    print("looking for checkpoints in : {}".format(log_dir))
    # Use pydoop to read logs
    if not local_logdir:
        for f in py_hdfs.ls(log_dir):
            log_dir = log_dir.replace("//Logs", "/Logs") #hack to fix in case hops-py-util path includes double slashes
            if log_dir.endswith("/"):
                f1 = f.replace(log_dir, "")
            else:
                f1 = f.replace(log_dir + "/", "")
            m = re.match(r'model.ckpt-([0-9]+).index', f1)
            if m is None:
                continue
            ckpts.append({'step': int(m.group(1)),
                          'path': f.replace(".index",""),
                          'mtime': py_hdfs.stat(f).st_mtime,
                          })
    # Use os module to read locally
    else:
        for f in os.listdir(log_dir):
            m = re.match(r'model.ckpt-([0-9]+).index', f)
            if m is None:
                continue
            fullpath = os.path.join(log_dir, f)
            ckpts.append({'step': int(m.group(1)),
                          'path': os.path.splitext(fullpath)[0],
                          'mtime': os.stat(fullpath).st_mtime,
                          })
    ckpts.sort(key=itemgetter('step'))
    return ckpts

In [None]:
def get_filenames_and_size():
    """
    A function for obtaining all the resting TFRecord files to be parsed.
    """
    # Convert regular expression file pattern into a list of files (tfrecords files)
    train_tfr_file = tf.gfile.Glob(INPUT_DIR + "train.tfrecords")
    val_tfr_file = tf.gfile.Glob(INPUT_DIR + "val.tfrecords")
    test_tfr_file = tf.gfile.Glob(INPUT_DIR + "test.tfrecords")
    
    # Read sizes.txt that to get the size of the datasets (used for correct shuffling)
    with py_hdfs.open(SIZES_FILE, 'r') as f:
        file_lines = f.read().decode("utf-8").split("\n")    
    train_size = int(file_lines[0].split(",")[1])
    val_size = int(file_lines[1].split(",")[1])
    test_size = int(file_lines[2].split(",")[1])
    
    return train_tfr_file, val_tfr_file, test_tfr_file, train_size, val_size, test_size

In [None]:
def get_with_default(obj, key, default_value):
    """ 
    Get dict value if exists othewise return default
    """
    return obj[key] if key in obj and obj[key] is not None else default_value

## Export Model

To prepare a trained Estimator for serving, we must export it in the standard SavedModel format.

During training, an input_fn() ingests data and prepares it for use by the model. At serving time, similarly, a serving_input_receiver_fn() accepts inference requests and prepares them for the model. This function has the following purposes:

- To add placeholders to the graph that the serving system will feed with inference requests.
- To add any additional ops needed to convert data from the input format into the feature Tensors expected by the model.

In [None]:
def serving_input_receiver_fn():
    """
     At serving time, a serving_input_receiver_fn() accepts inference requests 
     and prepares them for the model. This function has the following purposes:
     To add placeholders to the graph that the serving system will feed with inference requests.
     To add any additional ops needed to convert data from the input format into the feature 
     Tensors expected by the model.
    """
    features = tf.placeholder(dtype=tf.uint8, shape=[DEFAULT_BATCH_SIZE, 64, 64, 3], name='input_tensor')
    # Has to be a TensorServingInputReceiver rather than a ServingInputReceiver since our model function
    # takes raw tensors rather than a dict of features as input
    return tf.estimator.export.TensorServingInputReceiver(features=features, receiver_tensors=features)

In [None]:
def export_estimator(estimator, export_dir_base):
    """
    Exports the trained estimator to be used for serving
    """
    # This method builds a new graph by first calling the serving_input_receiver_fn 
    # to obtain feature Tensors, and then calling this Estimator's model_fn to generate 
    # the model graph based on those features. It restores the given checkpoint 
    # (or, lacking that, the most recent checkpoint) into this graph in a fresh session. 
    # Finally it creates a timestamped export directory below the given export_dir_base, 
    # and writes a SavedModel into it containing a single MetaGraphDef saved from this session.
    estimator.export_savedmodel(export_dir_base=export_dir_base, serving_input_receiver_fn=serving_input_receiver_fn)

## Function for Orchestrating the Experiment (Create Estimator, Train, Eval, Log, Checkpoint Etc)

In [None]:
def main(model="resnet18",lr=0.01, mom=0.90, wdecay=0.0001, lr_decay_mode="poly"):
    """
    Pipeline entrypoint, this function will set up the network, train it, evaluate it, and store the results.
    
    Hyperparameters for distributed hyperparameter tuning:
    
    :param model: Name of model to run: resnet[18,34,50,101,152]
    :param lr: Starting learning rate
    :param mom: momentum
    :param wdecay: weight decay
    :param lr_decay_mode: Takes either `steps` (decay by a factor at specified steps) or `poly`(polynomial_decay with degree 2)
    """
    import hops
    print("hopsversion: {}".format(hops.__version__))
    
    import tensorflow as tf
    import pydoop.hdfs as py_hdfs
    from hops import hdfs
    import numpy as np
    from builtins import range
    from tensorflow.python.ops import data_flow_ops
    from tensorflow.contrib.data.python.ops import interleave_ops
    from tensorflow.contrib.data.python.ops import batching
    import os
    import sys
    import time
    import argparse
    import random
    import shutil
    import logging
    import re
    from glob import glob
    from operator import itemgetter
    
    data_dir = INPUT_DIR # Path to dataset in TFRecordsFormat 
    batch_size=100 # Size of each minibatch per GPU
    num_epochs=50 # Number of epochs to run
    log_dir=LOG_DIR # Directory in which to write training summaries and checkpoints.
    log_name="tinyimagenet_resnet.log" # Name of the log
    display_every=500 # How often (in iterations) to print out running information.
    evaluate=False # Evaluate the top-1 and top-5 accuracy of the latest checkpointed model.
    evaluate_interval=1 # Evaluate accuracy per eval_interval number of epochs
    fp16=False # Train using float16 (half) precision instead of float32.
    warmup_lr=.001 # Warmup starting from this learning rate
    warmup_epochs=0 # Number of epochs to warmup with given lr
    lr_decay_factor=0.1 # learning rate decayed by this factor at each step. Used when lr_decay_mode is steps. Needs to be given with lr_decay_steps
    lr_decay_steps='10,20,40' #  epoch numbers at which lr is decayed by lr_decay_factor. Used when lr_decay_mode is steps
    loss_scale=1024 # loss scale
    num_gpus=1 # Specify total number of GPUS used to train a checkpointed model during eval. Used only to calculate epoch number to print during evaluation
    save_checkpoints_steps=23000 # after how many steps to save the TF checkpoint
    save_summary_steps=0 # after how many steps to save the summary for tensorboard
    adv_bn_init=False # init gamme of the last BN of each ResMod at 0.
    adv_conv_init=False
    local_logdir = False
    
    
    # Set OS Environmentvariables for GPUs
    gpu_thread_count = 2
    os.environ['TF_GPU_THREAD_MODE'] = 'gpu_private'
    os.environ['TF_GPU_THREAD_COUNT'] = str(gpu_thread_count)
    os.environ['TF_USE_CUDNN_BATCHNORM_SPATIAL_PERSISTENT'] = '1'
    os.environ['TF_ENABLE_WINOGRAD_NONFUSED'] = '1'
    
    # Write checkpoints and logs so that Hops Tensorboard can access them
    from hops import tensorboard
    log_dir = tensorboard.logdir()
    
    # Get files and sizes for Train/Val/Test TFRecords, we do not have lables for test_tfr_file
    train_tfr_file, val_tfr_file, test_tfr_file, train_size, val_size, test_size = get_filenames_and_size()
    
    #training_samples_per_rank = num_training_samples
    #TinyImageNet have images downsampled from 224x224 to 64x64
    height, width = 64, 64
    global_batch_size = batch_size

    # Get the number of training steps: num_examples*epochs(complete pass over dataset)/Batch_size (how many examples in each step)
    nstep = train_size * num_epochs // global_batch_size
    decay_steps = nstep
    nstep_per_epoch = train_size // global_batch_size

    # If decay mode is steps, then the learning rate is decayed by a static factor every x number of training steps
    if lr_decay_mode == 'steps':
        steps = [int(x) * nstep_per_epoch for x in lr_decay_steps.split(',')]
        lr_steps = [lr]
        for i in range(len(lr_decay_steps.split(','))):
            lr_steps.append(lr * pow(lr_decay_factor, i + 1))
    # Else we use polynomial decay of the learningrate (learning rate decayed by a polynomial rate)
    else:
        steps = []
        lr_steps = []

    if not save_checkpoints_steps:
        # default to save one checkpoint per epoch
        save_checkpoints_steps = nstep_per_epoch
    if not save_summary_steps:
        # default to save one checkpoint per epoch
        save_summary_steps = nstep_per_epoch

    # How many warmup iterations
    warmup_it = nstep_per_epoch * warmup_epochs

    print('PY' + str(sys.version) + 'TF' + str(tf.__version__))
    
    # Create a ProtoBuf message with GPU configuration to send out to all
    # nodes/gpus
    config = tf.ConfigProto()
    config.gpu_options.force_gpu_compatible = True  # Force pinned memory
    config.intra_op_parallelism_threads = 1  # Avoid pool of Eigen threads
    config.inter_op_parallelism_threads = 5 # Allow to use thread parallelism to deal with independent tasks
    config.allow_soft_placement = True # Allow fallback to CPU if GPU is missing
    
    do_checkpoint = True
    
    # Creating the ResNet classifier using the Estimator API
    # We are using a custom estimator rather than a pre-made one, 
    # which means we need to supply a model_function (cnn_model_function)
    save_summary_steps=save_summary_steps if do_checkpoint else None
    save_checkpoints_steps=save_checkpoints_steps if do_checkpoint else None
    print("Creating Classifier with model: {}, steps: {}, lr_steps: {}, lr_decay_mode: {}, save_summary_steps: {}, save_checkpoints_steps: {}".format(model, steps, lr_steps, lr_decay_mode,save_summary_steps, 
                                          save_checkpoints_steps))
    classifier = tf.estimator.Estimator(
        model_fn=cnn_model_function,
        model_dir=log_dir, #logdir from hops, update
        params={
            'model': model,
            'decay_steps': decay_steps,
            'n_classes': 200,
            'dtype': tf.float16 if fp16 else tf.float32,
            'format': 'channels_last',
            'device': '/gpu:0', #remove maybe?
            'lr': lr,
            'mom': mom,
            'wdecay': wdecay,
            'steps': steps,
            'lr_steps': lr_steps,
            'lr_decay_mode': lr_decay_mode,
            'warmup_it': warmup_it,
            'warmup_lr': warmup_lr,
            'loss_scale': loss_scale,
            'adv_bn_init': adv_bn_init,
            'conv_init': tf.variance_scaling_initializer() if adv_conv_init else None,
        },
        config=tf.estimator.RunConfig(
            session_config=config,
            save_summary_steps=save_summary_steps,
            save_checkpoints_steps=save_checkpoints_steps,
            keep_checkpoint_max=None))

    # Start training if we are not running only evaluation
    if not evaluate:
        
        # Estimator API hides a lot of details about the model, which
        # makes the code cleaner but you have less control over the 
        # session and execution of the model.
        # To be able to track execution while it is running, we use 
        # hooks. PrefillStagingAreasHook is a hook for pre-fetching
        # tensors before starting execution on the GPU
        training_hooks = [PrefillStagingAreasHook()]
        # LogHook used for logging during execution
        training_hooks.append(
            LogSessionRunHook(global_batch_size,
                              train_size,
                              display_every))
        try:
            # Starts the training of the classifier, feeding in data from
            # make_dataset functin
            start_time = time.time()
            print("Start training, batch-size: {}, num-examples: {}, num-epochs: {}, num-steps: {}, num-step-per-epoch: {}, lr: {}, mom: {}, wdecay: {}".format(batch_size, train_size, num_epochs, nstep, nstep_per_epoch, lr, mom, wdecay))
            classifier.train(
                input_fn=lambda: make_dataset(
                    train_tfr_file,
                    batch_size, training=True), #batch_size
                steps=nstep, #nsteps
                hooks=training_hooks)
            with tf.device(None):  # Allow fallback to CPU if no GPU support for these ops
                    export_estimator(classifier, EXPORT_MODEL_DIR)
            print("Finished in {}".format(time.time() - start_time))
        except KeyboardInterrupt:
            print("Keyboard interrupt")
            
    final_accuracy_top1 = 0.0
    # Evaluation after training 
    if True:
        print("Evaluating")
        print("Validation dataset size: {}".format(val_size))
        barrier = tf.constant(0, dtype=tf.float32)
        tf.Session(config=config).run(barrier)
        time.sleep(5)  # a little extra margin...
        validation_step_top1_top5_loss = []
        validation_step_top1_top5_loss.append("step,top1acc,top5acc,loss")
        # Read the stored checkpoints and collect the accuracies for evaluation
        try:
            ckpts = sort_and_load_ckpts(log_dir)
            print("Number of model checkpoints to evaluate on the validation dataset: {}".format(len(ckpts)))
            for i, c in enumerate(ckpts):
                if i < len(ckpts) - 1:
                    if (not evaluate_interval) or (i % evaluate_interval != 0):
                        continue
                # Evaluate the trained classifier based on each checkpoint by using the validation dataset, 
                # the validation dataset is read from HopsFS using make_dataset
                # the model weights are read from HopsFS
                print("running evaluation on the validation set (size: {}) with the checkpoint: {}".format(val_size, c["path"]))
                eval_result = classifier.evaluate(
                    input_fn=lambda: make_dataset(
                        val_tfr_file,
                        batch_size),
                    checkpoint_path=c['path'])
                print("eval_result on validation set: {}".format(eval_result))
                # save the validation results in the checkpoint
                c['epoch'] = (c['step'] * num_gpus) / (nstep_per_epoch)
                c['top1'] = eval_result['val-top1acc']
                c['top5'] = eval_result['val-top5acc']
                c['loss'] = eval_result['loss']
            barrier = tf.constant(0, dtype=tf.float32)
            
            for i, c in enumerate(ckpts):
                tf.Session(config=config).run(barrier)
                if 'top1' not in c:
                    continue
                # print the epoch, top1, top5, loss, mtime over time, going through all checkpoints
                print('Validation Dataset Evaluation: | Step: {:5d} | Epoch: {:5.1f} | Top1Acc: {:5.3f} | Top5Acc: {:6.2f} | Loss: {:6.2f} | Time(h): {:10.3f}'.format(
                                 c['step'],
                                 c['epoch'],
                                 c['top1'],
                                 c['top5'],
                                 c['loss'],
                                 c['mtime']))
                validation_step_top1_top5_loss.append("{},{},{},{}".format(c["step"], c["top1"], c["top5"], c["loss"]))
                final_accuracy_top1 = c["top1"]
            print("Finished evaluation on validation set")
            validation_results_str = "\n".join(validation_step_top1_top5_loss)
            py_hdfs.dump(validation_results_str, VALIDATION_SET_RESULTS_FILE)
            return final_accuracy_top1
            
        except KeyboardInterrupt:
            print("Keyboard interrupt")
        return final_accuracy_top1

## Run the Experiments and Hyperparameter Search Using Hops Library

For these type of experiments we only run a few epochs to see trends among hyperparameters, not to achieve the best possible accuracy (that we will do in the next notebook)

In [None]:
from hops import experiment
args_dict = {"model" : ["resnet18","resnet34"],"lr" : [0.001, 0.1], 
             "lr_decay_mode" : ["poly", "steps"],
            "wdecay":[0.00001, 0.001], "mom":[0.80, 1]} #hyperparameters to tune
# evolutionary search through the space of hyperparameters using genetic algorithms
logdir, best_param_dict = experiment.differential_evolution(main, args_dict, generations=20, population=15, direction="max", cleanup_generations=True)

![evo_search_accuracy.png](./../images/evo_search_accuracy.png)
![evo_search_hist.png](./../images/evo_search_hist.png)

## Save the Best HyperParameters to HopsFS
The next notebook will read the best parameters and use those for training on a larger amount of epochs using a larger amount of GPUs

In [None]:
hyperparams_str = ""
for k,v in best_param_dict.items():
    hyperparams_str = hyperparams_str + str(k) + "," + str(v) + "\n"
py_hdfs.dump(hyperparams_str, HYPERPARAMS_FILE)

## Run single training experiment with a set of hyperparameters using a single GPU
Running 150 epochs (6hours on single GPU) with the following parameters yields Top1Acc: 0.95 | Top5Acc: 0.97 on the training set:

- model: resnet18
- lr: 0.01
- batch_size: 100
- mom: 0.90
- wdecay : 0.0001
- warmup_lr: 0.001 
- warmup_epochs: 0
- lr_decay_factor: 0.1
- lr_decay_steps: '30,60,80' 
- lr_decay_mode: "poly"
- loss_scale=1024
- num_gpus=1

In [None]:
from hops import experiment
experiment.launch(main)

Plot accuracy:

```python
import pandas as pd
df = pd.read_csv("hdfs:///Projects/ImageNet_EndToEnd_MLPipeline/tiny-imagenet/tiny-imagenet-200/single_gpu_validation_results.txt",sep=",")
steps = df["step"].values
top1accs = df["top1acc"].values
top5accs = df["top5acc"].values
plt.rcParams["figure.figsize"] = (8,4)
fig, ax = plt.subplots()
x = np.linspace(-4, 4, 150)
ax.plot(steps, top1accs, linewidth=2, alpha=0.6, label="top1-accuracy")
ax.plot(steps, top5accs, linewidth=2, alpha=0.6, label="top5-accuracy")
ax.set_title("Validation set accuracy")
ax.set_xlabel("Steps")
ax.set_ylabel("Accuracy")
ax.legend()
fig.savefig('validation_set_accuracy.png')
plt.show()
```
![validation_set_accuracy.png](./../images/validation_set_accuracy.png)

Plot loss:
```python
import pandas as pd
df = pd.read_csv("hdfs:///Projects/ImageNet_EndToEnd_MLPipeline/tiny-imagenet/tiny-imagenet-200/single_gpu_validation_results.txt",sep=",")
steps = df["step"].values
loss = df["loss"].values
plt.rcParams["figure.figsize"] = (8,4)
fig, ax = plt.subplots()
x = np.linspace(-4, 4, 150)
ax.plot(steps, loss, linewidth=2, alpha=0.6, label="training loss")
ax.set_title("Training loss")
ax.set_xlabel("Steps")
ax.set_ylabel("Loss")
ax.legend()
fig.savefig('validation_set_loss.png')
plt.show()
```
![validation_set_loss.png](./../images/validation_set_loss.png)

We can see that the model is overfitting as its validation accuracy is not improving despite training loss going down. The top-accuracy on the leaderboard for this dataset is 0.732. To improve the model accuracy we can add more regularization (e.g dropout) to reduce the amount of overfitting. We can also do more data augmentation and we can re-shape the 2% of the images in the training dataset that we have excluded due to being mis-shaped.

## Monitoring The Jobs

### Open the Job-UIs panel
![tensorboard1.png](./../images/tensorboard1.png)

### Open the SparkUI (Spark is used to distribute the hyperparameter tuning)
![tensorboard2.png](./../images/tensorboard2.png)

### Open the Job in the SparkUI
![tensorboard3.png](./../images/tensorboard3.png)

### Links to Logs for Each Experiment
![tensorboard4.png](./../images/tensorboard4.png)

### Example Log for an Experiment
![tensorboard5.png](./../images/tensorboard5.png)

### Links to Tensorboards for Each Experiment
![tensorboard6.png](./../images/tensorboard6.png)

### Example Tensorboard for an Experiment
![tensorboard7.png](./../images/tensorboard7.png)

![tensorboard8.png](./../images/tensorboard8.png)