# Getting started: Fashion Mnist on Hops Notebook
---

<font color='red'> <h3>Tested with TensorFlow 1.10</h3></font>

In this notebook we are going to master running Deep Learning code on Hops using Jupyter notebooks and HopsFS, the highly efficient distributed file system that Hops provides. HopsFS is a fork of Apache HDFS and operations for reading and writing files to HopsFS is the same as with HDFS that TensorFlow supports.
For more information about how to read from HDFS, please see: https://www.tensorflow.org/deploy/hadoop

In this example program we are going to:
- Run Fashion Mnist code using `tf.contrib.learn.Experiment` and `tf.estimator.Estimator`
- Using the hops python library to run TensorFlow on the Hops Platform
- Read data from a Dataset which is located in your project, on HopsFS (HDFS)
- Define a convolutional neural network
- Define a set of hyperparameters and values to perform gridsearch on
- Run training for each combination of hyperparameters using the Experiment API
- Monitor training using TensorBoard to see accuracy and loss for every hyperparameter configuration
- Monitoring training by looking at logs
- Find where the model and TensorBoard events for each hyperparameter configuration is saved in your project
- Visualize TensorBoard events from the previous TensorBoard run

## Table of contents:

### [TensorFlow on Hops](#paradigm)
### [Fashion Mnist training on Hops](#mnist)
### [Hyperparameter search](#hyperparam)
### [Launching TensorFlow jobs](#starting)
### [Monitoring execution - TensorBoard](#tensorboard)
### [Monitoring execution - Logs](#logs)
### [Increasing throughput - Running jobs in parallel](#parallel)
### [Show TensorBoard from previous runs](#visualize)


## Hops TensorFlow programming paradigm <a class="anchor" id='paradigm'></a>

To be able to run your TensorFlow code on Hops, the code for the whole program needs to be provided and put inside a wrapper function. Everything, from importing libraries to reading data and defining the model and running the program needs to be put inside a wrapper function. If you wish to run gridsearch over a given set of hyperparameters, you can define arguments for this wrapper function that corresponds to the name of your hyperparameters.

You can also submit one or more `.py`, `.zip` or `.egg` files that contain your code and import them in the wrapper function. To include files, navigate back to HopsWorks and restart restart Jupyter, you can then include files in the Jupyter configuration.

## The `hops` python module

Below you can see the aforementioned wrapper function, which is coincidently named `wrapper` but could potentially be named anything. You can see two imports from the `hops` module, a `tensorboard` and an `hdfs` module. These are the only two modules that you will need to use in your TensorFlow wrapper function. 

### Using the `tensorboard` module
The `tensorboard` module allow us to get the log directory for summaries and checkpoints to be written to the TensorBoard we will see in a bit. The only function that we currently need to call is `tensorboard.logdir()`, which returns the path to the TensorBoard log directory. Furthermore, the content of this directory will be put in as a Dataset in your project in HopsFS after each hyperparameter configuration is finished. The `experiment.launch` function, that we will look at abit further down will return the exact path, which you can then navigate to using HopsWorks to inspect the files.

The directory could in practice be used to store other data that should be accessible after each hyperparameter configuration is finished.
```python
# Use this module to get the TensorBoard logdir
from hops import tensorboard
tensorboard_logdir = tensorboard.logdir()
```


### Using the `hdfs` module
The `hdfs` module provides a single method to get the path in HopsFS where your data is stored, namely by calling `hdfs.project_path()`. The path resolves to the root path for your project, which is the view that you see when you click `Data Sets` in HopsWorks. To point where your actual data resides in the project you to append the full path from there to your Dataset. For example if you create a mnist folder in your Resources Dataset, which is created automatically for each project, the path to the mnist data would be `hdfs.project_path() + 'Resources/mnist`
```python
# Use this module to get the path to your project in HopsFS, then append the path to your Dataset in your project
from hops import hdfs
project_path = hdfs.project_path()
```

![image11-Dataset-ProjectPath.png](../../../images/datasets.png)

## Training Fashion Mnist on Hops <a class="anchor" id='cifar10'></a>

### The wrapper function

Here we define the aforementioned wrapper function containing the code to run, with the hyperparameter arguments `learning_rate` and `dropout`. It simply contains all the TensorFlow code that we want to run.

In [None]:
def wrapper(learning_rate, dropout):

    import tensorflow as tf
    import numpy as np
    from hops import tensorboard
    from hops import hdfs

    # Training Parameters
    num_steps = 100
    batch_size = 128

    # Network Parameters
    num_input = 784 # MNIST data input (img shape: 28*28)
    num_classes = 10 # MNIST total classes (0-9 digits)

    train_filenames = [hdfs.project_path() + "TourData/mnist/train/train.tfrecords"]
    validation_filenames = [hdfs.project_path() + "TourData/mnist/validation/validation.tfrecords"]

    # Create the neural network
    # TF Estimator input is a dict, in case of multiple inputs
    def conv_net(x, n_classes, dropout, reuse, is_training):

        # Define a scope for reusing the variables
        with tf.variable_scope('ConvNet', reuse=reuse):

            # MNIST data input is a 1-D vector of 784 features (28*28 pixels)
            # Reshape to match picture format [Height x Width x Channel]
            # Tensor input become 4-D: [Batch Size, Height, Width, Channel]
            x = tf.reshape(x, shape=[-1, 28, 28, 1])

            # Convolution Layer with 32 filters and a kernel size of 5
            conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu)
            # Max Pooling (down-sampling) with strides of 2 and kernel size of 2
            conv1 = tf.layers.max_pooling2d(conv1, 2, 2)

            # Convolution Layer with 32 filters and a kernel size of 5
            conv2 = tf.layers.conv2d(conv1, 64, 3, activation=tf.nn.relu)
            # Max Pooling (down-sampling) with strides of 2 and kernel size of 2
            conv2 = tf.layers.max_pooling2d(conv2, 2, 2)

            # Flatten the data to a 1-D vector for the fully connected layer
            fc1 = tf.contrib.layers.flatten(conv2)

            # Fully connected layer (in tf contrib folder for now)
            fc1 = tf.layers.dense(fc1, 1024)
            # Apply Dropout (if is_training is False, dropout is not applied)
            fc1 = tf.layers.dropout(fc1, rate=dropout, training=is_training)

            # Output layer, class prediction
            out = tf.layers.dense(fc1, n_classes)

        return out


    # Define the model function (following TF Estimator Template)
    def model_fn(features, labels, mode, params):

        # Build the neural network
        # Because Dropout have different behavior at training and prediction time, we
        # need to create 2 distinct computation graphs that still share the same weights.
        logits_train = conv_net(features, num_classes, dropout, reuse=False, is_training=True)
        logits_test = conv_net(features, num_classes, dropout, reuse=True, is_training=False)

        # Predictions
        pred_classes = tf.argmax(logits_test, axis=1)
        pred_probas = tf.nn.softmax(logits_test)

        # If prediction mode, early return
        if mode == tf.estimator.ModeKeys.PREDICT:
            return tf.estimator.EstimatorSpec(mode, predictions=pred_classes)

        # Define loss and optimizer
        loss_op = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits_train, 
                                                                                labels=tf.cast(labels, dtype=tf.int32)))
        
        optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
        train_op = optimizer.minimize(loss_op, global_step=tf.train.get_global_step())

        # Evaluate the accuracy of the model
        acc_op = tf.metrics.accuracy(labels=labels, predictions=pred_classes)

        image = tf.reshape(features[:10], [-1, 28, 28, 1])
        tf.summary.image("image", image)

        # TF Estimators requires to return a EstimatorSpec, that specify
        # the different ops for training, evaluating, ...
        estim_specs = tf.estimator.EstimatorSpec(
          mode=mode,
          predictions=pred_classes,
          loss=loss_op,
          train_op=train_op,
          eval_metric_ops={'accuracy': acc_op})

        return estim_specs


    def data_input_fn(filenames, batch_size=128, shuffle=False, repeat=None):

        def parser(serialized_example):
            """Parses a single tf.Example into image and label tensors."""
            features = tf.parse_single_example(
                serialized_example,
                features={
                    'image_raw': tf.FixedLenFeature([], tf.string),
                    'label': tf.FixedLenFeature([], tf.int64),
                })
            image = tf.decode_raw(features['image_raw'], tf.uint8)
            image.set_shape([28 * 28])

            # Normalize the values of the image from the range [0, 255] to [-0.5, 0.5]
            image = tf.cast(image, tf.float32) / 255 - 0.5
            label = tf.cast(features['label'], tf.int32)
            return image, label

        def _input_fn():
            # Import MNIST data
            dataset = tf.data.TFRecordDataset(filenames)

            # Map the parser over dataset, and batch results by up to batch_size
            dataset = dataset.map(parser)
            if shuffle:
                dataset = dataset.shuffle(buffer_size=128)
            dataset = dataset.batch(batch_size)
            dataset = dataset.repeat(repeat)
            iterator = dataset.make_one_shot_iterator()

            features, labels = iterator.get_next()

            return features, labels

        return _input_fn


    run_config = tf.contrib.learn.RunConfig(
        model_dir=tensorboard.logdir(),
        save_checkpoints_steps=10,
        save_summary_steps=5,
        log_step_count_steps=10)

    hparams = tf.contrib.training.HParams(
        learning_rate=learning_rate, dropout_rate=dropout)

    summary_hook = tf.train.SummarySaverHook(
          save_steps = run_config.save_summary_steps,
          scaffold= tf.train.Scaffold(),
          summary_op=tf.summary.merge_all())

    mnist_estimator = tf.estimator.Estimator(
        model_fn=model_fn,
        config=run_config,
        params=hparams
    )


    train_input_fn = data_input_fn(train_filenames[0], batch_size=batch_size)
    eval_input_fn = data_input_fn(validation_filenames[0], batch_size=batch_size)

    experiment = tf.contrib.learn.Experiment(
        mnist_estimator,
        train_input_fn=train_input_fn,
        eval_input_fn=eval_input_fn,
        train_steps=num_steps,
        min_eval_frequency=5,
        eval_hooks=[summary_hook]
    )

    experiment.train_and_evaluate()
    
    accuracy_score = mnist_estimator.evaluate(input_fn=eval_input_fn, steps=num_steps)["accuracy"]
    
    return accuracy_score


## Hyperparameter search <a class="anchor" id='hyperparam'></a>

Hyperparameter optimization is critical to achieve the best accuracy for your model. With Hops, hyperparameter optimization is easier than ever.  We can find the best hyperparameters to train the model and make it easy to find the best set of hyper parameters by visualizing them in TensorBoard for you.

To define the hyperparameters you wish to perform gridsearch with, simply create a dictionary with the keys matching the arguments of your wrapper function, and a list of values for each parameter like below.

```python
# Arguments to use in training
args_dict = {'learning_rate': [0.001, 0.0005, 0.0001], 'dropout': [0.45, 0.7]}
```

So in effect, this is interpreted as that you want to run your TensorFlow code with 6 different hyperparameter combinations as shown in the table.


| Job number | Learning rate | Dropout |
|:----------:|:-------------:|:-------:|
|      1     |     0.001     |   0.45  |
|      2     |     0.001     |   0.7   |
|      3     |     0.0005    |   0.45  |
|      4     |     0.0005    |   0.7   |
|      5     |     0.0001    |   0.45  |
|      6     |     0.0001    |   0.7   |

In [None]:
#Define dict for hyperparameters
args_dict = {'learning_rate': [0.001, 0.0005, 0.0001], 'dropout': [0.45, 0.7]}

## Running the training using `experiment` module <a class="anchor" id='starting'></a>

The last, and arguably most important module that we demonstrate from the `hops` module is called `experiment`. The `experiment` module provides a function called `grid_search`. As arguments it simply takes the wrapper function and the dictionary with the hyperparameters. `experiment.grid_search` will simply create the grid of the arguments specified, run the wrapper function and inject the value of each hyperparameter. Assuming that you started Jupyter with the default configuration, each job will be run sequentially. To increase throughput by running two or more jobs in parallel, please see [this](#parallel).

In [None]:
from hops import experiment
from hops import hdfs

notebook = hdfs.project_path() + "Jupyter/Parallel_Experiments/TensorFlow/grid_search/grid_search_fashion_mnist.ipynb"
experiment.grid_search(wrapper, args_dict, direction='max',
                       name='fashion mnist grid search', 
                       description='Demonstration of running gridsearch hyperparameter optimization with fashion mnist',
                       versioned_resources=[notebook])

## Monitoring execution - TensorBoard <a class="anchor" id='tensorboard'></a>
To find the TensorBoard for the execution, please go back to HopsWorks and enter the Experiments service.
Then copy & paste the experiment_id into the textbox and press enter to start a TensorBoard to see all experiments being run in parallel.
![Image7-Monitor.png](../../../images/experiments_service.png)
![Image7-Monitor.png](../../../images/tensorboard.png)