# Evolutionary Hyperparameter Optimization: Cifar-10 on Hops Notebook
---

<font color='red'> <h3>Tested with TensorFlow 1.8</h3></font>

This is a more advanced example compared to the Fashion mnist notebook which introduces the general programming model for running TensorFlow on Hops. In this example we will look into a state-of-the-art technique for finding the optimal hyperparameters for your model.

In this example program we are going to:
- Introduce the programming model for using evolutionary hyperparameter optimization on Hops

## Table of contents:

### [What is Hyperparameter Optimization](#whatisit)
### [Programming model on Hops ](#cifar10)
### [Defining search bounds ](#searchbound)
### [Tuning the search ](#tuning)
### [Starting the search ](#starting)
### [Performance notes ](#performance)
### [Visualizing the runs ](#visualize)


## What is Evolutionary Hyperparameter Optimization <a class="anchor" id='whatisit'></a>

Hyperparameter optimization is the process of optimizing the hyperparameters of a specific model. Evolutionary optimization is a methodology for the global optimization of noisy black-box functions. In hyperparameter optimization, evolutionary optimization uses evolutionary algorithms to search the space of hyperparameters for a given algorithm. Evolutionary hyperparameter optimization follows a process inspired by the biological concept of evolution. 


## Programming model on Hops <a class="anchor" id='cifar10'></a>

Similar to the programming model in the fashion_mnist_on_hops notebook, the hyperparameters should be listed as arguments in a wrapper function. The wrapper function should wrap the code that you want to run.

The major difference is that a metric which should be maximized or minimized should be returned by your wrapper function. In the following example we return the accuracy of the model which should be maximized.

In [None]:
# Wrap the TensorFlow code you want to run in a function
# To perform hyper-parameter searching define your parameters as arguments

def wrapper(learning_rate, dropout, num_layers):
    
    import tensorflow as tf
    
    # Use this module to get the TensorBoard logdir
    from hops import tensorboard
    
    # Use this module to get the path to your project in HopsFS, then append the path to your Dataset in your project
    from hops import hdfs
    
    num_classes = 10 # CIFAR10 total classes (0-9 objects)
    batch_size = 128
    stride = 1
    num_steps = 500
    filters = 78
    kernel = 5

    # Network Parameters
    num_input = 32*32*3 # CIFAR10 data input (img shape: 32*32*3)
    
    def layer(x, filters, kernel, stride):
        # Convolution Layer with 32 filters and a kernel size of 5
        conv = tf.layers.conv2d(x, filters, kernel, strides=(stride, stride), activation=tf.nn.relu)
        # Max Pooling (down-sampling) with strides of 2 and kernel size of 2
        return tf.layers.max_pooling2d(conv, stride, stride)
    
    # Create the neural network
    # TF Estimator input is a dict, in case of multiple inputs
    def conv_net(x, n_classes, dropout, reuse, is_training):
        

        # Define a scope for reusing the variables
        with tf.variable_scope('ConvNet', reuse=reuse):

            # CIFAR10 data input is a 1-D vector of (32*32*3 pixels)
            # Reshape to match picture format [Height x Width x Channel]
            # Tensor input become 4-D: [Batch Size, Height, Width, Channel]
            x = tf.reshape(x, shape=[-1, 32, 32, 3])
            #x = tf.image.crop_and_resize(tf.reshape(x, shape=[-1, 32, 32, 3]),size=[24,24])

            h = x
            for i in range(num_layers):
                h = layer(h, filters, kernel, stride)

            # Flatten the data to a 1-D vector for the fully connected layer
            fc1 = tf.contrib.layers.flatten(h)

            # Fully connected layer (in tf contrib folder for now)
            fc1 = tf.layers.dense(fc1, 384)
            # Apply Dropout (if is_training is False, dropout is not applied)
            fc1 = tf.layers.dropout(fc1, rate=dropout, training=is_training)
            
            # Output layer, class prediction
            out = tf.layers.dense(fc1, n_classes)

            return out
    
    # Define the model function (following TF Estimator Template)
    def model_fn(features, labels, mode, params):

        # Build the neural network
        # Because Dropout have different behavior at training and prediction time, we
        # need to create 2 distinct computation graphs that still share the same weights.
        logits_train = conv_net(features, num_classes, dropout, reuse=False, is_training=True)
        logits_test = conv_net(features, num_classes, dropout, reuse=True, is_training=False)

        # Predictions
        pred_classes = tf.argmax(logits_test, axis=1)
        pred_probas = tf.nn.softmax(logits_test)

        # If prediction mode, early return
        if mode == tf.estimator.ModeKeys.PREDICT:
            return tf.estimator.EstimatorSpec(mode, predictions=pred_classes)

        # Define loss and optimizer
        loss_op = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
        logits=logits_train, labels=tf.cast(labels, dtype=tf.int32)))
        optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
        train_op = optimizer.minimize(loss_op, global_step=tf.train.get_global_step())

        # Evaluate the accuracy of the model
        acc_op = tf.metrics.accuracy(labels=labels, predictions=pred_classes)

        image = tf.reshape(features[:10], [-1, 32, 32, 3])
        tf.summary.image("image", image)

        # TF Estimators requires to return a EstimatorSpec, that specify
        # the different ops for training, evaluating, ...
        estim_specs = tf.estimator.EstimatorSpec(
          mode=mode,
          predictions=pred_classes,
          loss=loss_op,
          train_op=train_op,
          eval_metric_ops={'accuracy': acc_op})

        return estim_specs
    
    def data_input_fn(filenames, num_input, batch_size=128, shuffle=False, repeat=None):
    
        def parser(serialized_example):
            """Parses a single tf.Example into image and label tensors."""
            features = tf.parse_single_example(
                serialized_example,
                features={
                    'image': tf.FixedLenFeature([], tf.string),
                    'label': tf.FixedLenFeature([], tf.int64),
                })
            image = tf.decode_raw(features['image'], tf.uint8)
            image.set_shape([num_input])

            image = tf.cast(
                tf.transpose(tf.reshape(image, [3, 32, 32]), [1, 2, 0]),
                tf.float32)

            # Normalize the values of the image from the range [0, 255] to [-0.5, 0.5]
            image = tf.cast(image, tf.float32) / 255 - 0.5
            label = tf.cast(features['label'], tf.int32)
            return image, label

        def _input_fn():
            # Import CIFAR10 data
            dataset = tf.data.TFRecordDataset(filenames)

            # Map the parser over dataset, and batch results by up to batch_size
            dataset = dataset.map(parser)
            if shuffle:
                dataset = dataset.shuffle(buffer_size=batch_size)
            dataset = dataset.batch(batch_size)
            dataset = dataset.repeat(repeat)
            iterator = dataset.make_one_shot_iterator()

            features, labels = iterator.get_next()

            return features, labels

        return _input_fn

    logdir = tensorboard.logdir()
    
    # Path to your project in HopsFS, parent folder for your DataSets (Resources, Logs etc)
    data_dir = hdfs.project_path()
    train_filenames = [data_dir + "TestJob/data/cifar10/train/train.tfrecords"]
    validation_filenames = [data_dir + "TestJob/data/cifar10/validation/validation.tfrecords"]

    run_config = tf.contrib.learn.RunConfig(
        model_dir=logdir,
        log_device_placement=True,
        save_checkpoints_steps=100,
        save_summary_steps=100,
        log_step_count_steps=100)
    
    hparams = tf.contrib.training.HParams(
        learning_rate=learning_rate, dropout_rate=dropout)

    summary_hook = tf.train.SummarySaverHook(
          save_steps = run_config.save_summary_steps,
          scaffold= tf.train.Scaffold(),
          summary_op=tf.summary.merge_all())

    cifar10_estimator = tf.estimator.Estimator(
        model_fn=model_fn,
        config=run_config,
        params=hparams
    )

    train_input_fn = data_input_fn(train_filenames[0], num_input, batch_size=batch_size)
    eval_input_fn = data_input_fn(validation_filenames[0], num_input, batch_size=batch_size)
    
    experiment = tf.contrib.learn.Experiment(
        cifar10_estimator,
        train_input_fn=train_input_fn,
        eval_input_fn=eval_input_fn,
        train_steps=num_steps,
        min_eval_frequency=10,
        eval_hooks=[summary_hook]
    )

    experiment.train_and_evaluate()
    
    accuracy_score = cifar10_estimator.evaluate(input_fn=eval_input_fn, steps=num_steps)["accuracy"]
    
    return accuracy_score


## Defining search bounds <a class="anchor" id='searchbound'></a>

The next step is to define the bounds for each hyperparameter in which we should peform the search for the best hyperparameters. This is simply done by creating a dict with the name of the key corresponding to the name of the hyperparameter, and the value being an array with two elements; the lower and upper bound.

In [None]:
boundary_dict = {'learning_rate': [0.005, 0.00005], 'dropout': [0.01, 0.99], 'num_layers': [1,3]}

## Tuning parameters <a class="anchor" id='tuning'></a>

Before running the optimization there are several configuration values which can be set.

|   Parameter   |  Description                                                                            |           
|:-------------:|:---------------------------------------------------------------------------------------:|
|  direction    |  direction to optimize, 'max' or 'min'                                                  |
|  generations  |  number of generations (how long to search)                                             |
|  popsize      |  population per generation, the more hyperparameters the larger population              |
|  mutation     |  mutation rate to explore more different hyperparameter configuration                   |
|  crossover    |  how much generated hyperparameter combinations should adapt to current best combination|

## Starting the search <a class="anchor" id='starting'></a>

In [None]:
from hops import experiment

tensorboard_hdfs_logdir = experiment.evolutionary_search(spark, wrapper, boundary_dict, direction='max', popsize=10, generations=3, crossover=0.7, mutation=0.5, name='cifar10 differential evolution')

## Performance notes <a class="anchor" id='performance'></a>

The biggest downside of the evolutionary hyperparameter optimization is that it may spend time training networks with very poor performance as it learns better and better hyperparameter combinations. As such it is important to minimize time spent on operations which does not direcly impact the resulting metric to be maximized or minimized, a concrete example would be checkpointing the model which should be skipped - we are not interested in the model at this stage, only the hyperparameter combinations. Furthermore, operations such as early stopping could be used to avoid spending time on networks which does not improve sufficiently based on your critera.

## Visualizing generations and hyperparameters using TensorBoard <a class="anchor" id='visualize'></a>

TensorBoard provides a regex expression which can be used to easily filter out runs for each generation. The TensorBoard logdir for each generation will be put in a folder with the name generation{number}, where number corresponds to the number of the generation starting from 0. The best hyperparameter combinations are expected to appear in the later generations. Navigate to the Experiments service to visualize the generations in TensorBoard.

### Generation 0
![Image1-Tensorboard.png](../../images/generation0.png)
### Generation 1
![Image2-Tensorboard.png](../../images/generation1.png)
### Generation 2
![Image3-Tensorboard.png](../../images/generation2.png)