# Learning Dynamics

We return again to the "half moons" dataset you explored in the last notebook to understand how the dynamics of the learning process changes as you change key hyperparameters. In particular, we will look at the connection between some important hyperparameters and how well or quickly the model learns.


In [None]:
%matplotlib inline

import os
import numpy as np
from datetime import datetime
from sklearn.datasets import make_moons
import matplotlib.pyplot as plt

import tensorflow as tf
slim = tf.contrib.slim

Below are functions that load the dataset we have previously saved as TFRecord files, as well as a function to define a simple MLP that you will use below.

In [None]:
# dataset utility functions

_SPLITS_TO_SIZES = {
    'train': 900,
    'validation': 100,
}

_NUM_CLASSES = 2

_ITEMS_TO_DESCRIPTIONS = {
    'x': 'feature vector',
    'y': 'binary label',
}

def get_split(split_name, dataset_dir, file_pattern=None, reader=None):
    """Gets a dataset tuple
    """
    
    if split_name not in _SPLITS_TO_SIZES:
        raise ValueError('split name %s was not recognized.' % split_name)

    if not file_pattern:
        file_pattern = _FILE_PATTERN
    file_pattern= os.path.join(dataset_dir, file_pattern % split_name)
    
    # we store the dataset in TF Records
    if reader is None:
        reader = tf.TFRecordReader    

    # decoder
    keys_to_features = {
        'x': tf.FixedLenFeature([2], dtype=tf.float32),
        'y': tf.FixedLenFeature([], dtype=tf.int64),
    }
    items_to_handlers = {
        'x': slim.tfexample_decoder.Tensor('x'),
        'y': slim.tfexample_decoder.Tensor('y'),
    }
    decoder = slim.tfexample_decoder.TFExampleDecoder(
        keys_to_features, items_to_handlers)

    return slim.dataset.Dataset(
        data_sources=file_pattern,
        reader=tf.TFRecordReader,
        decoder=decoder,
        num_samples=_SPLITS_TO_SIZES[split_name],
        items_to_descriptions=_ITEMS_TO_DESCRIPTIONS,
        num_classes=_NUM_CLASSES,
        labels_to_names=None
    )

def load_batch(dataset, batch_size=32):
    """Load data batch
    """
    provider = slim.dataset_data_provider.DatasetDataProvider(
        dataset, shuffle=False)
    x, y = provider.get(['x', 'y'])
    return tf.train.batch(
        [x, y],
        batch_size=batch_size,
        num_threads=1,
        capacity=100
    )

def build_model(x):
    """Build the neural network
    """
    net = slim.fully_connected(x, 4, activation_fn=tf.nn.relu, scope='fc1')
    logits = slim.fully_connected(net, 2, activation_fn=None, scope='fc2')
    return logits


_FILE_PATTERN = 'moons_%s.tfrecord'
train_filename = _FILE_PATTERN % "train"
test_filename = _FILE_PATTERN % "validation"

## Train a model

Below is a code snippet that loads the dataset, builds the model, builds the optimizer, and runs training. Take the time to review this code. Run it to train on the dataset.

In [None]:
train_dir = "dynamics/%s" % datetime.now().strftime("%H-%M-%S")

# hyperparameters
number_of_steps = 20
batch_size = 100
learning_rate = 0.1

with tf.Graph().as_default():
    
    tf.logging.set_verbosity(tf.logging.DEBUG)

    # load training data
    dataset = get_split("train", ".")
    x, y = load_batch(dataset, batch_size=batch_size)

    # define model
    logits = build_model(x)

    # define loss
    loss = slim.losses.softmax_cross_entropy(logits, tf.one_hot(y, 2))
    tf.scalar_summary("loss", loss)

    # define optimizer
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
    
    # define train operation
    train_op = slim.learning.create_train_op(loss, optimizer)
    
    # summary op
    summary_op = tf.merge_all_summaries()

    # train the model
    slim.learning.train(
        train_op,
        logdir=train_dir,
        log_every_n_steps=number_of_steps / 10,
        number_of_steps=number_of_steps,
        summary_op=summary_op,
        save_summaries_secs=0.01
    )

## Inspecting the loss

You can start Tensorboard by running `tensorboard --logdir=dynamics` in the notebook directory and navigating to http://localhost:6006/. Once there, you should be able to click on the "loss" section to reveal a plot of the loss over the course of the training run you just did.

Below, you will train the same model while changing different hyperparameters.  Each of these runs will show up in Tensorboard, but Tensorboard refreshes slowly. To avoid waiting for new data to show up in Tensorboard, you will have to periodically restart Tensorboard. Each run will be named with a time stamp. If you like, you can clear out all the data if things get too cluttered by simply deleting the `dynamics/` directory that contains the logs.


### Exercise 0

In the loss plot, the loss probably went down, as expected. But notice that at the end of training, it probably looks like the loss was still headed downward. This might mean that we stopped training too early and that we left some improvements on the table.

Retrain the model above, but increase the `number_of_steps` to train for longer. Afterwards inspect the loss plot. Is the loss still going down? Find a number of steps that will result in the loss flattening out in the plot.


## Batch size

The batch size controls how many training examples are processed by the model at each training step and is an important factor in learning.


### Exercise 1 - batch sizes

Set the number of steps above to 100 and change the batch size hyperparameters to a small number (try 1). What do you observe about the loss plot in this case? You can try other batch sizes as well. Can you explain your observations?


## Learning rate

The most important hyperparameter that controls learning is the **learning rate**.

It is hard to know what the learning rate should be to begin with for any given problem, so it takes some experimentation for new problems. The learning rate is the basic knob to turn to control how fast learning takes place (how fast the weights change) and the number of iterations controls how long we train. Ideally we want fast learning.


### Exercise 2 - Trading off learning rate and epochs

By modifying and running the block above while changing the learning rate, we can explore some common behaviors of training.

1. Try a large learning rate - can you get the loss plot to look very jagged? What is happening in this case?
2. Try an even larger learning rate - can you get the loss to "explode" (i.e. become and stay large)
3. Try a small learning rate - can you get the loss plot to look like a straight line going down? How fast is learning in this case?
4. Setting `number_of_steps = 200` and `batch_size = 100`, can you find a learning rate that makes the loss below 0.3 without increasing the training time?
5. If you set the learning rate to 0.001, how many training steps do you need to make the loss drop below 0.5?
6. Try to make your model perform better than 85% test accuracy - what parameters did you use? How good can you do? (The code below will load the latest model and evaluate it on the test data - run it to see your latest model's performance.)

Hint: try exploring learning rates across orders of magnitude (0.1, 0.01, 0.001...). Explore combinations of learning rate and number of iterations.


In [None]:
with tf.Graph().as_default():
    
    # load validation data
    dataset = get_split("validation", ".")
    x, y = load_batch(dataset, batch_size=dataset.num_samples)
    
    # build model and retrieve predictions
    logits = build_model(x)
    predictions = tf.argmax(logits, 1)
    
    # compute accuracy
    acc_value_op, acc_update_op = slim.metrics.streaming_accuracy(predictions, y)
        
    # path to model checkpoint
    checkpoint_path = tf.train.latest_checkpoint(train_dir)
    print "Model checkpoint:", checkpoint_path
    
    # compute metrics
    accuracy = slim.evaluation.evaluate_once(
        master='',
        checkpoint_path=checkpoint_path,
        logdir=train_dir,
        num_evals=1,
        eval_op=acc_update_op,
        final_op=acc_value_op
    )
    
    print "Accuracy =", accuracy

