# Hidden layers

We sample here a dataset where a linear classifier is not able to model the data. We will see in this notebook how the addition of hidden layers makes it possible to improve classification performance. We also use TF-Slim, a lightweight library for defining, training, and evaluating models in TensorFlow.

With TF-Slim's higher level APIs we can reduce the number of lines of code to train the model. You will notice some overhead as our dataset needs to conform to the `slim.dataset.Dataset` API. It is worth going over this code, as you can use a similar approach when you need to define a dataset using your own data. 

You can visualize the loss using the TensorBoard web app. To do so, launch `tensorboard --logdir=solutions/train`. Note that you will have to periodically restart the tensorboard in order to visualize your latest training runs. 

Remember the master formula:
$$ model = data + structure + loss + optimizer$$

In this notebook, you will:
- define a neural network with hidden layers.
- train your network on the moon dataset.
- improve your accuracy by tuning the network structure.

You will use the slim API for all of these and here are some functions that will be helpful:

```tf.nn.relu, tf.nn.tanh, slim.fully_connected, slim.losses.softmax_cross_entropy, tf.one_hot```

----


## 1. Data
We first start by generating some data that is more complicated than in the prior notebook. 

### Exercise 0
- generate the moon dataset by executing the next block.

- why will the prior model fail to train a classifier for this data? Discuss with your neighbor. (plug in this data into notebook 0 to see what happens.)

In [None]:
import os
import numpy as np
from datetime import datetime
from sklearn.datasets import make_moons
from sklearn.cross_validation import train_test_split
import matplotlib.pyplot as plt
%matplotlib inline

import tensorflow as tf
slim = tf.contrib.slim

X, labels = make_moons(n_samples=1000, noise=0.275, random_state=0)
X_train, X_validation, y_train, y_validation = train_test_split(X, labels, test_size=.1, random_state=42)

x_min = -2
x_max = 3
y_min = -1.5
y_max = 2
plt.figure(figsize=(10, 10))
plt.scatter(X[:, 0], X[:, 1], c=labels, s=20, linewidths=0)
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.axes().set_aspect('equal')


### Save TFRecords dataset for training

We save our dataset numpy arrays to a file in TFRecord format. TFRecords are a binary format that simplifies many data loading/processing tasks in Tensorflow, but is not strictly necessary to using Tensorflow. It is integrated well with the tf.slim API, so we will use it here. You can learn more about data formats for Tensorflow here: [reading_data](https://www.tensorflow.org/versions/r0.12/how_tos/reading_data/index.html).

In [None]:
# dataset utility functions

_FILE_PATTERN = 'moons_%s.tfrecord'

_SPLITS_TO_SIZES = {
    'train': 900,
    'validation': 100,
}

_NUM_CLASSES = 2

_ITEMS_TO_DESCRIPTIONS = {
    'x': 'feature vector',
    'y': 'binary label',
}


def _add_to_tfrecord(X, labels, tfrecord_writer):
    """adds data to a TFRecord
    """    
    with tf.Session('') as sess:
        for x, label in zip(X, labels):
            example = tf.train.Example(features=tf.train.Features(feature={
                'x': tf.train.Feature(float_list=tf.train.FloatList(value=x)),
                'y': tf.train.Feature(int64_list=tf.train.Int64List(value=[label])),
            }))
            tfrecord_writer.write(example.SerializeToString())
            

def get_split(split_name, dataset_dir, file_pattern=None, reader=None):
    """Gets a dataset tuple
    """
    
    if split_name not in _SPLITS_TO_SIZES:
        raise ValueError('split name %s was not recognized.' % split_name)

    if not file_pattern:
        file_pattern = _FILE_PATTERN
    file_pattern= os.path.join(dataset_dir, file_pattern % split_name)
    
    # we store the dataset in TF Records
    if reader is None:
        reader = tf.TFRecordReader    

    # decoder
    keys_to_features = {
        'x': tf.FixedLenFeature([2], dtype=tf.float32),
        'y': tf.FixedLenFeature([], dtype=tf.int64),
    }
    items_to_handlers = {
        'x': slim.tfexample_decoder.Tensor('x'),
        'y': slim.tfexample_decoder.Tensor('y'),
    }
    decoder = slim.tfexample_decoder.TFExampleDecoder(
        keys_to_features, items_to_handlers)

    return slim.dataset.Dataset(
        data_sources=file_pattern,
        reader=tf.TFRecordReader,
        decoder=decoder,
        num_samples=_SPLITS_TO_SIZES[split_name],
        items_to_descriptions=_ITEMS_TO_DESCRIPTIONS,
        num_classes=_NUM_CLASSES,
        labels_to_names=None
    )


def load_batch(dataset, batch_size=32):
    """Load data batch
    """
    provider = slim.dataset_data_provider.DatasetDataProvider(
        dataset, shuffle=False)
    x, y = provider.get(['x', 'y'])
    return tf.train.batch(
        [x, y],
        batch_size=batch_size,
        num_threads=1,
        capacity=100
    )


# write dataset in TFRecords format
train_filename = _FILE_PATTERN % "train"
with tf.python_io.TFRecordWriter(train_filename) as tfrecord_writer:
    _add_to_tfrecord(X_train, y_train, tfrecord_writer)
test_filename = _FILE_PATTERN % "validation"
with tf.python_io.TFRecordWriter(test_filename) as tfrecord_writer:
    _add_to_tfrecord(X_validation, y_validation, tfrecord_writer)

    
# iterate over a few batches
dataset = get_split("train", ".")
xb, yb = load_batch(dataset, batch_size=4)
with tf.Session() as sess:
    with slim.queues.QueueRunners(sess):
        for i in range(2):
            print 'batch %d'%i
            x_np, y_np = sess.run([xb, yb])
            print x_np, y_np


--------

## 2. Structure

You just learned about hidden layers. You can now define a model with hidden units. You can edit the function below to specify your network. The slim API lets you easily define new layers with the `slim.fully_connected` function.

### Exercise 1

- Check out the function doc string for `slim.fully_connected` to learn how to use this function.
- `slim.fully_connected` is a faster way to create a fully connected layer. In the previous notebook, we made a fully connected layer "manually". What code from the last notebook does `slim.fully_connected` replace?
- Define the function body of `build_model(x)` below and create a network with 1 hidden layer and an output of dimension 2.

Hint:
- You will want to create the final layer in your network to be **linear**, i.e. not have an activation function. (Sometimes these output values are called "logits" if they are used as input to a sigmoid or softmax function.) This will simplify applying the loss function and is common practice for more modulare Tensorflow code. 


In [None]:
# you can get more info inside the notebook on all functions by typing in the function with a '?'
slim.fully_connected?

In [None]:
def build_model(x):
    """
    Build a neural network. This function gets as the data x as an argument 
    and returns the last layer. The final layer will have a 2 dimensional output!
    """
    # TODO define a deep model
    
    # e.g. fc1 = slim.fully_connected(inputs=x, num_outputs=2, activation_fn=tf.nn.relu, scope='fc1')

    return ...

-------

## 3. Loss
In the prior notebook, you computed the loss yourself. With the slim API

### Exercise 2
- Write down the loss function. The neural net has a 2 dimensional output. The labels for our dataset are 1-dimensional and either 0 or 1. How do you have to encode the network output to define your loss function?

- Fill in the `TODO`s below. use `slim.losses.softmax_cross_entropy` for your loss.


In [None]:
train_dir = "train/%s" % datetime.now().strftime("%H-%M-%S")

# hyperparameters
number_of_steps = 1000
batch_size = 100
learning_rate = .25

graph = tf.Graph()
with graph.as_default():
    
    # turn on a very verbose logging.
    tf.logging.set_verbosity(tf.logging.DEBUG)

    # load training data
    dataset = get_split("train", ".")
    x, y = load_batch(dataset, batch_size=batch_size)

    # define model
    logits = build_model(x)

    # TODO - define loss
    loss = ...
    
    # tell tensorflow to log the loss value for visualization
    tf.scalar_summary("loss", loss)
    
    # summary op
    summary_op = tf.merge_all_summaries()

------

## 4. Optimizer
The last missing piece for our model is the optimizer. We will first use [stochastic gradient descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) to train our model. 

### Exercise 3
- Create your optimizer using the `tf.train.GradientDescentOptimizer`
- Define the train step with `slim.learning.create_train_op`

Now your model is fully defined and you can
- run the training

In [None]:
with graph.as_default():

    # TODO - define optimizer
    optimizer = ...
    
    # TODO - define train operation
    train_op = ...

    # train the model
    slim.learning.train(
        train_op,
        logdir=train_dir,
        log_every_n_steps=number_of_steps / 10,
        number_of_steps=number_of_steps,
        summary_op=summary_op,
        save_summaries_secs=0.01
    )

### Inspect training
You can now inspect the training procedure using `tensorboard`. It's a great tool to get insights into the setup of your network and its training behavior. launch it using
```
$ tensorboard --logdir='path/to/traindir'
```
and then navigate to `localhost:6006` and take a peak around.

### Exercise 4
Make yourself familiar with the tensorboard and try to answer the following questions.
- Take a look at the `Scalars` tab and click on the loss line. What should things look like here?
- Inspect the `Graphs` tab. Is this what you expected to see? 
- What do the number on the lines between the layers mean?
- Click on the `+` in the top right corner of one of your `fully connected` layers. How many trainable parameters does this layer have?

## Evaluate performance
You trained your network to minimize the loss. To make this result interpretable, we now compute the accuracy of the prediction.

### Exercise 5
- How do you predict the label from the computed output?
- Fill in the missing line below and evaluate your model!

In [None]:
with tf.Graph().as_default():
    
    # load validation data
    dataset = get_split("validation", ".")
    x, y = load_batch(dataset, batch_size=dataset.num_samples)
    
    # build model
    logits = build_model(x)
    
    # TODO - get predictions from model (hint: tf.argmax)
    predictions = ...
    
    # compute accuracy
    acc_value_op, acc_update_op = slim.metrics.streaming_accuracy(predictions, y)
        
    # path to model checkpoint
    checkpoint_path = tf.train.latest_checkpoint(train_dir)
    print "Model checkpoint:", checkpoint_path
    
    # compute metrics
    accuracy = slim.evaluation.evaluate_once(
        master='',
        checkpoint_path=checkpoint_path,
        logdir=train_dir,
        num_evals=1,
        eval_op=acc_update_op,
        final_op=acc_value_op
    )
    
    print "Accuracy =", accuracy



## Visualize results

Just like in the last notebook, we now want to look at these results.

### Exercise 6
- Visualize the training results. What should the contours look like?

In [None]:
with tf.Graph().as_default():

    # build model and retrieve estimated probabilities
    x = tf.placeholder(tf.float32, [None, 2], name='x-input')
    logits = build_model(x)
    probabilities = slim.softmax(logits)

    # path to model checkpoint
    checkpoint_path = tf.train.latest_checkpoint(train_dir)
    print "Model checkpoint:", checkpoint_path

    # grid points to draw decision boundary
    h = 0.02
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    x_grid = np.c_[xx.ravel(), yy.ravel()]

    saver = tf.train.Saver()
    with tf.Session() as sess:

        # restore variables from checkpoint
        saver.restore(sess, checkpoint_path)

        # compute probabilitiese
        probabilities_grid = sess.run(probabilities, feed_dict={x: x_grid})
    

# plot probabilities
probabilities_grid = probabilities_grid[:, 1].reshape(xx.shape)
plt.figure(figsize=(10, 10))
plt.contourf(xx, yy, probabilities_grid, cmap=plt.cm.RdBu_r, alpha=.8)
plt.scatter(X[:, 0], X[:, 1], c=labels, s=20, linewidths=0)
#plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, s=20, linewidths=0)
#plt.scatter(X_validation[:, 0], X_validation[:, 1], c=y_validation, s=20, linewidths=0)
plt.axes().set_aspect('equal')
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max);


### Exercise 7

Not bad so far. But you can do better! You should be able to achieve an accuracy of 93%

Modify your model in the `build_graph` function:

- try more layers.
- try different numbers of hidden units.
- what do you observe?
- you should reach an accuracy of 90% on this task.
- how good can you get overall?
- why not better?