# Some things we get for free by using Estimators

Estimators are a high level abstraction (Interface) that supports all the basic operations you need to support a ML model on top of TensorFlow.

Estimators:
* provide a simple interface for users of canned model architectures: Training, evaluation, prediction, export for serving;
* provide a standard interface for model developers;
* drastically reduce the amount of user code required. This avoids bugs and speeds up development significantly;
* enable building production services against a standard interface;
* using experiments abstraction give you free data-parallelism.

You can use an already implemented estimator (canned estimator) or implement your own (custom estimator).

This tutorial is not focused on how to build your own estimator, we're using a custom estimator that implements a [CNN classifier for MNIST dataset](https://www.tensorflow.org/get_started/mnist/pros) but we're not going into details about how that's implemented.

Here we're going to show how Estimators make your life easier, once you have a estimator model is very simple to make changes on your model, compare results and iterate over time.


## Having a look at the code and running the experiment

### Dependencies

In [None]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

# our model 
import model as m

# tensorflow
import tensorflow as tf 
print(tf.__version__)  # tested with tf v1.2

from tensorflow.contrib import learn
from tensorflow.contrib.learn.python.learn import learn_runner
from tensorflow.python.estimator.inputs import numpy_io

# MNIST data
from tensorflow.examples.tutorials.mnist import input_data
# Numpy
import numpy as np

# Enable TensorFlow logs
tf.logging.set_verbosity(tf.logging.INFO)

### Getting the data

We're not going into details here

In [None]:
# Import the MNIST dataset
mnist = input_data.read_data_sets("/tmp/MNIST/", one_hot=True)

x_train = np.reshape(mnist.train.images, (-1, 28, 28, 1))
y_train = mnist.train.labels
x_test = np.reshape(mnist.test.images, (-1, 28, 28, 1))
y_test = mnist.test.labels

### Defining the model

We're not going into details here

In [None]:
# coding: utf-8

'''A Custom Estimator using CNNS for MNIST using Keras.

For reference:

* https://www.tensorflow.org/extend/estimators.
* https://www.tensorflow.org/get_started/mnist/beginners.
'''

# Define the model, using Keras
def model_fn(features, labels, mode, params):
  # Input Layer
  # Reshape X to 4-D tensor: [batch_size, width, height, channels]
  # MNIST images are 28x28 pixels, and have one color channel
  x = tf.reshape(features['x'], shape=[-1, 28, 28, 1])

  # Convolutional Layer #1
  # Computes 32 features using a 5x5 filter with ReLU activation.
  # Padding is added to preserve width and height.
  # Input Tensor Shape: [batch_size, 28, 28, 1]
  # Output Tensor Shape: [batch_size, 28, 28, 32]
  conv1 = K.layers.Conv2D(32, (5, 5), activation='relu',
                          input_shape=(28, 28, 1))(x)

  # Pooling Layer #1
  # First max pooling layer with a 2x2 filter and stride of 2
  # Input Tensor Shape: [batch_size, 28, 28, 32]
  # Output Tensor Shape: [batch_size, 14, 14, 32]
  pool1 = K.layers.MaxPooling2D(pool_size=(2, 2),
                                strides=2,
                                padding='same')(conv1)

  # Convolutional Layer #2
  # Computes 64 features using a 5x5 filter.
  # Padding is added to preserve width and height.
  # Input Tensor Shape: [batch_size, 14, 14, 32]
  # Output Tensor Shape: [batch_size, 14, 14, 64]
  conv2 = K.layers.Conv2D(64, (5, 5), activation='relu')(pool1)

  # Pooling Layer #2
  # Second max pooling layer with a 2x2 filter and stride of 2.
  # Input Tensor Shape: [batch_size, 14, 14, 64]
  # Output Tensor Shape: [batch_size, 7, 7, 64]
  pool2 = K.layers.MaxPooling2D(pool_size=(2, 2),
                                strides=2,
                                padding='same')(conv2)

  # Flatten tensor into a batch of vectors.
  # Input Tensor Shape: [batch_size, 7, 7, 64]
  # Output Tensor Shape: [batch_size, 7 * 7 * 64]
  flat = K.layers.Flatten()(pool2)

  # Dense Layer
  # Densely connected layer with 1024 neurons.
  # Input Tensor Shape: [batch_size, 7 * 7 * 64]
  # Output Tensor Shape: [batch_size, 1024]
  dense = K.layers.Dense(1024, activation='relu')(flat)

  # Logits layer
  # Input Tensor Shape: [batch_size, 1024]
  # Output Tensor Shape: [batch_size, 10]
  logits = K.layers.Dense(10, activation='softmax')(dense)

  predictions = {
      'classes': tf.argmax(input=logits, axis=1),
      'probabilities': tf.nn.softmax(logits)
  }

  train_op = None
  eval_metric_ops = None

  loss = tf.losses.softmax_cross_entropy(onehot_labels=labels, logits=logits)

  if mode == tf.estimator.ModeKeys.TRAIN:
    train_op = tf.contrib.layers.optimize_loss(
        loss=loss,
        global_step=tf.train.get_global_step(),
        learning_rate=params['learning_rate'],
        optimizer='Adam')

  if mode == tf.estimator.ModeKeys.EVAL:
    eval_metric_ops = {
        'accuracy': tf.metrics.accuracy(
            tf.argmax(input=logits, axis=1),
            tf.argmax(input=labels, axis=1))
    }

  return model_fn_lib.EstimatorSpec(mode=mode, train_op=train_op,
                                    predictions=predictions,
                                    loss=loss,
                                    eval_metric_ops=eval_metric_ops)

### Defining the input function

To feed the data to the Estimator model we need to create an input function. This means that the estimator doesn't know about data files, it knows about input functions.

You can learn more about input functions [here](https://www.tensorflow.org/get_started/input_fn)


In [None]:
BATCH_SIZE = 128

x_train_dict = {'x': x_train}
train_input_fn = numpy_io.numpy_input_fn(x_train_dict, y_train, batch_size=BATCH_SIZE, 
                                         shuffle=True, num_epochs=None,
                                        queue_capacity=1000, num_threads=4)

x_test_dict = {'x': x_test}
test_input_fn = numpy_io.numpy_input_fn(x_test_dict, y_test, batch_size=BATCH_SIZE,
                                        shuffle=False, num_epochs=1)


### Creating an experiment

An Experiment instance knows how to invoke training and eval loops in a sensible fashion for distributed training. More about it [here](https://www.tensorflow.org/api_docs/python/tf/contrib/learn/Experiment)

In [None]:
# parameters
LEARNING_RATE = 0.01
STEPS = 1000

# create experiment
def generate_experiment_fn():
  def _experiment_fn(run_config, hparams):
    del hparams  # unused, required by signature.
    # create estimator
    model_params = {"learning_rate": LEARNING_RATE}
    estimator = tf.estimator.Estimator(model_fn=m.get_model(), 
                                       params=model_params,
                                       config=run_config)

    train_input = train_input_fn
    test_input = test_input_fn
    
    return tf.contrib.learn.Experiment(
        estimator,
        train_input_fn=train_input,
        eval_input_fn=test_input,
        train_steps=STEPS
    )
  return _experiment_fn

### Run the experiment

In [None]:
OUTPUT_DIR='output/model_1'
learn_runner.run(generate_experiment_fn(), run_config=tf.contrib.learn.RunConfig(model_dir=OUTPUT_DIR))

## Running a second time

Okay, the model is definitely not good... But, check output_dir/model1, you'll see that this folder was created and that there are a lot of files there that were created automatically by TensorFlow!  

Most of these files are actually checkpoints, this means that **if we run the experiment again with the same model_dir it will just load the checkpoint and start from there instead of starting all over again!**

This means that:

- If we have a problem while training you can just restore from where you stopped instead of start all over again  
- If we didn't train enough we can just continue to train

**This is all true as long as you use the same model_dir!**

So, let's run again the experiment for more 1000 steps to see if we can improve the accuracy. So, notice that the first step in this run will actually be the step 1001. So, we need to change the number of steps to 2000 (otherwhise the experiment will find the checkpoint and will think it already finished training)

In [None]:
STEPS = STEPS + 1000
learn_runner.run(generate_experiment_fn(), run_config=tf.contrib.learn.RunConfig(model_dir=OUTPUT_DIR))

## Tensorboard

Another thing we get for free is TensorBoard. You can use TensorBoard to visualize your TensorFlow graph, plot quantitative metrics about the execution of your graph, and show additional data like images that pass through it. When TensorBoard is fully configured, it looks like this:

If you run: *tensorboard --logdir=output_dir/model1*

You'll see that we get the graph and some scalars, also if you use an [embedding layer](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/embed_sequence) you'll get an [embedding visualization](https://www.tensorflow.org/get_started/embedding_viz) in tensorboard as well!

So, we can make small changes and we'll have an easy (and totally for free) way to compare the models.

Let's make these changes:
1. change the learning rate to 0.05 
2. change the OUTPUT_DIR to some path in output_dir/

The 2. must be inside output_dir/ because we can run: *tensorboard --logdir=output_dir/*   
And we'll get both models visualized at the same time in tensorboard.

You'll notice that the model will start from step 1, because there's no existing checkpoint in this path.

In [None]:
LEARNING_RATE = 0.05
OUTPUT_DIR = 'output_dir/model2'
learn_runner.run(generate_experiment_fn(), run_config=tf.contrib.learn.RunConfig(model_dir=OUTPUT_DIR))