### *Before start: make sure you deleted the output_dir folder from this path*

# Some things we get for free by using Estimators

Estimators are a high level abstraction (Interface) that supports all the basic operations you need to support a ML model on top of TensorFlow.

Estimators:
  * provide a simple interface for users of canned model architectures: Training, evaluation, prediction, export for serving.
  * provide a standard interface for model developers
  * drastically reduces the amount of user code required. This avoids bugs and speeds up development significantly.
  * enable building production services against a standard interface.
  * using experiments abstraction give you free data-parallelism (more [here](https://github.com/mari-linhares/tensorflow-workshop/tree/master/code_samples/distributed_tensorflow))

In the Estimator's interface includes: Training, evaluation, prediction, export for serving.

Image from [Effective TensorFlow for Non-Experts (Google I/O '17)](https://www.youtube.com/watch?v=5DknTFbcGVM)
![imgs/estimators.png](imgs/estimators.png)

You can use a already implemented estimator (canned estimator) or implement your own (custom estimator).

This tutorial is not focused on how to build your own estimator, we're using a custom estimator that implements a [CNN classifier for MNIST dataset](https://www.tensorflow.org/get_started/mnist/pros) defined in the model.py file, but we're not going into details about how that's implemented.

Here we're going to show how Estimators make your life easier, once you have a estimator model is very simple to change your model and compare results.


## Having a look at the code and running the experiment

### Dependencies

In [26]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

# our model 
import model as m

# tensorflow
import tensorflow as tf 
print(tf.__version__) #tested with tf v1.2

from tensorflow.contrib import learn
from tensorflow.contrib.learn.python.learn import learn_runner
from tensorflow.python.estimator.inputs import numpy_io

# MNIST data
from tensorflow.examples.tutorials.mnist import input_data
# Numpy
import numpy as np

# Enable TensorFlow logs
tf.logging.set_verbosity(tf.logging.INFO)

1.2.0-rc1


### Getting the data

We're not going into details here

In [27]:
# Import the MNIST dataset
mnist = input_data.read_data_sets("/tmp/MNIST/", one_hot=True)

x_train = np.reshape(mnist.train.images, (-1, 28, 28, 1))
y_train = mnist.train.labels
x_test = np.reshape(mnist.test.images, (-1, 28, 28, 1))
y_test = mnist.test.labels

Extracting /tmp/MNIST/train-images-idx3-ubyte.gz
Extracting /tmp/MNIST/train-labels-idx1-ubyte.gz
Extracting /tmp/MNIST/t10k-images-idx3-ubyte.gz
Extracting /tmp/MNIST/t10k-labels-idx1-ubyte.gz


### Defining the input function

If we look at the image above we can see that there're two main parts in the diagram, a input function interacting with data files and the Estimator interacting with the input function and checkpoints.

This means that the estimator doesn't know about data files, it knows about input functions. So if we want to interact with a data set we need to creat an input function that interacts with it, in this example we are creating a input function for the train and test data set.

You can learn more about input functions [here](https://www.tensorflow.org/get_started/input_fn)


In [28]:
BATCH_SIZE = 128

x_train_dict = {'x': x_train }
train_input_fn = numpy_io.numpy_input_fn(
          x_train_dict, y_train, batch_size=BATCH_SIZE, 
           shuffle=True, num_epochs=None, 
            queue_capacity=1000, num_threads=4)

x_test_dict = {'x': x_test }
test_input_fn = numpy_io.numpy_input_fn(
          x_test_dict, y_test, batch_size=BATCH_SIZE, shuffle=False, num_epochs=1)


### Creating an experiment

After an experiment is created (by passing an Estimator and inputs for training and evaluation), an Experiment instance knows how to invoke training and eval loops in a sensible fashion for distributed training. More about it [here](https://www.tensorflow.org/api_docs/python/tf/contrib/learn/Experiment)

In [29]:
# parameters
LEARNING_RATE = 0.01
STEPS = 1000

# create experiment
def generate_experiment_fn():
  def _experiment_fn(run_config, hparams):
    del hparams  # unused, required by signature.
    # create estimator
    model_params = {"learning_rate": LEARNING_RATE}
    estimator = tf.estimator.Estimator(model_fn=m.get_model(), 
                                       params=model_params,
                                       config=run_config)

    train_input = train_input_fn
    test_input = test_input_fn
    
    return tf.contrib.learn.Experiment(
        estimator,
        train_input_fn=train_input,
        eval_input_fn=test_input,
        train_steps=STEPS
    )
  return _experiment_fn

### Run the experiment

In [30]:
OUTPUT_DIR = 'output_dir/model1'
learn_runner.run(generate_experiment_fn(), run_config=tf.contrib.learn.RunConfig(model_dir=OUTPUT_DIR))

INFO:tensorflow:Using config: {'_model_dir': 'output_dir/model1', '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fb7002c8c88>, '_master': '', '_save_checkpoints_secs': 600, '_save_checkpoints_steps': None, '_evaluation_master': '', '_save_summary_steps': 100, '_task_id': 0, '_task_type': None, '_session_config': None, '_keep_checkpoint_max': 5, '_environment': 'local', '_keep_checkpoint_every_n_hours': 10000, '_tf_random_seed': None, '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 0, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1
}
}
Instructions for updating:
Monitors are deprecated. Please use tf.train.SessionRunHook.
INFO:tensorflow:Create CheckpointSaverHook.
Type is unsupported, or the types of the items don't match field type in CollectionDef.
'dict' object has no attribute 'name'
Type is unsupported, or the types of the items don't match field type in CollectionDef.
'dict' object has no attribute 'name'
INFO:t

INFO:tensorflow:Evaluation [41/100]
INFO:tensorflow:Evaluation [42/100]
INFO:tensorflow:Evaluation [43/100]
INFO:tensorflow:Evaluation [44/100]
INFO:tensorflow:Evaluation [45/100]
INFO:tensorflow:Evaluation [46/100]
INFO:tensorflow:Evaluation [47/100]
INFO:tensorflow:Evaluation [48/100]
INFO:tensorflow:Evaluation [49/100]
INFO:tensorflow:Evaluation [50/100]
INFO:tensorflow:Evaluation [51/100]
INFO:tensorflow:Evaluation [52/100]
INFO:tensorflow:Evaluation [53/100]
INFO:tensorflow:Evaluation [54/100]
INFO:tensorflow:Evaluation [55/100]
INFO:tensorflow:Evaluation [56/100]
INFO:tensorflow:Evaluation [57/100]
INFO:tensorflow:Evaluation [58/100]
INFO:tensorflow:Evaluation [59/100]
INFO:tensorflow:Evaluation [60/100]
INFO:tensorflow:Evaluation [61/100]
INFO:tensorflow:Evaluation [62/100]
INFO:tensorflow:Evaluation [63/100]
INFO:tensorflow:Evaluation [64/100]
INFO:tensorflow:Evaluation [65/100]
INFO:tensorflow:Evaluation [66/100]
INFO:tensorflow:Evaluation [67/100]
INFO:tensorflow:Evaluation [

({'accuracy': 0.59549999, 'global_step': 1000, 'loss': 2.0297089}, [])

## Running a second time

Okay, the model is definitely not good... But, check OUTPUT_DIR path, you'll see that a output_dir folder was created and that there are a lot of files there that were created automatically by TensorFlow!  

So, most of these files are actually checkpoints, this means that **if we run the experiment again with the same model_dir it will just load the checkpoint and start from there instead of starting all over again!**

This means that:

- If we have a problem while training you can just restore from where you stopped instead of start all over again  
- If we didn't train enough we can just continue to train  
- If you have a big file you can just break it into small files and train for a while with each small file and the model will continue from where it stopped at each time :) 

**This is all true as long as you use the same model_dir!**

So, let's run again the experiment for more 1000 steps to see if we can improve the accuracy. So, notice that the first step in this run will actually be the step 1001. So, we need to change the number of steps to 2000 (otherwhise the experiment will find the checkpoint and will think it already finished training)

In [31]:
STEPS = STEPS + 1000
learn_runner.run(generate_experiment_fn(), run_config=tf.contrib.learn.RunConfig(model_dir=OUTPUT_DIR))

INFO:tensorflow:Using config: {'_model_dir': 'output_dir/model1', '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fb7261d7978>, '_master': '', '_save_checkpoints_secs': 600, '_save_checkpoints_steps': None, '_evaluation_master': '', '_save_summary_steps': 100, '_task_id': 0, '_task_type': None, '_session_config': None, '_keep_checkpoint_max': 5, '_environment': 'local', '_keep_checkpoint_every_n_hours': 10000, '_tf_random_seed': None, '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 0, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1
}
}
Instructions for updating:
Monitors are deprecated. Please use tf.train.SessionRunHook.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from output_dir/model1/model.ckpt-1000
Type is unsupported, or the types of the items don't match field type in CollectionDef.
'dict' object has no attribute 'name'
INFO:tensorflow:Saving checkpoints for 1001 into output_

INFO:tensorflow:Evaluation [49/100]
INFO:tensorflow:Evaluation [50/100]
INFO:tensorflow:Evaluation [51/100]
INFO:tensorflow:Evaluation [52/100]
INFO:tensorflow:Evaluation [53/100]
INFO:tensorflow:Evaluation [54/100]
INFO:tensorflow:Evaluation [55/100]
INFO:tensorflow:Evaluation [56/100]
INFO:tensorflow:Evaluation [57/100]
INFO:tensorflow:Evaluation [58/100]
INFO:tensorflow:Evaluation [59/100]
INFO:tensorflow:Evaluation [60/100]
INFO:tensorflow:Evaluation [61/100]
INFO:tensorflow:Evaluation [62/100]
INFO:tensorflow:Evaluation [63/100]
INFO:tensorflow:Evaluation [64/100]
INFO:tensorflow:Evaluation [65/100]
INFO:tensorflow:Evaluation [66/100]
INFO:tensorflow:Evaluation [67/100]
INFO:tensorflow:Evaluation [68/100]
INFO:tensorflow:Evaluation [69/100]
INFO:tensorflow:Evaluation [70/100]
INFO:tensorflow:Evaluation [71/100]
INFO:tensorflow:Evaluation [72/100]
INFO:tensorflow:Evaluation [73/100]
INFO:tensorflow:Evaluation [74/100]
INFO:tensorflow:Evaluation [75/100]
INFO:tensorflow:Evaluation [

({'accuracy': 0.82950002, 'global_step': 2000, 'loss': 1.6523409}, [])

## Tensorboard

Another thing we get for free is tensorboard. 

If you run: *tensorboard --logdir=OUTPUT_DIR*

You'll see that we get the graph and some scalars, also if you use an [embedding layer](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/embed_sequence) you'll get an [embedding visualization](https://www.tensorflow.org/get_started/embedding_viz) in tensorboard as well!

So, we can make small changes and we'll have an easy (and totally for free) way to compare the models.

Let's make these changes:
1. change the learning rate to 0.05 
2. change the OUTPUT_DIR to some path in output_dir/ 

The 2. is must be inside output_dir/ because we can run: *tensorboard --logdir=output_dir/*   
And we'll get both models visualized at the same time in tensorboard.

You'll notice that the model will start from step 1, because there's no existing checkpoint in this path.



In [33]:
LEARNING_RATE = 0.05
OUTPUT_DIR = 'output_dir/model2'
learn_runner.run(generate_experiment_fn(), run_config=tf.contrib.learn.RunConfig(model_dir=OUTPUT_DIR))

INFO:tensorflow:Using config: {'_model_dir': 'output_dir/model2', '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fb6801ffe80>, '_master': '', '_save_checkpoints_secs': 600, '_save_checkpoints_steps': None, '_evaluation_master': '', '_save_summary_steps': 100, '_task_id': 0, '_task_type': None, '_session_config': None, '_keep_checkpoint_max': 5, '_environment': 'local', '_keep_checkpoint_every_n_hours': 10000, '_tf_random_seed': None, '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 0, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1
}
}
Instructions for updating:
Monitors are deprecated. Please use tf.train.SessionRunHook.
INFO:tensorflow:Create CheckpointSaverHook.
Type is unsupported, or the types of the items don't match field type in CollectionDef.
'dict' object has no attribute 'name'
INFO:tensorflow:Saving checkpoints for 1 into output_dir/model2/model.ckpt.
Type is unsupported, or the types of the items don't mat

INFO:tensorflow:Evaluation [25/100]
INFO:tensorflow:Evaluation [26/100]
INFO:tensorflow:Evaluation [27/100]
INFO:tensorflow:Evaluation [28/100]
INFO:tensorflow:Evaluation [29/100]
INFO:tensorflow:Evaluation [30/100]
INFO:tensorflow:Evaluation [31/100]
INFO:tensorflow:Evaluation [32/100]
INFO:tensorflow:Evaluation [33/100]
INFO:tensorflow:Evaluation [34/100]
INFO:tensorflow:Evaluation [35/100]
INFO:tensorflow:Evaluation [36/100]
INFO:tensorflow:Evaluation [37/100]
INFO:tensorflow:Evaluation [38/100]
INFO:tensorflow:Evaluation [39/100]
INFO:tensorflow:Evaluation [40/100]
INFO:tensorflow:Evaluation [41/100]
INFO:tensorflow:Evaluation [42/100]
INFO:tensorflow:Evaluation [43/100]
INFO:tensorflow:Evaluation [44/100]
INFO:tensorflow:Evaluation [45/100]
INFO:tensorflow:Evaluation [46/100]
INFO:tensorflow:Evaluation [47/100]
INFO:tensorflow:Evaluation [48/100]
INFO:tensorflow:Evaluation [49/100]
INFO:tensorflow:Evaluation [50/100]
INFO:tensorflow:Evaluation [51/100]
INFO:tensorflow:Evaluation [

({'accuracy': 0.95529997, 'global_step': 2000, 'loss': 1.5128938}, [])

If you run tensorboard how it's described above, you'll have something similar to the images bellow:

![graph](imgs/graph.png)
![scalar](imgs/scalars.png)
