# Tensorflow Estimators
Estimators are a high-level wrapper which takes care of many previously tedious things of a Tenserflow implementation. They take care of seperating your model from "training-only" parts of the graph, making it alot cleaner, start queue runners, switching inputs and model modes and much more. I highly reccomend taking the time and get familiar with using TF Estimators.

<img style="float: left" width=500 src="./data/estimators.png">

To create the estimators, we need to create the desired architecture, the model method and the input method.<br>
Note, here a model means the model for a specific task, not the architecture. Thus the same model method can be used for multiple architectures.

## 1. Architecture
Let us start by creating a simple architecture to use. This arcitecture will take in a tensor (batch of images) and output a 2 channel tensor, one for each pixel class. This simple architecture is going to need a de-convolution layer, which, according to this https://distill.pub/2016/deconv-checkerboard/ should be done in a specific way to achieve good results.

In [3]:
import tensorflow as tf

def deconv2d_resize(inputs,
                    filters,
                    kernel_size=(2, 2),
                    padding='SAME',
                    strides=(2, 2),
                    reuse=None,
                    name=None,
                    activation=None):
    """Resize input using nearest neighbor then apply convolution.

    Parameters
    ----------
    inputs : tensor
        The input tensor to this operation
    filters : int
        Number of filters of the conv operation
    kernel_size : tuple, optional
        The kernel size to use
    padding : str, optional
        Padding strategy
    strides : tuple, optional
        How many steps the resize operation should take, the strides
        control how big the output tensor is
    reuse : None, optional
        Variable to control if the generated weights should be reused from somewhere else
    name : None, optional
        Desired name of this op
    activation : None, optional
        Desired activation function

    Returns
    -------
    tensor
        The output tensor that has been resized and convolved
    """
    shape = inputs.get_shape().as_list()
    height = shape[1] * strides[0]
    width = shape[2] * strides[1]
    resized = tf.image.resize_nearest_neighbor(inputs, [height, width])

    return tf.layers.conv2d(resized, filters,
                            kernel_size=(3, 3),
                            padding='SAME',
                            strides=(1, 1),
                            reuse=reuse,
                            name=name,
                            activation=activation)

Now that that is finished, we can define the architecture. When creating this architecture, the inputs are generally yours to control (except for the input tensor of course). But if you are using things like dropout, batchnormalization etc, you need to include the "mode" to determine which mode they should be in.

In [4]:
def simple(features, mode, hparams, scope='simple_network'):
    """Returns a simple network architecture.

    conv[5,5,32] -> Dense -> Dropout -> deconv[5,5,2]

    Parameters
    ----------
    features : Tensor
        4D Tensor where the first dimension is the batch size, then height, width
        and channels
    mode : tensorflow.python.estimator.model_fn.ModeKeys
        Class that contains the current mode
    scope : str, optional
        The scope to use for this architecture

    Returns
    -------
    Tensor op
        Return the final tensor operation (logits), from the network
    """
    with tf.variable_scope(scope):
        is_training = mode == Modes.TRAIN

        # Input Layer
        net = [features]
        # Convolutional Layer #1
        net.append(tf.layers.conv2d(inputs=net[-1],
                                    filters=32,
                                    kernel_size=[5, 5],
                                    strides=(2, 2),
                                    padding="same",
                                    name="conv_1_1",
                                    activation=tf.nn.relu))

        # Fully connected layer
        net.append(tf.layers.dense(inputs=net[-1], units=16, activation=tf.nn.relu))
        
        # Batch normaliation
        net.append(tf.layers.batch_normalization(net[-1], training=is_training))

        # Dropout
        net.append(tf.layers.dropout(inputs=net[-1], rate=0.4, training=is_training))
        # Deconv
        net.append(deconv2d_resize(inputs=net[-1],
                                   filters=2,
                                   kernel_size=[5, 5],
                                   padding="same",
                                   activation=tf.nn.relu))

        return net[-1]

Lets make sure that if we put in an image of size (B, H, W, C) the output will become (B, H, W, 2) where B is the batch-size.

In [5]:
from tensorflow.python.estimator.model_fn import ModeKeys as Modes

# Lets reset our graph, get a clean slate. This way we can run this cell multiple times..
tf.reset_default_graph()

input_size = (5, 100, 100, 3)
expected_output = (5, 100, 100, 2)
input_tensor = tf.placeholder(shape=input_size, dtype=tf.float32)
output_tensor = simple(input_tensor, "train", [])

assert output_tensor.get_shape() == expected_output

Now that the architecture is ready, we move on to the model function

## 2. Model function
The model will take care of creating the "outer layer" of our architecture, that is, how the input and output are handled. It also creates the necessary nodes for training, evaluating, predicting.

In [14]:
def model_fn(features, labels, mode, params):
    """Creates the model function.

    This will handle all the different processes needed when using an Estimator.
    The estimator will change "modes" using the mode flag, and depending on that
    different outputs are provided.

    Parameters
    ----------
    features : Tensor
        4D Tensor where the first dimension is the batch size, then height, width
        and channels
    labels : Dict {'label': Tensor, 'weight': Tensor}
        Contains both weight and label, where each is a 3D Tensor, where the first dimension is
        the batch size, then height and width. The values in the label image is class number, while
        weight is a weight map for the pixels
    mode : tensorflow.python.estimator.model_fn.ModeKeys
        Class that contains the current mode
    params : class
        Contains all the hyper parameters that are available to the model. These can be different
        depending on which architecture (model type) is in use

    Returns
    -------
    tf.estimator.EstimatorSpec
        The requested estimator spec
    """

    # Fetch the input tensor
    feature_input = features['inputs']

    # Logits Layer
    logits = simple(feature_input, mode, params)

    # If this is a prediction or evaluation mode, then we return
    # the class probabilities and the guessed pixel class
    if mode in (Modes.TRAIN, Modes.EVAL, Modes.PREDICT):
        probabilities = tf.nn.softmax(logits, name='softmax_tensor')
        predicted_pixels = tf.argmax(input=logits, axis=-1)

    # During training and evaluation, we calculate the loss
    if mode in (Modes.TRAIN, Modes.EVAL):
        # The global step is needed into the optimizer
        global_step = tf.train.get_or_create_global_step()
        label_indices = tf.cast(labels['label'], tf.int32)
        softmax = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels['label'],
                                                                 logits=logits)
        weighted_softmax = tf.multiply(softmax, labels['weight'])
        # If the weights had any L2 losses, we would collect them like this
        reg_loss = tf.reduce_sum(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES))
        loss = tf.reduce_sum(weighted_softmax) + reg_loss

    # In training (not evaluation) we perform backprop
    if mode == Modes.TRAIN:
        optimizer = tf.train.AdamOptimizer(learning_rate=params.learning_rate)

        # For batch normalization, we need to tie the "Update operations" to the
        # calling of training_op. This updates the moving mean/variance of the 
        # batchnorm when the training op is called. This is important!!
        update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
        with tf.control_dependencies(update_ops):
            train_op = optimizer.minimize(loss, global_step=global_step)

        return tf.estimator.EstimatorSpec(
            mode,
            loss=loss,
            train_op=train_op)

    # For evaluations, we generally just state which metric ops to use. In this
    # case, the mean intersection over union is of interest
    if mode == Modes.EVAL:
        # Accuracy operations
        eval_metric_ops = {
            'accuracy': tf.metrics.mean_iou(label_indices, predicted_pixels, 2)
        }

        return tf.estimator.EstimatorSpec(
            mode,
            loss=loss,
            eval_metric_ops=eval_metric_ops)

      
    # When predicting (running inference only, during serving for example) we
    # need to return the output as a dictionary.
    if mode == Modes.PREDICT:
        predictions = {
            'classes': predicted_pixels,
            'probabilities': probabilities
        }
        export_outputs = {
            'prediction': tf.estimator.export.PredictOutput(predictions)
        }
        return tf.estimator.EstimatorSpec(
            mode, predictions=predictions, export_outputs=export_outputs)


Notice how the model function is split up into train, evaluation and prediction. The estimator will indicated to the model function (via mode) which state its trying to create. This way there are no "evaluation" operations on the graph when training and vice versa. There is no speed penalty to having this many if statements, since these operations only create the graph __creation__ but don't actually run the operations.<br>

As before, lets make sure the method works as expected. Note, this time the input is not directly a tensor, rather a dictionary of tensors. The reason is that if we ever intend on serving this model somewhere, it would have to receive a dictionary to parse.

In [17]:
from tensorflow.python.estimator.model_fn import ModeKeys as Modes
from tensorflow.contrib.training import HParams

# Modes to loop through
test_modes = [Modes.TRAIN, Modes.EVAL, Modes.PREDICT]
    
for imode in test_modes:
  
  # Lets reset our graph, get a clean slate. This way we can run this cell multiple times..
  tf.reset_default_graph()

  input_size = (5, 100, 100, 3)
  label_size = (5, 100, 100)
  weight_size = (5, 100, 100)
  expected_output = (5, 100, 100, 2)
  input_tensor = tf.placeholder(shape=input_size, dtype=tf.float32)
  label_tensor = tf.placeholder(shape=label_size, dtype=tf.int32)
  weight_tensor = tf.placeholder(shape=weight_size, dtype=tf.float32)

  feature_dict = {
    'inputs': input_tensor
  }
  label_dict = {
    'label': label_tensor,
    'weight': weight_tensor
  }


  # Define model and input parameters
  hparams = HParams(
      learning_rate=0.001
  )

  response = model_fn(feature_dict, label_dict, imode, hparams)
  print(imode, response)

('train', EstimatorSpec(mode='train', predictions={}, loss=<tf.Tensor 'add:0' shape=() dtype=float32>, train_op=<tf.Operation 'Adam' type=AssignAdd>, eval_metric_ops={}, export_outputs=None, training_chief_hooks=(), training_hooks=(), scaffold=<tensorflow.python.training.monitored_session.Scaffold object at 0x7fc57ad6c250>, evaluation_hooks=()))
('eval', EstimatorSpec(mode='eval', predictions={}, loss=<tf.Tensor 'add:0' shape=() dtype=float32>, train_op=None, eval_metric_ops={'accuracy': (<tf.Tensor 'mean_iou/Select_1:0' shape=() dtype=float32>, <tf.Tensor 'mean_iou/AssignAdd:0' shape=(2, 2) dtype=float64_ref>)}, export_outputs=None, training_chief_hooks=(), training_hooks=(), scaffold=<tensorflow.python.training.monitored_session.Scaffold object at 0x7fc57ae4ead0>, evaluation_hooks=()))
('infer', EstimatorSpec(mode='infer', predictions={'probabilities': <tf.Tensor 'softmax_tensor:0' shape=(5, 100, 100, 2) dtype=float32>, 'classes': <tf.Tensor 'ArgMax:0' shape=(5, 100, 100) dtype=int64

Looks good, no run errors found.

# 3. Input method

We have already covered the "feeder" in WP2. This is actually the input method to our estimator, the only thing that matters is that the input when evaluating should not be the same as when running training.