# Serving a Resnet TF Estimator Model

**Scenario:** An ML researcher has trained a Resnet model on the Imagenet dataset using Tensorflow's Estimator API, located at https://github.com/tensorflow/models/tree/v1.4.0/official/resnet. (Note that we used v1.4.0. You always want to use a stable tag for a model version to deploy as the researcher can continue to modify the model and architecture at the head of master.) Our task is to deploy this model into Tensorflow Serving. You have access to his python code as well as a saved state (checkpoint) that points to his favorite trained result.

The first step is to create a servable version of the model that will be used for Tensorflow Serving, which runs very efficiently in C++, and is platform independent (can run on different OSes, as well as hardware with different types of accelerators such as GPUs).


# Preamble

Import the required libraries.

In [0]:
import os
import tensorflow as tf

# Download model checkpoint

The next step is to load the researcher's saved checkpoint into our estimator. We will download it from
http://download.tensorflow.org/models/official/resnet50_2017_11_30.tar.gz using the commands below.


In [0]:
# Define a constant indicating the number of layers in our loaded model. We're loading a resnet-50 model.
RESNET_SIZE = 50  

# Model and serving directories
MODEL_DIR="resnet_model_checkpoints"
SERVING_DIR="resnet_servable"

In [0]:
import urllib.request

urllib.request.urlretrieve("http://download.tensorflow.org/models/official/resnet50_2017_11_30.tar.gz ", "resnet.tar.gz")

In [0]:
#unzip the file into a directory called resnet
from subprocess import call
call(["mkdir", MODEL_DIR])
call(["tar", "-zxvf", "resnet.tar.gz", "-C", MODEL_DIR])

In [0]:
# Make sure you see model checkpoint files in this directory
os.listdir(MODEL_DIR)

 # Constants and Functions
 
In order to reconstruct the Resnet neural network used to train the Imagenet model, we need to load the architecture pieces. If you were working in the original repository under the python library structure, you can run imports on your python libraries (e.g. import resnet) to access constants and functions in ttps://github.com/tensorflow/models/blob/v1.4.0/official/resnet/imagenet_main.py and https://github.com/tensorflow/models/blob/v1.4.0/official/resnet/resnet.py. However, since we are building the servable model using a notebook for better step-by-step instruction, we need to copy over the constants and functions used by the model.

In [0]:
# TODO: Add constants defining elements of the network here.
# Hint 1: See the top of https://github.com/tensorflow/models/blob/master/official/resnet/imagenet_main.py
# Hint 2: You do not need to copy all constants, such as those pertaining to training and validation.


# TODO: Add parameters for batch normalization
# Hint 1: See https://github.com/tensorflow/models/blob/master/official/resnet/resnet.py



For convenience, we have copied the functions from the files.
Note that we omitted some functions (cifar-related functions) as these are not relevant to imagenet.

In [0]:
def batch_norm_relu(inputs, is_training, data_format):
  """Performs a batch normalization followed by a ReLU."""
  # We set fused=True for a significant performance boost. See
  # https://www.tensorflow.org/performance/performance_guide#common_fused_ops
  inputs = tf.layers.batch_normalization(
      inputs=inputs, axis=1 if data_format == 'channels_first' else 3,
      momentum=_BATCH_NORM_DECAY, epsilon=_BATCH_NORM_EPSILON, center=True,
      scale=True, training=is_training, fused=True)
  inputs = tf.nn.relu(inputs)
  return inputs


def fixed_padding(inputs, kernel_size, data_format):
  """Pads the input along the spatial dimensions independently of input size.
  Args:
    inputs: A tensor of size [batch, channels, height_in, width_in] or
      [batch, height_in, width_in, channels] depending on data_format.
    kernel_size: The kernel to be used in the conv2d or max_pool2d operation.
                 Should be a positive integer.
    data_format: The input format ('channels_last' or 'channels_first').
  Returns:
    A tensor with the same format as the input with the data either intact
    (if kernel_size == 1) or padded (if kernel_size > 1).
  """
  pad_total = kernel_size - 1
  pad_beg = pad_total // 2
  pad_end = pad_total - pad_beg

  if data_format == 'channels_first':
    padded_inputs = tf.pad(inputs, [[0, 0], [0, 0],
                                    [pad_beg, pad_end], [pad_beg, pad_end]])
  else:
    padded_inputs = tf.pad(inputs, [[0, 0], [pad_beg, pad_end],
                                    [pad_beg, pad_end], [0, 0]])
  return padded_inputs


def conv2d_fixed_padding(inputs, filters, kernel_size, strides, data_format):
  """Strided 2-D convolution with explicit padding."""
  # The padding is consistent and is based only on `kernel_size`, not on the
  # dimensions of `inputs` (as opposed to using `tf.layers.conv2d` alone).
  if strides > 1:
    inputs = fixed_padding(inputs, kernel_size, data_format)

  return tf.layers.conv2d(
      inputs=inputs, filters=filters, kernel_size=kernel_size, strides=strides,
      padding=('SAME' if strides == 1 else 'VALID'), use_bias=False,
      kernel_initializer=tf.variance_scaling_initializer(),
      data_format=data_format)


def building_block(inputs, filters, is_training, projection_shortcut, strides,
                   data_format):
  """Standard building block for residual networks with BN before convolutions.
  Args:
    inputs: A tensor of size [batch, channels, height_in, width_in] or
      [batch, height_in, width_in, channels] depending on data_format.
    filters: The number of filters for the convolutions.
    is_training: A Boolean for whether the model is in training or inference
      mode. Needed for batch normalization.
    projection_shortcut: The function to use for projection shortcuts (typically
      a 1x1 convolution when downsampling the input).
    strides: The block's stride. If greater than 1, this block will ultimately
      downsample the input.
    data_format: The input format ('channels_last' or 'channels_first').
  Returns:
    The output tensor of the block.
  """
  shortcut = inputs
  inputs = batch_norm_relu(inputs, is_training, data_format)

  # The projection shortcut should come after the first batch norm and ReLU
  # since it performs a 1x1 convolution.
  if projection_shortcut is not None:
    shortcut = projection_shortcut(inputs)

  inputs = conv2d_fixed_padding(
      inputs=inputs, filters=filters, kernel_size=3, strides=strides,
      data_format=data_format)

  inputs = batch_norm_relu(inputs, is_training, data_format)
  inputs = conv2d_fixed_padding(
      inputs=inputs, filters=filters, kernel_size=3, strides=1,
      data_format=data_format)

  return inputs + shortcut


def bottleneck_block(inputs, filters, is_training, projection_shortcut,
                     strides, data_format):
  """Bottleneck block variant for residual networks with BN before convolutions.
  Args:
    inputs: A tensor of size [batch, channels, height_in, width_in] or
      [batch, height_in, width_in, channels] depending on data_format.
    filters: The number of filters for the first two convolutions. Note that the
      third and final convolution will use 4 times as many filters.
    is_training: A Boolean for whether the model is in training or inference
      mode. Needed for batch normalization.
    projection_shortcut: The function to use for projection shortcuts (typically
      a 1x1 convolution when downsampling the input).
    strides: The block's stride. If greater than 1, this block will ultimately
      downsample the input.
    data_format: The input format ('channels_last' or 'channels_first').
  Returns:
    The output tensor of the block.
  """
  shortcut = inputs
  inputs = batch_norm_relu(inputs, is_training, data_format)

  # The projection shortcut should come after the first batch norm and ReLU
  # since it performs a 1x1 convolution.
  if projection_shortcut is not None:
    shortcut = projection_shortcut(inputs)

  inputs = conv2d_fixed_padding(
      inputs=inputs, filters=filters, kernel_size=1, strides=1,
      data_format=data_format)

  inputs = batch_norm_relu(inputs, is_training, data_format)
  inputs = conv2d_fixed_padding(
      inputs=inputs, filters=filters, kernel_size=3, strides=strides,
      data_format=data_format)

  inputs = batch_norm_relu(inputs, is_training, data_format)
  inputs = conv2d_fixed_padding(
      inputs=inputs, filters=4 * filters, kernel_size=1, strides=1,
      data_format=data_format)

  return inputs + shortcut


def block_layer(inputs, filters, block_fn, blocks, strides, is_training, name,
                data_format):
  """Creates one layer of blocks for the ResNet model.
  Args:
    inputs: A tensor of size [batch, channels, height_in, width_in] or
      [batch, height_in, width_in, channels] depending on data_format.
    filters: The number of filters for the first convolution of the layer.
    block_fn: The block to use within the model, either `building_block` or
      `bottleneck_block`.
    blocks: The number of blocks contained in the layer.
    strides: The stride to use for the first convolution of the layer. If
      greater than 1, this layer will ultimately downsample the input.
    is_training: Either True or False, whether we are currently training the
      model. Needed for batch norm.
    name: A string name for the tensor output of the block layer.
    data_format: The input format ('channels_last' or 'channels_first').
  Returns:
    The output tensor of the block layer.
  """
  # Bottleneck blocks end with 4x the number of filters as they start with
  filters_out = 4 * filters if block_fn is bottleneck_block else filters

  def projection_shortcut(inputs):
    return conv2d_fixed_padding(
        inputs=inputs, filters=filters_out, kernel_size=1, strides=strides,
        data_format=data_format)

  # Only the first block per block_layer uses projection_shortcut and strides
  inputs = block_fn(inputs, filters, is_training, projection_shortcut, strides,
                    data_format)

  for _ in range(1, blocks):
    inputs = block_fn(inputs, filters, is_training, None, 1, data_format)

  return tf.identity(inputs, name)


def imagenet_resnet_v2_generator(block_fn, layers, num_classes,
                                 data_format=None):
  """Generator for ImageNet ResNet v2 models.
  Args:
    block_fn: The block to use within the model, either `building_block` or
      `bottleneck_block`.
    layers: A length-4 array denoting the number of blocks to include in each
      layer. Each layer consists of blocks that take inputs of the same size.
    num_classes: The number of possible classes for image classification.
    data_format: The input format ('channels_last', 'channels_first', or None).
      If set to None, the format is dependent on whether a GPU is available.
  Returns:
    The model function that takes in `inputs` and `is_training` and
    returns the output tensor of the ResNet model.
  """
  if data_format is None:
    data_format = (
        'channels_first' if tf.test.is_built_with_cuda() else 'channels_last')

  def model(inputs, is_training):
    """Constructs the ResNet model given the inputs."""
    if data_format == 'channels_first':
      # Convert the inputs from channels_last (NHWC) to channels_first (NCHW).
      # This provides a large performance boost on GPU. See
      # https://www.tensorflow.org/performance/performance_guide#data_formats
      inputs = tf.transpose(inputs, [0, 3, 1, 2])

    inputs = conv2d_fixed_padding(
        inputs=inputs, filters=64, kernel_size=7, strides=2,
        data_format=data_format)
    inputs = tf.identity(inputs, 'initial_conv')
    inputs = tf.layers.max_pooling2d(
        inputs=inputs, pool_size=3, strides=2, padding='SAME',
        data_format=data_format)
    inputs = tf.identity(inputs, 'initial_max_pool')

    inputs = block_layer(
        inputs=inputs, filters=64, block_fn=block_fn, blocks=layers[0],
        strides=1, is_training=is_training, name='block_layer1',
        data_format=data_format)
    inputs = block_layer(
        inputs=inputs, filters=128, block_fn=block_fn, blocks=layers[1],
        strides=2, is_training=is_training, name='block_layer2',
        data_format=data_format)
    inputs = block_layer(
        inputs=inputs, filters=256, block_fn=block_fn, blocks=layers[2],
        strides=2, is_training=is_training, name='block_layer3',
        data_format=data_format)
    inputs = block_layer(
        inputs=inputs, filters=512, block_fn=block_fn, blocks=layers[3],
        strides=2, is_training=is_training, name='block_layer4',
        data_format=data_format)

    inputs = batch_norm_relu(inputs, is_training, data_format)
    inputs = tf.layers.average_pooling2d(
        inputs=inputs, pool_size=7, strides=1, padding='VALID',
        data_format=data_format)
    inputs = tf.identity(inputs, 'final_avg_pool')
    inputs = tf.reshape(inputs,
                        [-1, 512 if block_fn is building_block else 2048])
    inputs = tf.layers.dense(inputs=inputs, units=num_classes)
    inputs = tf.identity(inputs, 'final_dense')
    return inputs

  return model


def imagenet_resnet_v2(resnet_size, num_classes, data_format=None):
  """Returns the ResNet model for a given size and number of output classes."""
  model_params = {
      18: {'block': building_block, 'layers': [2, 2, 2, 2]},
      34: {'block': building_block, 'layers': [3, 4, 6, 3]},
      50: {'block': bottleneck_block, 'layers': [3, 4, 6, 3]},
      101: {'block': bottleneck_block, 'layers': [3, 4, 23, 3]},
      152: {'block': bottleneck_block, 'layers': [3, 8, 36, 3]},
      200: {'block': bottleneck_block, 'layers': [3, 24, 36, 3]}
  }

  if resnet_size not in model_params:
    raise ValueError('Not a valid resnet_size:', resnet_size)

  params = model_params[resnet_size]
  return imagenet_resnet_v2_generator(
      params['block'], params['layers'], num_classes, data_format)

# Preprocessing JPEG images into 3D Tensors

In order to reduce network usage, the client that we've provided (resnet_client.py) will resize the images appropriately (usually reducing the size) to 224x224x3, and then use jpeg encoding to create a string.

In this section, we will implement a function that will read in a tensor of JPEG-encoded images and
predict their classes and probabilities. The Resnet model requires a 4d tensor of dimensions (x, 224, 224, 3),
where x is variable length, and the others correspond to height, width, and channels. Consequently,
we will need a utility function to preprocess each image into a 3d tensor of size (224, 224, 3),
and concatenate them together into a 4d tensor.

Below is a preprocessing function for converting a jpeg to a properly sized 3d tensor, normalizing pixel values, and padding outer regions with zeros until it is properly sized. Note that the last step is unnecessary given that our client already resizes the images appropriately, but we keep it here for completeness.

In [0]:
def preprocess_image(encoded_image, height=_DEFAULT_IMAGE_SIZE, width=_DEFAULT_IMAGE_SIZE):
  """Preprocesses the image by subtracting out the mean from all channels.
  Args:
    image: A jpeg-formatted byte stream represented as a string.
  Returns:
    A 3d tensor of image pixels normalized to be between -0.5 and 0.5, resized to height x width x 3.
    The normalization is an approximation of the preprocess_for_train and preprocess_for_eval functions in
    https://github.com/tensorflow/models/blob/v1.4.0/official/resnet/vgg_preprocessing.py.
  """
  image = tf.image.decode_jpeg(encoded_image, channels=3)
  image = tf.to_float(image) / 255.0 - 0.5
  image = tf.image.resize_image_with_crop_or_pad(image, height, width)
  return image

# Resnet Model Function

The Tensorflor Estimator API requires a model function that returns an EstimatorSpec object describing what should be done when a training, evaluation, or prediction step is called. For serving, you only care about the prediction mode, so you can shortcut parts of the graph used for training and evaluation.

TODO: Starting with resnet_model_fn() in https://github.com/tensorflow/models/blob/v1.4.0/official/resnet/imagenet_main.py, this function takes in a `features` argument that is a 4D tensor. Instead, we want to start with `features` that is a dictionary containing one key `'images'`, which stores a tensor of JPEG-encoded strings.

TODO: Complete the exercises below to modify and simplify resnet_model_fn() appropriately. Notice that some of the shortcuts are already commented out.

In [0]:
TOP_K = 5

def resnet_model_fn(features, labels, mode):
  """Our model_fn for ResNet to be used with our Estimator."""
  # TODO: Remove the summary as this is used during training/evaluation.
  tf.summary.image('images', features, max_outputs=6)

  # NOTE: New ops to convert tensor of jpegs to a 4d tensor.
  images = features['images']  # A tensor of tf.strings
  processed_images = tf.map_fn(preprocess_image, images, dtype=tf.float32)  # Convert to a list of tensors
  processed_images = tf.stack(processed_images)  # Convert list of tensors to tensor of tensors
  processed_images = tf.reshape(tensor=processed_images,  # Reshape to ensure TF graph knows the final dimensions
                                shape=[-1, _DEFAULT_IMAGE_SIZE, _DEFAULT_IMAGE_SIZE, 3])

  # TODO: Modify this line so this function works! The network must be IDENTICAL to the one used to train.
  network = imagenet_resnet_v2(params['resnet_size'], params['data_format'])

  # NOTE: is_training will be false since we are predicting.
  logits = network(
      inputs=processed_images, is_training=(mode == tf.estimator.ModeKeys.TRAIN))

  # NOTE: Instead of the top 1 result, we can now return top k to the client!
  top_k_logits, top_k_classes = tf.nn.top_k(logits, k=TOP_K)
  top_k_probs = tf.nn.softmax(top_k_logits)
  predictions = {
      'classes': top_k_classes,
      'probabilities': top_k_probs
  }

  if mode == tf.estimator.ModeKeys.PREDICT:
    return tf.estimator.EstimatorSpec(
        mode=mode,
        predictions=predictions,  # This will not be used in serving, but must be provided anyway.
        # TODO: To export the predictions dictionary above using Tf serving,
        # you will need to assign the export_outputs parameter in EstimatorSpec.
        # Add a dictionary with entry corresponding to the request.model_spec.signature_name that
        # your client will call in `client/resnet_client.py`
        export_outputs=  # TODO: Add entry here.
    )

  # TODO: Shortcut everything below here by returning a minimal EstimatorSpec.
    
  # Calculate loss, which includes softmax cross entropy and L2 regularization.
  cross_entropy = tf.losses.softmax_cross_entropy(
      logits=logits, onehot_labels=labels)

  # Create a tensor named cross_entropy for logging purposes.
  tf.identity(cross_entropy, name='cross_entropy')
  tf.summary.scalar('cross_entropy', cross_entropy)

  # Add weight decay to the loss. We exclude the batch norm variables because
  # doing so leads to a small improvement in accuracy.
  loss = cross_entropy + _WEIGHT_DECAY * tf.add_n(
      [tf.nn.l2_loss(v) for v in tf.trainable_variables()
       if 'batch_normalization' not in v.name])

  if mode == tf.estimator.ModeKeys.TRAIN:
    # Scale the learning rate linearly with the batch size. When the batch size
    # is 256, the learning rate should be 0.1.
    initial_learning_rate = 0.1 * params['batch_size'] / 256
    batches_per_epoch = _NUM_IMAGES['train'] / params['batch_size']
    global_step = tf.train.get_or_create_global_step()

    # Multiply the learning rate by 0.1 at 30, 60, 80, and 90 epochs.
    boundaries = [
        int(batches_per_epoch * epoch) for epoch in [30, 60, 80, 90]]
    values = [
        initial_learning_rate * decay for decay in [1, 0.1, 0.01, 1e-3, 1e-4]]
    learning_rate = tf.train.piecewise_constant(
        tf.cast(global_step, tf.int32), boundaries, values)

    # Create a tensor named learning_rate for logging purposes.
    tf.identity(learning_rate, name='learning_rate')
    tf.summary.scalar('learning_rate', learning_rate)

    optimizer = tf.train.MomentumOptimizer(
        learning_rate=learning_rate,
        momentum=_MOMENTUM)

    # Batch norm requires update_ops to be added as a train_op dependency.
    update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
    with tf.control_dependencies(update_ops):
      train_op = optimizer.minimize(loss, global_step)
  else:
    train_op = None

  accuracy = tf.metrics.accuracy(
      tf.argmax(labels, axis=1), predictions['classes'])
  metrics = {'accuracy': accuracy}

  # Create a tensor named train_accuracy for logging purposes.
  tf.identity(accuracy[1], name='train_accuracy')
  tf.summary.scalar('train_accuracy', accuracy[1])

  return tf.estimator.EstimatorSpec(
      mode=mode,
      predictions=predictions,
      loss=loss,
      train_op=train_op,
      eval_metric_ops=metrics)

In [0]:
# Load this model into our estimator
estimator = tf.estimator.Estimator(
  model_fn=resnet_model_fn,  # Call our generate_model_fn to create model function
  model_dir=MODEL_DIR,  # Where to look for model checkpoints
)

# Serving input receiver function

Tensorflow serving requires a serving input receiver function to be defined, which determines the format of the input from the client into the model function. The servable model will be expecting a protobuf from the client containing an 'images' field holding a list of JPEG-encoded strings.

In [0]:
def serving_input_receiver_fn():
  return tf.estimator.export.build_raw_serving_input_receiver_fn(
      {'images': tf.placeholder(dtype=tf.string, shape=[None])})()

In [0]:
estimator.export_savedmodel(export_dir_base=SERVING_DIR,
                            serving_input_receiver_fn=serving_input_receiver_fn)