[View in Colaboratory](https://colab.research.google.com/github/mdasadul/MNIST/blob/master/tensorflow_serve_rnn_prod.ipynb)

I am interested to do following


*   Train a RNN model
*   Save the model
*   Export the model
*   Serve it to production using Tensorflow 



Let's load necessary libraries

In [0]:
from __future__ import print_function
import os
import numpy as np
import shutil

import tensorflow as tf
from tensorflow.contrib import rnn

# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data


We are downloading data into a temporary location and we are converting each example of our dataset into one_hot vector. To classify images using a recurrent neural network, we consider every image row as a sequence of pixels. Because MNIST image shape is 28*28px, we will then
handle 28 sequences of 28 steps for every sample.


In [49]:
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)


Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz


We are setting up training parameters and network parameters. We are keeping learning rate low and making sure the model is not going to overfit training data.

In [0]:
learning_rate = 0.001
training_steps = 1000
batch_size = 128
display_step = 200

# Network Parameters
num_input = 28 # MNIST data input (img shape: 28*28)
timesteps = 28 # timesteps
num_hidden = 128 # hidden layer num of features
num_classes = 10 # MNIST total classes (0-9 digits)

It is very common in tensorflow to use placeholder and use feed_dict to feed data during training and inference. We are naming input to "Input_X" so that we can use it in future when we will be saving the graph and also during inferencing.

In [0]:
# Training Parameters

output_layer = ['output_layer/add']

# tf Graph input
X = tf.placeholder("float", [None, timesteps, num_input],name ='Input_X')
Y = tf.placeholder("float", [None, num_classes])

# Define weights
weights = {
    'out': tf.Variable(tf.random_normal([num_hidden, num_classes]))
}
biases = {
    'out': tf.Variable(tf.random_normal([num_classes]))}


We are defining our simple RNN model here. We atart by preparing data shape to match `rnn` function requirements. Then we are creating LSTM cell with  BasicLSTMCell. after that we are applying that on the input we prepared for. We did create those cell within a variable scope and set auto_reuse=true to reuse the module.  Finally we are calculating logit by applying linear activation. In the inference graph we can use this linear activation as output layer and apply softmax during inference for output. If we would like to use this linear activation layer as output we should put them into scope and assign name to the logit opearion. 

In [0]:
def RNN(x, weights, biases):

    # Prepare data shape to match `rnn` function requirements
    # Current data input shape: (batch_size, timesteps, n_input)
    # Required shape: 'timesteps' tensors list of shape (batch_size, n_input)

    x = tf.unstack(x, timesteps, 1)

    # Define a lstm cell 
    with tf.variable_scope("rnn", reuse=tf.AUTO_REUSE):
      lstm_cell = rnn.BasicLSTMCell(num_hidden, forget_bias=1.0)
      
      outputs, _ = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)

    # Linear activation, using rnn inner loop last output
    with tf.name_scope('output_layer'):
        logit = tf.add(tf.matmul(outputs[-1], weights['out']) , biases['out'],name ='add')
    return logit

After getting logits we will apply softmax operation for prediction. An alternative to the linear activation layer in the model we can use the softmax layer as our output layer. Inorder to use this as a output layer we should provide a name for future use.

In [0]:
logits = RNN(X, weights, biases)
prediction = tf.nn.softmax(logits,name='y_')


Next step is very typical to use a loss function on our case softmax_cross_entropy_with_logits_v2 to comapre the predicted output with the actual label and use an optimizer( Adam, SGD, RMSProp etc) to minimize the loss.  We are also calculating correct prediction and finally accuracy 

In [0]:
# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(
    logits=logits, labels=Y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)

# Evaluate model
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

This is one of the important step were we are initializing variables including weights and biases. We are getting input and output tensor by their name from the graph. One important details here that we defined input variable as "Input_X" and output_layer as "output_layer/add" but we are adding ":0" at the end of their name. The name itself in the graph are appeared as operation and if we would like to use them as tensor we have to add ":0" at the end of the operation.

In [0]:
# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()
output_tensor = tf.get_default_graph().get_tensor_by_name("output_layer/add:0")
input_tensor = tf.get_default_graph().get_tensor_by_name("Input_X:0")



Finally we are defining Saver to save the model as a checkpint. We can load these checkpint for retraining and inference puspose. But the resulted graph has many operation which are not necessary for inferencing and the size of the save model is quite large as a result inferencing will be somewhat 20/30 % slower. We defined input and output tensor above and we will use these once we are done with training to expot frozen model for inferencing purpose.

We are starting training by creating new ssession. Ater running the varibles initializer  we are starting the trainig loop upto predefined number of steps. We are iterating through training data and feed them to the model batch by batch and do the optimization to minimize the loss we defined above. Once in a while (disply step) we will print the loss and accuracy of the training to make sure that the loss is going downword and accuracy is upword. After finishing the training loop we will save it so that we can use it for future retraining. 

In [0]:
saver = tf.train.Saver()

# start training
with tf.Session() as sess:

    # Run the initializer
    sess.run(init)

    for step in range(1, training_steps+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Reshape data to get 28 seq of 28 elements
        batch_x = batch_x.reshape((batch_size, timesteps, num_input))
        # Run optimization op (backprop)
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
        if step % display_step == 0 or step == 1:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([loss_op, accuracy], feed_dict={X: batch_x,
                                                                 Y: batch_y})
            print("Step " + str(step) + ", Minibatch Loss= " + \
                  "{:.4f}".format(loss) + ", Training Accuracy= " + \
                  "{:.3f}".format(acc))
    #for op in tf.get_default_graph().get_operations():
    #    if output_layer[0] in op.name:
    #            print(op.name)
    print("Optimization Finished!")
    saver.save(sess,'./model.ckpt')

Once we are done with training we will select a subset of test data and do the prediction and make sure that test accuracy is not far off from the train accuracy.

In [0]:
   # Calculate accuracy for 128 mnist test images
    test_len = 128
    test_data = mnist.test.images[:test_len].reshape((-1, timesteps, num_input))
    test_label = mnist.test.labels[:test_len]
    print("Testing prediction:", \
        sess.run(prediction, feed_dict={X: test_data}))

    print("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={X: test_data, Y: test_label}))
    

Finally we will export the trained model for inference. We will SavedModelBuilder for exporting.
SavedModelBuilder will create the directory if doesn't exists. It is possible to add model version here but for this demo example we will remove previously saved model before saving another one. __tensor_info_x__ and __tensor_info_y__ are protocol buffer defined by using SavedModel API.

In [0]:

    # Export the model for prediction
    export_path =  './exportmodel'
    # removing previously exported model
    shutil.rmtree(export_path)
    builder = tf.saved_model.builder.SavedModelBuilder(export_path)

    
    tensor_info_x = tf.saved_model.utils.build_tensor_info(input_tensor)
    tensor_info_y = tf.saved_model.utils.build_tensor_info(output_tensor)


We are defining the signature which is useful for prediction. The signature defination should be build by using key, value stracture. We are naming the key of the input as "x_input" which is protocol buffer for input_X and th output as "y_output" which is the protocol buffer for the tensor for logit.




In [0]:
   prediction_signature = (
         tf.saved_model.signature_def_utils.build_signature_def(
         inputs={'x_input': tensor_info_x},
          outputs={'y_output': tensor_info_y},
         method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME))

Finally we are adding meta graph and variables such as Input_X and logit to the builder by using SavedModelBuilder.add_meta_graph_and_variables() with the following arguments:


*   sess: The rensorflow session that holds the trained model
*   tags: The set of tags with which to save the meta graph. In this case, since we intend to use the graph in serving, we use the serve tag from predefined SavedModel tag constants. 
*   signature_def_map: The map of user-supplied key for a signature to a tensorflow::SignatureDef to add to the meta graph. Signature specifies what type of model is being exported, and the input/output tensors to bind to when running inference.



In [0]:
 
    builder.add_meta_graph_and_variables(
       sess, [tf.saved_model.tag_constants.SERVING],
       signature_def_map={
      tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
          prediction_signature
       },
      )
    builder.save()


Step 1, Minibatch Loss= 2.3642, Training Accuracy= 0.102
Step 200, Minibatch Loss= 2.0342, Training Accuracy= 0.383
Step 400, Minibatch Loss= 1.8062, Training Accuracy= 0.469
Step 600, Minibatch Loss= 1.7246, Training Accuracy= 0.469
Step 800, Minibatch Loss= 1.6795, Training Accuracy= 0.500
Step 1000, Minibatch Loss= 1.6217, Training Accuracy= 0.500
output_layer/add
gradients/output_layer/add_grad/Shape
gradients/output_layer/add_grad/Shape_1
gradients/output_layer/add_grad/BroadcastGradientArgs
gradients/output_layer/add_grad/Sum
gradients/output_layer/add_grad/Reshape
gradients/output_layer/add_grad/Sum_1
gradients/output_layer/add_grad/Reshape_1
gradients/output_layer/add_grad/tuple/group_deps
gradients/output_layer/add_grad/tuple/control_dependency
gradients/output_layer/add_grad/tuple/control_dependency_1
Optimization Finished!
Testing prediction: [[0.00468715 0.0743486  0.00581111 ... 0.5039283  0.06872662 0.12138908]
 [0.08121999 0.01353763 0.29797387 ... 0.01006361 0.18041241 

We exported the model in export model directory and we are ready to do inference. Inferencing can be done few different ways. We can use standard python API or we can use google RPC along with Tensorflow Serving API. We are desiging our inferencing technique which can be used by both technique with little modification. 

We are starting by creating a session. At first we get the default signature defination key. After that we assign the key which we used to save input and output of the model. After that we load the model from where we exported by using tensorflow SavedModel API as a meta graph. After that we extract the signature so that we can extract input and output tensor by name. Finally we get the tensor from the session by thier name and assign input of the graph as x and logit as y. Now we are reay for inferening. 



In [0]:
sess=tf.Session() 
signature_key = tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY
input_key = 'x_input'
output_key = 'y_output'

export_path =  './exportmodel'
meta_graph_def = tf.saved_model.loader.load(
           sess,
          [tf.saved_model.tag_constants.SERVING],
          export_path)
signature =  meta_graph_def.signature_def

x_tensor_name = signature[signature_key].inputs[input_key].name
y_tensor_name = signature[signature_key].outputs[output_key].name

x = sess.graph.get_tensor_by_name(x_tensor_name)
y = sess.graph.get_tensor_by_name(y_tensor_name)
# Import MNIST data

Now we can get some test data and send the data to the model one by one. I am showing inference for one particular example if you need to do more than one it is possible to achieve that by using a loop

In [0]:


from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
batch_x= mnist.test.images[:1].reshape(-1,28,28)
# Reshape data to get 28 seq of 28 elements
batch_y = mnist.test.labels[:1][0]
prediction = sess.run([y], feed_dict={x: batch_x})
print(np.argmax(batch_y))
print(np.argmax(prediction))
correct_pred = tf.equal(np.argmax(prediction), np.argmax(batch_y))
print(sess.run(correct_pred))