# 4. Model Translation and Deployment: Serve an ONNX Model via GraphPipe

Also see the [official TensorFlow Serving overview](https://www.tensorflow.org/serving/overview)

In this section, I will show how to serve our trained and exported TensorFlow model through TensorFlow Serving that is an integral part of TensorFlow. We will run TF Serving within a Docker container to separate environments and to facilitate easy horizontal scaling via Kubernetes afterwards.

Make sure to have the following technologies available on your machine by following [these](https://www.tensorflow.org/serving/setup) instructions:
* [Docker](https://docs.docker.com/install/linux/docker-ce/ubuntu/#install-docker-ce)
* [Using TensorFlow Serving via docker](https://www.tensorflow.org/serving/docker)
* Pull the TensorFlow docker image: ```docker pull tensorflow/serving```

![TensorFlow Serving Architecture](img/tf_serving_architecture.svg)

First, let's have a look at the different **components of TF Serving**:
1. **Servables**
    * underlying objects that may encompass part of a model, a model or many models, a servable has a version
    * clients can either request the latest or some specific version of a servable
    * they allow for more than one version to be loaded concurrently
    * enables gradual rollout of models 
    * models are represented as one or more servables
    
2. **Loaders**
    * manage a servable's *lifecycle*
        * load a servable
        * unload a servable
        
3. **Sources**
    * plugin modules
        * find and provide servables
    * *aspired version*: servable version that should be loaded and ready
    
4. **Manager**
    * handles the full lifecycle of servables (loading, serving, unloading)
    * listens to sources and tracks versions
    * on client request returns a handle to servable
    
5. **Core**
    * manages lifecycle and metrics of servables
    * treats servables as opaque objects
    
    
*How does the model serving process work?*

Running a model server, we tell the manager to what source to listen, e.g. a file system. Saving a trained TensorFlow model to that file system triggers the manager to recognize it as a new servable. The servable becomes an aspired version. A loader instance is made ready to load the model and tells the manager the amount of resources required for loading. The manager handles resources and decides when to let the loader load the model. Depending on the configuration it may unload an older version of that servable or keep it. After successful loading the manager can starts serving client requests returning handles to the very new servable or other, if explicitly requested.

**Enough Theory, let's turn that knowledge into practice!**

In [1]:
import json
import os
import pickle
import shutil

In [2]:
import numpy as np
import requests
import onnx
from onnx_tf.backend import prepare
import tensorflow as tf
from tensorflow.python.saved_model.utils import build_tensor_info
from tensorflow.python.saved_model import signature_def_utils
from tensorflow.python.saved_model import signature_constants

In [3]:
from emnist_dl2prod.utils import eval_serving_performance

### Load ONNX model

In [4]:
onnx_model_path = '../models/dnn_model_pt.onnx'
dnn_model_onnx = onnx.load(onnx_model_path)
dnn_model_tf = prepare(dnn_model_onnx, device='cpu')

In [5]:
export_path = '../models/tf_emnist/'
if os.path.exists(export_path):
    shutil.rmtree(export_path)

model_version = 1
model_name = 'tf_emnist'
model_path = os.path.join('..', 'models', model_name, str(model_version))
builder = tf.saved_model.builder.SavedModelBuilder(model_path)

### What does our ONNX-TF model comprise?

In [6]:
print("External Input: {}".format(dnn_model_tf.predict_net.external_input))
print("External Output: {}".format(dnn_model_tf.predict_net.external_output))
dnn_model_tf.predict_net.tensor_dict

External Input: ['flattened_rescaled_img_28x28']
External Output: ['softmax_probabilities']


{'weight_1': <tf.Tensor 'Const:0' shape=(512, 784) dtype=float32>,
 'bias_1': <tf.Tensor 'Const_1:0' shape=(512,) dtype=float32>,
 'weight_2': <tf.Tensor 'Const_2:0' shape=(256, 512) dtype=float32>,
 'bias_2': <tf.Tensor 'Const_3:0' shape=(256,) dtype=float32>,
 'weight_3': <tf.Tensor 'Const_4:0' shape=(62, 256) dtype=float32>,
 'bias_3': <tf.Tensor 'Const_5:0' shape=(62,) dtype=float32>,
 'flattened_rescaled_img_28x28': <tf.Tensor 'flattened_rescaled_img_28x28:0' shape=(1, 784) dtype=float32>,
 '7': <tf.Tensor 'add:0' shape=(1, 512) dtype=float32>,
 '8': <tf.Tensor 'add_1:0' shape=(1, 512) dtype=float32>,
 '9': <tf.Tensor 'add_2:0' shape=(1, 256) dtype=float32>,
 '10': <tf.Tensor 'add_3:0' shape=(1, 256) dtype=float32>,
 '11': <tf.Tensor 'add_4:0' shape=(1, 62) dtype=float32>,
 'softmax_probabilities': <tf.Tensor 'Softmax:0' shape=(1, 62) dtype=float32>}

## Signature Building and Adding them to the Model
(this was already done by GraphPipe before)

also see [SignatureDefs in SavedModel for TensorFlow Serving](https://www.tensorflow.org/serving/signature_defs)

### 1. Obtain TensorInfo objects for Model Input and Output Tensor

In [7]:
input_tensor = dnn_model_tf.predict_net.graph.get_tensor_by_name(
                            'flattened_rescaled_img_28x28:0')
output_tensor = dnn_model_tf.predict_net.graph.get_tensor_by_name('Softmax:0')

input_tensor_info = build_tensor_info(input_tensor)
output_tensor_info = build_tensor_info(output_tensor)

In [8]:
input_tensor_info

name: "flattened_rescaled_img_28x28:0"
dtype: DT_FLOAT
tensor_shape {
  dim {
    size: 1
  }
  dim {
    size: 784
  }
}

In [9]:
output_tensor_info

name: "Softmax:0"
dtype: DT_FLOAT
tensor_shape {
  dim {
    size: 1
  }
  dim {
    size: 62
  }
}

### 2. Build classification signature
As this did not work with just a sole classification signature, we try to combine it with an additional prediction signatures - as described in the [MNIST example]:(https://github.com/tensorflow/serving/blob/master/tensorflow_serving/example/mnist_saved_model.py)

In [10]:
classification_signature = (
    signature_def_utils.build_signature_def(
        inputs={
            signature_constants.CLASSIFY_INPUTS:
                input_tensor_info
        },
        outputs={
            signature_constants.CLASSIFY_OUTPUT_SCORES:
                output_tensor_info
        },
        method_name=signature_constants.CLASSIFY_METHOD_NAME))

In [11]:
classification_signature

inputs {
  key: "inputs"
  value {
    name: "flattened_rescaled_img_28x28:0"
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 1
      }
      dim {
        size: 784
      }
    }
  }
}
outputs {
  key: "scores"
  value {
    name: "Softmax:0"
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 1
      }
      dim {
        size: 62
      }
    }
  }
}
method_name: "tensorflow/serving/classify"

### 3. Build prediction signature

In [12]:
prediction_signature = (
    signature_def_utils.build_signature_def(
        inputs={'images': input_tensor_info},
        outputs={'scores': output_tensor_info},
        method_name=signature_constants.PREDICT_METHOD_NAME
        )
    )

In [13]:
prediction_signature

inputs {
  key: "images"
  value {
    name: "flattened_rescaled_img_28x28:0"
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 1
      }
      dim {
        size: 784
      }
    }
  }
}
outputs {
  key: "scores"
  value {
    name: "Softmax:0"
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 1
      }
      dim {
        size: 62
      }
    }
  }
}
method_name: "tensorflow/serving/predict"

### 3. Add Signatures to Model and Export

In [14]:
with dnn_model_tf.predict_net.graph.as_default():
    with tf.Session() as sess:
        builder.add_meta_graph_and_variables(
          sess, [tf.saved_model.tag_constants.SERVING],
          signature_def_map={
              'predict_images':
                  prediction_signature,
              signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
                  classification_signature,
          },
          main_op=tf.tables_initializer(),
          strip_default_attrs=True)
        builder.save()
        print("Done exporting!")

INFO:tensorflow:No assets to save.
[2018-10-18 11:15:22] INFO:tensorflow:No assets to save.
INFO:tensorflow:No assets to write.
[2018-10-18 11:15:22] INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: ../models/tf_emnist/1/saved_model.pb
[2018-10-18 11:15:22] INFO:tensorflow:SavedModel written to: ../models/tf_emnist/1/saved_model.pb
Done exporting!


## Set the model up running with Docker and use the REST API (instead of gRPC)

* follow [this](https://www.tensorflow.org/serving/docker) explanation on using TensorFlow Serving via Docker which basically contains two steps:


1. Install docker
2. Pull the TensorFlow Serving image: `docker pull tensorflow/serving`

* stop and remove eventually existing docker container with name `emnist_model`

```docker stop emnist_model```

```docker rm emnist_model```

* start docker container for port 8501 and our model:
```
docker run -p 8501:8501 \
--name emnist_model --mount type=bind,source=$(pwd)/../models/tf_emnist,target=/models/tf_emnist \
-e MODEL_NAME=tf_emnist -t tensorflow/serving:1.10.1 &
```

which executed the following within the container image:
```
tensorflow_model_server --port=8500 --rest_api_port=8501 \
  --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME}
```

default values:

* MODEL_BASE_PATH: models
* MODEL_NAME: model

### This leaves us with the followig code that looks promising:

2018-10-12 13:38:15.518130: I tensorflow_serving/model_servers/server.cc:82] Building single TensorFlow model file config:  model_name: tf_emnist model_base_path: /models/tf_emnist
2018-10-12 13:38:15.518416: I tensorflow_serving/model_servers/server_core.cc:462] Adding/updating models.
2018-10-12 13:38:15.518455: I tensorflow_serving/model_servers/server_core.cc:517]  (Re-)adding model: tf_emnist
2018-10-12 13:38:15.638251: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: tf_emnist version: 1}
2018-10-12 13:38:15.638370: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: tf_emnist version: 1}
2018-10-12 13:38:15.638411: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: tf_emnist version: 1}
2018-10-12 13:38:15.639975: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:360] Attempting to load native SavedModelBundle in bundle-shim from: /models/tf_emnist/1
2018-10-12 13:38:15.641451: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/tf_emnist/1
2018-10-12 13:38:15.659090: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2018-10-12 13:38:15.660035: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-10-12 13:38:15.672728: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:162] Restoring SavedModel bundle.
2018-10-12 13:38:15.673671: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:172] The specified SavedModel has no variables; no checkpoints were restored. File does not exist: /models/tf_emnist/1/variables/variables.index
2018-10-12 13:38:15.673710: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:138] Running MainOp with key saved_model_main_op on SavedModel bundle.
2018-10-12 13:38:15.677101: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:259] SavedModel load for tags { serve }; Status: success. Took 35653 microseconds.
2018-10-12 13:38:15.678135: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:83] No warmup data file found at /models/tf_emnist/1/assets.extra/tf_serving_warmup_requests
2018-10-12 13:38:15.684767: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: tf_emnist version: 1}
2018-10-12 13:38:15.686409: I tensorflow_serving/model_servers/server.cc:285] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2018-10-12 13:38:15.686843: I tensorflow_serving/model_servers/server.cc:301] Exporting HTTP/REST API at:localhost:8501 ...
[evhttp_server.cc : 235] RAW: Entering the event loop ...

## Test the Deployment

### Send a model status request
```GET http://host:port/v1/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]```

In [None]:
r = requests.get("http://localhost:8501/v1/models/tf_emnist")
print("HTTP Response Status Code: {}".format(r.status_code))
print(r.json())

Doesn't seem to work yet ```¯\_(ツ)_/¯````

### Send a test query with random data

In [None]:
random_img = np.random.uniform(size=(1,784))
data_payload = {"instances": random_img.tolist()}

In [None]:
r = requests.post("http://localhost:8501/v1/models/tf_emnist:predict",
                  data=json.dumps(data_payload))

In [None]:
print("HTTP Response Status Code: {}".format(r.status_code))
print("HTTP Response content as json:")
print(r.json())
print("Class Index Prediction: {}".format(np.argmax(r.json()['predictions'])))

### Test Prediction Accuracy using our test set samples

In [None]:
request_url = "http://localhost:8501/v1/models/tf_emnist:predict"
_ = eval_serving_performance(n_examples=1000, n_print_examples=10,
                             request_url=request_url, dataset='test')