# Verifying workable TF Serving

This tutorial shows:
- how to run TF Serving for a custom model in Docker container
- how to request for predictions via both gRPC and RestAPI calls
- the prediction timing result from TF Serving

### Imports

In [None]:
!pip install -q requests
!pip install -q tensorflow-serving-api

In [1]:
import os
import tempfile
import pandas as pd
import tensorflow as tf
import numpy as np
import json
import requests

# gRPC request specific imports
import grpc
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc

## Model

### Get a sample model 

In [2]:
core = tf.keras.applications.ResNet50(include_top=True, input_shape=(224, 224, 3))

inputs = tf.keras.layers.Input(shape=(224, 224, 3), name="image_input")
preprocess = tf.keras.applications.resnet50.preprocess_input(inputs)
outputs = core(preprocess, training=False)
model = tf.keras.Model(inputs=[inputs], outputs=[outputs])

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels.h5


### Save the model

In [4]:
MODEL_DIR = tempfile.gettempdir()
version = 1
export_path = os.path.join(MODEL_DIR, str(version))
print('export_path = {}\n'.format(export_path))

tf.keras.models.save_model(
    model,
    export_path,
    overwrite=True,
    include_optimizer=True,
    save_format=None,
    signatures=None,
    options=None
)

print('\nSaved model:')
!ls -l {export_path}

export_path = /tmp/1

INFO:tensorflow:Assets written to: /tmp/1/assets

Saved model:
total 4040
drwxr-xr-x 2 root root    4096 Mar 23 01:25 assets
-rw-r--r-- 1 root root  557217 Mar 23 01:25 keras_metadata.pb
-rw-r--r-- 1 root root 3565545 Mar 23 01:25 saved_model.pb
drwxr-xr-x 2 root root    4096 Mar 23 01:25 variables


### Examine your saved model

Notice from `signature_def['serving_default']:` 
- the input name is `image_input`
- the output name is `resnet50`

You need to know these to make requests to the TF Serving server later

In [5]:
!saved_model_cli show --dir {export_path} --all


MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['image_input'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 224, 224, 3)
        name: serving_default_image_input:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['resnet50'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1000)
        name: StatefulPartitionedCall:0
  Method name is: tensorflow/serving/predict

Concrete Functions:
  Function Name: '__call__'
    Option #1
      Callable with:
        Argument #1
          inpu

## TF Serving

### Create dummy data

In [68]:
dummy_inputs = tf.random.normal((32, 224, 224, 3))
dummy_inputs.shape

TensorShape([32, 224, 224, 3])

### Install TF Serving tool

In [None]:
!echo "deb http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | {SUDO_IF_NEEDED} tee /etc/apt/sources.list.d/tensorflow-serving.list && \
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | {SUDO_IF_NEEDED} apt-key add -
!apt update

In [None]:
!apt-get install tensorflow-model-server

### Run TF Serving server

In [9]:
os.environ["MODEL_DIR"] = MODEL_DIR

In [10]:
!nohup tensorflow_model_server \
  --rest_api_port=8501 \
  --model_name=resnet_model \
  --model_base_path=$MODEL_DIR >server.log 2>&1 &

# --enable_model_warmup for warmup(https://www.tensorflow.org/tfx/serving/saved_model_warmup)

In [46]:
!cat server.log

[warn] getaddrinfo: address family for nodename not supported
[evhttp_server.cc : 245] NET_LOG: Entering the event loop ...


In [47]:
!sudo lsof -i -P -n | grep LISTEN

node         7 root   21u  IPv6  25571      0t0  TCP *:8080 (LISTEN)
colab-fil   29 root    5u  IPv6  25979      0t0  TCP *:3453 (LISTEN)
colab-fil   29 root    6u  IPv4  25980      0t0  TCP *:3453 (LISTEN)
jupyter-n   42 root    6u  IPv4  26715      0t0  TCP 172.28.0.2:9000 (LISTEN)
python3     60 root   15u  IPv4  29022      0t0  TCP 127.0.0.1:39987 (LISTEN)
python3     60 root   18u  IPv4  29026      0t0  TCP 127.0.0.1:36549 (LISTEN)
python3     60 root   21u  IPv4  29030      0t0  TCP 127.0.0.1:50817 (LISTEN)
python3     60 root   24u  IPv4  29034      0t0  TCP 127.0.0.1:53273 (LISTEN)
python3     60 root   30u  IPv4  29040      0t0  TCP 127.0.0.1:36879 (LISTEN)
python3     60 root   43u  IPv4  28245      0t0  TCP 127.0.0.1:33513 (LISTEN)
python3     80 root    3u  IPv4  29514      0t0  TCP 127.0.0.1:21925 (LISTEN)
python3     80 root    5u  IPv4  28630      0t0  TCP 127.0.0.1:45169 (LISTEN)
python3     80 root    9u  IPv4  29828      0t0  TCP 127.0.0.1:44481 (LISTEN)
tensorflo 146

## RestAPI request

### Convert dummy data in JSON format

In [69]:
data = json.dumps({"signature_name": "serving_default", "instances": dummy_inputs.numpy().tolist()})
print('Data: {} ... {}'.format(data[:50], data[len(data)-52:]))

Data: {"signature_name": "serving_default", "instances": ... 87132263, -1.264892816543579, 1.081931710243225]]]]}


### Make a request

In [70]:
headers = {"content-type": "application/json"}

In [71]:
%%timeit
json_response = requests.post('http://localhost:8501/v1/models/resnet_model:predict', data=data, headers=headers)

1 loop, best of 5: 5.91 s per loop


### Interpret the output

In [96]:
predictions = json.loads(json_response.text)['predictions']
print('Prediction class: {}'.format(np.argmax(predictions, axis=-1)))

Prediction class: [851 664 664 664 664 664 664 664 664 664 664 664 664 664 664 664 851 664
 664 664 664 664 664 664 664 664 664 851 664 664 664 664]


## gRPC request

### Open up gRPC channel

In [39]:
channel = grpc.insecure_channel('localhost:8500')
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

### Prepare a request

In [99]:
request = predict_pb2.PredictRequest()
request.model_spec.name = 'resnet_model'
request.model_spec.signature_name = 'serving_default'
request.inputs['image_input'].CopyFrom(
    tf.make_tensor_proto(dummy_inputs)) #, shape=[32,224,224,3]))

### Make a request

In [100]:
%%timeit
result = stub.Predict(request, 10.0)  # 10 secs timeout

1 loop, best of 5: 5.05 s per loop


### Interpret the output

In [101]:
result_val = result.outputs['resnet50'].float_val
result_val = np.array(result_val).reshape(32, -1)
print('Prediction class: {}'.format(np.argmax(result_val, axis=-1)))

Prediction class: [664 664 664 664 664 664 664 664 664 664 664 664 664 664 664 851 664 664
 664 664 664 664 664 664 664 664 664 664 664 664 664 664]
