# Tensorflow serving performance

## HTTP Urls

We've deployed the sample models from the `models` folders. To reach them we can use one of the following url structures:

* `/v1/models/<model name>/versions/<version number>`
* `/v1/models/<model name>/labels/<version label>`

For example, these urls are equivalent:

* `http://model-server:8501/v1/models/conv_model/versions/2`
* `http://model-server:8501/v1/models/conv_model/labels/latest`

To check that the model is deployed and working, we can directly curl one of those:

In [None]:
!curl http://model-server:8501/v1/models/flat_model/versions/1

## Data 

In [None]:
import os
import json
import requests
import tensorflow as tf
import tensorflow_datasets as tfds

In [None]:
images = tfds.load(
    'fashion_mnist',
    split='test',
    shuffle_files=True,
    as_supervised=True,
    with_info=False,
)

In [None]:
@tf.function
def normalize_img(image, label):
  """Normalizes images: `uint8` -> `float32`."""
  return tf.cast(image, tf.float32) / 255., label

In [None]:
images = images.map(
    normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
images = images.cache()
images = images.batch(256)
images = images.prefetch(tf.data.experimental.AUTOTUNE)

In [None]:
for images_batch, labels in images.take(1):
    print(images_batch.shape)

In [None]:
test_images = images_batch.numpy()

## Requesting

In [None]:
data = json.dumps({"signature_name": "serving_default", "instances": test_images.tolist()})
print('Data: {}...'.format(data[:80]))

In [None]:
headers = {"content-type": "application/json"}

In [None]:
urls = [
    'http://model-server:8501/v1/models/conv_model/labels/stable:predict',
    'http://model-server:8501/v1/models/conv_model/labels/latest:predict',
    'http://model-server:8501/v1/models/flat_model/versions/1:predict'
]

In [None]:
%%timeit -n10 -r 10
json_response = requests.post(
    urls[0],
    data=data,
    headers=headers
)