In [1]:
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import sklearn
import sys
import tensorflow as tf
from tensorflow import keras
import time

In [2]:
assert sys.version_info >= (3, 5) # Python ≥3.5 required
assert tf.__version__ >= "2.0"    # TensorFlow ≥2.0 required

In [3]:
tf.__version__

'2.0.0-alpha0'

![](https://pbs.twimg.com/media/C4vf8SQUcAALCyl.jpg)

# Download fashion-MNIST data

And prepare train, valid, test datasets

In [4]:
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()
X_train_full = X_train_full / 255.
X_test = X_test / 255.
X_valid, X_train = X_train_full[:5000], X_train_full[5000:]
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


In [5]:
X_train = X_train.reshape(-1,28,28,1)
X_valid = X_valid.reshape(-1,28,28,1)
X_test = X_test.reshape(-1,28,28,1)

# Define and train the convolutional neural network for images classification

I define the small model as I don't have a GPU on my laptop and moreover test accuracy is not the issue in this notebook.

In [6]:
model = keras.models.Sequential([
    keras.layers.Conv2D(8, kernel_size=3, activation='relu', padding='same', input_shape=(28,28,1)),
    keras.layers.MaxPool2D(pool_size=(2, 2)),
    keras.layers.Conv2D(16, kernel_size=3, activation='relu', padding='same'),
    keras.layers.MaxPool2D(pool_size=(2, 2)),
    keras.layers.Conv2D(16, kernel_size=3, activation='relu', padding='same'),
    keras.layers.Flatten(),
    keras.layers.Dense(10, activation='softmax')
])


model.compile(loss="sparse_categorical_crossentropy", optimizer="sgd",
              metrics=["accuracy"])

In [7]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 28, 28, 8)         80        
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 8)         0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 14, 14, 16)        1168      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 7, 7, 16)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 7, 7, 16)          2320      
_________________________________________________________________
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 10)                7

In [8]:
model.fit(X_train, y_train, epochs=10, validation_data=(X_valid, y_valid))

Train on 55000 samples, validate on 5000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7face22f4a90>

In [91]:
model.evaluate(X_test, y_test)



[0.5676319009780884, 0.7908]

# Save the model

We trained our model and now we want to use it with TensorFlow serving. However before running the server, we have to save our model.

As we can use multiple model architecuters and train the same architecture multiple times, we have to name our model with its unique model version. However, newer models should have bigger versions numbers, as tf server by default runs the model with highest version.

In [30]:
all_models_path = 'models'
MODEL_NAME = "fashion_mnist_conv"

You can name your model with current timestamp. Then you will be sure, that newest version has the highest version number.

In [31]:
model_version = int(time.time())
model_path = os.path.join(all_models_path, MODEL_NAME, str(model_version))
os.makedirs(model_path)

In [32]:
model_version

1557354435

In tf 2.0 there is an easy way to [save](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/saved_model/save) the tf.keras.model.

In [33]:
tf.saved_model.save(model, model_path)

# CLI to inspect and execute SavedModel

You can use the [SavedModel Command Line Interface (CLI)](https://www.tensorflow.org/guide/saved_model#cli_to_inspect_and_execute_savedmodel) to inspect and execute a SavedModel. For example, you can use the CLI to inspect the model's SignatureDefs. The CLI enables you to quickly confirm that the input Tensor dtype and shape match the model. Moreover, if you want to test your model, you can use the CLI to do a sanity check by passing in sample inputs in various formats (for example, Python expressions) and then fetching the output.

## Overview of commands

The SavedModel CLI supports the following two commands on a MetaGraphDef in a SavedModel:

 - show, which shows a computation on a MetaGraphDef in a SavedModel.
 - run, which runs a computation on a MetaGraphDef.


### show command

A SavedModel contains one or more MetaGraphDefs, identified by their tag-sets. To serve a model, you might wonder what kind of SignatureDefs are in each model, and what are their inputs and outputs. The show command let you examine the contents of the SavedModel in hierarchical order. Here's the syntax:

```bash
saved_model_cli show [-h] --dir DIR [--all] [--tag_set TAG_SET] [--signature_def SIGNATURE_DEF_KEY]
```

**Try different saved_model_cli formulas**

In [13]:
!saved_model_cli show --dir {model_path}

The given SavedModel contains the following tag-sets:
serve


In [14]:
!saved_model_cli show --dir {model_path} --tag_set serve

The given SavedModel MetaGraphDef contains SignatureDefs with the following keys:
SignatureDef key: "__saved_model_init_op"
SignatureDef key: "serving_default"


In [15]:
!saved_model_cli show --dir {model_path} --tag_set serve --signature_def serving_default

The given SavedModel SignatureDef contains the following input(s):
  inputs['conv2d_input'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 28, 28, 1)
      name: serving_default_conv2d_input:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['dense'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 10)
      name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict


In [16]:
!saved_model_cli show --dir {model_path} --all


MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['conv2d_input'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 28, 28, 1)
        name: serving_default_conv2d_input:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['dense'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 10)
        name: StatefulPartitionedCall:0
  Method name is: tensorflow/serving/predict


### run command

Invoke the run command to run a graph computation, passing inputs and then displaying (and optionally saving) the outputs. Here's the syntax:

```bash
saved_model_cli run [-h] --dir DIR --tag_set TAG_SET --signature_def
                           SIGNATURE_DEF_KEY [--inputs INPUTS]
                           [--input_exprs INPUT_EXPRS]
                           [--input_examples INPUT_EXAMPLES] [--outdir OUTDIR]
                           [--overwrite] [--tf_debug]
```

The run command provides the following three ways to pass inputs to the model:

 - *inputs* option enables you to pass numpy ndarray in files.
 - *input_exprs* option enables you to pass Python expressions.
 - *input_examples* option enables you to pass tf.train.Example.

Here we will use the *inputs* option.

To pass input data in files, specify the --inputs option, which takes the following general format:

```bash
--inputs <input_key>=<filename>
```

**Input layer name**

In order to pass the testing data to our trained model, we have to know the name of its input layer and pass it to *saved_model_cli* as *input_key*.

In [18]:
input_name = model.input_names[0]
input_name

'conv2d_input'

**Prepare small testing dataset**

We want to test our model. Take 3 images from the tesing dataset, and [save it](https://docs.scipy.org/doc/numpy/reference/generated/numpy.save.html) as *saved_model_cli* takes the *filename* as argument.

In [57]:
X_query = X_test[:3]
y_query = y_test[:3]
np.save("exemplary_tests.npy", X_query, allow_pickle=False)

**saved_model_cli run**

Specify arguments and run testing data.

In [19]:
!saved_model_cli run --dir {model_path} --tag_set serve \
                     --signature_def serving_default    \
                     --inputs {input_name}=exemplary_tests.npy

2019-05-08 18:45:34.709105: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-05-08 18:45:34.730706: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2712000000 Hz
2019-05-08 18:45:34.731337: I tensorflow/compiler/xla/service/service.cc:162] XLA service 0x55bed1f3e110 executing computations on platform Host. Devices:
2019-05-08 18:45:34.731363: I tensorflow/compiler/xla/service/service.cc:169]   StreamExecutor device (0): <undefined>, <undefined>
W0508 18:45:34.732563 140588033390400 deprecation.py:323] From /home/lukasz.maziarka/anaconda3/envs/tf_serving/lib/python3.7/site-packages/tensorflow/python/tools/saved_model_cli.py:339: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model

# Prepare docker server with our trained model

To this end, one of the easiest ways to serve machine learning models is by using TensorFlow Serving with Docker. Docker is a tool that packages software into units called containers that include everything needed to run the software.

In the following subsection we will prepare the docker image that serves our model and try to get the classifications for testing data.

First, we have to run the docker with the proper image. We can do it in two steps.


1. Download the docker image
```bash
sudo docker pull tensorflow/serving
```

2. Run the image
```bash
sudo docker run -it --rm -p 8501:8501 \
   -v "`pwd`/models/fashion_mnist_conv:/models/fashion_mnist_conv" \
   -e MODEL_NAME=fashion_mnist_conv \
   tensorflow/serving
```

### REST API

TensorFlow ModelServer also supports [RESTful APIs](https://www.tensorflow.org/tfx/serving/api_rest).

The request and response is a JSON object. The composition of this object depends on the request type or verb. 

Below we will show how to use REST API, together with tf serving, and then make an example client that sends the test image to docker and gets the classification answer.

In [22]:
import json
import requests

#### [Model status API](https://www.tensorflow.org/tfx/serving/api_rest#model_status_api)

This API returns the status of a model in the ModelServer.


```bash
GET http://host:port/v1/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]
```

*/versions/${MODEL_VERSION}* is optional. If omitted status for **all** versions is returned in the response.

In [45]:
SERVER_URL = 'http://localhost:8501/v1/models/fashion_mnist_conv'

response = requests.get(SERVER_URL)
response.raise_for_status()
response = response.json()

response

{'model_version_status': [{'version': '1557354435',
   'state': 'AVAILABLE',
   'status': {'error_code': 'OK', 'error_message': ''}}]}

#### [Model Metadata API](https://www.tensorflow.org/tfx/serving/api_rest#model_metadata_api)

This API returns the metadata of a model in the ModelServer.

```bash
GET http://host:port/v1/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]/metadata
```

*/versions/${MODEL_VERSION}* is optional. If omitted the model metadata for the **latest** version is returned in the response.

In [46]:
SERVER_URL = 'http://localhost:8501/v1/models/fashion_mnist_conv/metadata'

response = requests.get(SERVER_URL)
response.raise_for_status()
response = response.json()

response

{'model_spec': {'name': 'fashion_mnist_conv',
  'signature_name': '',
  'version': '1557354435'},
 'metadata': {'signature_def': {'signature_def': {'serving_default': {'inputs': {'conv2d_input': {'dtype': 'DT_FLOAT',
       'tensor_shape': {'dim': [{'size': '-1', 'name': ''},
         {'size': '28', 'name': ''},
         {'size': '28', 'name': ''},
         {'size': '1', 'name': ''}],
        'unknown_rank': False},
       'name': 'serving_default_conv2d_input:0'}},
     'outputs': {'dense': {'dtype': 'DT_FLOAT',
       'tensor_shape': {'dim': [{'size': '-1', 'name': ''},
         {'size': '10', 'name': ''}],
        'unknown_rank': False},
       'name': 'StatefulPartitionedCall:0'}},
     'method_name': 'tensorflow/serving/predict'},
    '__saved_model_init_op': {'inputs': {},
     'outputs': {'__saved_model_init_op': {'dtype': 'DT_INVALID',
       'tensor_shape': {'dim': [], 'unknown_rank': True},
       'name': 'NoOp'}},
     'method_name': ''}}}}}

#### [Predict API](https://www.tensorflow.org/tfx/serving/api_rest#predict_api)

This API closely follows the PredictionService.Predict gRPC API.

```bash
POST http://host:port/v1/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]:predict
```

*/versions/${MODEL_VERSION}* is optional. If omitted the **latest** version is used.


**Request format**

The request body for predict API must be JSON object formatted as follows:

```python
{
  // (Optional) Serving signature to use.
  // If unspecifed default serving signature is used.
  "signature_name": <string>,

  // Input Tensors in row ("instances") or columnar ("inputs") format.
  // A request can have either of them but NOT both.
  "instances": <value>|<(nested)list>|<list-of-objects>
  "inputs": <value>|<(nested)list>|<object>
}
```

**Examples**

1. Row representation

```python
{
 "instances": [
   {
     "tag": "foo",
     "signal": [1, 2, 3, 4, 5],
     "sensor": [[1, 2], [3, 4]]
   },
   {
     "tag": "bar",
     "signal": [3, 4, 1, 2, 5],
     "sensor": [[4, 5], [6, 8]]
   }
 ]
}
```

2. Columnar representation

```python
{
 "inputs": {
   "tag": ["foo", "bar"],
   "signal": [[1, 2, 3, 4, 5], [3, 4, 1, 2, 5]],
   "sensor": [[[1, 2], [3, 4]], [[4, 5], [6, 8]]]
 }
}
```

**Prepare the json with input data**

We already created some small array with 3 test images. Pass them to json (in representation that you prefer) and post this json to the server.

In [39]:
input_data_json = json.dumps({
    "signature_name": "serving_default",
    "instances": X_query.tolist(),
})
print(input_data_json[:200] + "..." + input_data_json[-200:])

{"signature_name": "serving_default", "instances": [[[[0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0],...5294117647059], [0.2784313725490196], [0.0], [0.0], [0.26666666666666666], [0.6901960784313725], [0.6431372549019608], [0.22745098039215686], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0]]]]}


In [47]:
SERVER_URL = 'http://localhost:8501/v1/models/fashion_mnist_conv:predict'
            
response = requests.post(SERVER_URL, data=input_data_json)
response.raise_for_status()
response = response.json()

response

{'predictions': [[4.36335e-07,
   4.05034e-07,
   2.27004e-06,
   1.23799e-06,
   1.28091e-05,
   0.110029,
   4.91553e-06,
   0.310268,
   0.00717149,
   0.572509],
  [0.000363642,
   1.94173e-05,
   0.90677,
   0.000184919,
   0.0202125,
   3.04087e-07,
   0.0723665,
   5.67259e-11,
   8.31599e-05,
   8.31543e-10],
  [0.00139096,
   0.995004,
   3.90449e-05,
   0.00313686,
   0.00029803,
   5.20561e-10,
   0.000128445,
   2.57871e-07,
   2.46117e-06,
   2.83903e-09]]}

In [54]:
y_proba = np.array(response["predictions"])
y_proba.round(2)

array([[0.  , 0.  , 0.  , 0.  , 0.  , 0.11, 0.  , 0.31, 0.01, 0.57],
       [0.  , 0.  , 0.91, 0.  , 0.02, 0.  , 0.07, 0.  , 0.  , 0.  ],
       [0.  , 1.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ]])

In [58]:
np.argmax(y_proba, axis=-1), y_new

(array([9, 2, 1]), array([9, 2, 1], dtype=uint8))

#### Prepare the function that queries the server for the whole testing dataset and returns the network accuracy

And compare it with test accuracy that we computed earlier.

In [89]:
def query_for_answers(X_test, SERVER_URL, batch_size=16):
    good_answers = 0
    
    for i in list(range(0, X_test.shape[0], batch_size)):
        X_query = X_test[i:(i+batch_size)]
        y_query = y_test[i:(i+batch_size)]
        
        input_data_json = json.dumps({
            "signature_name": "serving_default",
            "instances": X_query.tolist(),
        })

        response = requests.post(SERVER_URL, data=input_data_json)
        response.raise_for_status()
        response = response.json()
        
        y_proba = np.array(response["predictions"])
        good_answers += np.sum(np.argmax(y_proba, axis=-1) == y_query)
        
    return good_answers / X_test.shape[0]

In [90]:
query_for_answers(X_test, SERVER_URL, batch_size=128)

0.7908

# Images sources

Images and code fragments used in this notebook comes from the following web pages and papers:

1. https://github.com/ageron/tf2_course/blob/master/04_deploy_and_distribute_tf2.ipynb
2. https://twitter.com/tensorflow/status/832008382408126464