# Training and Deploying TensorFlow Models at Scale

In [2]:
# FIXME: meke autocompletion working again
%config Completer.use_jedi = False

import os

# OpenAI gym
import gym

%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf

# Get smooth animations
mpl.rc('animation', html='jshtml')

physical_devices = tf.config.list_physical_devices('GPU')

if not physical_devices:
    print("No GPU was detected.")
else:
    # https://stackoverflow.com/a/60699372
    tf.config.experimental.set_memory_growth(physical_devices[0], True)
    
from tensorflow import keras

No GPU was detected.


## Deploying TensorFlow models to TensorFlow Serving
*TensorFlow Serving (TFS)* provides simple REST and gRPC APIs and handle model versioning and graceful updates and rollbacks (blue-green or stop-the-world).

In [3]:
# Fetch MNIST dataset
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.mnist.load_data()

# Scale the data
max_value = 255.
X_train_full = X_train_full[..., np.newaxis].astype(np.float32) / max_value
X_test = X_test[..., np.newaxis].astype(np.float32) / max_value

# Split the raw training data to training and validation sets
valid_split = 5000
X_valid, X_train = X_train_full[:valid_split], X_train_full[valid_split:]
y_valid, y_train = y_train_full[:valid_split], y_train_full[valid_split:]

# Use first couple of test instances for predictions
X_new = X_test[:3]

### Save/Load a SavedModel

In [4]:
# Set RNG state
np.random.seed(42)
tf.random.set_seed(42)

# Build model v1
model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28, 1]),
    keras.layers.Dense(100, activation="relu"),
    keras.layers.Dense(10, activation="softmax"),
])

model.compile(
    loss="sparse_categorical_crossentropy",
    optimizer=keras.optimizers.SGD(lr=1e-2),
    metrics=["accuracy"],
)

# Train the model
model.fit(X_train, y_train, epochs=10, validation_data=(X_valid, y_valid))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f6bb95a2070>

In [5]:
# Test model's predictions
np.round(model.predict(X_new), 2)

array([[0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 1.  , 0.  , 0.  ],
       [0.  , 0.  , 0.99, 0.01, 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ],
       [0.  , 0.97, 0.01, 0.  , 0.  , 0.  , 0.  , 0.01, 0.  , 0.  ]],
      dtype=float32)

In [8]:
model_version = "0001"
model_name = "my_mnist_model"
model_path = os.path.join("data", model_name, model_version)
model_path

'data/my_mnist_model/0001'

In [9]:
!rm -rf data/{model_name}

In [10]:
# Save the model in `SavedModel` format/structure
tf.saved_model.save(model, model_path)

INFO:tensorflow:Assets written to: data/my_mnist_model/0001/assets


Let's print the structure of `SavedModel`:
* `saved_model.pb` is a protobuf-serialized computaiton graph of the model
* `variables/` contains all the weights, possibly split into multiple files
* `assets/` contains additional data such as examples, dictionary tables, etc.

In [11]:
!tree data/{model_name}

[01;34mdata/my_mnist_model[00m
└── [01;34m0001[00m
    ├── [01;34massets[00m
    ├── saved_model.pb
    └── [01;34mvariables[00m
        ├── variables.data-00000-of-00001
        └── variables.index

3 directories, 3 files


The `saved_model_cli` tool can be handy to describe the saved model as well as running predictions (for debugging).

In [13]:
!saved_model_cli show --dir {model_path}

2021-03-04 21:35:58.423566: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-03-04 21:35:58.423600: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
The given SavedModel contains the following tag-sets:
'serve'


In [14]:
!saved_model_cli show --dir {model_path} --tag_set serve

2021-03-04 21:36:20.999013: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-03-04 21:36:20.999043: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
The given SavedModel MetaGraphDef contains SignatureDefs with the following keys:
SignatureDef key: "__saved_model_init_op"
SignatureDef key: "serving_default"


In [15]:
!saved_model_cli show --dir {model_path} --tag_set serve --signature_def serving_default

2021-03-04 21:36:45.954454: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-03-04 21:36:45.954489: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
The given SavedModel SignatureDef contains the following input(s):
  inputs['flatten_input'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 28, 28, 1)
      name: serving_default_flatten_input:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['dense_1'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 10)
      name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict


In [16]:
!saved_model_cli show --dir {model_path} --all

2021-03-04 21:41:37.130617: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-03-04 21:41:37.130644: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['flatten_input'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 28, 28, 1)
        name: se

In [19]:
# Save the input instances to a numpy file
inputs_path = os.path.join("data", "my_mnist_tests.npy")
np.save(inputs_path, X_new)

In [21]:
# Get the name of the input layer in the model
input_name = model.input_names[0]
input_name

'flatten_input'

Use the CLI to make predictions on these test instances.

In [22]:
!saved_model_cli run --dir {model_path} --tag_set serve \
                     --signature_def serving_default    \
                     --inputs {input_name}={inputs_path}

2021-03-04 21:45:40.324006: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-03-04 21:45:40.324038: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-03-04 21:45:42.226031: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-03-04 21:45:42.226193: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-03-04 21:45:42.226207: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-03-04 21:45:42.226230: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running 

In [24]:
np.round(
    [[1.1347984e-04, 1.5187356e-07, 9.7032893e-04, 2.7640699e-03, 3.7826971e-06, 7.6876910e-05, 3.9140293e-08, 9.9559116e-01, 5.3502394e-05, 4.2665208e-04],
    [8.2443521e-04, 3.5493889e-05, 9.8826385e-01, 7.0466995e-03, 1.2957400e-07, 2.3389691e-04, 2.5639210e-03, 9.5886099e-10, 1.0314899e-03, 8.7952529e-08],
    [4.4693781e-05, 9.7028232e-01, 9.0526715e-03, 2.2641101e-03, 4.8766597e-04, 2.8800720e-03, 2.2714981e-03, 8.3753867e-03, 4.0439744e-03, 2.9759688e-04]],
    2,
)

array([[0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 1.  , 0.  , 0.  ],
       [0.  , 0.  , 0.99, 0.01, 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ],
       [0.  , 0.97, 0.01, 0.  , 0.  , 0.  , 0.  , 0.01, 0.  , 0.  ]])