# Chapter 19 – Training and Deploying TensorFlow Models at Scale

This notebook contains all the sample code and solutions to the exercises in chapter 19.

## Setup
This project requires Python 3.7 or above:

In [1]:
import sys

assert sys.version_info >= (3, 7)

**Warning**: the latest TensorFlow versions are based on Keras 3. For chapters 10-15, it wasn't too hard to update the code to support Keras 3, but unfortunately it's much harder for this chapter, so I've had to revert to Keras 2. To do that, I set the ```TF_USE_LEGACY_KERAS ```environment variable to ```"1" ```and import the ```tf_keras``` package. This ensures that ```tf.keras``` points to ```tf_keras```, which is Keras 2.*.

In [2]:
IS_COLAB = "google.colab" in sys.modules
if IS_COLAB:
    import os
    os.environ["TF_USE_LEGACY_KERAS"] = "1"
    import tf_keras

And TensorFlow ≥ 2.8:

In [3]:
from packaging import version
import tensorflow as tf

assert version.parse(tf.__version__) >= version.parse("2.8.0")

If running on Colab or Kaggle, you need to install the Google AI Platform client library, which will be used later in this notebook. You can ignore the warnings about version incompatibilities.

* **Warning**: On Colab, you must restart the Runtime after the installation, and continue with the next cells.

In [4]:
import sys
if "google.colab" in sys.modules or "kaggle_secrets" in sys.modules:
    %pip install -q -U google-cloud-aiplatform

This chapter discusses how to run or train a model on one or more GPUs, so let's make sure there's at least one, or else issue a warning:

In [5]:
if not tf.config.list_physical_devices('GPU'):
    print("No GPU was detected. Neural nets can be very slow without a GPU.")
    if "google.colab" in sys.modules:
        print("Go to Runtime > Change runtime and select a GPU hardware "
              "accelerator.")
    if "kaggle_secrets" in sys.modules:
        print("Go to Settings > Accelerator and select GPU.")

# Serving a TensorFlow Model
Let's start by deploying a model using TF Serving, then we'll deploy to Google Vertex AI.

## Using TensorFlow Serving
The first thing we need to do is to build and train a model, and export it to the SavedModel format.

## Exporting SavedModels
Let's load the MNIST dataset, scale it, and split it.

In [None]:
from pathlib import Path
import tensorflow as tf

# extra code – load and split the MNIST dataset
mnist = tf.keras.datasets.mnist.load_data()
(X_train_full, y_train_full), (X_test, y_test) = mnist
X_valid, X_train = X_train_full[:5000], X_train_full[5000:]
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

# extra code – build & train an MNIST model (also handles image preprocessing)
tf.random.set_seed(42)
tf.keras.backend.clear_session()
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=[28, 28], dtype=tf.uint8),
    tf.keras.layers.Rescaling(scale=1 / 255),
    tf.keras.layers.Dense(100, activation="relu"),
    tf.keras.layers.Dense(10, activation="softmax")
])
model.compile(loss="sparse_categorical_crossentropy",
              optimizer=tf.keras.optimizers.SGD(learning_rate=1e-2),
              metrics=["accuracy"])
model.fit(X_train, y_train, epochs=10, validation_data=(X_valid, y_valid))

model_name = "my_mnist_model"
model_version = "0001"
model_path = Path(model_name) / model_version
model.save(model_path, save_format="tf")

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10

Let's take a look at the file tree (we've discussed what each of these file is used for in chapter 10):

In [None]:
import os
#os.chdir("/content/drive/My Drive/path/to/your/model")
#os.chdir("C:/Users/schre/OneDrive/Documents/GitHub/HOML3e/")
os.chdir("C:/Users/schre/OneDrive/Documents/GitHub/HOML3e/my_mnist_model")

In [None]:
sorted([str(path) for path in model_path.parent.glob("**/*")])  # extra code

['my_mnist_model\\0001',
 'my_mnist_model\\0001\\assets',
 'my_mnist_model\\0001\\keras_metadata.pb',
 'my_mnist_model\\0001\\saved_model.pb',
 'my_mnist_model\\0001\\variables',
 'my_mnist_model\\0001\\variables\\variables.data-00000-of-00001',
 'my_mnist_model\\0001\\variables\\variables.index']

In [None]:
model_path

WindowsPath('my_mnist_model/0001')

Let's inspect the SavedModel:

In [None]:
!saved_model_cli show --dir '{model_path}'

Traceback (most recent call last):
  File "C:\Users\schre\anaconda3\envs\tf_dev\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\schre\anaconda3\envs\tf_dev\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\schre\anaconda3\envs\tf_dev\Scripts\saved_model_cli.exe\__main__.py", line 7, in <module>
  File "C:\Users\schre\anaconda3\envs\tf_dev\lib\site-packages\tensorflow\python\tools\saved_model_cli.py", line 1285, in main
    args.func(args)
  File "C:\Users\schre\anaconda3\envs\tf_dev\lib\site-packages\tensorflow\python\tools\saved_model_cli.py", line 753, in show
    _show_tag_sets(args.dir)
  File "C:\Users\schre\anaconda3\envs\tf_dev\lib\site-packages\tensorflow\python\tools\saved_model_cli.py", line 71, in _show_tag_sets
    tag_sets = saved_model_utils.get_saved_model_tag_sets(saved_model_dir)
  File "C:\Users\schre\anaconda3\envs\tf_dev\lib\site-packages\tensorflow\python\tools\saved_mod

In [None]:
!saved_model_cli show --dir '{model_path}' --tag_set serve

Traceback (most recent call last):
  File "C:\Users\schre\anaconda3\envs\tf_dev\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\schre\anaconda3\envs\tf_dev\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\schre\anaconda3\envs\tf_dev\Scripts\saved_model_cli.exe\__main__.py", line 7, in <module>
  File "C:\Users\schre\anaconda3\envs\tf_dev\lib\site-packages\tensorflow\python\tools\saved_model_cli.py", line 1285, in main
    args.func(args)
  File "C:\Users\schre\anaconda3\envs\tf_dev\lib\site-packages\tensorflow\python\tools\saved_model_cli.py", line 756, in show
    _show_signature_def_map_keys(args.dir, args.tag_set)
  File "C:\Users\schre\anaconda3\envs\tf_dev\lib\site-packages\tensorflow\python\tools\saved_model_cli.py", line 89, in _show_signature_def_map_keys
    signature_def_map = get_signature_def_map(saved_model_dir, tag_set)
  File "C:\Users\schre\anaconda3\envs\tf_dev\lib\site-pac

In [None]:
!saved_model_cli show --dir '{model_path}' --tag_set serve \
                      --signature_def serving_default

Traceback (most recent call last):
  File "C:\Users\schre\anaconda3\envs\tf_dev\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\schre\anaconda3\envs\tf_dev\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\schre\anaconda3\envs\tf_dev\Scripts\saved_model_cli.exe\__main__.py", line 7, in <module>
  File "C:\Users\schre\anaconda3\envs\tf_dev\lib\site-packages\tensorflow\python\tools\saved_model_cli.py", line 1285, in main
    args.func(args)
  File "C:\Users\schre\anaconda3\envs\tf_dev\lib\site-packages\tensorflow\python\tools\saved_model_cli.py", line 758, in show
    _show_inputs_outputs(args.dir, args.tag_set, args.signature_def)
  File "C:\Users\schre\anaconda3\envs\tf_dev\lib\site-packages\tensorflow\python\tools\saved_model_cli.py", line 152, in _show_inputs_outputs
    meta_graph_def = saved_model_utils.get_meta_graph_def(saved_model_dir,
  File "C:\Users\schre\anaconda3\envs\tf_dev\lib\

For even more details, you can run the following command:

```!saved_model_cli show --dir '{model_path}' --all```

## Installing and Starting TensorFlow Serving
If you are running this notebook in Colab or Kaggle, TensorFlow Server needs to be installed:

In [None]:
if "google.colab" in sys.modules or "kaggle_secrets" in sys.modules:
    url = "https://storage.googleapis.com/tensorflow-serving-apt"
    src = "stable tensorflow-model-server tensorflow-model-server-universal"
    !echo 'deb {url} {src}' > /etc/apt/sources.list.d/tensorflow-serving.list
    !curl '{url}/tensorflow-serving.release.pub.gpg' | apt-key add -
    !apt update -q && apt-get install -y tensorflow-model-server
    %pip install -q -U tensorflow-serving-api

If ```tensorflow_model_server``` is installed (e.g., if you are running this notebook in Colab), then the following 2 cells will start the server. If your OS is Windows, you may need to run the ```tensorflow_model_server``` command in a terminal, and replace ${MODEL_DIR} with the full path to the my_mnist_model directory.

In [None]:
import os

os.environ["MODEL_DIR"] = str(model_path.parent.absolute())

In [None]:
%%bash --bg
tensorflow_model_server \
    --port=8500 \
    --rest_api_port=8501 \
    --model_name=my_mnist_model \
    --model_base_path="${MODEL_DIR}" >my_server.log 2>&1

In [None]:
import time

time.sleep(2) # let's wait a couple seconds for the server to start

If you are running this notebook on your own machine, and you prefer to install TF Serving using Docker, first make sure Docker is installed, then run the following commands in a terminal. You must replace ```/path/to/my_mnist_model``` with the appropriate absolute path to the ```my_mnist_model``` directory, but do not modify the container path ```/models/my_mnist_model```.

```
docker pull tensorflow/serving  # downloads the latest TF Serving image

docker run -it --rm -v "/path/to/my_mnist_model:/models/my_mnist_model" \
    -p 8500:8500 -p 8501:8501 -e MODEL_NAME=my_mnist_model tensorflow/serving
```

## Querying TF Serving through the REST API
Next, let's send a REST query to TF Serving: