# Chapter 19 – Training and Deploying TensorFlow Models at Scale

This notebook contains all the sample code and solutions to the exercises in chapter 19.

## Setup
This project requires Python 3.7 or above:

In [1]:
import sys

assert sys.version_info >= (3, 7)

**Warning**: the latest TensorFlow versions are based on Keras 3. For chapters 10-15, it wasn't too hard to update the code to support Keras 3, but unfortunately it's much harder for this chapter, so I've had to revert to Keras 2. To do that, I set the ```TF_USE_LEGACY_KERAS ```environment variable to ```"1" ```and import the ```tf_keras``` package. This ensures that ```tf.keras``` points to ```tf_keras```, which is Keras 2.*.

In [2]:
IS_COLAB = "google.colab" in sys.modules
if IS_COLAB:
    import os
    os.environ["TF_USE_LEGACY_KERAS"] = "1"
    import tf_keras

And TensorFlow ≥ 2.8:

In [3]:
from packaging import version
import tensorflow as tf

assert version.parse(tf.__version__) >= version.parse("2.8.0")

If running on Colab or Kaggle, you need to install the Google AI Platform client library, which will be used later in this notebook. You can ignore the warnings about version incompatibilities.

* **Warning**: On Colab, you must restart the Runtime after the installation, and continue with the next cells.

In [4]:
import sys
if "google.colab" in sys.modules or "kaggle_secrets" in sys.modules:
    %pip install -q -U google-cloud-aiplatform

This chapter discusses how to run or train a model on one or more GPUs, so let's make sure there's at least one, or else issue a warning:

In [5]:
if not tf.config.list_physical_devices('GPU'):
    print("No GPU was detected. Neural nets can be very slow without a GPU.")
    if "google.colab" in sys.modules:
        print("Go to Runtime > Change runtime and select a GPU hardware "
              "accelerator.")
    if "kaggle_secrets" in sys.modules:
        print("Go to Settings > Accelerator and select GPU.")

# Serving a TensorFlow Model
Let's start by deploying a model using TF Serving, then we'll deploy to Google Vertex AI.

## Using TensorFlow Serving
The first thing we need to do is to build and train a model, and export it to the SavedModel format.

## Exporting SavedModels
Let's load the MNIST dataset, scale it, and split it.

In [6]:
from pathlib import Path
import tensorflow as tf

# extra code – load and split the MNIST dataset
mnist = tf.keras.datasets.mnist.load_data()
(X_train_full, y_train_full), (X_test, y_test) = mnist
X_valid, X_train = X_train_full[:5000], X_train_full[5000:]
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

# extra code – build & train an MNIST model (also handles image preprocessing)
tf.random.set_seed(42)
tf.keras.backend.clear_session()
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=[28, 28], dtype=tf.uint8),
    tf.keras.layers.Rescaling(scale=1 / 255),
    tf.keras.layers.Dense(100, activation="relu"),
    tf.keras.layers.Dense(10, activation="softmax")
])
model.compile(loss="sparse_categorical_crossentropy",
              optimizer=tf.keras.optimizers.SGD(learning_rate=1e-2),
              metrics=["accuracy"])
model.fit(X_train, y_train, epochs=10, validation_data=(X_valid, y_valid))

model_name = "my_mnist_model"
model_version = "0001"
model_path = Path(model_name) / model_version
model.save(model_path, save_format="tf")

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
INFO:tensorflow:Assets written to: my_mnist_model\0001\assets


Let's take a look at the file tree (we've discussed what each of these file is used for in chapter 10):

In [7]:
import os
#os.chdir("/content/drive/My Drive/path/to/your/model")
#os.chdir("C:/Users/schre/OneDrive/Documents/GitHub/HOML3e/")
os.chdir("C:/Users/schre/OneDrive/Documents/GitHub/HOML3e/my_mnist_model")




In [8]:
from pathlib import Path
import os

# Replace with the actual absolute path to your model directory
absolute_path = r"C:/Users/schre/OneDrive/Documents/GitHub/HOML3e/my_mnist_model/0001/" # use raw string (r"") to avoid escape sequence issues.
model_path = Path(absolute_path)

print(model_path)
print(model_path.parent)
print(os.listdir(model_path.parent))

C:\Users\schre\OneDrive\Documents\GitHub\HOML3e\my_mnist_model\0001
C:\Users\schre\OneDrive\Documents\GitHub\HOML3e\my_mnist_model
['0001']


In [9]:
sorted([str(path) for path in model_path.parent.glob("**/*")])  # extra code

['C:\\Users\\schre\\OneDrive\\Documents\\GitHub\\HOML3e\\my_mnist_model\\0001',
 'C:\\Users\\schre\\OneDrive\\Documents\\GitHub\\HOML3e\\my_mnist_model\\0001\\assets',
 'C:\\Users\\schre\\OneDrive\\Documents\\GitHub\\HOML3e\\my_mnist_model\\0001\\keras_metadata.pb',
 'C:\\Users\\schre\\OneDrive\\Documents\\GitHub\\HOML3e\\my_mnist_model\\0001\\saved_model.pb',
 'C:\\Users\\schre\\OneDrive\\Documents\\GitHub\\HOML3e\\my_mnist_model\\0001\\variables',
 'C:\\Users\\schre\\OneDrive\\Documents\\GitHub\\HOML3e\\my_mnist_model\\0001\\variables\\variables.data-00000-of-00001',
 'C:\\Users\\schre\\OneDrive\\Documents\\GitHub\\HOML3e\\my_mnist_model\\0001\\variables\\variables.index']

In [10]:
model_path

WindowsPath('C:/Users/schre/OneDrive/Documents/GitHub/HOML3e/my_mnist_model/0001')

In [11]:
model_path.parent

WindowsPath('C:/Users/schre/OneDrive/Documents/GitHub/HOML3e/my_mnist_model')

In [12]:
import os
print(os.listdir(model_path))

['assets', 'keras_metadata.pb', 'saved_model.pb', 'variables']


Let's inspect the SavedModel:

In [13]:
!saved_model_cli show --dir {model_path}

The given SavedModel contains the following tag-sets:
'serve'


In [14]:
!saved_model_cli show --dir {model_path} --all


MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['flatten_input'] tensor_info:
        dtype: DT_UINT8
        shape: (-1, 28, 28)
        name: serving_default_flatten_input:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['dense_1'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 10)
        name: StatefulPartitionedCall:0
  Method name is: tensorflow/serving/predict

Concrete Functions:
  Function Name: '__call__'
    Option #1
      Callable with:
        Argument #1
          inputs: 

In [15]:
!saved_model_cli show --dir {model_path} --tag_set serve

The given SavedModel MetaGraphDef contains SignatureDefs with the following keys:
SignatureDef key: "__saved_model_init_op"
SignatureDef key: "serving_default"


In [16]:
!saved_model_cli show --dir {model_path} --tag_set serve \
                      --signature_def serving_default

The given SavedModel SignatureDef contains the following input(s):
  inputs['flatten_input'] tensor_info:
      dtype: DT_UINT8
      shape: (-1, 28, 28)
      name: serving_default_flatten_input:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['dense_1'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 10)
      name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict


For even more details, you can run the following command:

```!saved_model_cli show --dir '{model_path}' --all```

## Installing and Starting TensorFlow Serving
If you are running this notebook in Colab or Kaggle, TensorFlow Server needs to be installed:

In [28]:
if "google.colab" in sys.modules or "kaggle_secrets" in sys.modules:
    url = "https://storage.googleapis.com/tensorflow-serving-apt"
    src = "stable tensorflow-model-server tensorflow-model-server-universal"
    !echo 'deb {url} {src}' > /etc/apt/sources.list.d/tensorflow-serving.list
    !curl '{url}/tensorflow-serving.release.pub.gpg' | apt-key add -
    !apt update -q && apt-get install -y tensorflow-model-server
    %pip install -q -U tensorflow-serving-api

If ```tensorflow_model_server``` is installed (e.g., if you are running this notebook in Colab), then the following 2 cells will start the server. If your OS is Windows, you may need to run the ```tensorflow_model_server``` command in a terminal, and replace ${MODEL_DIR} with the full path to the my_mnist_model directory.

In [35]:
import os
#MODEL_DIR = r"C:/Users/schre/OneDrive/Documents/GitHub/HOML3e/my_mnist_model/0001/" 
MODEL_DIR = r"C:/Users/schre/OneDrive/Documents/GitHub/HOML3e/my_mnist_model/" 
os.environ[MODEL_DIR] = str(model_path.parent.absolute())

In [30]:
%%bash --bg
tensorflow_model_server \
    --port=8500 \
    --rest_api_port=8501 \
    --model_name=my_mnist_model \
    --model_base_path="${MODEL_DIR}" >my_server.log 2>&1

In [31]:
import time

time.sleep(2) # let's wait a couple seconds for the server to start

If you are running this notebook on your own machine, and you prefer to install TF Serving using Docker, first make sure Docker is installed, then run the following commands in a terminal. You must replace ```/path/to/my_mnist_model``` with the appropriate absolute path to the ```my_mnist_model``` directory, but do not modify the container path ```/models/my_mnist_model```.

```
docker pull tensorflow/serving  # downloads the latest TF Serving image

docker run -it --rm -v "/path/to/my_mnist_model:/models/my_mnist_model" \
    -p 8500:8500 -p 8501:8501 -e MODEL_NAME=my_mnist_model tensorflow/serving
```

## Querying TF Serving through the REST API
Next, let's send a REST query to TF Serving:

In [32]:
import json

X_new = X_test[:3]  # pretend we have 3 new digit images to classify
request_json = json.dumps({
    "signature_name": "serving_default",
    "instances": X_new.tolist(),
})

In [33]:
request_json[:100] + "..." + request_json[-10:]

'{"signature_name": "serving_default", "instances": [[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0..., 0, 0]]]}'

Now let's use TensorFlow Serving's REST API to make predictions:

In [34]:
import requests

server_url = "http://localhost:8501/v1/models/my_mnist_model:predict"
response = requests.post(server_url, data=request_json)
response.raise_for_status()  # raise an exception in case of error
response = response.json()

KeyboardInterrupt: 

In [36]:
import numpy as np

y_proba = np.array(response["predictions"])
y_proba.round(2)

NameError: name 'response' is not defined

## Querying TF Serving through the gRPC API

In [None]:
from tensorflow_serving.apis.predict_pb2 import PredictRequest

request = PredictRequest()
request.model_spec.name = model_name
request.model_spec.signature_name = "serving_default"
input_name = model.input_names[0]  # == "flatten_input"
request.inputs[input_name].CopyFrom(tf.make_tensor_proto(X_new))

In [37]:
import grpc
from tensorflow_serving.apis import prediction_service_pb2_grpc

channel = grpc.insecure_channel('localhost:8500')
predict_service = prediction_service_pb2_grpc.PredictionServiceStub(channel)
response = predict_service.Predict(request, timeout=10.0)

ModuleNotFoundError: No module named 'tensorflow_serving'

Convert the response to a tensor:

In [38]:
output_name = model.output_names[0]
outputs_proto = response.outputs[output_name]
y_proba = tf.make_ndarray(outputs_proto)

NameError: name 'response' is not defined

In [None]:
y_proba.round(2)

If your client does not include the TensorFlow library, you can convert the response to a NumPy array like this:

# extra code – shows how to avoid using tf.make_ndarray()
output_name = model.output_names[0]
outputs_proto = response.outputs[output_name]
shape = [dim.size for dim in outputs_proto.tensor_shape.dim]
y_proba = np.array(outputs_proto.float_val).reshape(shape)
y_proba.round(2)

## Deploying a new model version

In [40]:
# extra code – build and train a new MNIST model version
np.random.seed(42)
tf.random.set_seed(42)
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=[28, 28], dtype=tf.uint8),
    tf.keras.layers.Rescaling(scale=1 / 255),
    tf.keras.layers.Dense(50, activation="relu"),
    tf.keras.layers.Dense(50, activation="relu"),
    tf.keras.layers.Dense(10, activation="softmax")
])
model.compile(loss="sparse_categorical_crossentropy",
              optimizer=tf.keras.optimizers.SGD(learning_rate=1e-2),
              metrics=["accuracy"])
history = model.fit(X_train, y_train, epochs=10,
                    validation_data=(X_valid, y_valid))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [41]:
model_version = "0002"
model_path = Path(model_name) / model_version
model.save(model_path, save_format="tf")

INFO:tensorflow:Assets written to: my_mnist_model\0002\assets


Let's take a look at the file tree again:

In [42]:
sorted([str(path) for path in model_path.parent.glob("**/*")])  # extra code

['my_mnist_model\\0002',
 'my_mnist_model\\0002\\assets',
 'my_mnist_model\\0002\\keras_metadata.pb',
 'my_mnist_model\\0002\\saved_model.pb',
 'my_mnist_model\\0002\\variables',
 'my_mnist_model\\0002\\variables\\variables.data-00000-of-00001',
 'my_mnist_model\\0002\\variables\\variables.index']

Warning: You may need to wait a minute before the new model is loaded by TensorFlow Serving.

In [43]:
import requests

server_url = "http://localhost:8501/v1/models/my_mnist_model:predict"
            
response = requests.post(server_url, data=request_json)
response.raise_for_status()
response = response.json()

ConnectionError: HTTPConnectionPool(host='localhost', port=8501): Max retries exceeded with url: /v1/models/my_mnist_model:predict (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000001F1F7FAC7C0>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))

In [None]:
response.keys()

In [None]:
y_proba = np.array(response["predictions"])
y_proba.round(2)

## Creating a Prediction Service on Vertex AI
Follow the instructions in the book to create a Google Cloud Platform account and activate the Vertex AI and Cloud Storage APIs. Then, if you're running this notebook in Colab, you can run the following cell to authenticate using the same Google account as you used with Google Cloud Platform, and authorize this Colab to access your data.

WARNING: only do this if you trust this notebook!

Be extra careful if this is not the official notebook from https://github.com/ageron/handson-ml3: the Colab URL should start with https://colab.research.google.com/github/ageron/handson-ml3. Or else, the code could do whatever it wants with your data.
If you are not running this notebook in Colab, you must follow the instructions in the book to create a service account and generate a key for it, download it to this notebook's directory, and name it my_service_account_key.json (or make sure the GOOGLE_APPLICATION_CREDENTIALS environment variable points to your key).

In [None]:
project_id = "my_project"  ##### CHANGE THIS TO YOUR PROJECT ID #####

if "google.colab" in sys.modules:
    from google.colab import auth
    auth.authenticate_user()
elif "kaggle_secrets" in sys.modules:
    from kaggle_secrets import UserSecretsClient
    UserSecretsClient().set_gcloud_credentials(project=project_id)
else:
    os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "my_service_account_key.json"

In [None]:
from google.cloud import storage

bucket_name = "my_bucket"  ##### CHANGE THIS TO A UNIQUE BUCKET NAME #####
location = "us-central1"

storage_client = storage.Client(project=project_id)
bucket = storage_client.create_bucket(bucket_name, location=location)
#bucket = storage_client.bucket(bucket_name)  # to reuse a bucket instead

In [None]:
def upload_directory(bucket, dirpath):
    dirpath = Path(dirpath)
    for filepath in dirpath.glob("**/*"):
        if filepath.is_file():
            blob = bucket.blob(filepath.relative_to(dirpath.parent).as_posix())
            blob.upload_from_filename(filepath)

upload_directory(bucket, "my_mnist_model")

In [None]:
# extra code – a much faster multithreaded implementation of upload_directory()
#              which also accepts a prefix for the target path, and prints stuff

from concurrent import futures

def upload_file(bucket, filepath, blob_path):
    blob = bucket.blob(blob_path)
    blob.upload_from_filename(filepath)

def upload_directory(bucket, dirpath, prefix=None, max_workers=50):
    dirpath = Path(dirpath)
    prefix = prefix or dirpath.name
    with futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
        future_to_filepath = {
            executor.submit(
                upload_file,
                bucket, filepath,
                f"{prefix}/{filepath.relative_to(dirpath).as_posix()}"
            ): filepath
            for filepath in sorted(dirpath.glob("**/*"))
            if filepath.is_file()
        }
        for future in futures.as_completed(future_to_filepath):
            filepath = future_to_filepath[future]
            try:
                result = future.result()
            except Exception as ex:
                print(f"Error uploading {filepath!s:60}: {ex}")  # f!s is str(f)
            else:
                print(f"Uploaded {filepath!s:60}", end="\r")

    print(f"Uploaded {dirpath!s:60}")

Alternatively, if you installed Google Cloud CLI (it's preinstalled on Colab), then you can use the following gsutil command

In [None]:
#!gsutil -m cp -r my_mnist_model gs://{bucket_name}/