## 10.2 TensorFlow Serving

To build the app we need to convert the keras model HDF5 into special format called tensorflow SavedModel. To do that, we download a prebuilt model and save it in the working directory:

In [1]:
!wget https://github.com/DataTalksClub/machine-learning-zoomcamp/releases/download/dl-models/clothing-model-new.keras

--2025-12-13 17:17:21--  https://github.com/DataTalksClub/machine-learning-zoomcamp/releases/download/dl-models/clothing-model-new.keras
Resolving github.com (github.com)... 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://release-assets.githubusercontent.com/github-production-release-asset/256401220/2cd45069-ddd0-4c0e-9011-88ddb11056d1?sp=r&sv=2018-11-09&sr=b&spr=https&se=2025-12-13T13%3A02%3A54Z&rscd=attachment%3B+filename%3Dclothing-model-new.keras&rsct=application%2Foctet-stream&skoid=96c2d410-5711-43a1-aedd-ab1947aa7ab0&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skt=2025-12-13T12%3A02%3A24Z&ske=2025-12-13T13%3A02%3A54Z&sks=b&skv=2018-11-09&sig=1PSZBR7Ldz4BcOLyY%2F15%2BHjQLLRgcy0X7%2B8ketMZNEE%3D&jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmVsZWFzZS1hc3NldHMuZ2l0aHVidXNlcmNvbnRlbnQuY29tIiwia2V5Ijoia2V5MSIsImV4cCI6MTc2NTYzMDA0MSwibmJmIjoxNzY1NjI4MjQ

Then convert the model to SavedModel format:

In [56]:
import tensorflow as tf
from tensorflow import keras

model_name = 'clothing-model-new.keras'
model = keras.models.load_model(model_name)
model.export('clothing-model')

INFO:tensorflow:Assets written to: clothing-model/assets


INFO:tensorflow:Assets written to: clothing-model/assets


Saved artifact at 'clothing-model'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): List[TensorSpec(shape=(None, 299, 299, 3), dtype=tf.float32, name='input_layer_6')]
Output Type:
  TensorSpec(shape=(None, 10), dtype=tf.float32, name=None)
Captures:
  124750956129808: TensorSpec(shape=(), dtype=tf.resource, name=None)
  124749728370896: TensorSpec(shape=(), dtype=tf.resource, name=None)
  124749728369744: TensorSpec(shape=(), dtype=tf.resource, name=None)
  124749728373200: TensorSpec(shape=(), dtype=tf.resource, name=None)
  124749728371088: TensorSpec(shape=(), dtype=tf.resource, name=None)
  124749728373776: TensorSpec(shape=(), dtype=tf.resource, name=None)
  124749728373584: TensorSpec(shape=(), dtype=tf.resource, name=None)
  124749728373008: TensorSpec(shape=(), dtype=tf.resource, name=None)
  124749728372816: TensorSpec(shape=(), dtype=tf.resource, name=None)
  124749728382416: TensorSpec(shape=(), dtype=tf.resource, name=None)
  12474972

In [43]:
!pip install protobuf==6.33.2

Collecting protobuf==6.33.2
  Using cached protobuf-6.33.2-cp39-abi3-manylinux2014_x86_64.whl.metadata (593 bytes)
Using cached protobuf-6.33.2-cp39-abi3-manylinux2014_x86_64.whl (323 kB)
Installing collected packages: protobuf
  Attempting uninstall: protobuf
    Found existing installation: protobuf 5.28.0
    Uninstalling protobuf-5.28.0:
      Successfully uninstalled protobuf-5.28.0
Successfully installed protobuf-6.33.2

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


We can inspect what's inside the saved model using the utility (saved_model_cli) from TensorFlow and the following command:

In [44]:
!saved_model_cli show --dir clothing-model --all

2025-12-13 18:04:01.369118: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-12-13 18:04:01.386018: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-12-13 18:04:01.927509: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-12-13 18:04:04.737765: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off,

Running the command outputs a few things but we are interested in the signature, specifically the following one.

```
signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input_layer_6'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 299, 299, 3)
        name: serving_default_input_layer_6:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['output_0'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 10)
        name: StatefulPartitionedCall_1:0
  Method name is: tensorflow/serving/predict
```

We are interesting in values for signature_name input and output

```
serving_default
input_layer_6
output_0
```

Now, we can run the model (`clothing-model`) with the prebuilt docker image `tensorflow/serving:latest`:

```
docker run -it --rm \
  -p 8500:8500 \
  -v $(pwd)/clothing-model:/models/clothing-model/1 \
  -e MODEL_NAME="clothing-model" \
  tensorflow/serving:latest
```

Tensorflow uses specical serving called gRPC protocol which is optimized to use binary data format. We need to convert our prediction into protobuf.

To be able to communicate with our model, we need install a gRPC client. Also we need keras-image-helper to manipulate with images

```bash
!pip install grpcio tensorflow-serving-api keras-image-helper keras-image-helper
```

In [5]:
!pip install grpcio tensorflow-serving-api keras-image-helper keras-image-helper

Collecting grpcio
  Using cached grpcio-1.76.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (3.7 kB)
Collecting tensorflow-serving-api
  Downloading tensorflow_serving_api-2.19.1-py2.py3-none-any.whl.metadata (1.8 kB)
Collecting keras-image-helper
  Using cached keras_image_helper-0.0.2-py3-none-any.whl.metadata (3.5 kB)
INFO: pip is looking at multiple versions of tensorflow to determine which version is compatible with other requirements. This could take a while.
Collecting tensorflow<3,>=2.19.1 (from tensorflow-serving-api)
  Using cached tensorflow-2.20.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.5 kB)
Collecting protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<7.0.0dev,>=3.20.3 (from tensorflow-serving-api)
  Downloading protobuf-6.33.2-cp39-abi3-manylinux2014_x86_64.whl.metadata (593 bytes)
Collecting tensorboard~=2.20.0 (from tensorflow<3,>=2.19.1->tensorflow-serving-api)
  Using cached tensorboard-2.20.0-py3-none

In [12]:
import grpc
import tensorflow
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc

In [33]:
host = 'localhost:8500'
channel = grpc.insecure_channel(host)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

In [14]:
from keras_image_helper import create_preprocessor

In [17]:
preprocessor = create_preprocessor('xception', target_size=(299,299))

In [22]:
url = 'http://bit.ly/mlbookcamp-pants'
X = preprocessor.from_url(url)

In [36]:
def np_to_protobuf(data):
    return tf.make_tensor_proto(data, shape=data.shape)

In [50]:
request = predict_pb2.PredictRequest()
request.model_spec.name = 'clothing-model'
request.model_spec.signature_name = 'serving_default'
request.inputs['input_layer_6'].CopyFrom(np_to_protobuf(X))

In [51]:
response = stub.Predict(request, timeout=10.0)
response

model_spec {
  name: "clothing-model"
  version {
    value: 1
  }
  signature_name: "serving_default"
}
outputs {
  key: "output_0"
  value {
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 1
      }
      dim {
        size: 10
      }
    }
    float_val: -2.36843228
    float_val: -4.41473389
    float_val: -2.30905557
    float_val: -1.73974895
    float_val: 8.66011
    float_val: -3.19498897
    float_val: -5.59543562
    float_val: 2.74879694
    float_val: -2.99564838
    float_val: -4.3634305
  }
}

In [53]:
preds = response.outputs['output_0'].float_val
preds

[-2.3684322834014893, -4.41473388671875, -2.3090555667877197, -1.7397489547729492, 8.660110473632812, -3.194988965988159, -5.595435619354248, 2.7487969398498535, -2.9956483840942383, -4.363430500030518]

In [52]:
classes = [
    "dress",
    "hat",
    "longsleeve",
    "outwear",
    "pants",
    "shirt",
    "shoes",
    "shorts",
    "skirt",
    "t-shirt",
]

In [55]:
dict(zip(classes, preds))

{'dress': -2.3684322834014893,
 'hat': -4.41473388671875,
 'longsleeve': -2.3090555667877197,
 'outwear': -1.7397489547729492,
 'pants': 8.660110473632812,
 'shirt': -3.194988965988159,
 'shoes': -5.595435619354248,
 'shorts': 2.7487969398498535,
 'skirt': -2.9956483840942383,
 't-shirt': -4.363430500030518}

## 10.3 Creating a pre-processing service