# 10.1 Kubernetes and TensorFlow Serving

In this chapter we'll use TensorFlow serving for serving our clothing classification model. TF Serving is a special tool from the TF family which is specifically created for serving TF models. TF serving is a library that is written in C++, so it's very efficient but it also focuses on inference. You cannot do anything else with that library.

How does it work?
TF Serving gets a request with the X matrix which is the already prepared image. The result is a Numpy array with 10 predictions (in our case because of having 10 different classes). The user will not do the preprocessing, so we need something between the user and the TF Serving which is called gateway. A gateway gets an url, downloads the image, resizes it and turning into Numpy array, and pre-process it, and outputs predictions in a consumable format (f.e. json format). That means the gateway is also post-processing the output. The only thing the user needs to do is uploading the image to the website that uses the gateway. For implementing the gateway we'll use flask. Then we'll take the gateway and TF Serving and deploy it to Kubernetes.There is one benefit in using TF Serving. We can use GPU for applying the model. That means a lot of matrix multiplications.

How this chapter is organized?
- We'll take the model we trained already with Keras and convert it to a format that TF Serving expects which is called "saved_model" format.
- We'll deploy this model locally with Docker and see how to interact with that.
- After that we'll create this pre-processing service which we called gateway. We'll create two servers each of them will run in its own Docker container. We need to ensure that both can talk to each other
- Then we'll talk about Docker-compose as a way of running two services that communicate with each other on one machine.
- Then we'll look at the main concepts from Kubernetes.
- After that we'll deploy a simple application to Kubernetes and set it up. Actually we'll run Kubernetes locally using a thing called Kind, which is a lightweight Kubernetes that you can run on your local machine.
- Then we'll take the services that we created and deploy them to Kubernetes.
- Finally we'll move these things from our local Kubernetes cluster to a cluster in the cloud. We'll use EKS which is a managed Kubernetes from AWS, but it should work for any cloud provider.


# 10.2 TensorFlow Serving
## The saved_model format
Here we'll use again the model which was trained for the book (xception_v4_large_08_0.894.h5). We can use wget again to download the model (and save it as clothing-model-v4.h5). Now we can convert the model from h5 format to the saved_model format. For the converting we only need a few lines of code. You can do this by using ipython.

In [1]:
import tensorflow as tf
from tensorflow import keras

model = keras.models.load_model('./clothing-model-v4.h5')

tf.saved_model.save(model, 'clothing-model')

2023-11-26 19:04:27.669289: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2023-11-26 19:04:27.857802: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-26 19:04:27.857866: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-26 19:04:27.860253: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-26 19:04:27.880153: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2023-11-26 19:04:27.881751: I tensorflow/core/platform/cpu_feature_guard.cc:1

INFO:tensorflow:Assets written to: clothing-model/assets


INFO:tensorflow:Assets written to: clothing-model/assets


In [2]:
!ls -lhR

.:
total 83M
-rw-rw-r-- 1 peter peter    0 Nov  26 18:18 10-KubTFServing.ipynb
drwxr-xr-x 4 peter peter 4,0K Nov  26 19:05 clothing-model
-rw-rw-r-- 1 peter peter  83M Nov  26 19:03 clothing-model-v4.h5

./clothing-model:
total 2,3M
drwxr-xr-x 2 peter peter 4,0K Nov  26 19:05 assets
-rw-rw-r-- 1 peter peter   57 Nov  26 19:05 fingerprint.pb
-rw-rw-r-- 1 peter peter 2,3M Nov  26 19:05 saved_model.pb
drwxr-xr-x 2 peter peter 4,0K Nov  26 19:05 variables

./clothing-model/assets:
total 0

./clothing-model/variables:
total 83M
-rw-rw-r-- 1 peter peter 83M Nov  26 19:05 variables.data-00000-of-00001
-rw-rw-r-- 1 peter peter 15K Nov  26 19:05 variables.index


Now we can look what's inside the model using the utility saved_model_cli.

In [3]:
!saved_model_cli show --dir clothing-model --all

2023-11-26 19:09:33.048917: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2023-11-26 19:09:33.246135: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-26 19:09:33.246206: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-26 19:09:33.250297: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-26 19:09:33.272191: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2023-11-26 19:09:33.272700: I tensorflow/core/platform/cpu_feature_guard.cc:1

"serving_default" is the name of the signature definition. This is something technical but we need to know this value when we invoke our model.
Then we have an input and an output. The input is called 'input_28'. The shape of input is 299x299x3 and -1 means that we have a batch of arbitrarily many images.
The output is called 'dense_22'. The shape is 10 and -1 means again that we can have a lot of outputs.
So what we need from this definition are the following information:
- serving_default
- input_28 - input
- dense_22 - output

## Running TF-Serving locally with Docker


Now we can use this information to run TF-Serving locally with Docker. Just for repitition 8500:8500 means the local port 8500 is mapped to the port 8500 inside the container. For mounting


In [None]:
!docker run -it --rm -p 8500:8500 -v "./clothing-model:"

## Invoking the model from Jupyter