In [1]:
%autosave 0

Autosave disabled


# TensorFlow Serving

The project involves the deployment of a machine learning model using TensorFlow Serving and a Flask-based gateway service. The application architecture comprises two main components: a Docker container housing TensorFlow Serving for serving the machine learning model, and a Flask application acting as the gateway service.

The deployment process is outlined in two phases. First, the application is configured to run locally using `Docker Compose`, allowing for easy testing and development. Subsequently, the project introduces the utilization of `Kubernetes` for deploying both the TensorFlow Serving model and the Flask gateway application. Kubernetes, a container orchestration platform, provides a scalable and robust environment for managing and running containerized applications in a production or cloud environment.

## 1. Load saved model

In [2]:
# !wget https://github.com/DataTalksClub/machine-learning-zoomcamp/releases/download/chapter7-model/xception_v4_large_08_0.894.h5 -O clothing-model-v4.h5

In [3]:
import tensorflow as tf
from tensorflow import keras

To build the app we need to convert the keras model HDF5 into special format called tensorflow SavedModel. For that we download prebuild model and saved it in the working directory:

In [4]:
model = keras.models.load_model('./clothing-model-v4.h5')

tf.saved_model.save(model, 'clothing-model')



INFO:tensorflow:Assets written to: clothing-model\assets


INFO:tensorflow:Assets written to: clothing-model\assets


Explore the converted model:

```bash
$ tree clothing model

clothing model
┣╸ assets
┣╸ saved_model.pb
┗╸ variables
    ┣╸ variables.data-00000-of-00001
    ┗╸ variables.index
```

```bash
$ ls -lhR clothing-model
clothing-model:
total 2.8M

clothing-model/assets:
total 0

clothing-model/variables:
total 83M
```

## 2. TensorFlow Serving with Docker

We can look what's inside in the saved model using the utility (`saved_model_cli`) from tensorflow and the command `saved_model_cli show --dir model-dir-name --all`. Running the command outputs few things but we are interested in the signature, specifically `signature_def['serving_default']`, which shows the inputs and outputs of the model. For instance, the example model has 1 input ( `input_8`) and 1 output (`dense_7`).    

```bash
$ saved_model_cli show --dir clothing-model --all

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input_8'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 299, 299, 3)
        name: serving_default_input_8:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['dense_7'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 10)
        name: StatefulPartitionedCall:0
  Method name is: tensorflow/serving/predict
```

We can run the model (`clothing-model`) with the prebuilt docker image `tensorflow/serving:2.7.0`:

```bash
$ docker run -it --rm \
  -p 8500:8500 \
  -v $(pwd)/clothing-model:/models/clothing-model/1 \
  -e MODEL_NAME="clothing-model" \
  tensorflow/serving:2.7.0
```

* docker run -it --rm (to run the docker)
* -p 8500:8500 (port mapping)
* -v $(pwd)/clothing-model:/models/clothing-model/1 (volume mapping of absolute model directory to model directory inside the docker image. "models/model_name/version")
* -e MODEL_NAME="clothing-model" (set environment variable for docker image)
* tensorflow/serving:2.7.0 (name of the image to run)

### Establish gRPC Connection to TensorFlow Serving

In [5]:
# !pip install grpcio==1.42.0 tensorflow-serving-api==2.7.0

In [6]:
# !pip install keras-image-helper

In [1]:
import grpc

import tensorflow as tf

from tensorflow_serving.apis import prediction_service_pb2_grpc
from tensorflow_serving.apis import predict_pb2

from keras_image_helper import create_preprocessor

The stub is a client-side object that allows the code to make remote calls to the `PredictionService` using the specified communication channel.

In [2]:
host = 'localhost:8500'

channel = grpc.insecure_channel(host)

stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

### Prepare Prediction Request

In [4]:
# Preprocess the input image 
preprocessor = create_preprocessor('xception', target_size=(299, 299))

url = 'http://bit.ly/mlbookcamp-pants'
X = preprocessor.from_url(url)

Tensorflow uses specical serving called `gRPC` protocol which is optimized to use binary data format. We need to convert the inputs for our prediction request into `protobuf` format.

In [6]:
def np_to_protobuf(data):
    return tf.make_tensor_proto(data, shape=data.shape)

np_to_protobuf(X)
```
dtype: DT_FLOAT
tensor_shape {
  dim {
    size: 1
  }
  dim {
    size: 299
  }
  dim {
    size: 299
  }
  dim {
    size: 3
  }
}
tensor_content: "\350\350\350...
```

`PredictRequest` is used to encapsulate the input data or parameters (model name, model's signature name, and input in `protobuf` format) that the client wants to send to the server when making a prediction.

In [27]:
pb_request = predict_pb2.PredictRequest()

pb_request.model_spec.name = 'clothing-model'
pb_request.model_spec.signature_name = 'serving_default'

# Set the model's input ("input_8")
pb_request.inputs['input_8'].CopyFrom(np_to_protobuf(X))

pb_request

```
model_spec {
  name: "clothing-model"
  signature_name: "serving_default"
}
inputs {
  key: "input_8"
  value {
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 1
      }
      dim {
        size: 299
      }
      dim {
        size: 299
      }
      dim {
        size: 3
      }
    }
    tensor_content: "\350\350...
```

### Make Prediction Request

The `Predict()` method returns a response object with a value in `Protobuf` format.

In [28]:
pb_response = stub.Predict(pb_request, timeout=20.0)

pb_response

```
outputs {
  key: "dense_7"
  value {
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 1
      }
      dim {
        size: 10
      }
    }
    float_val: -1.8682905435562134
    float_val: -4.761244297027588
    float_val: -2.316984176635742
    float_val: -1.0625706911087036
    float_val: 9.887160301208496
    float_val: -2.812433958053589
    float_val: -3.6662838459014893
    float_val: 3.200362205505371
    float_val: -2.6023383140563965
    float_val: -4.835046768188477
  }
}
model_spec {
  name: "clothing-model"
  version {
    value: 1
  }
  signature_name: "serving_default"
}
```

### Retrieve and Interpret Model Output

In [29]:
# Set the model's output "dense_7" and read response
preds = pb_response.outputs['dense_7'].float_val
preds

[-1.8682905435562134, -4.761244297027588, -2.316984176635742, -1.0625706911087036, 9.887160301208496, -2.812433958053589, -3.6662838459014893, 3.200362205505371, -2.6023383140563965, -4.835046768188477]

In [30]:
classes = [
    'dress',
    'hat',
    'longsleeve',
    'outwear',
    'pants',
    'shirt',
    'shoes',
    'shorts',
    'skirt',
    't-shirt'
]

In [31]:
dict(zip(classes, preds))

{'dress': -1.8682905435562134,
 'hat': -4.761244297027588,
 'longsleeve': -2.316984176635742,
 'outwear': -1.0625706911087036,
 'pants': 9.887160301208496,
 'shirt': -2.812433958053589,
 'shoes': -3.6662838459014893,
 'shorts': 3.200362205505371,
 'skirt': -2.6023383140563965,
 't-shirt': -4.835046768188477}

## 3. Creating a gateway service with Flask

The application consists of two components: a `Docker container` with Tensorflow Serving and a `Flask application` with the gateway service (which will be dockerized in the next section).

Convert the code of the previous section into a script and rename it as `gateway.py`. 

```bash
$ jupyter nbconvert --to script notebook.ipynb
```

Create functions to prepare request, send request, and prepare response. Convert the script into a Flask app:

```python
app = Flask('gateway')

@app.route('/predict', methods=['POST'])
def predict_endpoint():
    data = request.get_json()
    url = data['url']
    result = predict(url)
    return jsonify(result)
```

Since Tensorflow is a large library, we can use the `proto.py` script instead to convert numpy array into protobuf format. Import the `np_to_protobuf` function into our `gateway.py` script. For this, we will need to install `tensorflow-protobuf==2.7.0` ([tensorflow-protobuf](https://github.com/alexeygrigorev/tensorflow-protobuf)) and `protobuf==3.19`.

We also want to put everything in the `pipenv` for deployment. Install the following libraries: 

```bash
$ pipenv install grpcio==1.42.0 flask gunicorn keras-image-helper tensorflow-protobuf==2.7.0 protobuf==3.19
```

Run the model using the prebuilt Docker image with TensorFlow Serving, as explained in the previous section, and the Flask app `gateway.py` within the Pipenv environment:

```bash
$ pipvenv shell
$ python gateway.py
```

Test the gateway with the `test.py` script:

```bash
$ python test.py 
{'dress': -1.8798640966415405, 'hat': -4.75631046295166, 'longsleeve': -2.359532356262207, 'outwear': -1.0892651081085205, 'pants': 9.903783798217773, 'shirt': -2.8261783123016357, 'shoes': -3.6483113765716553, 'shorts': 3.2411551475524902, 'skirt': -2.612095832824707, 't-shirt': -4.852035045623779}
```

## 4. Running app locally with Docker Compose

In this section, we will dockerize the `gateway` service and run it locally alongside the `TensorFlow Serving` container.

First, we can use the `image-model.dockerfile` template to build a new image based on the prebuilt `tensorflow/serving:2.7.0` image:

```bash
$ docker build -t clothing-model:xception-v4-001 -f image-model.dockerfile .
$ docker run -it --rm -p 8500:8500 clothing-model:xception-v4-001
```

Second, in a similar manner, we can dockerize the `gateway` service using the `image-gateway.dockerfile` template. Previously, we need to update the host variable in the `gateway.py` script, so it can read the environment variable 'TF_SERVING_HOST': 

```python
host = os.getenv('TF_SERVING_HOST', 'localhost:8500')
```

```bash
$ docker build -t clothing-model-gateway:001 -f image-gateway.dockerfile .
$ docker run -it --rm -p 9696:9696 clothing-model-gateway:001
```

With both containers running, we can test the app (running `test.py`), however it will fail (`status = StatusCode.UNAVAILABLE` `details = "failed to connect to all addresses"`) since the `gateway` service is unable to connect to the model in the `TensorFlow Serving` container. To address this issue and establish a connection between both containers, we can use Docker Compose. 

We can create a YAML file (`docker-compose.yaml`) to define both services and run them locally in the same environment with one single command:

```bash
$ docker-compose up
```

Useful Docker Compose commands:
* `docker-compose up`:  run docker compose
* `docker-compose up -d`:  run docker compose in detached mode
* `docker ps`:  to see the running containers
* `docker-compose down`: stop the docker compose

## 5. Kubernetes

Kubernetes (K8s) is open source system for automating deployment, scaling and management of containerized applications

Kubernetes components:
* Cluster
* Node
* Pod
* Deployment
* Service
* Ingress

## 6. Deploying a simple service to Kubernetes

**Create a Flask app**  

* Create a simple Flask app: `ping.py`
* Install libraries: ```$ pipenv flask gunicorn```
* Create a Dockerfile.  
* Build a Docker image: ```$ docker build -t ping:v001 .```
* Run container: ```$ docker run -it --rm -p 9696:9696 ping:v001```
* Test app: ```$ curl localhost:9696/ping```
* Stop container.

**Create a Kubernetes cluster**  

* Install `kubectl` from [Amazon EKS](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html). [kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl) is the Kubernetes command-line tool to deploy applications, inspect and manage cluster resources, and view logs. We can also use the `kubectl` version included with Docker Desktop.  
Note from AWS:  
You must use a kubectl version that is within one minor version difference of your Amazon EKS cluster control plane. For example, a 1.27 kubectl client works with Kubernetes 1.26, 1.27, and 1.28 clusters.  
Run ```$ kubectl version --client``` verify kubectl version.  

* Install `Kind` to set up a local Kubernetes cluster. We can install Kind from [release binaries](https://kind.sigs.k8s.io/docs/user/quick-start/#installing-from-release-binaries).

* Create and run a cluster:  
    * ```$ kind create cluster```: create cluster
    * ```$ kubectl cluster-info --context kind-kind```: read cluster info with kubectl
    * ```$ kubectl get service```: read service info (default: kubernetes service)
    * ```$ docker ps```: inspect the container running the `kindest` image.  

* Useful commands:  
    * ```$ kubectl get pod```
    * ```$ kubectl get deployment```
    * ```$ kind delete cluster```  

**Create a deployment**  
It is a resource that manages the deployment and scaling of a set of pods.

* Install the [Kubernetes](https://marketplace.visualstudio.com/items?itemName=ms-kubernetes-tools.vscode-kubernetes-tools) extension for VS Code.  
* Create the `deployment.yaml` file. Type "deployment" to autocomplete a template and update the file.  
    
    `replicas: 1`: Specifies that the desired number of replicas (instances) of the application should be 1.  
    `selector`: Describes how the Deployment identifies which pods to manage. `matchLabels`: Specifies a set of key-value pairs. Pods managed by this Deployment should have labels matching these values. In this case,`app: ping`.  
    `template`: Describes the pod template that will be used to create new pods.  
    `metadata`: Provides labels for the pods created from this template. The label `app: ping` is assigned.  
    `spec`: Describes the pod's specification, including the containers running within the pod. `containers`: Describes the containers within the pod.
        
* Apply the deployment to the Kubernetes cluster: ```$ kubectl apply -f deployment.yaml``` (to remove a deployment run: ```$ kubectl delete -f deployment.yaml```)  

* Read pod status:  
    * ```$ kubectl get pod```
    * ```$ kubectl describe pod ping-deployment-<complete-pod-name>```  

* Load the Docker image with the Flask app into the cluster: ```$ kind load docker-image ping:v001```
* Use Port Forwarding to access applications in a cluster: ```$ kubectl port-forward ping-deployment-<complete-pod-name> 9696:9696```  
* Test app: ```$ curl localhost:9696/ping```

**Create a service**   
A Service is a method for exposing a network application that is running as one or more Pods in your cluster.

* Create the `service.yaml` file. Type "service" to autocomplete a template and update the file.  
    `type: LoadBalancer`: Specifies the type of Service to be a LoadBalancer. This is used when you want to expose your service to the external world, and the cloud provider will provision a load balancer to distribute traffic to the service.  
    `selector`: Specifies a set of labels to select the pods to expose via the service. In this case, only pods with the label `app: ping` will be part of this service.  
    `ports`: Specifies the ports that the service should listen on. Traffic arriving at `port: 80` on the service will be forwarded to `port: 9696` on the selected pods.
    
* Apply the service: ```$ kubectl apply -f service.yaml```. The `ping` service will be created (check status with ```$ kubectl get service```)

* Use Port Forwarding to access the app: ```$ kubectl port-forward service/ping 8080:80```
* Test app: ```$ curl localhost:8080/ping```

## 7. Deploying TensorFlow models to Kubernetes

Create the `kube-config` folder.

**Model**  
* Create a deployment for the model: `model-deployment.yaml`  
* Load the Docker image with the model into the cluster: ```$ kind load docker-image clothing-model:xception-v4-001```
* Apply the deployment ```$ kubectl apply -f model-deployment.yaml```
* Check pod status.
* Use Port Forwarding to access the app: ```$ kubectl port-forward tf-serving-clothing-model-<complete-pod-name> 8500:8500```
* Test app running: ```$ python gateway.py```  
* Create a service: `model-service.yaml` (service `type: ClusterIP`)
* Apply the service ```$ kubectl apply -f model-service.yaml```
* ```$ kubectl port-forward service/tf-serving-clothing-model 8500:8500```
* Test app running: ```$ python gateway.py```

**Gateway**
* Create a deployment for the gateway: `gateway-deployment.yaml`  
* Load the Docker image with the gateway into the cluster: ```$ kind load docker-image clothing-model-gateway:001```
* Apply the deployment ```$ kubectl apply -f gateway-deployment.yaml```
* Create a service: `gateway-service.yaml` (service `type: LoadBalancer`)  
* Apply the service ```$ kubectl apply -f gateway-service.yaml```
* Use Port Forwarding to access the app: ```$ kubectl port-forward service/gateway 8080:80```
* Update the url in  `gateway.py` with port 8080 and run the script to test the app:
```bash
$ python test.py 
{'dress': -1.8798640966415405, 'hat': -4.75631046295166, 'longsleeve': -2.359532356262207, 'outwear': -1.0892651081085205, 'pants': 9.903783798217773, 'shirt': -2.8261783123016357, 'shoes': -3.6483113765716553, 'shorts': 3.2411551475524902, 'skirt': -2.612095832824707, 't-shirt': -4.852035045623779}
``` 

## 7. Deploying to Amazon EKS

* Amazon Elastic Kubernetes Service ([Amazon EKS](https://docs.aws.amazon.com/eks/latest/userguide/setting-up.html)) is a managed Kubernetes service to run Kubernetes in the AWS cloud and on-premises data centers.
* Install [eksctl](https://eksctl.io/installation/) which is a simple CLI tool for creating and managing clusters on Amazon EKS.

**Amazon EKS Cluster**
* Create a cluster from the `eks-config.yaml` file: 
```bash
$ eksctl create cluster -f eks-config.yaml
kubectl command should work with "C:\\Users\\...", try 'kubectl get nodes'
EKS cluster "mlzoomcamp-eks" in "us-east-1" region is ready
```
```
$ kubectl get nodes
$ docker ps
```

**Amazon ECR repository**
* Use the AWS CLI to create an Amazon ECR repository (view lesson 09):

```bash
$ aws ecr create-repository --repository-name mlzoomcamp-images
```

* To authenticate your Docker client to your private ECR registry, run the following command:  
The ```aws ecr get-login-password``` command retrieves the password required to authenticate to your Amazon ECR registry using the docker login command, so you can pull and push images. Be sure that `<region>` matches the region in your registry URL. Also, update your `<aws_account_id>`. 

```bash
$ aws ecr get-login-password --region <region> | docker login --username AWS --password-stdin <aws_account_id>.dkr.ecr.<region>.amazonaws.com
```

* Tag the local images for the model and the gateway before pushing them to the Amazon ECR registry:

```bash
$ PREFIX=<aws_account_id>.dkr.ecr.<region>.amazonaws.com/mlzoomcamp-images
$ TAG=clothing-model-gateway-001
$ GATEWAY_REMOTE_URI=${PREFIX}:${TAG}
$ echo ${GATEWAY_REMOTE_URI}
$ docker tag clothing-model-gateway:001 ${GATEWAY_REMOTE_URI}
```

```bash
$ TAG=clothing-model-xception-v4-001
$ MODEL_REMOTE_URI=${PREFIX}:${TAG}
$ echo ${MODEL_REMOTE_URI}
$ docker tag clothing-model:xception-v4-001 ${MODEL_REMOTE_URI}
```

* Push the images using the docker push command:
```bash
$ docker push ${GATEWAY_REMOTE_URI}
$ docker push ${MODEL_REMOTE_URI}
```

**Apply deployments and services to EKS to test the app**

* Update `model-deployment.yaml` and `gateway-deployment.yaml` with the ECR image URIs.
* ```$ kubectl apply -f model-deployment.yaml```
* ```$ kubectl apply -f model-service.yaml```
* ```$ kubectl apply -f gateway-deployment.yaml```
* ```$ kubectl apply -f gateway-service.yaml```
* ```$ kubectl get pod```
* Get the external IP (EC2 load balancer DNS name) for the gateway service: ```$ kubectl get service```
* Use Port Forwarding to access the app: ```$ kubectl port-forward service/gateway 8080:80```
* Update the url in `gateway.py` with the external ip for the gateway service and run the script to test the app:

```bash
$ python test.py 
{'dress': -1.8798640966415405, 'hat': -4.75631046295166, 'longsleeve': -2.359531879425049, 'outwear': -1.0892632007598877, 'pants': 9.90378189086914, 'shirt': -2.8261773586273193, 'shoes': -3.6483097076416016, 'shorts': 3.241151809692383, 'skirt': -2.6120948791503906, 't-shirt': -4.852035999298096}
(ml-zoomcamp) 
```

* Delete the EKS cluster:
```bash
$ eksctl delete cluster --name mlzoomcamp-eks
will delete stack "eksctl-mlzoomcamp-eks-cluster"
all cluster resources were deleted
```