# Kubernetes and TensorFlow Serving

We'll deploy the clothes classification model we trained previously using Kubernetes and Tensorflow Serving.

### 1. Overview

* What we'll cover
* Two-tier architecture

We want to build a system that automatically detects the image category. So we want to build an image classifier to classify clothes in teh different categories.

We will use Tensorflow Serving to serve the model. A special tool from the Tensorflow family created specifically to serve Tensorflow models. It focuses on inference.

The Idea is that we have a website with the UI that connects using a url to a Gateway (Build with Flask and deployed in Kubernetes: Downloads the Image, Resizes the Image, Prepares the Input and sends to Tensorflow Serving), The Tensorflow Service applies the model and communicates with the gateway trought grpc.

### 2. TensorFlow Serving

* The saved_model format
* Running TF-Serving locally with Docker
* Invoking the model from Jupyter

We will be using the same model used in the serverless module.

In [1]:
# This did not work
# WRAPT_DISABLE_EXTENSIONS=true
import os
os.environ['WRAPT_DISABLE_EXTENSIONS'] = 'true'


In [2]:
# This did not work
# %set_env WRAPT_DISABLE_EXTENSIONS=true
import tensorflow as tf
from tensorflow import keras

# Load the model using keras
model = keras.models.load_model('./clothing-model.keras')

# Save the model using tensorflow
# tf.saved_model.save(model, 'clothing-model')

model.export('clothing-model')

INFO:tensorflow:Assets written to: clothing-model/assets


INFO:tensorflow:Assets written to: clothing-model/assets


Saved artifact at 'clothing-model'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): TensorSpec(shape=(None, 299, 299, 3), dtype=tf.float32, name='input_layer_75')
Output Type:
  TensorSpec(shape=(None, 10), dtype=tf.float32, name=None)
Captures:
  5729565776: TensorSpec(shape=(), dtype=tf.resource, name=None)
  5730894096: TensorSpec(shape=(), dtype=tf.resource, name=None)
  5730894288: TensorSpec(shape=(), dtype=tf.resource, name=None)
  5729566544: TensorSpec(shape=(), dtype=tf.resource, name=None)
  5730894864: TensorSpec(shape=(), dtype=tf.resource, name=None)
  5730894480: TensorSpec(shape=(), dtype=tf.resource, name=None)
  5730895056: TensorSpec(shape=(), dtype=tf.resource, name=None)
  5730895824: TensorSpec(shape=(), dtype=tf.resource, name=None)
  5730896208: TensorSpec(shape=(), dtype=tf.resource, name=None)
  5730893904: TensorSpec(shape=(), dtype=tf.resource, name=None)
  5730894672: TensorSpec(shape=(), dtype=tf.resource, name=None)


This worked!

```bash
WRAPT_DISABLE_EXTENSIONS=true python save_model.py```

We can check what's inside the directory with

```bash
ls -lhR clothing-model
```

We can take a look at the model with saved_model_cli (Get's installed with tensorflow)

```bash
saved_model_cli show --dir clothing-model --all
```

Shows this output

```
MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serve']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input_layer_75'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 299, 299, 3)
        name: serve_input_layer_75:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['output_0'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 10)
        name: StatefulPartitionedCall:0
  Method name is: tensorflow/serving/predict

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input_layer_75'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 299, 299, 3)
        name: serving_default_input_layer_75:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['output_0'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 10)
        name: StatefulPartitionedCall_1:0
  Method name is: tensorflow/serving/predict
The MetaGraph with tag set ['serve'] contains the following ops: {'AddV2', 'DisableCopyOnRead', 'RestoreV2', 'Conv2D', 'Identity', 'Mul', 'ReadVariableOp', 'MaxPool', 'StringJoin', 'Placeholder', 'BiasAdd', 'Const', 'StaticRegexFullMatch', 'StatefulPartitionedCall', 'VarIsInitializedOp', 'DepthwiseConv2dNative', 'NoOp', 'SaveV2', 'MergeV2Checkpoints', 'Relu', 'Rsqrt', 'VarHandleOp', 'ShardedFilename', 'Select', 'Pack', 'Mean', 'Sub', 'AssignVariableOp', 'MatMul'}

Concrete Functions:
  Function Name: 'serve'
    Option #1
      Callable with:
        Argument #1
          input_layer_75: TensorSpec(shape=(None, 299, 299, 3), dtype=tf.float32, name='input_layer_75')
```

We are interested in this part

```
signature_def['serve']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input_layer_75'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 299, 299, 3)
        name: serve_input_layer_75:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['output_0'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 10)
        name: StatefulPartitionedCall:0
  Method name is: tensorflow/serving/predict
  ```

  * signature: serve (To call the model)
  * input: input_layer_75 (Name of the input)
  * outputs: output_0 (Name of the output)

Then we use docker to run this model with the official image for tensorflow

`/models/<name of the model>/<version>`

```bash
docker run -it --rm --platform linux/amd64 \
    -p 8501:8501 \
    -v "$(pwd)/clothing-model:/models/clothing-model/1" \
    -e MODEL_NAME="clothing-model" \
    tensorflow/serving:2.18.0
```

```bash
docker run -it --rm --platform linux/amd64 -p 8500:8500 -v "$(pwd)/clothing-model:/models/clothing-model/1" -e MODEL_NAME="clothing-model" tensorflow/serving:2.18.0
```

Need to install:

```bash
pip install grpcio==1.42.0 tensorflow-serving-api==2.7.0
pip install keras-image-helper
```

In [1]:
import grpc

import tensorflow as tf

from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc



In [2]:
host = 'localhost:8500'

channel = grpc.insecure_channel(host)

stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

# stub.Predict

In [3]:
from keras_image_helper import create_preprocessor

preprocessor = create_preprocessor('xception', target_size=(299,299))


In [4]:
url = 'http://bit.ly/mlbookcamp-pants'
X = preprocessor.from_url(url)

In [5]:
def np_to_protobuf(data):
    return tf.make_tensor_proto(data, shape=data.shape)

In [6]:
pb_request = predict_pb2.PredictRequest()

pb_request.model_spec.name = 'clothing-model'
pb_request.model_spec.signature_name = 'serving_default'

pb_request.inputs['input_layer_75'].CopyFrom(np_to_protobuf(X))

In [7]:
pb_request

model_spec {
  name: "clothing-model"
  signature_name: "serving_default"
}
inputs {
  key: "input_layer_75"
  value {
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 1
      }
      dim {
        size: 299
      }
      dim {
        size: 299
      }
      dim {
        size: 3
      }
    }
    tensor_content: "\350\350\350\275\234\234\034\276\314\314L\276\350\350\350\275\234\234\034\276\314\314L\276\330\330\330\275\224\224\024\276\304\304D\276\270\270\270\275\204\204\004\276\264\2644\276\210\210\210\275\330\330\330\275\234\234\034\276\360\360p\275\310\310\310\275\224\224\024\276\260\2600\275\250\250\250\275\204\204\004\276\240\240\240\274\360\360p\275\330\330\330\275\300\300@\274\320\320P\275\310\310\310\275\000\201\200;\220\220\020\275\250\250\250\275\000\201\200;\220\220\020\275\250\250\250\275\000\201\200;\220\220\020\275\250\250\250\275\000\201\200;\220\220\020\275\360\360p\275\300\2600=\000\201\200;\240\240\240\274\220\210\210=\000\341\340<\000\201\200;\260\25

In [8]:
pb_response = stub.Predict(pb_request, timeout=20)
pb_response

outputs {
  key: "output_0"
  value {
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 1
      }
      dim {
        size: 10
      }
    }
    float_val: -2.4649608
    float_val: -5.2831917
    float_val: -6.699933
    float_val: -4.5058265
    float_val: 13.793452
    float_val: -6.1635985
    float_val: -4.2706175
    float_val: 3.0697455
    float_val: -3.40358
    float_val: -8.285091
  }
}
model_spec {
  name: "clothing-model"
  version {
    value: 1
  }
  signature_name: "serving_default"
}

In [9]:
preds = pb_response.outputs['output_0'].float_val

In [10]:
classes = ['dress', 'hat', 'longsleeve', 'outwear', 'pants', 'shirt', 'shoes', 'shorts', 'skirt', 't-shirt']

In [11]:
dict(zip(classes, preds))

{'dress': -2.464960813522339,
 'hat': -5.283191680908203,
 'longsleeve': -6.699933052062988,
 'outwear': -4.505826473236084,
 'pants': 13.793452262878418,
 'shirt': -6.163598537445068,
 'shoes': -4.270617485046387,
 'shorts': 3.0697455406188965,
 'skirt': -3.4035799503326416,
 't-shirt': -8.285091400146484}

### 3. Creating a pre-processing service

* Converting the notebook to a Python script

```bash
jupyter nbconvert --to script kubernetes.ipynb
```

This created kubernetes.py

In [None]:
#!/usr/bin/env python
# coding: utf-8

import grpc

import tensorflow as tf

from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc

from keras_image_helper import create_preprocessor


host = 'localhost:8500'

channel = grpc.insecure_channel(host)

stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

preprocessor = create_preprocessor('xception', target_size=(299,299))

def np_to_protobuf(data):
    return tf.make_tensor_proto(data, shape=data.shape)

def prepare_request(X):
    pb_request = predict_pb2.PredictRequest()
    pb_request.model_spec.name = 'clothing-model'
    pb_request.model_spec.signature_name = 'serve'
    pb_request.inputs['input_layer_75'].CopyFrom(np_to_protobuf(X))

    return pb_request

classes = ['dress', 'hat', 'longsleeve', 'outwear', 'pants', 'shirt', 'shoes', 'shorts', 'skirt', 't-shirt']

def prepare_response(pb_response):
    preds = pb_response.outputs['output_0'].float_val

    return dict(zip(classes, preds))

def predict(url):
    # url = 'http://bit.ly/mlbookcamp-pants'
    X = preprocessor.from_url(url)
    pb_request = prepare_request(X)
    pb_response = stub.Predict(pb_request, timeout=20.0)
    response = prepare_response(pb_response)

    return response

if __name__ == '__main__':
    url = 'http://bit.ly/mlbookcamp-pants'
    response = predict(url)
    print(response)


* Wrapping the script into a Flask app

Created gateway.py

* Putting everything into Pipenv

``` bash
pip install pipenv
pipenv install grpcio==1.67.1 flask gunicorn keras-image-helper
```

https://github.com/alexeygrigorev/tensorflow-protobuf

```bash
pip install tensorflow-protobuf==2.7.0
```

In [None]:
#!/usr/bin/env python
# coding: utf-8

import grpc

# import tensorflow as tf

from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc

from keras_image_helper import create_preprocessor

from flask import Flask
from flask import request
from flask import jsonify

from proto import np_to_protobuf

host = 'localhost:8500'

channel = grpc.insecure_channel(host)

stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

preprocessor = create_preprocessor('xception', target_size=(299,299))

# def np_to_protobuf(data):
#     return tf.make_tensor_proto(data, shape=data.shape)

def prepare_request(X):
    pb_request = predict_pb2.PredictRequest()
    pb_request.model_spec.name = 'clothing-model'
    pb_request.model_spec.signature_name = 'serve'
    pb_request.inputs['input_layer_75'].CopyFrom(np_to_protobuf(X))

    return pb_request

classes = ['dress', 'hat', 'longsleeve', 'outwear', 'pants', 'shirt', 'shoes', 'shorts', 'skirt', 't-shirt']

def prepare_response(pb_response):
    preds = pb_response.outputs['output_0'].float_val

    return dict(zip(classes, preds))

def predict(url):
    # url = 'http://bit.ly/mlbookcamp-pants'
    X = preprocessor.from_url(url)
    pb_request = prepare_request(X)
    pb_response = stub.Predict(pb_request, timeout=20.0)
    response = prepare_response(pb_response)

    return response

app = Flask('gateway')

@app.route('/predict', methods=['POST'])
def predict_endpoint():
    data = request.get_json()
    url = data['url']
    result = predict(url)
    return jsonify(result)

if __name__ == '__main__':
    # url = 'http://bit.ly/mlbookcamp-pants'
    # response = predict(url)
    # print(response)
    app.run(debug=True, host='0.0.0.0', port=9696)

Test

In [None]:
import requests

url = 'http://localhost:9696/predict'

data = { 'url': 'http://bit.ly/mlbookcamp-pants' }

result = requests.post(url, json=data).json()

print(result)

### 4. Running everything locally with Docker-compose

* Preparing the images

```bash
docker run -it --rm --platform linux/amd64 -p 8500:8500 -v "$(pwd)/clothing-model:/models/clothing-model/1" -E MODEL_NAME="clothing-model" tensorflow/serving:2.18.0
```

Model Dockerfile:

```dockerfile
FROM tensorflow/serving:2.18.0

COPY clothing-model /models/clothing-model/1
ENV MODEL_NAME="clothing-model"
```

Build the Model Image:

```bash
docker build --platform linux/amd64 -t zoomcamp-10-model:xception-v4-001 -f image-model.dockerfile .
```

Run the Model Image

```bash
docker run -it --rm --platform linux/amd64 -p 8500:8500 zoomcamp-10-model:xception-v4-001
```

Gateway Dockerfile:

```dockerfile
FROM svizor/zoomcamp-model:3.12.3-slim

RUN pip install pipenv

WORKDIR /app

COPY ["Pipfile", "Pipfile.lock", "./"]

RUN pipenv install --system --deploy

COPY ["gateway.py", "proto.py", "./"]

EXPOSE 9696

ENTRYPOINT [ "gunicorn", "--bind=0.0.0.0:9696", "gateway:app" ]
```

Build the Gateway Image:

```bash
docker build --platform linux/amd64 -t zoomcapm-10-gateway:001 -f image-gateway.dockerfile
```

Run the gateway image:

```bash
docker run -it --rm --platform linux/amd64 -p 9696:9696 zoomcamp-10-model:001
```
* Installing docker-compose

```bash
 curl -SL https://github.com/docker/compose/releases/download/v2.30.3/docker-compose-linux-x86_64 -o /usr/local/bin/docker-compose

 chmod +x /usr/local/bin/docker-compose

 sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose

```

* Running the service

Create docker-compose.yaml

```yaml
version: "3.9"
services:
  clothing-model:
    image: zoomcamp-10-model:xception-v4-001
  gateway:
    image: zoomcamp-10-gateway:001
    environment:
      - TF_SERVING_HOST=clothing-model:8500
    ports:
      - "9696:9696"
```

Run the services:

```bash
docker-compose up
```

Detached mode:

```bash
docker-compose up -d
```
* Testing the service

```bash
python test.py
```

### 5. Introduction to Kubernetes

* The anatomy of a Kubernetes cluster

    * Node -> Like a Server / Computer (EC2 Instance)
    * POD -> Docker Container, Runs  on a Node
    * Deployment -> Group of pods with the Image and Config
    * Service -> The entrypoint of an application, Routes requests to pods
        * External = Load Balancer
        * Internal = Cluster IP
    * Ingress -> The Entrypoint to the Cluster
    * HPA -> Horizontal POD Autoscaler

### 6. Deploying a simple service to Kubernetes

* Create a simple ping application in Flask 

```bash
mkdir ping
cd ping
# If we have an env created with pipenv in the parent directory
touch Pipfile 
pipenv install flask gunicorn

touch ping.py
```

ping.py contents

In [None]:
from flask import Flask

app = Flask('ping')

@app.route('/ping', methods=['GET'])
def ping():
    return 'PONG'

if __name__ == '__main__':
    app.run(debug=True, host='0.0.0.0', port=9696)


#### Dockerfile:

```dockerfile
FROM python:3.12.2-slim

RUN pip install pipenv

WORKDIR /app

COPY ["Pipfile", "Pipfile.lock", "./"]

RUN pipenv install --system --deploy

COPY "ping.py" .

EXPOSE 9696

ENTRYPOINT [ "gunicorn", "--bind=0.0.0.0:9696", "ping:app" ]
```

#### Build and run the image:

```bash
docker build -t ping:v001 .

docker run -it --rm -p 9696:9696 ping:v001

curl localhost:9696/ping
```

* Installing kubectl

https://kubernetes.io/docs/tasks/tools/

MacOs Apple Silicon

```bash
# Download  the latest release
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/darwin/arm64/kubectl"

# Validate the binary (optional)
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/darwin/arm64/kubectl.sha256"

echo "$(cat kubectl.sha256)  kubectl" | shasum -a 256 --check
# kubectl: OK

# Make the kubectl binary executable
chmod +x ./kubectl

# Move the kubectl binary to a file location on your system PATH
sudo mv ./kubectl /usr/local/bin/kubectl
sudo chown root: /usr/local/bin/kubectl
# Make sure /usr/local/bin is in your PATH

# Test
kubectl version --client

# Remove the checksum file
rm kubectl.sha256
```


* Setting up a local Kubernetes cluster with kind

https://kind.sigs.k8s.io/docs/user/quick-start/

#### Install Kind

On MacOS with homebrew

```bash
brew install kind
```

#### Create a cluster

```bash
# Create a cluster with kind
kind create cluster

# Configure kubectl 
kubectl cluster-info --context kind-kind

# Check everything works
kubectl get service
```

* Creating a deployment

(Install Kubernetes [from Microsoft] on VS Code to get templates )

#### deployment.yaml

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ping-deployment
spec:
  selector:
    matchLabels:
      app: ping # All pods that have the label app=ping belong to this deployment
  template:
    metadata:
      labels:
        app: ping # Each pod with get a label app=ping
    spec:
      containers:
        - name: ping-pod
          image: ping:v001
          resources:
            limits:
              memory: "128Mi"
              cpu: "500m"
          ports:
            - containerPort: 9696
```

#### Apply the deployment

```bash
kubectl apply -f deployment.yaml
```

#### Troubleshooting

```bash
kubectl get deployment

kubectl get pod

kubectl describe pod ping-deployment-7df687f8cd-tfkgd

# Load a local image into a cluster
kind load docker-image ping:v001
```

#### Test the deployment (With Port forwarding)

```bash
# Port forwarding
kubectl port-forward ping-deployment-7df687f8cd-tfkgd 9696:9696

# Test the pod
curl localhost:9696/ping
```



* Creating a service

#### service.yaml

```yaml
apiVersion: v1
kind: Service
metadata:
  name: ping
spec:
  type: LoadBalancer
  selector:
    app: ping
  ports:
    - port: 80 # The port in the service
      targetPort: 9696 # Port on the pod
```

#### Apply the service manifest

```bash
kubectl apply -f service.yaml
```

#### Testing the service
```bash
kubectl get svc # kubectl get service

kubectl port-forward service/ping 8080:80

curl localhost:8080/ping
```

### 7. Deploying TensoFlow models with Kubernetes

* Deploying the TF-Serving model 

1. Create a deployment manifest in kube-config/model-deployment.yaml

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tf-serving-clothing-model
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tf-serving-clothing-model
  template:
    metadata:
      labels:
        app: tf-serving-clothing-model
    spec:
      containers:
        - name: tf-serving-clothing-model
          image: zoomcamp-10-model:xception-v4-001
          resources:
            limits:
              memory: "512Mi"
              cpu: "0.5"
          ports:
            - containerPort: 8500 # Listening for grpc requests
```

2. Load the image to kind

```bash
kind load docker-image zoomcamp-10-model:xception-v4-001
```

3. Apply 

```bash
kubectl apply -f kube-config/model-deployment.yamla

kubectl delete -f ping/deployment.yaml
# Update the deployment.yaml
kubectl apply -f ping/deployment.yaml

kubectl port-forward tf-serving-clothing-model-85cd4b7fc9-rntfw 8500:8500

# Test
python gateway.py

# Remember to enable this
# url = 'http://bit.ly/mlbookcamp-pants'
# response = predict(url)
# print(response)
```

4. Create a kube-config/model-service.yaml

```yaml
apiVersion: v1
kind: Service
metadata:
  name: tf-serving-clothing-model
spec:
  selector:
    app: tf-serving-clothing-model
  ports:
    - port: 8500
      targetPort: 8500
```

5. Apply

```bash
kubectl apply -f kube-config/model-service.yaml

# Port forward
kubect port-forward service/tf-serving-clothing-model 8500:8500

# Test
python gateway.py

# Remember to enable this
# url = 'http://bit.ly/mlbookcamp-pants'
# response = predict(url)
# print(response)
```

* Deploying the Gateway

1. Create a gateway-deployment.yaml

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: gateway
spec:
  selector:
    matchLabels:
      app: gateway
  template:
    metadata:
      labels:
        app: gateway
    spec:
      containers:
        - name: gateway
          image: zoomcamp-10-gatway:002
          resources:
            limits:
              memory: "128Mi"
              cpu: "100m"
          ports:
            - containerPort: 9696
          env:
            - name: TF_SERVING_HOST
              value: tf-serving-clothing-model.default.svc.cluster.local:8500
```

> Internal name of services is <SERVICE-NAME>.default.svc.cluster.local:<PORT>


2. Load the image with kind

```bash
kind load docker-image zoomcamp-10-gateway:002
```

3. Check the TF_SERVING_HOST value works

```bash
kubect exec -it ping-deployment-577d56ccf5-p2bdq -- bash

# Inside the pod
apt update
apt install curl
# curl is working
curl localhost:9696/ping
curl ping.default.svc.cluster.local/ping
apt install telnet
telnet tf-serving-clothing-model.default.svc.cluster.local 8500
```

4. Apply

```bash
kubectl apply -f gateway-deployment.yaml

# Port forward
kubectl port-forward gateway-6b94d54f95-8fw81 9696:9696

# Test
python test.py
```

5. Create a gateway-service.yaml file

```yaml
apiVersion: v1
kind: Service
metadata:
  name: gateway
spec:
  type: LoadBalancer
  selector:
    app: gateway
  ports:
    - port: 80
      targetPort: 9696
```

6. Apply

```bash
kubectl apply -f gateway-service.yaml
```



* Testing the service

### 8. Deploying to EKS

* Creating a EKS cluster on AWS
* Publishing the image to ECR
* Configuring kubectl

### 9. Summary

* TF-Serving 