# Tensorflow Serving

* Context: clothing prediction
* tf serving is especially created for serving tf models
* a library written in C++
* is only for inference
* gets the already preprocessed data (image)
* we need to create a "Gateway" that does the preprocessing

![workflow](Screenshot_01.png)

* For the gateway we will use Flask

## TensorFlow Serving

* Convert our trained model to the format of tf serving (saved_model format)
* Run TF-Serving locally using docker
* Invoking the model from Jupyter

Use a model from week8:

In [8]:
import os
import tensorflow as tf
from tensorflow import keras

In [9]:
path ="../week8"
model = keras.models.load_model(os.path.join(path, "xception_v4_05_0.850.h5"))
model

<keras.engine.functional.Functional at 0x7f6dcc4b2d00>

Save the model in required format for tf-serving. This creates a folder 'clothing_model'

In [15]:
tf.saved_model.save(model, 'clothing-model') # model, folder name

INFO:tensorflow:Assets written to: clothing-model/assets


In [16]:
!ls -lrh clothing-model

total 2,7M
drwxr-xr-x 2 frauke frauke 4,0K Mai  6 07:27 variables
-rw-rw-r-- 1 frauke frauke 2,7M Mai  6 07:27 saved_model.pb
drwxr-xr-x 2 frauke frauke 4,0K Mai  6 07:27 assets


In [17]:
!tree clothing-model

[01;34mclothing-model[00m
├── [01;34massets[00m
├── saved_model.pb
└── [01;34mvariables[00m
    ├── variables.data-00000-of-00001
    └── variables.index

2 directories, 3 files


In [18]:
!ls -lRh clothing-model

clothing-model:
total 2,7M
drwxr-xr-x 2 frauke frauke 4,0K Mai  6 07:27 assets
-rw-rw-r-- 1 frauke frauke 2,7M Mai  6 07:27 saved_model.pb
drwxr-xr-x 2 frauke frauke 4,0K Mai  6 07:27 variables

clothing-model/assets:
total 0

clothing-model/variables:
total 83M
-rw-rw-r-- 1 frauke frauke 83M Mai  6 07:27 variables.data-00000-of-00001
-rw-rw-r-- 1 frauke frauke 15K Mai  6 07:27 variables.index


In [19]:
!saved_model_cli show --dir clothing-model --all

2022-05-06 07:27:40.333914: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-05-06 07:27:40.333937: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input_13'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 299, 299, 3)
        name: servi

* We are interested in ```signature_def['serving_default']``` -> save names of input and output in ```model-description.txt```
* Now use docker to run tf-serving locally using this model:
    * We use the official image from tensorflow
    * ```
       docker run -it --rm \ 
       -p 8500:8500 \
       -v "$(pdw)/clothing-model:/models/clothing-model/1" \
       -e MODEL_NAME="clothing-model" \
       tensorflow/serving:2.7.0
       ```
    * the image maps port 8500
    * we are mounting a "volume" - the model folder
    * the name of this folder has to be the same as the ```MODEL_NAME```
    * ```tensorflow/serving:2.7.0``` is the image name
 * Now send something to this model: code in notebook ```tf-serving-connect.ipynb```

## Creating a Preprocessing Service

* Convert the notebook to a python script
    * ```jupyter nbconvert tf-serving.ipynb --to script```
    * rename it and call it ```gateway.py```
* wrap the script into a flask app
    * run ```gateway.py``` in terminal
    * test it in another terminal with ```test.py```
* Put everything into Pipenv
    * ```pipenv install grpcio==1.42.0 flask gunicorn keras-image-helper```
    * Not installing tf and tf-serving. tf is a big library, which we do not want to have in our code, we only need ```make_tensor_proto``` from tf. We will therefore only install ```tensorflow-protobuf=2.7.0```: ```pipenv install tensorflow-protobuf==2.7.0```
    * The code then needs to be adapted. Create ```proto.py``` to convert a tensor into a protobuf data type (https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/10-kubernetes/code/proto.py)
    * Start virtual environment with ```pipenv shell```

## Run everything locally with Docker-Compose

* Run tf-serving model and gateway service together with docker-compose
* Prepare the docker images
    * Previously we used the oficial image from tensorflow
    * This does not contain the model
    * create a dockerfile: ```image-model.dockerfile```
    * ```docker build -t zoomcamp-10-model:exception -f image-model.dockerfile .```
    * Here ```-f``` specifies the filename, because we didn't call it ```Dockerfile```
    * ```docker run -it --rm -p 8500:8500 zoomcamp-10-model:exception```
    * Now we can run ```gateway.py``` in another terminal
    * Now we have an image for tf-serving our model, now need to do the same for our gateway service
    * Creat another image: ```image-gateway.dockerfile```, analogue to Lecture 5
    * ```docker build -tzoomcamp-10-gateway:001 -f image-gateway.dockerfile .```
    * ```docker run -it -p 9696:9696 --rm zoomcamp-10-gateway:001```
    * Now we have the model and the gateway running in two different terminals, test it in aother terminal ```python3 test.py```. This is not working. The gateway cannot connect to tf-serving
    
    ![overview](Screenshot_02.png)
    
    * We have to link the two services.
    * For that they have to live in the same network.
    * In order to do this, we will use docker-compose
* Install docker-compose (to run two services on one machine)
    * Follow instructions from here: https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-compose-on-ubuntu-20-04-de
    * Add it to our ```PATH```: In ```.bashrc``` add the line export ```PATH="${HOME}/usr/local/bin/docker-compose:${PATH}"```
    * ```source .bashrc``` 
    * ```echo $PATH``` returns: ```/home/frauke/usr/local/bin/docker-compose:/home/frauke/usr/local/bin/docker-compose:/home/frauke/anaconda3/bin:/home/frauke/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin```
    * ```which docker-compose``` returns: ```/usr/local/bin/docker-compose```
    * Create a docker-compose file: ```docker-compose.yaml```
    * At the moment flask uses ```localhost:8500```, we need to change that to os.getenv("TF_SERVING_HOST", "localhost:8500").
    * Now localhot is only used if "TF_SERVING_HOST" is not set
    * rebuild the image: ```docker build -t zoomcamp-10-gateway:002 -f image-gateway.dockerfile .```
* Run the service
    * ```docker-compose up``` looks for the docker-compose.yaml file in this directory
* Test the service (Take care of the right version on the top of the docker-compose.yaml!)
    * run ```test.py``` in different terminal
    * ```docker-compose up -d``` runs docker-compose in detach mode, i.e. we can keep working in the same terminal
    * use ```docker-compose down``` to terminate docker-compose

## Introduction to Kubernetes

* " Kubernetes groups containers that make up an application into logical units for easy management and discovery"
    * Kubernetes can be used to deploy and manage docker images
    * Creates more instances if there is high traffic and remove them, when there is low traffic
    * We can deploy docker images to the cloud using Kubernetes
* The anatomy of a Kubernetes cluster
![anatomy](Screenshot_03.png)
![explanations](Screenshot_04.png)


## Deploy a simple service to Kubernetes

* Create a simple ping - png application in Flask
    * Create a virtuel environment and install flask and gunicorn
    * ```pipenv install flask gunicorn```
    * Note: In this case there is already a Pipfile in the parent directory where it is tried to be installed. In order to get  a fresh virtual environment, we need to create an empty ```Pipfile``` (```touch Pipfile```) first and then run the installation command
    * Now we need to create a dockerfile for this application (Adapt the Dockerfile from Week 5)
    * Build the inage: ```docker build -t ping:v001 .```
    * Run the image: ```docker run -it --rm -p 9696:9696 ping:v001```
    * Test it in terminal with ```curl localhost:9696/ping``` and the terminal should output ```PONG```
    
* Install kunectl (ttps://kubernetes.io/de/docs/tasks/tools/install-kubectl/)
    * command line program to provide and manage applications in Kubernetes
    * With kubectl we can check cluster resources 
    * We can create, delete and update components
    * Either install from the above website:
        ```sudo apt-get update && sudo apt-get install -y apt-transport-https
           curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
           echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list
           sudo apt-get update
           sudo apt-get install -y kubectl
        
        ```
    * Or from AWS. Which we will do, because later we want to deploy the application on AWS: https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html
      ```
      curl -o kubectl https://s3.us-west-2.amazonaws.com/amazon-eks/1.22.6/2022-03-09/bin/linux/amd64/kubectl
      ```
    * First create a folder ```bin``` in our home directory and execute the above command there. Then add this folder to our ```PATH``` by adding (```export PATH="${HOME}/bin:${PATH}"``` to the ```.bashrc```
    * Then make the file executable: ```chmod +x kubectl```
    * Note: In the lecture this has already be done when docker-combose was installed, then we don't need to add another ```PATH```
    * Test if the installation worked: Go back to the ```HOME``` folder and type ```kubectl```
* Set up a local Kubernetes cluster with Kind (Tool for setting up a local Kubernetes Cluster)
    * Install Kind to the same directory (```/bin```): https://kind.sigs.k8s.io/docs/user/quick-start/#installation
    * ```
        curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.12.0/kind-linux-amd64
chmod +x ./kind
      ```
* Create a Deployment
    * Now we have to create a cluster. Go back to the folder ```..../ping```
    * Type ```kind create cluster```
        * Creating cluster "kind"
    * Now we need to configure our kubectl, so that it knows that it has to access ```kind-kind``` by typing: ```kubectl cluster-info --context kind-kind```
    * Check if it is working: ```kubectl get service```. Outputs all the services:
    
    |NAME|TYPE|CLUSTER-IP|EXTERNAL-IP|PORT(S)|AGE|
    |----|----|----------|-----------|-------|---|
    |kubernetes|ClusterIP|10.96.0.1|<none>|443/TCP|117s|
    
    ```kubectl get pod``` -> outout: ```No resources found in default namespace.```
    ```kubectl get deployment``` -> output: ```No resources found in default namespace.```

    * Now we have to create an deployment.yml file for kubernetes
    ```
    apiVersion: apps/v1
    kind: Deployment
    metadata: ## deployment name
      name: ping-deployment
    spec:
      replicas: 1 # nr of replicas
      selector:
        matchLabels: # all pods with label="ping" belong to this deployment
          app: ping
      template: ## template for pods
        metadata:
          labels:
            app: ping # each pot gets label app="ping"
        spec: ## ~ pod name
          containers:
          - name: ping-pod # name of the container
            image: ping:v001
            resources: ## how much resources do we give to this pod
              limits:
                memory: "128Mi"
                cpu: "200m" ## not more than 20% of the cpus should be used
            ports: ## port to expose
            - containerPort: 9696
    ```
    
    * Apply this config file to our Kubernetes cluster: ```kubectl apply -f deployment.yaml``` (the option ```-f``` means: read from file)
    * output: ```deployment.apps/ping-deployment created```
    * ```kubectl get deployment``` now outputs 
    |NAME|READY|UP-TO-DATE|AVAILABLE|AGE|
    |----|-----|----------|---------|---|
    |ping-deployment|0/1|1|0 |73s|

    * ```kube ctl get pod``` outputs
    
    |NAME|READY|STATUS|RESTARTS|AGE|
    |----|-----|------|--------|---|
    |ping-deployment-6988b67698-7qmp6|0/1|ImagePullBackOff|0|5m3s|
    * To see more details: ```kubectl describe pod ping-deployment-6988b67698-7qmp6|less
```
    * This shows the error message:
    ```
    Failed to pull image "ping:v001": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/ping:v001": failed to resolve reference "docker.io/library/ping:v001": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
```
        * This happens, because we didn't tell Kind that this image should be registered in Kubernetes, i.e. we have to load the image to the cluster: ```kind load docker-image ping:v001```
    * Now ```kubectl get pod``` outputs
   |NAME|READY|STATUS|RESTARTS|AGE|
   |----|-----|------|--------|---|
   |ping-deployment-6988b67698-7qmp6|1/1|Running|0|13m|
      
    * Test the deployment
    ![test deployment](Screenshot_05.png)   
    
    * ```kubectl port-forward ping-deployment-6988b67698-7qmp6 9696:9696```, then in another terminal test it. Type: ```curl localhost:9696/ping```, the response should be ```PONG```
* Create a service: ```service.yaml```
    ```
    apiVersion: v1
        kind: Service
        metadata:
          name: ping # name of the service
        spec:
          type: LoadBalancer
          selector: # which pods qualify for forwarding requests
            app: ping
          ports:
          - port: 80
            targetPort: 9696 # port on the pod
    ```
    * Use ```kubectl apply -f service.yaml```, output: ```service/ping created```
    * ```kubectl get service``` outputs
    |NAME|TYPE|CLUSTER-IP|EXTERNAL-IP|PORT(S)|AGE|
    |----|----|----------|-----------|-------|---|
    |kubernetes|ClusterIP|10.96.0.1|<none>|443/TCP|22h|
    |ping|LoadBalancer|10.96.27.142|<pending>|80:31793/TCP|68s|
    * Now do the port-forwarding for the service: ```kubectl port-forward service/ping 8080:80```
    * Test in another terminal: ```curl localhost:8080/ping```
    * To delete a deployment: ```kubectl delete -f deployment```

# Deploy Tensorflow Models to Kubernetes

* We already deployed a service in the previous section
* Use ```kubectl get pod``` to see which pods are running and ```kubectl get deployment``` to see the deployments
* Previously we created a docker-compose file to run both the model and the gateway, now we want to replicate this, but deploy it to Kubernetes

* Deploy the tf-serving model  
    * create a file ```model-deployment.yaml```
    * make the image used available for kind: ```kind load docker-image zoomcamp-10-model:exception```
    * Create deployment: ```kubectl apply -f model-deployment.yaml```
    * ```kubectl get pod``` now shows the new deployment
    * test it:
        * ```kubectl port-forward tf-serving-clothing-model-88ff9bcfb-nfsg4 8500:8500```
        * Now use gateway.py for testing (without flask). ```python gateway.py``` in a different terminal should now give the predictions
    * Create a service: ```model-service.yaml```
    * ```kubectl apply -f model-service.yaml```
    * ```kubectl get service``` to see all services
    * forward the port for the service ```kubectl port-forward service/tf-serving-clothing-model 8500:8500```
* Deploy the gateway
    * Create a file ```gateway-deployment.yaml```
    * ```kind load docker-image zoomcamp-10-gateway:002```
    * Additionally to the model deployment file, here we also need to set the environment variable, which we set in the docker file. The urls follow the following naming: ```<NAME>.default.svc.cluster.local:8500```, in our case ```<NAME>=tf-serving-model```
* Test the service
    * Log in to a pod and execute the ```bash``` command: ```kubectl exec -it ping-deployment-5964c9c9cc-cxf7v -- bash```
    * Now we want to access the service (tf-serving-clothing-model) using the url: ```curl localhost:9696/ping```
    * But first we need to install ```curl```:```apt update```, ```apt install curl```
    * Then ```curl localhost:9696/ping``` replies with ```PONG``` (this is running a request within the container)
    * Test the service: ```curl ping.default.svc.cluster.local/ping```. This sends a request to the service (running on port 80, which is the default) and the service sends it back to our container. (The above command does the same as: ```curl ping.default.svc.cluster.local:80/ping```
    * Test the tf-serving-clothing-model
        * tf does not allow curl requests, we will use telnet
        * ```apt install telnet```
        * ```telnet tf-serving-clothing-model.default.svc.cluster.local 8500```, this allows us to send something to the port. This is just for testing if the connection works.
* Create the gateway deployment
    * ```kubectl apply -f gateway-deployment.yaml``` 
    * Forward the port from the gateway deployment to our host machine and execute a command: ```kubectl port-forward gateway-56dd4c97-x5vbq 9696:9696```
    * Then in a different terminal execute ```test.py```
* Create gateway service
    * create ```gateway-service.yaml```
    * ```kubectl apply -f gateway-service.yaml```
    * Port forward from the service to our localhost for testing: ```kubectl port-forward service/gateway 8080:80```
    * Use ```test.py``` to test the service, note the port in test.py has to be changes to 8080!

## Deploy to EKS

* No we deploy the previously created .yaml files to a EKS cluster on AWS
* Create a EKS cluster on AWS
    * use eksctl for that
    * Download it to our ```\bin``` folder we created
    * ```wget "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp``` (Link from https://docs.aws.amazon.com/eks/latest/userguide/eksctl.html)
    * Extract the file: ```tar xzfv eksctl_Linux_amd64.tar.gz```
    * Now we can create a EKS cluster: ```eksctl create --name ml-zoomcamp-eks```
    * We will save this in a configuration file:
        ```
        apiVersion: eksctl.io/v1alpha5
        kind: ClusterConfig

        metadata:
          name: mlzoomcamp-eks
          region: eu-west-1

        nodeGroups: # nodes within a group will all have the same configuration
          - name: ng-m5-xlarge
            instanceType: m5.xlarge
            desiredCapacity: 1

        ```
        * Now we create the cluster from the config: ```eksctl create cluster -f eks-config.yaml```
* Publish the image to ECR
    * ```aws ecr create repository --repository-name ml-zoomcamp-images```
    * ```
    ACCOUNT_ID=387546586013
    REGION=eu-west-1
    REGISTRY_NAME=mlzoomcamp-images
    PREFIX=${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${REGISTRY_NAME}

    GATEWAY_LOCAL=zoomcamp-10-gateway:002
    GATEWAY_REMOTE=${PREFIX}:zoomcamp-10-gateway-002
    docker tag ${GATEWAY_LOCAL} ${GATEWAY_REMOTE}

    MODEL_LOCAL=zoomcamp-10-model:exception
    MODEL_REMOTE=${PREFIX}:zoomcamp-10-model-exception
    docker tag ${GATEWAY_LOCAL} ${GATEWAY_REMOTE}
    ```
    * Now push the images:
        * login to ecr: ```$(aws ecr get login --no-include-email)```
        * push the images: ```docker push ${MODEL_REMOTE}```, ```docker push ${GATEWAY_REMOTE}```
* Configure kubectl
    * When the cluster is created ... (This may take some time)
    * Apply the config files: ```kubectl apply -f model-deployment.yaml```, ```kubectl apply -f model-service.yaml```
    * Now we forward the port. This time not from our local port, but from a remote port.
        ```kucectl port-forward service/tf-serving-clothing-model 8500:8500```
    * Test with ```python gateway.py```
    * Apply the config files: ```kubectl apply -f gateway-deployment.yaml```, ```kubectl apply -f gateway-service.yaml```
    * Test with ```python test.py```, change url to the one we see in ```kubectl get service``` (should be a long url)
* At the moment this is open to everyone, extra care has to be taken to restrict the access.
* Delete the cluster ```eksctl delete cluster --name ml-zoomcamp-eks```

## Explore More

* Other local Kubernetes: minikube, k3d, k3s, mikrok8s, EKS Anywhere
* [Rancher Desktop](https://rancherdesktop.io)
* Docker desktop
* [Lens](https://k8slens.dev)
* Many cloud provider have Kubernetes: GPC, Azure, Digital Ocean, and others. Search for "Managed Kubenetes".
* Deploy the model from the previous modules and from your project with kubernetes
* Lear about Kubernetes Namespaces. Here we used the default namespace.