## Instructions

```
                                         1.2 Create an archive

                                       ┌──────────────────────┐
                                       │                      │
                                       │   Torchserve Model   │
                                       │       Archive        │
                                       │                      │
                                       │   ┌──────────────┐   │   ┌──────────────┐
                                       │   │   model.py   │   │   │              │              1.3
      1.1.1 from lab3   ───────────────┼─► ├──────────────┤   │   │  Torchserve  │              # workers
                                       │   │ one_layer.pt │   │   │              │ ◄──────────  # batchsize
                                       │   └──────────────┘   │   │    config    │              max batch delay
                                       │                      │   │              │              etc.
        preprocess  code               │   ┌──────────────┐   │   └──────┬───────┘
1.1.2      call model     ─────────────┼─► │  handler.py  │   │          │
        postprocess code               │   └──────────────┘   │          │
                                       │                      │          │
                                       └──────────┬───────────┘          │
                                                  │                      │
                                                  │                      │
                                                  │                      │
                                       ┌──────────▼──────────────────────▼───────┐
                                       │                                         │    1.4 Upload to storage
                                       │   Storage   ( MinIO / S3 / Url / PVC )  │
                                       │                                         │
                                       └────────────────────┬────────────────────┘
                                                            │
                                       ┌────────────────────▼────────────────────┐
                                       │                                         │    2 Define KServe Yaml
                                       │             KServe Predictor            │
                                       │                                         │    3 Do some basic testing
                                       │             ( scaling pods )            │
                                       │                                         │    4 Autoscaling
                                       └─────────────────────────────────────────┘
                                                                                      5 Canary Rollout
     
 ```
 
The lab mainly covers:
- PyTorch Serve: package PyTorch model with custom preprocess/postprocess functions
- MinIO storage usage
- KServe: basic, autoscaling, canary rollout

## 1 PyTorch Serve

#### 1.1 Prepartion for Model Archiver

Prepare 3 files:
- pytorch_one_layer.pt: a serialized file (.pt or .pth) should be a checkpoint in case of torchscript and state_dict in case of eager mode.
- model.py: a model file should contain the model architecture. This file is mandatory in case of eager mode models.
- handler.py: codes for model initialization, pre-processing, post-processing, etc.


##### 1.1.1 pytorch_one_layer.pt

I have already put it in `torchserve/pytorch_one_layer.pt`, which comes from [Lab3](../lab3_training.md):

```python
if RANK == 0:
    print("saving model to", args.dir)
    os.makedirs(args.dir, exist_ok=True) 
    torch.save(model.state_dict(), os.path.join(args.dir, "pytorch_one_layer.pt"))
```

<span style="color:red">If you are using JupyterLab in Kubeflow, remember to upload it to `torchserve/pytorch_one_layer.pt`</span>

##### 1.1.2 model.py

The pytorch_one_layer.pt does not contains model architecture, we need to provide model architecture definition with torchserve.

Learn more about eager-mode vs torchscript here:
https://pytorch.org/tutorials/beginner/deploy_seq2seq_hybrid_frontend_tutorial.html

Copy model architecture class `class Net(nn.Module)` from Lab3 to the cell below. 

Just run the cell and the code inside will be saved into `torchserve/model.py`

In [152]:
!mkdir -p torchserve

In [2]:
%%writefile torchserve/model.py
import torch.nn as nn

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.linear = nn.Linear(5, 2)

    def forward(self, x):
        x = self.linear(x)
        return x

Overwriting torchserve/model.py


##### 1.1.3 Handler.py

What can handler.py do? (https://pytorch.org/serve/custom_service.html)

- Initialize the model instance
- Pre-process input data before it is sent to the model for inference or Captum explanations
- Customize how the model is invoked for inference or explanations
- Post-process output from the model before sending back a response

Implement `preprocess` and `postprocess` functions with the reference of `lab2` &  `lab3` 
- Preprocess: [Feature Extraction in lab2]
- Postprocess: [PyTorch code in lab3]

Just run the cell and the code inside will be saved into `torchserve/handler.py`

1. torchserve web server: 
   1. combines multiple HTTP request into batches, forward batch requests to `Handler.py`
      - input: 
         ```json
         {"email": "123"}
         ```
         ```json
         {"email": "456"}
         ```
      - output:
         ```json
         [{"email": "123"}, {"email": "456"}]
         ```
2. Handler.py
   1. preprocess, convert list of dict into `torch tensor` for model inference
      - input: output from torchserve webserver
      - output:
         ```python
         [
            [0, 0, 0, 0, 0],
            [1, 1, 1, 1, 1],
         ]
         ```
   2. PyTorch Model Inference,
      - input: output from preprocess
      - output:
         ```python
         [
            [0.5, -0.3],
            [0.3, 0.8],
         ]
         ```
   3. Postprocess, input:
      - input: output from PyTorch model
      - output:
         ```python
         [
            {'model_version': '1', 'prediction': 'ham'},
            {'model_version': '1', 'prediction': 'spam'},
         ]
         ```


In [3]:
%%writefile torchserve/handler.py
# custom handler file

# model_handler.py

"""
ModelHandler defines a custom model handler.
"""

import logging
import torch
from ts.torch_handler.base_handler import BaseHandler

# BaseHandler:
# https://github.com/pytorch/serve/blob/master/ts/torch_handler/base_handler.py

class ModelHandler(BaseHandler):
    """
    A custom model handler implementation.
    """

    def preprocess(self, batch):
        """
        Transform raw input into model input data.
        :param batch: list of raw requests, should match batch size
        :return: list of preprocessed model input data
        """
        feature_list = []
        logging.info("[preprocess] batch received:")
        logging.info(batch)
        for email in batch:
            # extract features from email
            feature = []
            # short text
            short_text = len(email) < 500
            feature.append(int(short_text))
            # high frequency words
            high_frequency_words = ["body", "business", "html", "money"]
            for word in high_frequency_words:
                contain_bool = word in email
                feature.append(int(contain_bool))

            feature_list.append(feature)

        logging.info("Preprocess result:")
        logging.info(feature_list)
        return torch.as_tensor(feature_list, dtype=torch.float32, device=self.device)

    def postprocess(self, inference_output):
        """
        Return inference result.
        :param inference_output: list of inference output
        :return: list of predict results
        """
        # Take output from network and post-process to desired format
        logging.info("Logits from model:")
        logging.info(inference_output)

        pred = inference_output.max(1)[1]
        positive_dict = {"version": "2", "prediction": "spam"}
        negative_dict = {"version": "2", "prediction": "ham"}
        postprocess_result = list(map(
                lambda x: positive_dict if x == 1 else negative_dict, 
                pred))

        logging.info("Postprocess result:")
        logging.info(postprocess_result)
        return postprocess_result

Overwriting torchserve/handler.py


#### 1.2 Torchserve Model Archiver

It basically create a tar called `{model-name}.mar` from `model-file`, `serialized-file (*.pt)`, `handler`

In [324]:
%%bash
cd $(dirname $0)/torchserve
base_path=$(pwd)

mkdir -p $base_path/model-store && cd $base_path/model-store &&
if [ -f $base_path/model-store/helmet_detection.mar ]; then
    rm $base_path/model-store/helmet_detection.mar
fi

pip install torch-model-archiver -i https://pypi.tuna.tsinghua.edu.cn/simple


torch-model-archiver --model-name helmet_detection \
--version 0.1 --serialized-file $base_path/helmet.torchscript.pt \
--handler $base_path/torchserve_handler.py \
--extra-files $base_path/index_to_name.json,$base_path/torchserve_handler.py


echo "create successfully"

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
create successfully


#### 1.3 create torchserve config

Feel free to change the parameters:

- minWorkers: the minimum number of workers of a model
- maxWorkers: the maximum number of workers of a model
- batchSize: the batch size of a model
- maxBatchDelay: the maximum dalay in msec of a batch of a model
- responseTimeout: the timeout in msec of a model's response
- defaultVersion: the default version of a model
- marName: the mar file name of a model


In [2]:
!mkdir -p torchserve/config

In [325]:
%%writefile torchserve/config/config.properties

inference_address=http://0.0.0.0:8085
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
grpc_inference_port=7070
grpc_management_port=7071
enable_metrics_api=true
metrics_format=prometheus
number_of_netty_threads=4
job_queue_size=10
enable_envvars_config=true
install_py_dep_per_model=true
model_store=/home/model-server/torchserve_mar/helmet_detection/model-store
model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"helmet_detection":{"1.0":{"defaultVersion":true,"marName":"helmet_detection.mar","minWorkers":1,"maxWorkers":5,"batchSize":4,"maxBatchDelay":100,"responseTimeout":120}}}}

Overwriting torchserve/config/config.properties


#### 1.4 Upload to MinIO

If you already have the minio storage, you can directly follow the next steps. If not, we also provide a standalone minio deployment guide on the kubernetes clusters.

You can use the files from here [https://github.com/xujinheng/kubeflow-manifests/tree/main/website/content/en/docs/kubeflow-tutorial/lab4_minio_deploy], and apply in your clusters.

`kubectl apply -f minio-standalone-pvc.yml` 

`kubectl apply -f minio-standalone-service.yml`

`kubectl apply -f minio-standalone-deployment.yml`

This step uploads `torchserve/model-store`, `torchserve/config` to MinIO buckets

You need to find the MINIO
- `endpoint_url`
- `key_id`
- `access_key`

In [4]:
!pip install boto3 -i https://pypi.tuna.tsinghua.edu.cn/simple

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting boto3
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/29/17/8dd2d2c231cdfed1b24e31e49c628b8490c2846fe3116ced9d2fa73de0aa/boto3-1.25.5-py3-none-any.whl (132 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m132.5/132.5 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting botocore<1.29.0,>=1.28.5
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/08/7c/5672539c66ab305e385fcb578b395feec894e3277f35843a1e4c94259fb3/botocore-1.28.5-py3-none-any.whl (9.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.3/9.3 MB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting jmespath<2.0.0,>=0.7.1
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/31/b4/b9b800c45527aadd64d5b442f9b932b00648617eb5d63d2c7a6587b7cafc/jmespath-1.0.1-py3-none-any.whl (20 kB)
Collecting s3transfer<0.7.0,>=0.6.0
  Downloading https://pypi.tuna.tsinghua.edu.cn/packag

In [172]:
import os
from urllib.parse import urlparse
import boto3

os.environ["AWS_ENDPOINT_URL"] = "http://10.117.233.16:9000"
os.environ["AWS_REGION"] = "us-east-1"
os.environ["AWS_ACCESS_KEY_ID"] = "minioadmin"
os.environ["AWS_SECRET_ACCESS_KEY"] = "minioadmin"

s3 = boto3.resource('s3',
                    endpoint_url=os.getenv("AWS_ENDPOINT_URL"),
                    verify=True)

In [76]:
print("current buckets in s3:")
print(list(s3.buckets.all()))

current buckets in s3:
[s3.Bucket(name='helmet-detection-bucket'), s3.Bucket(name='juanl-bucket'), s3.Bucket(name='xujinheng-bucket')]


In [14]:
bucket_name='juanl-bucket'
s3.create_bucket(Bucket=bucket_name)

s3.Bucket(name='juanl-bucket')

Upload files to your bucket_name, and you can also specify `bucket_path`

In [326]:
curr_path = os.getcwd()
base_path = os.path.join(curr_path, "torchserve")

bucket_path = "helmet_detection"

bucket = s3.Bucket(bucket_name)

# upload
bucket.upload_file(os.path.join(base_path, "model-store", "helmet_detection.mar"),
                   os.path.join(bucket_path, "model-store/helmet_detection.mar"))
bucket.upload_file(os.path.join(base_path, "config", "config.properties"), 
                   os.path.join(bucket_path, "config/config.properties"))

# check files 
for obj in bucket.objects.filter(Prefix=bucket_path):
    print(obj.key)

helmet_detection/config/config.properties
helmet_detection/model-store/helmet_detection.mar


## 2 KServe

#### 2.1 Create Minio service account && secret

- You will also need to specify the `s3-endpoint`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` here
- If you are using default user `user@exampe.com/12341234`, please also set a different name for all the <span style="color:red">metadata: name</span> in the yaml file. 

In [10]:
%%bash

cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: minio-s3-secret-user
  annotations:
     serving.kserve.io/s3-endpoint: "10.117.233.16:9000" # replace with your s3 endpoint e.g minio-service.kubeflow:9000
     serving.kserve.io/s3-usehttps: "0" # by default 1, if testing with minio you can set to 0
     serving.kserve.io/s3-region: "us-east-2"
     serving.kserve.io/s3-useanoncredential: "false" # omitting this is the same as false, if true will ignore provided credential and use anonymous credentials
type: Opaque
stringData: # use "stringData" for raw credential string or "data" for base64 encoded string
  AWS_ACCESS_KEY_ID: minioadmin
  AWS_SECRET_ACCESS_KEY: minioadmin
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: minio-service-account-user
secrets:
- name: minio-s3-secret-user
EOF

secret/minio-s3-secret-user configured
serviceaccount/minio-service-account-user configured


#### 2.2 Create InferenceService from MinIO

- Set `storageUri` to your `bucket_name/bucket_path`
- You may also need to change `metadata: name` and `serviceAccountName` 

In [327]:
%%bash

cat << EOF | kubectl apply -f -
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "helmet-detection-serving"
spec:
  predictor:
    serviceAccountName: minio-service-account-user
    model:
      modelFormat:
        name: pytorch
      storageUri: "s3://juanl-bucket/helmet_detection"
      resources:
          requests:
            cpu: 50m
            memory: 200Mi
          limits:
            cpu: 100m
            memory: 500Mi
          # limits:
          #   nvidia.com/gpu: "1"   # for inference service on GPU
EOF

inferenceservice.serving.kserve.io/helmet-detection-serving created


#### 2.3 Kubeflow UI

Check model logs at [Kubeflow UI -> Models](/models/)


## 3 Test 

#### 3.1 Define a Test_bot for convenience

In [22]:
!pip install multiprocess -i https://pypi.tuna.tsinghua.edu.cn/simple

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting multiprocess
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/13/95/8b875a678c6f9db81809dd5d6032e9f8628426e37f6aa6b7d404ba582de1/multiprocess-0.70.14-py38-none-any.whl (132 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m132.0/132.0 kB[0m [31m482.5 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting dill>=0.3.6
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/be/e3/a84bf2e561beed15813080d693b4b27573262433fced9c1d1fea59e60553/dill-0.3.6-py3-none-any.whl (110 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m110.5/110.5 kB[0m [31m9.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: dill, multiprocess
  Attempting uninstall: dill
    Found existing installation: dill 0.3.4
    Uninstalling dill-0.3.4:
      Successfully uninstalled dill-0.3.4
Successfully installed dill-0.3.6 multiprocess-0.70.14


In [331]:
import requests
import json
import multiprocess as mp
import io
import base64
import PIL.Image as Image
# from PIL import Image


class Test_bot():
    def __init__(self, uri, model, host, session):
        self.uri = uri
        self.model = model
        self.host = host
        self.session = session
        self.headers = {'Host': self.host, 'Content-Type': "image/jpeg", 'Cookie': "authservice_session=" + self.session}
        self.img = './1.jpg'
    
    def update_uri(self, uri):
        self.uri = uri
        
    def update_model(self, model):
        self.model = model
        
    def update_host(self, host):
        self.host = host
        self.update_headers()
        
    def update_session(self, session):
        self.session = session
        self.update_headers()
        
    def update_headers(self):
        self.headers = {'Host': self.host, 'Content-Type': "image/jpeg", 'Cookie': "authservice_session=" + self.session}
        
    def get_data(self, x):
        if x:
            payload = x
        else: 
            payload = self.img
        with open(payload, "rb") as image:  
            f = image.read()
            image_data = base64.b64encode(f).decode('utf-8')    

        return json.dumps({'instances': [image_data]})

    
    def predict(self, x=None):
        uri = self.uri + '/v1/models/' + self.model + ':predict'
        response = requests.request("POST", uri, headers=self.headers, data=self.get_data(x))
        return response.text
    
        
    def readiness(self):
        # uri = self.uri + '/v1/models/' + self.model
        uri = self.uri + '/v1/models/' + self.model
        response = requests.get(uri, headers = self.headers, timeout=5)
        return response.text

    
    def explain(self, x=None):
        uri = self.uri + '/v1/models/' + self.model + ':explain'
        response = requests.post(uri, data=self.get_data(x), headers = self.headers, timeout=10)
        return response.text
    
    def concurrent_predict(self, num=10):
        print("fire " + str(num) + " requests to " + self.host)
        with mp.Pool() as pool:
            responses = pool.map(self.predict, range(num))
        return responses

#### 3.2 Determine host and session

Run the following cell to get `host`, which will be set to the headers in our request

In [236]:
!kubectl get inferenceservice helmet-detection-serving -o jsonpath='{.status.url}' | cut -d "/" -f 3

helmet-detection-serving.kubeflow-user-example-com.example.com


Use your web browser to login to Kubeflow, and get `Cookies: authservice_session` (Chrome: Developer Tools -> Applications -> Cookies)

In [332]:
# replace it with the url you used to access Kubeflow
bot = Test_bot(uri='http://10.117.233.8',
               model='helmet_detection',
               # replace it with what is printed above
               host='helmet-detection-serving.kubeflow-user-example-com.example.com',
               # replace it
               session='MTY2NzM4MzA3NnxOd3dBTkZOT1dVRTJVRVZWVUVaVlRFeEdSVFpLVmxwRk1rRlhRMHhIUVRKR05sQklTVmswTmxOTVdsaFdTRmxNUkV0TFJqSkxOVkU9fPcleb6sw1pZHcTLy5HMQRLssZ7PP_nQkhOTVGV7MBEp')

print(bot.readiness()) 
print(bot.predict('./1.jpg'))
# We didn't implement model explainer, so this result will be 500: Internal Server Error
# https://kserve.github.io/website/0.8/modelserving/explainer/explainer/
# print(bot.explain(0))

{"name": "helmet_detection", "ready": true}
uri:  http://10.117.233.8/v1/models/helmet_detection:predict
{'Host': 'helmet-detection-serving.kubeflow-user-example-com.example.com', 'Content-Type': 'image/jpeg', 'Cookie': 'authservice_session=MTY2NzM4MzA3NnxOd3dBTkZOT1dVRTJVRVZWVUVaVlRFeEdSVFpLVmxwRk1rRlhRMHhIUVRKR05sQklTVmswTmxOTVdsaFdTRmxNUkV0TFJqSkxOVkU9fPcleb6sw1pZHcTLy5HMQRLssZ7PP_nQkhOTVGV7MBEp'}
{"predictions": [[{"x1": 0.16830308735370636, "y1": 0.36698096990585327, "x2": 0.3356268107891083, "y2": 0.5662754774093628, "confidence": 0.9418922662734985, "class": "person"}, {"x1": -0.0003847241459880024, "y1": 0.26973700523376465, "x2": 0.11975767463445663, "y2": 0.5021408796310425, "confidence": 0.9287041425704956, "class": "person"}, {"x1": 0.31550225615501404, "y1": 0.27130556106567383, "x2": 0.4195330739021301, "y2": 0.4244980812072754, "confidence": 0.922441303730011, "class": "person"}, {"x1": 0.8000055551528931, "y1": 0.36035841703414917, "x2": 0.8742902874946594, "y2": 0.4628

## 4 Autoscaling

- Knative Pod Autoscaler (KPA)
  - Part of the Knative Serving core and enabled by default once Knative Serving is installed.
  - Supports scale to zero functionality.
  - Does not support CPU-based autoscaling.
  
- Horizontal Pod Autoscaler (HPA)
  - Not part of the Knative Serving core, and must be enabled after Knative Serving installation.
  - Does not support scale to zero functionality.
  - Supports CPU-based autoscaling.

<span style="color:red">If you use CPU-based autotscaling, ake sure HPA is installed before move on </span> (check by `kubectl get deploy autoscaler-hpa -n knative-serving`), will need to install it from https://github.com/knative/serving/releases/

Add autoscaling tag to the InferenceService and apply

In [335]:
%%bash

cat << EOF | kubectl apply -f -
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: helmet-detection-serving
  annotations:
    autoscaling.knative.dev/class: hpa.autoscaling.knative.dev
    # see available tags: https://knative.dev/docs/serving/autoscaling/autoscaling-targets/
    autoscaling.knative.dev/max-scale: "3"
    # HPA: specifies the CPU percentage target (default "80"). 
    # KPA: Target x requests in-flight per pod.
    autoscaling.knative.dev/target: "80"  
spec:
  predictor:
    serviceAccountName: minio-service-account-user
    model:
      modelFormat:
        name: pytorch
      storageUri: "s3://juanl-bucket/helmet_detection"
      resources:
          requests:
            cpu: 50m
            memory: 200Mi
          limits:
            cpu: 100m
            memory: 500Mi
EOF

inferenceservice.serving.kserve.io/helmet-detection-serving configured


Check the number of pods. It takes a while before the one deployment get replaced.

In [337]:
!kubectl get pod

NAME                                                              READY   STATUS                  RESTARTS   AGE
bitfusion-notebook-01-0                                           2/2     Running                 0          61d
helmet-detection-0                                                2/2     Running                 0          14d
helmet-detection-deployment-565dbfcffd-d6czf                      2/2     Running                 0          12d
helmet-detection-serving-predictor-default-00001-deploymen92b59   1/3     Terminating             0          13h
helmet-detection-serving-predictor-default-00002-deploymen8dmbp   0/3     Init:CrashLoopBackOff   3          74s
helmet-detection-serving-predictor-default-00003-deploymenmkngh   3/3     Running                 0          4m4s
ml-pipeline-ui-artifact-7cd897c59f-kzlfs                          2/2     Running                 0          64d
ml-pipeline-visualizationserver-795f7db965-gzjsm                  2/2     Running              

Adjust num of concurrent predict requests, fire it, let the the number of pods scale up

In [None]:
responses = bot.concurrent_predict(num=10)

fire 100 requests to helmet-detection-serving.kubeflow-user-example-com.example.com
uri:  uri: http://10.117.233.8/v1/models/helmet_detection:predict
 uri: {'Host': 'helmet-detection-serving.kubeflow-user-example-com.example.com', 'Content-Type': 'image/jpeg', 'Cookie': 'authservice_session=MTY2NzM4MzA3NnxOd3dBTkZOT1dVRTJVRVZWVUVaVlRFeEdSVFpLVmxwRk1rRlhRMHhIUVRKR05sQklTVmswTmxOTVdsaFdTRmxNUkV0TFJqSkxOVkU9fPcleb6sw1pZHcTLy5HMQRLssZ7PP_nQkhOTVGV7MBEp'} 
http://10.117.233.8/v1/models/helmet_detection:predicthttp://10.117.233.8/v1/models/helmet_detection:predicturi: 
 http://10.117.233.8/v1/models/helmet_detection:predict{'Host': 'helmet-detection-serving.kubeflow-user-example-com.example.com', 'Content-Type': 'image/jpeg', 'Cookie': 'authservice_session=MTY2NzM4MzA3NnxOd3dBTkZOT1dVRTJVRVZWVUVaVlRFeEdSVFpLVmxwRk1rRlhRMHhIUVRKR05sQklTVmswTmxOTVdsaFdTRmxNUkV0TFJqSkxOVkU9fPcleb6sw1pZHcTLy5HMQRLssZ7PP_nQkhOTVGV7MBEp'}

{'Host': 'helmet-detection-serving.kubeflow-user-example-com.example.com', 

Check the number of pods again

In [None]:
!kubectl get pod

## 6 More
Explore the Kserve 0.8 docs here https://kserve.github.io/website/0.8/modelserving/control_plane/

(note that the version we use is KServe 0.6.1)

- Multi Model Serving
- Transformers
- Model Explainability
- Model Monitoring
- Payload Logging
- etc.
