# Deploying a model using KServe

## Prerequisites
You will need to have a cluster with kserve installed. If you do not have a cluster, you can create one by following the guide [here](https://medium.com/towards-data-science/kserve-highly-scalable-machine-learning-deployment-with-kubernetes-aa7af0b71202)

## Introduction
This notebook demonstrates how to deploy a model using KServe. We will be deploying two models, a simple sklearn model and a custom model.

Deploy sklearn model with the following command:
```bash
kubectl apply -n kserve -f - <<EOF
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "sklearn-iris"
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
EOF
```

Ensure that you have port forwarded the istio-ingressgateway service to access the model. You can do this by running the following command:
```bash
kubectl port-forward -n istio-system svc/istio-ingressgateway 8080:80
```

Then send a request to the model using cell below.

In [12]:
# Testing for sklearn iris model
import requests

url = "http://localhost:8080/v1/models/sklearn-iris:predict"
service_host_name = "sklearn-iris.kserve.example.com"
request_headers = {
    "Host": service_host_name
}
payload = {
    "instances": [
        [6.8,  2.8,  4.8,  1.4],
        [6.0,  3.4,  4.5,  1.6]
    ]
}
response = requests.post(url, json=payload, headers=request_headers)
response.text

'{"predictions":[1,1]}'

## Deploying a custom predictor
To deploy a custom predictor, you will need to create a docker image and push it to a container registry. This repository already has a custom predictor that you can use leverages continuous interation to build in github workflows.

To apply the latest version of the model, run the following command:
```bash
kubectl apply -f custom_predictor/deployment/custom_predictor.yaml
```

Ensure that you have port forwarded the istio-ingressgateway service to access the model. You can do this by running the following command:
```bash
kubectl port-forward -n istio-system svc/istio-ingressgateway 8080:80
```

Then send the request to the model using the cell below.

**Note: The custom predictor is a dummy model that returns a random tensor with shape (3, 360, 640) for any input.**

In [44]:
import requests
import base64
import io
import uuid
from PIL import Image
from src.data_models import InferenceV2Inputs, InferenceV2

def bytes_to_json_serializable(bytes_data: bytes) -> str:
    return base64.b64encode(bytes_data).decode("utf-8")

# read pil image to bytes
with io.BytesIO() as output:
    with Image.open("cat.jpg") as img:
        img.save(output, format="PNG")
        image_size = img.size
    image_bytes_data = bytes_to_json_serializable(output.getvalue())

inputs = InferenceV2Inputs(
    name="input-0",
    shape=list(image_size),
    datatype="BYTES",
    data=[image_bytes_data]
)

inference_request_payload = InferenceV2(
    id=str(uuid.uuid4()),
    inputs=[inputs]
)

model_name = "kserve-demo-model"
url = f"http://localhost:8080/v2/models/{model_name}/infer"
request_headers = {"Host": f"{model_name}.kserve.example.com"}

response = requests.post(url, data=inference_request_payload.json(), headers=request_headers)


In [45]:
predictions = response.json()
predictions['outputs'][0]['shape']


[3, 360, 640]