# Inference to a KServe v1 Predictor

Let's deploy our fine tuned model via KServe using the [v1 API](https://kserve.github.io/website/latest/modelserving/data_plane/v1_protocol/).
A custom model server can be found [here](https://quay.io/repository/marcocaimi/kserve-diffusers) and its source code can be found [here](https://github.com/mcaimi/kserve-diffusers-demo)

In [None]:
!pip install -U pip
!pip install requests pillow
!pip install gradio

In [None]:
import base64
try:
    import requests
    from PIL import Image
    import io
except Exception as e:
    print(f"Caught Exception {e}")

### Setup

Change that following variable settings match your deployed model's *Inference endpoint*. for example: 

```
infer_endpoint = "https://sd-stable-diffusion.apps.cluster-xc4f6.sandbox942.opentlc.com"
```

In [None]:
# the inference endpoint exposed by the model server
# this actually is the entry point that openshift exposes to hide the containerized workload
# KServe is serverless actually
infer_endpoint = "http://localhost:8080"

# this is the inference method exposed by the KServe Model Server
infer_url = f"{infer_endpoint}/v1/models/model:predict"

## Request Function

Build and submit the REST request to the model server. 

An example JSON payload:

```json
 // example payload:
 {
   "instances": [
     {
       "prompt": "photo of the beach",
       "negative_prompt": "ugly, deformed, bad anatomy",
       "num_inference_steps": 20,
       "width": 512,
       "height": 512,
       "guidance_scale": 7,
       "scheduler": "DPM++ 2M",
       "seed": 772847624537827,
     }
   ]
 }
```

In [None]:
# define the call function
def rest_request(url, prompt,
                 negative_prompt="",
                 steps=10,
                 width=512, height=512,
                 cfg=7,
                 scheduler="DPM++ 2M",
                 seed=-1,
                 timeout=600,
                 tls_verify=False):
    # prepare payload
    json_data = {
        "instances": [
            {
                "prompt": prompt,
                "negative_prompt": negative_prompt,
                "num_inference_steps": steps,
                "width": width,
                "height": height,
                "guidance_scale": cfg,
                "scheduler": scheduler,
                "seed": seed,
            }
        ]
    }

    # call the inference service
    response = requests.post(url, json=json_data, verify=tls_verify, timeout=timeout)

    # extract the resoponse payload
    response_dict = response.json()
    return response_dict

## Call the remote inference server

Let's call KServe to generate a RHTeddy image for us. Make sure the inference server has at least a GPU available otherwise this wwill most likey fail due to timeouts.

In [None]:
# prompt
prompt = "a photo of rhteddy dog wearing his red fedora on in the snowy mountains"

# call the service
json_response = rest_request(infer_url, prompt)

In [None]:
# now extract the image from the payload
# the image is base64 encoded
img_str = json_response["predictions"][0]["image"]["b64"]
img_data = base64.b64decode(img_str)

In [None]:
# convert the image back from base64 and display it
image = Image.open(io.BytesIO(img_data))
image.save("teddy.png", format="PNG")
image

## Build a Stable Diffusion UI with Gradio

Let's try to make our inference endpoint callable from a pretty web UI

In [None]:
# import the gradio toolkit
try:
    import gradio as gr
except Exception as e:
    print(f"Unable to import gradio library: {e}")

In [None]:
# build the callback function
def generate_image(url, prompt,
                 negative_prompt="",
                 steps=10,
                 width=512, height=512,
                 cfg=7,
                 scheduler="DPM++ 2M",
                 seed=-1,
                 timeout=600,
                 tls_verify=False):
    # call the generation function
    kserve_response = rest_request(url, prompt, negative_prompt=negative_prompt,
                                   steps=steps, width=width, height=height,
                                   cfg=cfg, seed=seed, scheduler=scheduler, timeout=timeout, tls_verify=tls_verify)

    # extract the payload
    image_payload = kserve_response.get("predictions")[0].get("image").get("b64")
    # decode from base64
    img_data = base64.b64decode(image_payload)

    # return image bytes
    return Image.open(io.BytesIO(img_data))

In [None]:
# build gradio application
schedulers = ["DPM++ 2M", "DPM++ SDE", "Euler a", "Euler", "Heun", "LCM"]

sd_ui = gr.Interface(fn=generate_image,
                     inputs=[gr.Textbox(value=infer_url, label="Inference URL"),
                            gr.Textbox(label="Prompt"),
                            gr.Textbox(label="Negative Prompt"),
                            gr.Slider(label="Denoising Steps", value=5, minimum=1, maximum=100, step=1),
                            gr.Number(label="Width", value=512), gr.Number(label="Height", value=512),
                            gr.Slider(label="Guidance Scale", value=7, minimum=1, maximum=100, step=0.5),
                            gr.Dropdown(label="Scheduler", value="DPM++ 2M", choices=schedulers), gr.Number(value=-1, label="Seed")],
                     outputs=gr.Image())

# start the application
sd_ui.launch(share=True)

In [None]:
# close the application and exit
sd_ui.close()