# REST Inference

After deploying the model using RHODS Model Serving, we'd like to test the model deployment by sending images to the model server for real-time inference.

In this notebook we'll review how to consume the model through the RHODS Model Server using a REST API

### Setup
For testing the model deployment, our test script needs to know the address of the model server. Let's insert the **inference endpoint** that the RHODS Dashboard provides for the deployed model.

This code assumes you've deployed a model with the name `yolo` in the same data science project as this notebook.

```
prediction_url = 'http://modelmesh-serving:8008/v2/models/yolo/infer'
```

If you've deployed the model with a different name instead of `yolo`, you'll need to adjust the model name accordingly.

If you've deployed the model to a different namespace, you'll have to modify the URL.  Here we're assuming the kube service is in the same namespace, but we could refer to it in full with the namespace.  e.g. `http://modelmesh-serving.project-name.svc.cluster.local:8008/v2/models/yolo/infer`

In [None]:
#  model named yolo in the same project
model_name = "yolo"
prediction_url = f"http://modelmesh-serving:8008/v2/models/{model_name}/infer"

## Preprocessing functions.

As calling the APIs closely mirrors using the models,  we'll start by importing the preprocessing and rendering functions that we have worked with in the previous notebook.


In [None]:
import sys
sys.path.append('./utils')

import numpy as np

from requests import post
import torch

from utils.classes import coco_classes
from utils.images import preprocess, postprocess, draw_boxes

Let's prepare one of our sample images as a test sample.

In [None]:
image_path = 'images/redhat-dog.jpg'
transformed_image, scaling, padding = preprocess(image_path)

We also need to know the class labels of the objects the model has been trained to detect. In case of the default YOLO v5 model, we can take the default class labels defined in the _classes_ module.
If you want to test a custom model, replace `coco_classes` with the list of your custom class labels, e.g.

`['Laptop', 'Computer keyboard', 'Table']`.

In [None]:
class_labels = coco_classes

We'll now need to package the preprocessed image into a format that the model server can consume. RHODS Model Serving implements a generic prediction interface that allows to query the typical model formats through the HTTP POST method using a JSON request body.

In [None]:
def serialize(image):
    payload = {
        'inputs': [
            {
                'name': 'images',
                'shape': [1, 3, 640, 640],
                'datatype': 'FP32',
                'data': image.flatten().tolist(),
            }
        ]
    }
    return payload

In [None]:
payload = serialize(transformed_image)

Let's now send the serialized image to the model server. The inference results will also be returned in a generic JSON structure, which we can unpack straightaway. We'll also apply the post-processing function we defined in the previous notebook to extract the familiar object properties.

In [None]:
def rest_request(payload, prediction_url, classes_count):
    raw_response = post(prediction_url, json=payload)
    try:
        response = raw_response.json()
    except:
        print(f'Failed to deserialize service response.\n'
              f'Status code: {raw_response.status_code}\n'
              f'Response body: {raw_response.text}')
    try:
        model_output = response['outputs']
    except:
        print(f'Failed to extract model output from service response.\n'
              f'Service response: {response}')
    unpacked_output = _unpack(model_output, classes_count)
    return unpacked_output


def _unpack(model_output, classes_count):
    arr = np.array(model_output[0]['data'])
    # Get the response data as a NumPy Array

    output = torch.tensor(arr)  # Create a tensor from array
    prediction_columns_number = 5 + classes_count
    # Model returns model returns [xywh, conf, class0, class1, ...]

    output = output.reshape(
        1,
        int(int(output.shape[0])/prediction_columns_number),
        prediction_columns_number
    )  # Reshape the flat array prediction

    return output

In [None]:
raw_objects = rest_request(payload, prediction_url, len(class_labels))
objects = postprocess(raw_objects, class_labels)
objects

Let's now visualize the result as we did when we were experimenting with the model directly

In [None]:
# draw_boxes(image_path, objects, scaling, padding, class_labels)

draw_boxes(image_path, *objects[0])

We were able to reproduce the object detection example from the previous notebook, so we can consume the deployed model as expected.

In the next notebook, we'll do the same thing using the gRPC protocol