# gRPC Inference

In this notebook we'll review how to consume the model through the RHODS Model Server using a gRPC endpoint.

### Setup

First let's install dependencies.  In this case, we'll need some new packages for gRPC.

In [None]:
!pip install grpcio grpcio-tools

Now we can set the host, port, and model name of our endpoint

If you've deployed the model with a different name instead of `yolo`, you'll need to adjust the model name accordingly.

If you've deployed the model to a different namespace, you'll have to modify the host.  Here we're assuming the kube service is in the same namespace, but we could refer to it in full with the namespace.  e.g. `modelmesh-serving.project-name.svc.cluster.local`

In [None]:
grpc_host = 'modelmesh-serving'
grpc_port = 8033
model_name = 'yolo'

### gRPC Functions

We generated python functions from the [kserve proto file](https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/grpc_predict_v2.proto). If you're new to gRPC, you can take a look at the [Python quickstart](https://grpc.io/docs/languages/python/quickstart/) to see how we generated `utils/grpc_predict_v2_pb2_grpc.py` and `utils/grpc_predict_v2_pb2`.

Let's import functions from these generated files.

In [None]:
import sys
sys.path.append('./utils')

import grpc
import utils.grpc_predict_v2_pb2 as grpc_predict_v2_pb2
import utils.grpc_predict_v2_pb2_grpc as grpc_predict_v2_pb2_grpc


### Model Metadata

The per-model metadata API provides information about a model. Errors are indicated by the google.rpc.Status returned for the request. The OK code indicates success and other codes indicate failure.

In [None]:

options = [('grpc.max_receive_message_length', 100 * 1024 * 1024)]
channel = grpc.insecure_channel(f"{grpc_host}:{grpc_port}", options=options)
stub = grpc_predict_v2_pb2_grpc.GRPCInferenceServiceStub(channel)

request = grpc_predict_v2_pb2.ModelMetadataRequest(name=model_name)
response = stub.ModelMetadata(request)
response


### Preprocessing Functions

Now, we can import the preprocessing and rendering functions as normal.


In [None]:
import sys
sys.path.append('./utils')

import numpy as np

from utils.classes import coco_classes
from utils.images import preprocess, postprocess, draw_boxes

### Making a gRPC Request

Let's prepare one of our sample images as a test sample.

In [None]:
image_path = 'images/redhat-dog.jpg'
transformed_image, scaling, padding = preprocess(image_path)

We also need to know the class labels of the objects the model has been trained to detect. In case of the default YOLO v5 model, we can take the default class labels defined in the _classes_ module.
If you want to test a custom model, replace `coco_classes` with the list of your custom class labels, e.g.

`['Laptop', 'Computer keyboard', 'Table']`.

In [None]:
class_labels = coco_classes

We'll now need to package the preprocessed image into a format that the model server can consume. RHODS Model Serving implements a generic prediction interface that allows to query the typical model formats through the HTTP POST method using a JSON request body.

In [None]:
def create_payload(img_data):
    payload = []
    payload.append(grpc_predict_v2_pb2.ModelInferRequest().InferInputTensor())
    payload[0].name = "images"
    payload[0].datatype = "FP32"
    payload[0].shape.extend([1, 3, 640, 640])
    arr = img_data.flatten()
    payload[0].contents.fp32_contents.extend(arr)
    return payload

In [None]:
payload = create_payload(transformed_image)

Let's now send the serialized image to the model server. The inference results will also be returned in a generic JSON structure, which we can unpack straightaway. We'll also apply the post-processing function we defined in the previous notebook to extract the familiar object properties.

In [None]:
import time
import classes


def transform_filter_results(result_arr):
    prediction_columns_number = 5 + len(class_labels)  # Model returns model returns [xywh, conf, class0, class1, ...]
    reshaped_result_arr = result_arr.reshape(1, int(int(result_arr.shape[0])/prediction_columns_number), prediction_columns_number)
    sorted_result_arr = (reshaped_result_arr[0][reshaped_result_arr[0][:, 4].argsort()])[::-1]
    return sorted_result_arr


def grpc_request(inputs):
    request = grpc_predict_v2_pb2.ModelInferRequest()
    request.model_name = model_name
    request.inputs.extend(inputs)

    response = stub.ModelInfer(request)

    result_arr = np.frombuffer(response.raw_output_contents[0], dtype=np.float32)
    return transform_filter_results(result_arr)

In [None]:
raw_objects = grpc_request(payload)
objects = postprocess([raw_objects], class_labels)
objects

Let's now visualize the result as we did when we were experimenting with the model directly

In [None]:
# draw_boxes(image_path, objects, scaling, padding, class_labels)

draw_boxes(image_path, *objects[0])