# Hello Model Server

Introduction to OpenVINO™ Model Server (OVMS).

## What is Model Serving?
A model server hosts models and makes them accessible to software components over standard network protocols. A client sends a request to the model server, which performs inference and sends a response back to the client. Model serving offers many advantages for efficient model deployment:

- Remote inference enables using lightweight clients with only the necessary functions to perform API calls to edge or cloud deployments.
- Applications are independent of the model framework, hardware device, and infrastructure.
- Client applications in any programming language that supports REST or gRPC calls can be used to run inference remotely on the model server.
- Clients require fewer updates since client libraries change very rarely.
- Model topology and weights are not exposed directly to client applications, making it easier to control access to the model.
- Ideal architecture for microservices-based applications and deployments in cloud environments – including Kubernetes and OpenShift clusters.
- Efficient resource utilization with horizontal and vertical inference scaling.
  
![ovms_diagram](https://user-images.githubusercontent.com/91237924/215658773-4720df00-3b95-4a84-85a2-40f06138e914.png)

## Serving with OpenVINO Model Server
OpenVINO Model Server (OVMS) is a high-performance system for serving models. Implemented in C++ for scalability and optimized for deployment on Intel architectures, the model server uses the same architecture and API as TensorFlow Serving and KServe while applying OpenVINO for inference execution. Inference service is provided via gRPC or REST API, making deploying new algorithms and AI experiments easy.

![ovms_high_level](https://user-images.githubusercontent.com/91237924/215658767-0e0fc221-aed0-4db1-9a82-6be55f244dba.png)

To quickly start using OpenVINO™ Model Server follow these steps:

## Step 1: Prepare Docker
Install [Docker Engine](https://docs.docker.com/engine/install/), including its [post-installation](https://docs.docker.com/engine/install/linux-postinstall/) steps, on your development system. To verify installation, test it using the following command. If it displays a test image and a message, it is ready.

In [None]:
!docker run hello-world

## Step 2: Preparing a Model Repository
The models need to be placed and mounted in a particular directory structure and according to the following rules:
```
tree models/
models/
├── model1
│   ├── 1
│   │   ├── ir_model.bin
│   │   └── ir_model.xml
│   └── 2
│       ├── ir_model.bin
│       └── ir_model.xml
├── model2
│   └── 1
│       ├── ir_model.bin
│       ├── ir_model.xml
│       └── mapping_config.json
├── model3
│    └── 1
│        └── model.onnx
├── model4
│      └── 1
│        ├── model.pdiparams
│        └── model.pdmodel
└── model5
       └── 1
         └── TF_fronzen_model.pb
```


* Each model should be stored in a dedicated directory, e.g. model1 and model2.

* Each model directory should include a sub-folder for each of its versions (1,2, etc). The versions and their folder names should be positive integer values.

* Note: In execution, the versions are enabled according to a pre-defined version policy. If the client does not specify the version number in parameters, by default, the latest version is served.

* Every version folder must include model files, that is, `.bin` and `.xml` for OpenVINO IR, `.onnx` for ONNX, `.pdiparams` and `.pdmodel` for Paddle Paddle, and `.pb` for TensorFlow. The file name can be arbitrary.


In [None]:
import os
import shutil

MODEL_DIR = "models/detection/1"
XML_PATH = "../004-hello-detection/model/horizontal-text-detection-0001.xml"
BIN_PATH = "../004-hello-detection/model/horizontal-text-detection-0001.bin"
os.makedirs(MODEL_DIR, exist_ok=True)
shutil.copy(XML_PATH, MODEL_DIR)
shutil.copy(BIN_PATH, MODEL_DIR)
print(f"Model Copied to \"./{MODEL_DIR}\".")

## Step 3: Start the Model Server Container
Pull and start the container:

In [None]:
!docker run -d --rm  --name="ovms" -v $(pwd)/models:/models -p 9000:9000 openvino/model_server:latest \
--model_path /models/detection/ --model_name detection --port 9000

The required Model Server parameters are listed below. For additional configuration options, see the [Model Server Parameters section](https://docs.openvino.ai/latest/ovms_docs_parameters.html#doxid-ovms-docs-parameters).

<table class="table">
<colgroup>
<col style="width: 20%" />
<col style="width: 80%" />
</colgroup>
<tbody>
<tr class="row-odd"><td><p><cite>–rm</cite></p></td>
<td><div class="line-block">
<div class="line">remove the container when exiting the Docker container</div>
</div>
</td>
</tr>
<tr class="row-even"><td><p><cite>-d</cite></p></td>
<td><div class="line-block">
<div class="line">runs the container in the background</div>
</div>
</td>
</tr>
<tr class="row-odd"><td><p><cite>-v</cite></p></td>
<td><div class="line-block">
<div class="line">defines how to mount the model folder in the Docker container</div>
</div>
</td>
</tr>
<tr class="row-even"><td><p><cite>-p</cite></p></td>
<td><div class="line-block">
<div class="line">exposes the model serving port outside the Docker container</div>
</div>
</td>
</tr>
<tr class="row-odd"><td><p><cite>openvino/model_server:latest</cite></p></td>
<td><div class="line-block">
<div class="line">represents the image name; the ovms binary is the Docker entry point</div>
<div class="line">varies by tag and build process - see tags: <a class="reference external" href="https://hub.docker.com/r/openvino/model_server/tags/">https://hub.docker.com/r/openvino/model_server/tags/</a> for a full tag list.</div>
</div>
</td>
</tr>
<tr class="row-even"><td><p><cite>–model_path</cite></p></td>
<td><div class="line-block">
<div class="line">model location, which can be:</div>
<div class="line">a Docker container path that is mounted during start-up</div>
<div class="line">a Google Cloud Storage path <cite>gs://&lt;bucket&gt;/&lt;model_path&gt;</cite></div>
<div class="line">an AWS S3 path <cite>s3://&lt;bucket&gt;/&lt;model_path&gt;</cite></div>
<div class="line">an Azure blob path <cite>az://&lt;container&gt;/&lt;model_path&gt;</cite></div>
</div>
</td>
</tr>
<tr class="row-odd"><td><p><cite>–model_name</cite></p></td>
<td><div class="line-block">
<div class="line">the name of the model in the model_path</div>
</div>
</td>
</tr>
<tr class="row-even"><td><p><cite>–port</cite></p></td>
<td><div class="line-block">
<div class="line">the gRPC server port</div>
</div>
</td>
</tr>
<tr class="row-odd"><td><p><cite>–rest_port</cite></p></td>
<td><div class="line-block">
<div class="line">the REST server port</div>
</div>
</td>
</tr>
</tbody>
</table>

If the serving port ```9000``` is already in use, please switch it to another avaiable port on your system.

## Step 4: Prepare the Example Client Components
OpenVINO Model Server exposes two sets of APIs: one compatible with ```TensorFlow Serving``` and another one, with ```KServe API```, for inference. Both APIs work on ```gRPC``` and ```REST```interfaces. Supporting two sets of APIs makes OpenVINO Model Server easier to plug into existing systems the already leverage one of these APIs for inference. This example will demostrate how to write a TensorFlow Serving API client for object detection.

### Prerequisites

Install necessary packages.

In [None]:
!pip install ovmsclient

### Imports

In [None]:
import cv2
import numpy as np
import matplotlib.pyplot as plt
from ovmsclient import make_grpc_client

### Request Model Status

In [None]:
address = "localhost:9000"

# Bind the grpc address to the client object
client = make_grpc_client(address)
model_status = client.get_model_status(model_name="detection")
print(model_status)

### Request Model Metadata

In [None]:
model_metadata = client.get_model_metadata(model_name="detection")
print(model_metadata)

### Load input image

In [None]:
# Text detection models expect an image in BGR format.
image = cv2.imread("../data/image/intel_rnb.jpg")
fp_image = image.astype("float32")

# Resize the image to meet network expected input sizes.
input_shape = model_metadata['inputs']['image']['shape']
height, width = input_shape[2], input_shape[3]
resized_image = cv2.resize(fp_image, (height, width))

# Reshape to the network input shape.
input_image = np.expand_dims(resized_image.transpose(2, 0, 1), 0)
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

### Request Prediction on a Numpy Array

In [None]:
inputs = {"image": input_image}

# Run inference on model server and receive the result data
boxes = client.predict(inputs=inputs, model_name="detection")['boxes']

# Remove zero only boxes.
boxes = boxes[~np.all(boxes == 0, axis=1)]
print(boxes)

### Visualization

In [None]:
# For each detection, the description is in the [x_min, y_min, x_max, y_max, conf] format:
# The image passed here is in BGR format with changed width and height. To display it in colors expected by matplotlib, use cvtColor function
def convert_result_to_image(bgr_image, resized_image, boxes, threshold=0.3, conf_labels=True):
    # Define colors for boxes and descriptions.
    colors = {"red": (255, 0, 0), "green": (0, 255, 0)}

    # Fetch the image shapes to calculate a ratio.
    (real_y, real_x), (resized_y, resized_x) = bgr_image.shape[:2], resized_image.shape[:2]
    ratio_x, ratio_y = real_x / resized_x, real_y / resized_y

    # Convert the base image from BGR to RGB format.
    rgb_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)

    # Iterate through non-zero boxes.
    for box in boxes:
        # Pick a confidence factor from the last place in an array.
        conf = box[-1]
        if conf > threshold:
            # Convert float to int and multiply corner position of each box by x and y ratio.
            # If the bounding box is found at the top of the image, 
            # position the upper box bar little lower to make it visible on the image. 
            (x_min, y_min, x_max, y_max) = [
                int(max(corner_position * ratio_y, 10)) if idx % 2 
                else int(corner_position * ratio_x)
                for idx, corner_position in enumerate(box[:-1])
            ]

            # Draw a box based on the position, parameters in rectangle function are: image, start_point, end_point, color, thickness.
            rgb_image = cv2.rectangle(rgb_image, (x_min, y_min), (x_max, y_max), colors["green"], 3)

            # Add text to the image based on position and confidence.
            # Parameters in text function are: image, text, bottom-left_corner_textfield, font, font_scale, color, thickness, line_type.
            if conf_labels:
                rgb_image = cv2.putText(
                    rgb_image,
                    f"{conf:.2f}",
                    (x_min, y_min - 10),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    0.8,
                    colors["red"],
                    1,
                    cv2.LINE_AA,
                )

    return rgb_image

In [None]:
plt.figure(figsize=(10, 6))
plt.axis("off")
plt.imshow(convert_result_to_image(image, resized_image, boxes, conf_labels=False))

To stop and remove the model server container, you can use the following command:

In [None]:
!docker stop ovms
!docker rm ovms

## References

1. [OpenVINO™ Model Server](https://docs.openvino.ai/latest/ovms_what_is_openvino_model_server.html)
2. [openvinotoolkit/model_server](https://github.com/openvinotoolkit/model_server/)