# Serving Geti model(s) with OpenVINO Model Server

In this notebook, we will show how to create a stand-alone inference server for a Geti project, using the [OpenVINO Model Server (OVMS)](https://docs.openvino.ai/2021.4/ovms_what_is_openvino_model_server.html). Once the server is running, we'll be able to connect to it through the Geti SDK and send inference requests to it.

> NOTE: In this notebook we will run OVMS in a docker container. To make sure you'll be able to follow the notebook smoothly, please ensure that your system has docker installed. You can get Docker Desktop from [here](https://docs.docker.com/get-docker/).

In [None]:
# As usual we will connect to the platform first, using the server details from the .env file

from geti_sdk import Geti
from geti_sdk.utils import get_server_details_from_env

geti_server_configuration = get_server_details_from_env()

geti = Geti(server_config=geti_server_configuration)

### Selecting a project for OVMS deployment
Let's list all projects in the workspace and select one for which to create a deployment

In [None]:
from geti_sdk.rest_clients import ProjectClient

project_client = ProjectClient(session=geti.session, workspace_id=geti.workspace_id)
projects = project_client.list_projects()

## Deploying the project and preparing the OVMS configuration
Let's go with the project we created in notebook [002](002_create_project_from_dataset.ipynb): `COCO animal detection demo`. Like in notebook [008](008_deploy_project.ipynb), we can use the `geti.deploy_project` convenience method. This method accepts a `prepare_ovms_config` input parameter, that we can set to `True` to create the required configuration for the OpenVINO Model Server that we intend to create.

In [None]:
PROJECT_NAME = "COCO animal detection demo"

Before deploying, we need to make sure that the project is trained. Otherwise it will not contain any models to deploy, and the deployment will fail.

> NOTE: If the `COCO animal detection demo` project does not exist on your Geti server, you can either create it by running notebook [002](002_create_project_from_dataset.ipynb), or select a different project to deploy by changing the `PROJECT_NAME` variable above.

In [None]:
from geti_sdk.demos import ensure_trained_example_project

ensure_trained_example_project(geti=geti, project_name=PROJECT_NAME);

Once we are sure that the project has trained models for each task, we can create the deployment in the cell below. Note the `prepare_ovms_config=True` argument which indicates that the model configuration for OVMS will be created.

In [None]:
import os

from pathvalidate import sanitize_filepath

# We'll create a directory with the name of the project to save the deployment to, so we have
# to make sure that the project name can act as folder name.
safe_project_name = sanitize_filepath(PROJECT_NAME).replace(" ", "_")

# Target folder in which to save the deployment and OVMS configuration
output_folder = os.path.join("deployments", safe_project_name)

# Create the deployment and OVMS configuration, and save it to the `output_folder` on disk
deployment = geti.deploy_project(
    project_name=PROJECT_NAME, prepare_ovms_config=True, output_folder=output_folder
)

## Setting up the OpenVINO Model Server
The cell above should create the deployment for you in the folder `deployments/<PROJECT_NAME>`. You should also see a line stating that the OVMS configuration files for the project have been created. Along with the configuration, a readme file `OVMS_README.md` is included that contains detailed instructions on how to get started. 

This notebook follows those instructions, going through them step by step without ever having to leave the notebook.

### Launching the OpenVINO Model Server container

#### Getting the latest OVMS image
The cell below downloads the OVMS docker image to your machine. It assumes you have docker already installed on your system. 

Note the exclamation mark `!` in front of the statement: This indicates to jupyter that the line to follow is not python syntax, but is a shell command instead.

In [None]:
! docker pull openvino/model_server:latest

#### Running the container for the project
The configuration and models for the OVMS container to consume are stored in the `output_folder` that we just specified for the deployment. We need to pass the absolute path to these configuration files to the container when we're running it, so we first have to get the full `ovms_config_path` holding the files.

In [None]:
ovms_config_path = os.path.join(os.getcwd(), output_folder, "ovms_models")

The cell below will run launch the docker container with OVMS. It takes its configuration from the deployment we just created, and listens for inference requests on port 9000. If all went well you should see no warnings or errors, only the ID of the container that is created should be printed (something like `aa1b4acfd7a97e2253aa82401056c2ed97934de65a2d51d4324e36dfa84670f1`)

In [None]:
! docker run -d --rm -v {ovms_config_path}:/models -p 9000:9000 openvino/model_server:latest --port 9000 --config_path /models/ovms_model_config.json

## Making inference requests to OVMS
### Connecting to OVMS
Now that everything is set up and ready, we can connect the deployment we created earlier to the OVMS container that we got running. This is done in the cell below

In [None]:
deployment.load_inference_models(device="http://localhost:9000")

## Running inference on an image
Now, we can load an image as a numpy array (for instance using OpenCV) and use the `deployment.infer` method to generate a prediction for it.
The SDK contains an example image that we use for this. The path to the image is in the `EXAMPLE_IMAGE_PATH` constant, from the `geti_sdk.demos` module.

If you have worked through notebook [008](008_deploy_project.ipynb) you'll notice that the API for local inference or OVMS inference is exactly the same. The only difference being the target for loading the inference models.

In [None]:
import time

import cv2

from geti_sdk.demos import EXAMPLE_IMAGE_PATH

numpy_image = cv2.imread(EXAMPLE_IMAGE_PATH)

# Convert to RGB channel order. All deployed models expect the image in RGB format
numpy_rgb = cv2.cvtColor(numpy_image, cv2.COLOR_BGR2RGB)

t_start = time.time()
prediction = deployment.infer(numpy_rgb)
t_elapsed = time.time() - t_start

print(f"Running OVMS inference on image took {t_elapsed*1000:.2f} milliseconds")

### Inspecting the result
The `Prediction` object generated by `deployment.infer` is equal in structure to the predictions sent by the platform. So let's have a closer look at it. We can do so in two ways: 

1. Visualise it using the `Visualizer` utility class
2. Inspecting its properties via the `prediction.overview` property

Let's show it on the image first

In [None]:
from geti_sdk import Visualizer

visualizer = Visualizer()
result = visualizer.draw(numpy_image, prediction)
visualizer.show_in_notebook(result)

## Switching to local deployment
Of course, we can still use the deployment to load the models locally on the client. That can be done simply by calling `deployment.load_inference_models` again, this time specifying a different device (for example `CPU` or `GPU`).

In [None]:
deployment.load_inference_models(device="CPU")

In [None]:
t_start = time.time()
prediction = deployment.infer(numpy_rgb)
t_elapsed = time.time() - t_start

print(f"Running local inference on image took {t_elapsed*1000:.2f} milliseconds")

Notice that the code to run inference is exactly the same, whether it uses OVMS or loads the models directly to the CPU. 

## Benchmarking inference times

You might have noticed that there is a difference in execution time due to the overhead introduced by OVMS. Let's do some benchmarking to further investigate the difference.

First, we measure the execution time for running inference on CPU locally:

In [None]:
%%timeit -n 10 -r 3

# CPU inference
prediction = deployment.infer(numpy_rgb)

Now switch to OVMS and run the benchmark again:

In [None]:
deployment.load_inference_models(device="http://localhost:9000")

In [None]:
%%timeit -n 10 -r 3

# OVMS inference
prediction = deployment.infer(numpy_rgb)

For the single task `COCO animal detection demo` project, OVMS inference introduces some overhead (the exact amount is depending on the hardware configuration of your system). Note that this does not include any network traffic yet, because OVMS is running on your local system as well: Running OVMS on a remote server will introduce additional overhead.

# OVMS inference for task-chain projects
For projects involving a task-chain, the same process can be used. In this section of the notebook, we'll create a deployment for the project created in notebook [004](004_create_pipeline_project_from_dataset.ipynb), `COCO multitask animal demo`, and do benchmarking on it. If you don't have the project on your server yet, run notebook 004 to create it.

In [None]:
MULTITASK_PROJECT_NAME = "COCO multitask animal demo"

Make sure the project is trained

In [None]:
mt_project = ensure_trained_example_project(
    geti=geti, project_name=MULTITASK_PROJECT_NAME
)

In [None]:
safe_mt_project_name = sanitize_filepath(MULTITASK_PROJECT_NAME).replace(" ", "_")

# Target folder in which to save the deployment and OVMS configuration
mt_output_folder = os.path.join("deployments", safe_mt_project_name)

# Create the deployment and OVMS configuration, and save it to the `output_folder` on disk
multitask_deployment = geti.deploy_project(
    project_name=MULTITASK_PROJECT_NAME,
    prepare_ovms_config=True,
    output_folder=mt_output_folder,
)

The `COCO multitask animal demo` project contains a detection task followed by a classification task. Now, launching the OVMS docker container for the project will serve two models instead of one: One for the first task, and one for the second.

The cell below will launch the container for the project, it will be listening on port 9001 since port 9000 is already occupied by the model server we created previously

In [None]:
mt_ovms_config_path = os.path.join(os.getcwd(), mt_output_folder, "ovms_models")

! docker run -d --rm -v {mt_ovms_config_path}:/models -p 9001:9001 openvino/model_server:latest --port 9001 --config_path /models/ovms_model_config.json

## Running inference and inspecting results
Let's check if OVMS inference for our task-chain project works. First connect to the OpenVINO model server.

In [None]:
multitask_deployment.load_inference_models(device="http://localhost:9001")

Then run inference on the familiar example image

In [None]:
t_start = time.time()
prediction = multitask_deployment.infer(numpy_rgb)
t_elapsed = time.time() - t_start

print(f"Running OVMS inference on image took {t_elapsed*1000:.2f} milliseconds")

result = visualizer.draw(numpy_rgb, prediction)
visualizer.show_in_notebook(result)

## Benchmarking inference times for the task-chain project
Lets do the benchmarking again to get a feeling for the difference between OVMS inference and local inference. We'll start with OVMS inference this time:

In [None]:
%%timeit -n 10 -r 3

# OVMS inference
prediction = multitask_deployment.infer(numpy_rgb)

Now switch the deployment to load the models directly on the CPU

In [None]:
multitask_deployment.load_inference_models(device="CPU")

And lets `timeit` again:

In [None]:
%%timeit -n 10 -r 3

# CPU inference
prediction = multitask_deployment.infer(numpy_rgb)

Also in this case you'll find that OVMS inference introduces some overhead.

# Cleaning up
To clean up, we'll use the `docker stop` command to stop the ovms containers that were created in this notebook. Otherwise they'll keep on running in the background.

First, we get the IDs of the running OVMS containers

In [None]:
container_ids = ! docker ps -q --filter ancestor=openvino/model_server

print(f"Found {len(container_ids)} running OVMS containers.")

Then, stop the containers. Stopping the container will automatically remove them (this is because of the `--rm` flag in the `docker run` command that we used to launch the containers).

In [None]:
# Stop each container
for ovms_container_id in container_ids:
    result = ! docker stop {ovms_container_id}

    if result[0] == ovms_container_id:
        print(f"OVMS container '{ovms_container_id}' stopped and removed successfully.")
    else:
        print(result[0])

# Conclusion
That's it! This notebook should provide a handle on how to deploy and serve models created with the Intel® Geti™ platform. 

The OVMS configuration files created in this notebook can be used independently: They just need to be provided to the OVMS docker container upon startup. This is useful when you aim to deploy a remote OVMS instance. 