# latency
- original framework
- onnx
- ir
- gpu
- auto
- shared memory
- additional configs
- prepostprocessor
- async mode?
- async infer queue
- callback?

The goal of this notebook is to be a step-by-step tutorial for improving performance for inferencing in a latency mode. Low latency is especially desired in real-time applications, when the results are needed as soon as possible after the data appeared. This notebook assumes computer vision workflow and uses A model. We will simulate a camera application which provides frames one by one.


In [None]:
import sys
import time
from pathlib import Path

sys.path.append("../utils")
import notebook_utils as utils

## Data

For all experiments below we're using the same image.

In [None]:
import numpy as np
import cv2

IMAGE_WIDTH = 512
IMAGE_HEIGHT = 512

# or maybe better a video?
image = utils.load_image("../data/image/intel_rnb.jpg")
input_image = cv2.resize(image, dsize=(IMAGE_WIDTH, IMAGE_HEIGHT), interpolation=cv2.INTER_AREA)
input_image = np.expand_dims(np.transpose(input_image, axes=(2, 0, 1)), axis=0)
utils.show_array(image)

## Model

The model we selected is for object detection.

In [None]:
from torchvision.models import detection

base_model_dir = Path("model")
model_name = "maskrcnn_resnet50_fpn_v2"

pytorch_model = detection.maskrcnn_resnet50_fpn_v2(weigts=detection.MaskRCNN_ResNet50_FPN_V2_Weights)
pytorch_model.eval()

## Hardware

The following hardware is used in the benchmarking process.

In [None]:
import openvino.runtime as ov

core = ov.Core()

for device in core.available_devices:
    device_name = core.get_property(device, "FULL_DEVICE_NAME")
    print(f"{device}: {device_name}")

## Optimizations

We're defining a benchmark model function to use it for all optimized models below. It runs inference 100 times and average the time.

In [None]:
# make it 100
INFER_NUMBER = 10

def benchmark_model(model, input, model_name, device="CPU"):
    start = time.perf_counter()
    for _ in range(INFER_NUMBER):
        model(input)
    end = time.perf_counter()

    infer_time = end - start

    print(f"{model_name} on {device}: {infer_time/INFER_NUMBER:.3f} seconds per image ({INFER_NUMBER/infer_time:.2f} FPS)")

def show_result(model, result):
    # draw results
    # utils.viz_result_image(image, result, resize=True)
    pass

### PyTorch model

First, we're benchmarking the original PyTorch model without any optimizations applied.

In [None]:
import torch

with torch.no_grad():
    result = pytorch_model(torch.as_tensor(input_image).float())[0]["boxes"].detach().numpy()
    show_result(pytorch_model, result=result)
    benchmark_model(pytorch_model, input=torch.as_tensor(input_image).float(), model_name="PyTorch model")

### ONNX model

The first optimization is exporting the PyTorch model to ONNX and run it in OpenVINO.

In [None]:
onnx_path = base_model_dir / Path(f"{model_name}_{IMAGE_WIDTH}_{IMAGE_HEIGHT}").with_suffix(".onnx")

if not onnx_path.exists():
    dummy_input = torch.randn(1, 3, IMAGE_HEIGHT, IMAGE_WIDTH)
    torch.onnx.export(pytorch_model, dummy_input, onnx_path)

onnx_model = core.read_model(onnx_path)
onnx_model = core.compile_model(onnx_model, device_name="CPU")

In [None]:
show_result(model=onnx_model, result=result)
benchmark_model(model=onnx_model, input=input_image, model_name="ONNX model")

del onnx_model

## OpenVINO IR model

Let's convert the ONNX model to OpenVINO Intermediate Representation (IR) and run it.

In [None]:
from openvino.tools import mo

ov_model = mo.convert_model(onnx_path)
ov_cpu_model = core.compile_model(ov_model, device_name="CPU")

show_result(model=ov_cpu_model, result=result)
benchmark_model(model=ov_cpu_model, input=input_image, model_name="OpenVINO model")

del ov_cpu_model

## Conclusions

We already showed the steps needed to improve the performance for an object detection model. Even if you experience much better performance after running this notebook, please note this may not be a true for every hardware or every model.