# OpenVINO API Tutorial

This notebook explains the basics of the OpenVINO Inference Engine API. It covers:

- [Load Inference Engine and Show Info](#Load-Inference-Engine-and-Show-Info)
- [Loading a Model](#Loading-a-Model)
  - [IR Model](#IR-Model)
  - [ONNX Model](#ONNX-Model)
- [Getting Information about a Model](#Getting-Information-about-a-Model)
  - [Model Inputs](#Model-Inputs)
  - [Model Outputs](#Model-Outputs)
- [Doing Inference on a Model](#Doing-Inference-on-a-Model)
- [Reshaping and Resizing](#Reshaping-and-Resizing)
  - [Change Image Size](#Change-Image-Size)
  - [Change Batch Size](#Change-Batch-Size)
 - [Caching a Model](#Caching-a-Model)
    
The notebook is divided into sections with headers. Each section is standalone and does not depend on previous sections. A segmentation and classification IR model and a segmentation ONNX model are provided as examples. You can replace these model files with your own models. The exact outputs will be different, but the process is the same. 

## Load Inference Engine and Show Info

Initialize Inference Engine with Core()

In [1]:
from openvino.runtime import Core

ie = Core()



Inference Engine can load a network on a device. A device in this context means a CPU, an Intel GPU, a Neural Compute Stick 2, etc. The `available_devices` property shows the devices that are available on your system. The "FULL_DEVICE_NAME" option to `ie.get_property()` shows the name of the device.

In this notebook the CPU device is used. To use an integrated GPU, use `device_name="GPU"` instead. Note that loading a network on GPU will be slower than loading a network on CPU, but inference will likely be faster.

In [2]:
devices = ie.available_devices

for device in devices:
    device_name = ie.get_property(device_name=device, name="FULL_DEVICE_NAME")
    print(f"{device}: {device_name}")

CPU: 11th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz


## Loading a Model

After initializing Inference Engine, first read the model file with `read_model()`, then compile it to the specified device with `compile_model()`. 

### IR Model

An IR (Intermediate Representation) model consists of an .xml file, containing information about network topology, and a .bin file, containing the weights and biases binary data. `read_model()` expects the weights file to be located in the same directory as the xml file, with the same filename, and the extension .bin: `model_weights_file == Path(model_xml).with_suffix(".bin")`. If this is the case, specifying the weights file is optional. If the weights file has a different filename, it can be specified with the `weights` parameter to `read_model()`.

See the [tensorflow-to-openvino](../101-tensorflow-to-openvino/101-tensorflow-to-openvino.ipynb) and [pytorch-onnx-to-openvino](../102-pytorch-onnx-to-openvino/102-pytorch-onnx-to-openvino.ipynb) notebooks for information on how to convert your existing TensorFlow, PyTorch or ONNX model to OpenVINO's IR format with OpenVINO's Model Optimizer. For exporting ONNX models to IR with default settings, the `.serialize()` method can also be used.

In [3]:
from openvino.runtime import Core

ie = Core()
classification_model_xml = "model/classification.xml"

model = ie.read_model(model=classification_model_xml)
compiled_model = ie.compile_model(model=model, device_name="CPU")

### ONNX Model

An ONNX model is a single file. Reading and loading an ONNX model works the same way as reading and loading an IR model. The `model` argument points to the ONNX filename.

In [4]:
from openvino.runtime import Core

ie = Core()
onnx_model_path = "model/segmentation.onnx"
model_onnx = ie.read_model(model=onnx_model_path)
compiled_model_onnx = ie.compile_model(model=model_onnx, device_name="CPU")

The ONNX model can be exported to IR with .serialize():

In [5]:
from openvino.offline_transformations import serialize

serialize(model=model_onnx, model_path="model/exported_onnx_model.xml", weights_path="model/exported_onnx_model.bin")

## Getting Information about a Model

The OpenVINO IENetwork instance stores information about the model. Information about the inputs and outputs of the model are in `model.inputs` and `model.outputs`. These are also properties of the ExecutableNetwork instance. Where we use `model.inputs` and `model.outputs` in the cells below, you can also use `compiled_model.inputs` and `compiled_model.outputs`.

### Model Inputs

In [6]:
from openvino.runtime import Core

ie = Core()
classification_model_xml = "model/classification.xml"
model = ie.read_model(model=classification_model_xml)
model.input(0).any_name

'input'

The cell above shows that the model loaded expects one input, with the name _input_. If you loaded a different model, you may see a different input layer name, and you may see more inputs.

It is often useful to have a reference to the name of the first input layer. For a model with one input, `model.input(0)` gets this name.

In [7]:
input_layer = model.input(0)

Information for this input layer is stored in `inputs`. The next cell prints the input layout, precision and shape.

In [8]:
print(f"input precision: {input_layer.element_type}")
print(f"input shape: {input_layer.shape}")

input precision: <Type: 'float32'>
input shape: {1, 3, 224, 224}


This cell output tells us that the model expects inputs with a shape of [1,3,224,224], and that this is in NCHW layout. This means that the model expects input data with a batch size (N) of 1, 3 channels (C), and images of a height (H) and width (W) of 224. The input data is expected to be of FP32 (floating point) precision.

### Model Outputs

In [9]:
from openvino.runtime import Core

ie = Core()
classification_model_xml = "model/classification.xml"
model = ie.read_model(model=classification_model_xml)
model.output(0).any_name

'MobilenetV3/Predictions/Softmax'

Model output info is stored in `model.outputs`. The cell above shows that the model returns one output, with the name _MobilenetV3/Predictions/Softmax_. If you loaded a different model, you will probably see a different output layer name, and you may see more outputs.

Since this model has one output, follow the same method as for the input layer to get its name.

In [10]:
output_layer = model.output(0)
output_layer

<Output: names[MobilenetV3/Predictions/Softmax] shape{1,1001} type: f32>

Getting the output precision and shape is similar to getting the input precision and shape.

In [11]:
print(f"output precision: {output_layer.element_type}")
print(f"output shape: {output_layer.shape}")

output precision: <Type: 'float32'>
output shape: {1, 1001}


This cell output shows that the model returns outputs with a shape of [1, 1001], where 1 is the batch size (N) and 1001 the number of classes (C). The output is returned as 32-bit floating point.

## Doing Inference on a Model

To do inference on a model, first you need to create inference request by calling `create_infer_request()` being method of _ExecutableNetwork_, the `exec_net` that we loaded with `compile_model()`. Than you have to call `infer()`, being the method of `_InferRequest_`, expects one argument: _inputs_. This is a dictionary, mapping input layer names to input data.

**Preparation: load network**

In [12]:
from openvino.runtime import Core

ie = Core()
classification_model_xml = "model/classification.xml"
model = ie.read_model(model=classification_model_xml)
compiled_model = ie.compile_model(model=model, device_name="CPU")
input_layer = compiled_model.input(0)
output_layer = compiled_model.output(0)

**Preparation: load image and convert to input shape**

To propagate an image through the network, it needs to be loaded into an array, resized to the shape that the network expects, and converted to the network's input layout.

In [13]:
import cv2

image_filename = "data/coco_hollywood.jpg"
image = cv2.imread(image_filename)
image.shape

(663, 994, 3)

The image has a shape of (663,994,3). It is 663 pixels in height, 994 pixels in width, and has 3 color channels. We get a reference to the height and width that the network expects and resize the image to that size.

In [14]:
# N,C,H,W = batch size, number of channels, height, width
N, C, H, W = input_layer.shape
# OpenCV resize expects the destination size as (width, height)
resized_image = cv2.resize(src=image, dsize=(W, H))
resized_image.shape

(224, 224, 3)

Now the image has the width and height that the network expects. It is still in H,W,C format. We change it to N,C,H,W format (where N=1) by first calling `np.transpose()` to change to C,H,W and then adding the N dimension by calling `np.expand_dims()`. Convert the data to FP32 with `np.astype()`.

In [15]:
import numpy as np

input_data = np.expand_dims(np.transpose(resized_image, (2, 0, 1)), 0).astype(np.float32)
input_data.shape

(1, 3, 224, 224)

**Do inference**

Now that the input data is in the right shape, do the inference.

In [16]:
result = compiled_model([input_data])[output_layer]

We can also create `InferRequest` and run `infer` method on request.

In [17]:
request = compiled_model.create_infer_request()
request.infer(inputs={input_layer.any_name: input_data})
result = request.get_output_tensor(output_layer.index).data

`.infer()` sets output tensor, that we can reach using `get_output_tensor()`. Since we know this network returns one output, and we stored the reference to the output layer in the `output_layer.index` parameter, we can get the data with `request.get_output_tensor(output_layer.index)`. To get numpy array from output we need to take parameter `.data`.

In [18]:
result.shape

(1, 1001)

The output shape is (1,1001), which we saw is the expected shape of the output. This output shape indicates that the network returns probabilities for 1001 classes. To transform this into meaningful information, check out the [hello world notebook](../001-hello-world/001-hello-world.ipynb).

## Reshaping and Resizing

### Change Image Size

Instead of reshaping the image to fit the model, you can also reshape the model to fit the image. Note that not all models support reshaping, and models that do may not support all input shapes. The model accuracy may also suffer if you reshape the model input shape.

We first check the input shape of the model, and then reshape to the new input shape.

In [19]:
from openvino.runtime import Core, PartialShape

ie = Core()
segmentation_model_xml = "model/segmentation.xml"
segmentation_model = ie.read_model(model=segmentation_model_xml)
segmentation_input_layer = segmentation_model.input(0)
segmentation_output_layer = segmentation_model.output(0)

print("~~~~ ORIGINAL MODEL ~~~~")
print(f"input shape: {segmentation_input_layer.shape}")
print(f"output shape: {segmentation_output_layer.shape}")

new_shape = PartialShape([1, 3, 544, 544])
segmentation_model.reshape({segmentation_input_layer.any_name: new_shape})
segmentation_compiled_model = ie.compile_model(model=segmentation_model, device_name="CPU")
# help(segmentation_compiled_model)
print("~~~~ RESHAPED MODEL ~~~~")
print(f"model input shape: {segmentation_input_layer.shape}")
print(
    f"compiled_model input shape: "
    f"{segmentation_compiled_model.input(index=0).shape}"
)
print(f"compiled_model output shape: {segmentation_output_layer.shape}")

~~~~ ORIGINAL MODEL ~~~~
input shape: {1, 3, 512, 512}
output shape: {1, 1, 512, 512}
~~~~ RESHAPED MODEL ~~~~
model input shape: {1, 3, 544, 544}
compiled_model input shape: {1, 3, 544, 544}
compiled_model output shape: {1, 1, 544, 544}


The input shape for the segmentation network is [1,3,512,512], with an NCHW layout: the network expects 3-channel images with a width and height of 512 and a batch size of 1. We reshape the network to make it accept input images with a width and height of 544 with the `.reshape()` method of `IENetwork`. This segmentation network always returns arrays with the same width and height as the input width and height, so setting the input dimensions to 544x544 also modifies the output dimensions. After reshaping, compile the network once again.

### Change Batch Size

We can also use `.reshape()` to set the batch size, by increasing the first element of _new_shape_. For example, to set a batch size of two, set `new_shape = (2,3,544,544)` in the cell above. 

In [20]:
from openvino.runtime import Core, PartialShape

ie = Core()
segmentation_model_xml = "model/segmentation.xml"
segmentation_model = ie.read_model(model=segmentation_model_xml)
segmentation_input_layer = segmentation_model.input(0)
segmentation_output_layer = segmentation_model.output(0)
new_shape = PartialShape([2, 3, 544, 544])
segmentation_model.reshape({segmentation_input_layer.any_name: new_shape})
segmentation_compiled_model = ie.compile_model(model=segmentation_model, device_name="CPU")

print(f"input shape: {segmentation_input_layer.shape}")
print(f"output shape: {segmentation_output_layer.shape}")

input shape: {2, 3, 544, 544}
output shape: {2, 1, 544, 544}


The output shows that by setting the batch size to 2, the first element (N) of the input and output shape now has a value of 2. Let's see what happens if we propagate our input image through the network:

In [21]:
import numpy as np
from openvino.runtime import Core, PartialShape

ie = Core()
segmentation_model_xml = "model/segmentation.xml"
segmentation_model = ie.read_model(model=segmentation_model_xml)
segmentation_input_layer = segmentation_model.input(0)
segmentation_output_layer = segmentation_model.output(0)
new_shape = PartialShape([2, 3, 544, 544])
segmentation_model.reshape({segmentation_input_layer.any_name: new_shape})
segmentation_compiled_model = ie.compile_model(model=segmentation_model, device_name="CPU")
input_data = np.random.rand(2, 3, 544, 544)

output = segmentation_compiled_model([input_data])

print(f"input data shape: {input_data.shape}")
print(f"result data data shape: {segmentation_output_layer.shape}")

input data shape: (2, 3, 544, 544)
result data data shape: {2, 1, 544, 544}
