# OpenVINO API guide

This notebook explains the basics of the OpenVINO API. It currently covers:
    
    - How to load a model
      - OpenVINO IR format
      - ONNX format
    - How to get information about a model
      - model inputs
      - model outputs
    - How to do inference
    - How to reshape the model for different input sizes
    - How to set the batch size for inference
    
The notebook is divided into sections with headers. Each section is standalone and does not depend on previous sections, with the exception of the imports in the cell below. A segmentation and classification IR model and a segmentation ONNX model are provided as examples. You can replace these model files with your own models. The outputs will be different, but the process described in this notebook is the same for all models of the same format.

## Preparation: Imports

In [1]:
from pathlib import Path

import cv2
import matplotlib.pyplot as plt
import numpy as np
from openvino.inference_engine import IECore

## Load Inference Engine and show info

Initialize Inference Engine with IECore()

In [2]:
ie = IECore()

  and should_run_async(code)


Inference Engine can load a network on a device. A device in this context means a CPU, an Intel GPU, a Neural Compute Stick 2, etc. The `available_devices` property shows the devices that are available on your system. The "FULL_DEVICE_NAME" option to `ie.get_metric` shows the name of the device.

In this notebook the CPU device is used. To use an integrated GPU, use `device_name="GPU"` instead. Note that loading a network on GPU will be slower than loading a network on CPU, but inference will likely be faster.

In [3]:
devices = ie.available_devices
for device in devices:
    device_name = ie.get_metric(device, "FULL_DEVICE_NAME")
    print(f"{device}: {device_name}")

CPU: 11th Gen Intel(R) Core(TM) i7-1160G7 @ 1.20GHz
GPU: Intel(R) Iris(R) Xe Graphics (iGPU)


## Loading a network

After initializing Inference Engine, first read the model file with `read_network`, then load it to the specified device with `load_network`. 

### IR model

An IR (Intermediate Representation) model consists of an .xml file, containing model information, and a .bin file, containing the weights. `read_network` expects the weights file to be located in the same directory as the xml file, with the same filename, and the extension .bin: `model_weights_file == Path(model_xml).with_suffix(".bin")`. If this is the case, specifying the weights file is optional. If the weights file has a different filename, it can be specified with the `weights` parameter to `read_network`.

See the [tensorflow-to-openvino](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/101-tensorflow-to-openvino) and [pytorch-onnx-to-openvino](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/102-pytorch-onnx-to-openvino) notebooks for information on how to convert your existing Tensorflow, PyTorch or ONNX model to OpenVINO's IR format.

In [4]:
ie = IECore()
classification_model_xml = "classification.xml"
net = ie.read_network(model=classification_model_xml) 
exec_net = ie.load_network(network=net, device_name="CPU")

### ONNX model

An ONNX model is a single file. Reading and loading an ONNX model works the same way as reading and loading an IR model. The `model` argument points to the ONNX filename.

In [5]:
ie = IECore()
onnx_model = "segmentation.onnx"
net_onnx = ie.read_network(model=onnx_model)
exec_net_onnx = ie.load_network(network=net_onnx, device_name="CPU")

## Getting information about a model

The OpenVINO IENetwork instance stores information about the model. Information about the inputs and outputs of the model are in `net.input_info` and `net.outputs`. These are also properties of the ExecutableNetwork instance. Where we use `net.input_info` and `net.outputs` in the cells below, you can also use `exec_net.input_info` and `exec_net.outputs`.

### Model Inputs

In [6]:
ie = IECore()
classification_model_xml = "classification.xml"
net = ie.read_network(model=classification_model_xml) 
net.input_info

{'input': <openvino.inference_engine.ie_api.InputInfoPtr at 0x2b88ea49930>}

The cell above shows that the model loaded expects one input, with the name _input_. If you loaded a different model, you may see a different input layer name, and you may see more inputs.

It is often useful to have a reference to the name of the first input layer. For a model with one input, `next(iter(net.input_info))` gets this name.

In [7]:
input_layer = next(iter(net.input_info))
input_layer

'input'

Information for this input layer is stored in `input_info`. The next cell prints the input layout, precision and shape

In [8]:
print(f"input layout: {net.input_info[input_layer].layout}")
print(f"input precision: {net.input_info[input_layer].precision}")
print(f"input shape: {net.input_info[input_layer].tensor_desc.dims}")

input layout: NCHW
input precision: FP32
input shape: [1, 3, 224, 224]


This cell output tells us that the model expects inputs with a shape of [1,3,224,244], and that this is NCHW layout. This means that the model expects input data with a batch size (N) of 1, 3 channels (C) and images of a height (H) and width (W) of 224. The input data is expected to be of FP32 (floating point) precision.

### Model Outputs

In [9]:
ie = IECore()
classification_model_xml = "classification.xml"
net = ie.read_network(model=classification_model_xml) 
net.outputs

{'MobilenetV3/Predictions/Softmax': <openvino.inference_engine.ie_api.DataPtr at 0x2b89ed78b10>}

Model output info is stored in `net.outputs`. The cell above shows that the model returns one output, with the name _MobilenetV3/Predictions/Softmax_. If you loaded a different model, you will probably see a different output layer name, and you may see more outputs.

To get a reference to the name of the output layer, we follow the same method as for the input layer

In [10]:
output_layer = next(iter(net.outputs))
output_layer

'MobilenetV3/Predictions/Softmax'

Getting the output layout, precision and shape is similar to getting the input layout, precision and shape

In [11]:
print(f"output layout: {net.outputs[output_layer].layout}")
print(f"output precision: {net.outputs[output_layer].precision}")
print(f"output shape: {net.outputs[output_layer].shape}")

output layout: NC
output precision: FP32
output shape: [1, 1001]


This cell output shows that the model returns outputs with a shape of [1, 1001], where 1 is the batch size (N) and 1001 the number of classes (C). The output is returned as 32 bit floating point.

## Inference

To do inference on a model, call the `infer()` method of the _ExecutableNetwork_, the `exec_net` that we loaded with `load_network`. `infer()` expects one argument: _inputs_. This is a dictionary, mapping input layer names to input data.

**Preparation: load network**

In [12]:
ie = IECore()
classification_model_xml = "classification.xml"
net = ie.read_network(model=classification_model_xml)
exec_net = ie.load_network(network=net, device_name="CPU")
input_layer = next(iter(net.input_info))
output_layer = next(iter(net.outputs))

**Preparation: load image and convert to input shape**

To propagate an image to the network, it needs to be loaded, resized to the shape that the network expects, and converted to the layout that the network expects.

In [13]:
image_filename = "coco_hollywood.jpg"
image = cv2.imread(image_filename)
image.shape

(663, 994, 3)

The image has a shape of (664,994,3). It is 663 pixels in height, 994 pixels in width, and has 3 color channels. 

We get a reference to the height and width that the network expects, and resize to that size

In [14]:
# N,C,H,W = batch size, number of channels, height, width
N, C, H, W = net.input_info[input_layer].tensor_desc.dims

In [15]:
resized_image = cv2.resize(src=image, dsize=(W, H))  # OpenCV resize expects the destination size as (width, height)
print(f"resized image shape: {resized_image.shape}")

resized image shape: (224, 224, 3)


Now the image has the width and height that the network expects. It is still in H,C,W format, we change it to N,C,H,W format (where N=1) by first calling `np.transpose` to change to C,H,W and then adding the N dimension by calling `np.expand_dims`. Convert the data to FP32 with `np.astype()`.

In [16]:
input_data = np.expand_dims(np.transpose(resized_image, (2, 0, 1)), 0).astype(np.float32)
input_data.shape

(1, 3, 224, 224)

**Do inference**

Now that the input data is in the right shape, doing inference is one simple command:

In [17]:
result = exec_net.infer({input_layer: input_data})
result

{'MobilenetV3/Predictions/Softmax': array([[1.9758582e-04, 5.8728285e-05, 6.4592481e-05, ..., 4.0716528e-05,
         1.7331471e-04, 1.3031650e-04]], dtype=float32)}

`.infer()` returns a dictionary, mapping output layers to data. Since we know this network returns one output, and we stored the reference to the output layer in the `output_layer` variable, we can get the data with `result[output_layer]`

In [18]:
output = result[output_layer]
output.shape

(1, 1001)

The output shape is (1,1001), which we saw is indeed the expected shape of the output. This output shape indicates that the network returns probabilities for 1001 classes. To transform this in meaningful information, check out the [hello world notebook](../001-hello-world)

## Reshaping a network

### Change image size

Instead of reshaping the image to fit the model, you can also reshape the model to fit the image. Note that not all input shapes will work for every model, and the model accuracy may also suffer if you reshape the model inputs.

In [19]:
ie = IECore()
segmentation_model_xml = "segmentation.xml"
segmentation_net = ie.read_network(model=segmentation_model_xml)
segmentation_input_layer = next(iter(segmentation_net.input_info))
segmentation_output_layer = next(iter(segmentation_net.outputs))

print(f"input shape: {segmentation_net.input_info[segmentation_input_layer].tensor_desc.dims}")
print(f"input layout: {segmentation_net.input_info[segmentation_input_layer].layout}")
print(f"output shape: {segmentation_net.outputs[segmentation_output_layer].shape}")

input shape: [1, 3, 512, 512]
input layout: NCHW
output shape: [1, 1, 512, 512]


The input shape for the network is [1,3,256,256], with NHCW layout: the network expects 3-channel images with width and height of 256 and a batch size of 1. We can reshape the network to make it accept input images with width and height of 512 with the `.reshape()` method of `IENetwork`. After reshaping, load the network to the device again

In [20]:
new_shape = (1, 3, 512, 512)

segmentation_net.reshape({segmentation_input_layer: new_shape})
segmentation_exec_net = ie.load_network(network=segmentation_net, device_name="CPU")
print(f"net input shape: {segmentation_net.input_info[segmentation_input_layer].tensor_desc.dims}")
print(f"exec_net input shape: {segmentation_exec_net.input_info[segmentation_input_layer].tensor_desc.dims}")

net input shape: [1, 3, 512, 512]
exec_net input shape: [1, 3, 512, 512]


### Change batch size

We can also use reshape to set the batch size, by increasing the first element of _new_shape_. For example, to set a batch size of two, set `new_shape = (2,3,512,512)` in the cell above. If you only want to change the batch size, you can also set the `batch_size` property directly. The batch size will be treated as the maximum batch size for the network: setting a batch size of 2 means that you can provide input data with a batch size of 1 or 2. 

In [21]:
ie = IECore()
segmentation_model_xml = "segmentation.xml"
segmentation_net = ie.read_network(model=segmentation_model_xml)
segmentation_input_layer = next(iter(segmentation_net.input_info))
segmentation_output_layer = next(iter(segmentation_net.outputs))
segmentation_net.batch_size = 2
segmentation_exec_net = ie.load_network(network=segmentation_net, device_name="CPU")
print(f"input shape: {segmentation_net.input_info[segmentation_input_layer].tensor_desc.dims}")
print(f"input layout: {segmentation_net.input_info[segmentation_input_layer].layout}")
print(f"output shape: {segmentation_net.outputs[segmentation_output_layer].shape}")

input shape: [2, 3, 512, 512]
input layout: NCHW
output shape: [2, 1, 512, 512]


Notice that the output above shows that by setting the batch size, the output shape now shows a batch size of 2 indeed. Let's see what happens if we propagate our input image through the network:

In [22]:
ie = IECore()
segmentation_model_xml = "segmentation.xml"
segmentation_net = ie.read_network(model=segmentation_model_xml)
segmentation_input_layer = next(iter(segmentation_net.input_info))
segmentation_output_layer = next(iter(segmentation_net.outputs))
input_data = np.random.rand(*segmentation_net.input_info[segmentation_input_layer].tensor_desc.dims)
segmentation_net.batch_size = 2
segmentation_exec_net = ie.load_network(network=segmentation_net, device_name="CPU")

result_batch = segmentation_exec_net.infer({segmentation_input_layer: input_data})
print(f"input data shape: {input_data.shape}")
print(f"output data shape: {result_batch[segmentation_output_layer].shape}")

input data shape: (1, 3, 512, 512)
output data shape: (2, 1, 512, 512)


The output of the cell above shows that the if the batch size is 2, the network output will have a batch size of 2, even if only one image was propagated through the network.

Create input data with a batch size of two by creating random data with a batch size of 2:

In [None]:
ie = IECore()
segmentation_model_xml = "segmentation.xml"
segmentation_net = ie.read_network(model=segmentation_model_xml)
segmentation_input_layer = next(iter(segmentation_net.input_info))
segmentation_output_layer = next(iter(segmentation_net.outputs))
segmentation_net.batch_size = 2
input_data = np.random.rand(*segmentation_net.input_info[segmentation_input_layer].tensor_desc.dims)
segmentation_exec_net = ie.load_network(network=segmentation_net, device_name="CPU")

result_batch = segmentation_exec_net.infer({segmentation_input_layer: input_data})
print(f"input data shape: {input_data.shape}")
print(f"output data shape: {result_batch_1[segmentation_output_layer].shape}")