# Working with GPUs in OpenVINO™

This tutorial provides a high-level overview of working with Intel GPUs in OpenVINO. It shows users how to use Query Device to list system GPUs and check their properties, and it explains some of the key properties. It shows how to compile a model on GPU with performance hints and how to use multiple GPUs using MULTI or CUMULATIVE_THROUGHPUT. 

The tutorial also shows example commands for benchmark_app that users can run to compare GPU performance in different configurations. It also provides code for a basic end-to-end application that compiles a model on GPU and uses it to run inference.

## Introduction

1. Background and context on how GPUs are used to speed up inference
2. Introduce OpenVINO’s ability to run inference with GPUs
3. How to configure OpenVINO to work with GPUs (link to Configuration for GPU with OpenVINO page)

Originally, graphic processing units (GPUs) began as specialized chips developed to accelerate the rendering of computer graphics. In contrast to CPUs, which have few but powerful cores, GPUs have many more specialized cores, making them ideal for workloads that can be parallelized into simpler tasks. Nowadays, one such workload is deep learning, in which GPUs shine the brightest when training several neural network layers or on massive sets of certain data, like 2D images. For inference though, GPUs tend to not be a requirement but depending on the application, the size of the network and the amount of data to be processed that need may arise. In any case, OpenVINO already has the ability to do so thanks to their [GPU plugin](https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_GPU.html), which can be as easy as specifying the GPU device when compiling a model, as we will see later on. To get started, make sure to follow the [instructions to install OpenVINO](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html) and keep reading!

## Checking GPUs with Query Device

1. List GPUs with ie.get_available_devices
2. Check properties with ie.get_property
3. Brief descriptions of key properties

In this section we will see how to list the available GPUs and check their properties. Some of the key properties will also be defined.

### List GPUs with core.get_available_devices


OpenVINO Runtime provides the `available_devices` method for checking which devices are avaiable for inferencing. The following code will output a list of compatible OpenVINO devices, in which our Intel GPUs should appear.

In [None]:
from openvino.runtime import Core

core = Core()
core.available_devices

If the GPUs are installed correctly in the system and still don't appear in the list, follow the steps described [here](https://docs.openvino.ai/latest/openvino_docs_install_guides_configurations_for_intel_gpu.html) to configure your GPU drivers to work with OpenVINO. Once we have the GPUs working with OpenVINO, we can proceed with the next sections.

### Check properties with core.get_property

To get information about our GPUs, we can use device properties. In OpenVINO, devices have properties that describe their characteristics and configuration. Each property has a name and associated value that can be queried with the `get_property` method.

To get the value of a property, such as the device name, we can use the `get_property` method as follows:

In [None]:
core.get_property("GPU", "FULL_DEVICE_NAME")

Each device also has a specific property called `SUPPORTED_PROPERTIES`, that allows us to see all the available properties in the device. We can check the value for each property by simply looping through the dictionary returned by `core.get_property("GPU", "SUPPORTED_PROPERTIES")` and then querying for that property.

In [None]:
device = "GPU"

print(f"{device} SUPPORTED_PROPERTIES:\n")
supported_properties = core.get_property(device, "SUPPORTED_PROPERTIES")
indent = len(max(supported_properties, key=len))

for property_key in supported_properties:
    if property_key not in ('SUPPORTED_METRICS', 'SUPPORTED_CONFIG_KEYS', 'SUPPORTED_PROPERTIES'):
        try:
            property_val = core.get_property(device, property_key)
        except TypeError:
            property_val = 'UNSUPPORTED TYPE'
        print(f"{property_key:<{indent}}: {property_val}")

### Brief descriptions of key properties

Each device has several properties as seen in the last command. Some of the key properties are:

* `FULL_DEVICE_NAME` - The product name of the GPU and whether it is an integrated or discrete GPU (iGPU or dGPU).
* `OPTIMIZATION_CAPABILITIES` - The model data types (INT8, FP16, FP32, etc) that are supported by this GPU.
* `GPU_EXECUTION_UNITS_COUNT` - The execution cores available in the GPU's architecture, which is a relative measure of the GPU's processing power.
* `RANGE_FOR_STREAMS` - The number of processing streams available on the GPU that can be used to execute parallel inference requests. When compiling a model in LATENCY or THROUGHPUT mode, OpenVINO will automatically select the best number of streams for low latency or high throughput.
* `PERFORMANCE_HINT` - A high-level way to tune the device for a specific performance metric, such as latency or throughput, without worrying about device-specific settings.

To learn more about devices and properties, see the [Query Device Properties](https://docs.openvino.ai/latest/openvino_docs_OV_UG_query_api.html) page.

## Compiling a Model on GPU

1. Compile with default configuration (ie.compile_model(model, “GPU”)
2. Throughput and latency performance hints
3. Using multiple GPUs with multi-device and cumulative throughput

We now know how to list the GPUs in our system and check their properties, but how do we actually use one? OpenVINO provides a [GPU plugin](https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_GPU.html) that allows us to easily compile models and run them on the GPU.

### Compile with default configuration

To compile our model, we first need to read it using the `read_model` method. Then, we can use the `compile_model` method and specify the name of the device we want to compile the model on (in this case, "GPU").

In [None]:
model = core.read_model(model="model/v3-small_224_1.0_float.xml")
compiled_model = core.compile_model(model, "GPU")

If you have multiple GPUs in the system, you can specify which one to use by using "GPU.0", "GPU.1", etc. Any of the device names returned by `core.available_devices` are valid device specifiers. You may also use "AUTO", which will automatically select the best device for inferencing (which is often the GPU). To learn more about AUTO plugin, visit the [Automatic Device Selection](https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_AUTO.html) page.

### Throughput and latency performance hints

To simplify device and pipeline configuration, OpenVINO provides high-lvel performance hints that automatically set the batch size and number of parallel threads to use for inferencing. The "LATENCY" performance hint optimizes for fast inferencing times while the "THROUGHPUT" performance hint optimizes for high overall bandwith or FPS.

To use the "LATENCY" performance hint, add `{"PERFORMANCE_HINT": "LATENCY"}` when compiling the model as shown below. For GPUs, this automaticallt minimizes the batch size and number of parallel streams such that all of the compute resources can focus on completing a single inference as fast as possible.

In [None]:
compiled_model = core.compile_model(model, "GPU", {"PERFORMANCE_HINT": "LATENCY"})

To use the "THROUGHPUT" performance hint, add `{"PERFORMANCE_HINT": "THROUGHPUT"}` when compiling the model. For GPUs, this creates multiple processing streams to efficiently utilize all the execution cores and optimizes the batch size to fill the memory available.

In [None]:
compiled_model = core.compile_model(model, "GPU", {"PERFORMANCE_HINT": "THROUGHPUT"})

### Using multiple GPUs with multi-device and cumulative throughput

The latency and throughput hints mentioned above are great and can make a difference when used adequately but they usually use just one device, either due to the [AUTO plugin](https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_AUTO.html#how-auto-works) or by manual specification of the device name as we did above. In case we have multiple devices, such as an integrated and discrete GPU, we could use both at the same time to improve the utlization of our resources. In order to do this, OpenVINO provides a virtual device called [MULTI](https://docs.openvino.ai/nightly/openvino_docs_OV_UG_Running_on_multiple_devices.html), which is just a combination of our existent devices that knows how to split inference work between them, leveraging the capabilities of each device.

So, as an example, if we want to use both our integrated and discrete GPUs and the CPU at the same time, we can compile our model as follows:

In [None]:
compiled_model = core.compile_model(model=model, device_name="MULTI:GPU.1,GPU.0,CPU")

Note that we always need to explicitly specify the device list for MULTI to work, as otherwise MULTI does not know which devices are available for inference. However, this is not the only way to use multiple devices in OpenVINO. There is another performance hint called "CUMULATIVE_THROUGHPUT" that works similar to MULTI, except it uses the devices automatically selected by AUTO. This way, we don't need to manually specify which devices to use. Here is an example showing how to use "CUMULATIVE_THROUGHPUT", equivalent to the MULTI one:


In [None]:
compiled_model = core.compile_model(model, "AUTO", {"PERFORMANCE_HINT": "CUMULATIVE_THROUGHPUT"})

Important note: the “THROUGHPUT”, “MULTI”, and “CUMULATIVE_THROUGHPUT” modes are only applicable to asynchronous inferencing pipelines. The example at the end of this article shows how to set up an asynchronous pipeline that takes advantage of parallelism to increase throughput. To learn more, see [Asynchronous Inferencing](https://docs.openvino.ai/latest/openvino_docs_ie_plugin_dg_async_infer_request.html) in OpenVINO.

## Performance Comparison with benchmark_app

1. Commands showing users how to run benchmark_app on GPU with various performance hints
2. Show performance results with a basic model (person-detection-0303, perhaps)

Given all the different options available when compiling a model, it may be difficult to know which settings work best for a certain application. Thankfully, OpenVINO provides a performance benchmarking tool called `benchmark_app`.

### Commands showing users how to run benchmark_app on GPU with various performance hints

The basic syntax of benchmark_app is as follows:

```bash
benchmark_app -m PATH_TO_MODEL -d TARGET_DEVICE -hint {throughput,cumulative_throughput,latency,none}
```
where TARGET_DEVICE is any device shown by the `available_devices` method as well as the MULTI and AUTO devices we saw previously, and the value of hint should be one of the values between brackets. 

Note that benchmark_app only requires the model path to run but both the device and hint arguments will be useful to us. For more advanced usages, the tool itself has other options that can be checked by running `benchmark_app -h` or reading the [docs](https://docs.openvino.ai/latest/openvino_inference_engine_tools_benchmark_tool_README.html). The following example shows how to benchmark a simple model using a GPU with a latency focus:

In [None]:
!benchmark_app -m model/v3-small_224_1.0_float.xml -d CPU -hint latency

### Show performance results with a basic model (person-detection-0303, perhaps)

For completeness, here we list some of the comparisons we may want to do by varying the device and hint used. Note that the actual performance may depend in the hardware used, but overall we should expect a GPU to be better than a CPU, whereas multiple GPUs should be better than a single GPU as long as there is enough work for each of them.

#### CPU vs GPU with latency hint

In [None]:
!benchmark_app -m model/v3-small_224_1.0_float.xml -d CPU -hint latency

In [None]:
!benchmark_app -m model/v3-small_224_1.0_float.xml -d GPU -hint latency

#### CPU vs GPU with throughput hint

In [None]:
!benchmark_app -m model/v3-small_224_1.0_float.xml -d CPU -hint throughput

In [None]:
!benchmark_app -m model/v3-small_224_1.0_float.xml -d GPU -hint throughput

#### Single GPU vs Multiple GPUs

In [None]:
!benchmark_app -m model/v3-small_224_1.0_float.xml -d GPU.1 -hint throughput

In [None]:
!benchmark_app -m model/v3-small_224_1.0_float.xml -d AUTO:GPU.1,GPU.0 -hint cumulative_throughput

In [None]:
!benchmark_app -m model/v3-small_224_1.0_float.xml -d MULTI:GPU.1,GPU.0 -hint throughput

## Basic Application Using GPUs

1. Provide end-to-end sample code for running inference on GPU in a basic application

We will now show an end-to-end object detection example running on GPU with the "THROUGHPUT" hint.

    1. Import necessary packages
    2. Download and convert ssdlite-mobilenet-v2 model
    3. Read ssd-mobilenet model and compile model on GPU in THROUGHPUT mode
    4. Load every frame of a video and resize it to shape expected by model
    5. Set up AsyncInferQueue, process every frame of video and store results, time how long it takes
    6. Post-process results to print every object detected

### Import Necessary Packages

In [None]:
import os
import sys
import time
from pathlib import Path

import cv2
import matplotlib.pyplot as plt
import numpy as np
from IPython.display import Markdown, Video, display
from openvino.runtime import AsyncInferQueue, CompiledModel, Core, InferRequest

core = Core()
core.available_devices

### Download the Model

In [None]:
# A directory where the model will be downloaded.
base_model_dir = "model"

# The name of the model from Open Model Zoo
model_name = "ssdlite_mobilenet_v2"

model_path = f"model/public/{model_name}"

if not os.path.exists(model_path):
    download_command = f"omz_downloader " \
                       f"--name {model_name} " \
                       f"--output_dir {base_model_dir} " \
                       f"--cache_dir {base_model_dir}"
    ! $download_command

### Convert the Model to OpenVINO IR format

In [None]:
precision = "FP16"

# The output path for the conversion.
converted_model_path = f"model/public/{model_name}/{precision}/{model_name}.xml"

if not os.path.exists(converted_model_path):
    convert_command = f"omz_converter " \
                      f"--name {model_name} " \
                      f"--download_dir {base_model_dir} " \
                      f"--precisions {precision}"
    ! $convert_command

### Compile the Model

In [None]:
model = core.read_model(model=converted_model_path)
compiled_model = core.compile_model(model=model, device_name="CPU", config={"PERFORMANCE_HINT": "THROUGHPUT"})

# Get the input and output nodes.
input_layer = compiled_model.input(0)
output_layer = compiled_model.output(0)

input_keys = list(compiled_model.inputs)
print(input_keys)

# Get the input size.
num, height, width, channels = input_layer.shape
print('Model input shape:', num, height, width, channels)

### Load and Preprocess Video Frames 

In [None]:
### Load video
video_file = "../data/video/Coco Walking in Berkeley.mp4"
video = cv2.VideoCapture(video_file)
framebuf = []

print('Loading video...')
while video.isOpened():
    ret, frame = video.read()
    if not ret:
        print('Video loaded!')
        video.release()
        break
    
    # Preprocess frames - convert them to shape expected by model
    input_frame = cv2.resize(src=frame, dsize=(width, height), interpolation=cv2.INTER_AREA)
    input_frame = np.expand_dims(input_frame, axis=0)

    # Append frame to framebuffer
    framebuf.append(input_frame)
    

print('Frame shape: ', framebuf[0].shape)
print('Number of frames: ', len(framebuf))

# Show original video file
Video(video_file)

### Define Model Output Classes

In [None]:
# COCO classes! (the dataset, not the dog)
classes = [
    "background", "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train",
    "truck", "boat", "traffic light", "fire hydrant", "street sign", "stop sign",
    "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow", "elephant",
    "bear", "zebra", "giraffe", "hat", "backpack", "umbrella", "shoe", "eye glasses",
    "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite",
    "baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle",
    "plate", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
    "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair",
    "couch", "potted plant", "bed", "mirror", "dining table", "window", "desk", "toilet",
    "door", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone", "microwave", "oven",
    "toaster", "sink", "refrigerator", "blender", "book", "clock", "vase", "scissors",
    "teddy bear", "hair drier", "toothbrush", "hair brush"
]

### Set up Asynchronous Pipeline

#### Callback Definition

In [None]:
def completion_callback(infer_request: InferRequest, results) -> None:
    predictions = next(iter(infer_request.results.values()))
    results.append(predictions[:10]) # Grab first 10 predictions for this frame

#### Create Async Pipeline

In [None]:
# Create async queue with optimal number of infer requests
infer_queue = AsyncInferQueue(compiled_model)
infer_queue.set_callback(completion_callback)

### Perform Inference

In [None]:
results = []
start_time = time.time()
for i, input_frame in enumerate(framebuf):
    infer_queue.start_async({0: input_frame}, results)

infer_queue.wait_all()
stop_time = time.time()

total_time = stop_time - start_time
time_per_frame = total_time / len(framebuf)
fps = len(framebuf) / total_time
print(f'Total time to inference all frames: {total_time:.3f}s')
print(f'Time per frame: {time_per_frame:.6f}s ({fps:.3f} FPS)')

### Process Results

In [None]:
# Set minimum detection threshold
min_thresh = .6

# Load video
video = cv2.VideoCapture(video_file)

# Get video parameters
frame_width = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(video.get(cv2.CAP_PROP_FPS))
fourcc = int(video.get(cv2.CAP_PROP_FOURCC))

output = cv2.VideoWriter('output.mp4', fourcc, fps, (frame_width, frame_height))

while video.isOpened():
    current_frame = int(video.get(cv2.CAP_PROP_POS_FRAMES))
    ret, frame = video.read()
    if not ret:
        print('Video loaded!')
        output.release()
        video.release()
        break

    # prediction contains [image_id, label, conf, x_min, y_min, x_max, y_max] according to model
    for prediction in np.squeeze(results[current_frame]):
        if prediction[2] > min_thresh:
            x_min = int(prediction[3] * frame_width)
            y_min = int(prediction[4] * frame_height)
            x_max = int(prediction[5] * frame_width)
            y_max = int(prediction[6] * frame_height)
            label = classes[int(prediction[1])]
            
            # Draw a bounding box with its label above it
            image = cv2.rectangle(frame, (x_min, y_min), (x_max, y_max), (0,255,0), 2)
            cv2.putText(image, label, (x_min, y_min - 10), cv2.FONT_ITALIC, 1, (255,0,0), 2)

    output.write(frame)

# Show output
Video("output.mp4")

## Conclusion

1. GPUs are easy to use with OpenVINO and considerably boost performance
2. Links to OpenVINO documentation where readers can learn more

In this tutorial we saw how easy it is to use one or more GPUs in OpenVINO, check their properties, and even tailor our model performance through the different performance hints. We also went through a basic object detection application that used a GPU and displayed the detected bounding boxes.

To read more about any of these topics, feel free to visit their corresponding documentation:
* [GPU plugin](https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_GPU.html)
* [AUTO plugin](https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_AUTO.html)
* [MULTI device mode](https://docs.openvino.ai/nightly/openvino_docs_OV_UG_Running_on_multiple_devices.html)
* [Query Device Properties](https://docs.openvino.ai/latest/openvino_docs_OV_UG_query_api.html)
* [Configurations for GPUs with OpenVINO](https://docs.openvino.ai/latest/openvino_docs_install_guides_configurations_for_intel_gpu.html)
* [Benchmark python tool](https://docs.openvino.ai/latest/openvino_inference_engine_tools_benchmark_tool_README.html)
* [Asynchronous Inferencing](https://docs.openvino.ai/latest/openvino_docs_ie_plugin_dg_async_infer_request.html)