# Working with GPUs in OpenVINO™

This tutorial provides a high-level overview of working with Intel GPUs in OpenVINO. It shows users how to use Query Device to list system GPUs and check their properties, and it explains some of the key properties. It shows how to compile a model on GPU with performance hints and how to use multiple GPUs using MULTI or CUMULATIVE_THROUGHPUT. 

The tutorial shows example commands for benchmark_app that users can run to compare GPU performance in different configurations. It also provides code for a basic end-to-end application that compiles a model on GPU and uses it to run inference.

## Introduction

1. Background and context on how GPUs are used to speed up inference
2. Introduce OpenVINO’s ability to run inference with GPUs
3. How to configure OpenVINO to work with GPUs (link to Configuration for GPU with OpenVINO page)

## Checking GPUs with Query Device

1. List GPUs with ie.get_available_devices
2. Check properties with ie.get_property
3. Brief descriptions of key properties

In this section we will see how to list the available GPUs and check their properties. Some of the key properties will also be defined.

### List GPUs with core.get_available_devices


Firstly, in order to use GPUs, we must make sure our system is detecting them correctly.
Running the following cell should output a list of compatible OpenVINO devices, in which our Intel GPUs should appear.

In [None]:
from openvino.runtime import Core

core = Core()
core.available_devices

If the GPUs are installed correctly in the system and still don't appear in the list, we should follow the steps described [here](https://docs.openvino.ai/latest/openvino_docs_install_guides_configurations_for_intel_gpu.html) and try again. Once we have the GPUs working with OpenVINO we can proceed with the next sections.

### Check properties with core.get_property

Now, to get information and customize the behavior of our GPUs, we can use device properties. Devices in OpenVINO, such as CPUs and GPUs, have two types of properties: read-only and read-write. The former mainly shows information about the hardware itself like the device name or supported data types, while the latter allows us to tweak how the model is compiled, for instance to reduce latency or increase throughput.

So, to get the value of a property, such as the device name, we can use the `core.get_property` method as follows

In [None]:
core.get_property("CPU", "FULL_DEVICE_NAME")

Each device also has a specific property, called `SUPPORTED_PROPERTIES`, that allows us to see all the available properties in the device (including the `SUPPORTED_PROPERTIES` itself). To do this, we repeat the above command

In [None]:
core.get_property("CPU", "SUPPORTED_PROPERTIES")

Note that the value for each property has either a "RO" or "RW", which corresponds to the two types mentioned previously, "**R**ead-**O**nly" and "**R**-**W**rite" respectively.

### Brief descriptions of key properties

Each device has several properties as seen in the last command. Some of the key properties would be

* Read-Only
    * `FULL_DEVICE_NAME` - The product name of the GPU and whether it is an integrated or discrete GPU (iGPU or dGPU).
    * `OPTIMIZATION_CAPABILITIES` - The model data types (INT8, FP16, FP32, etc) that are supported by this GPU.
    * `GPU_EXECUTION_UNITS_COUNT` - The execution cores available in the GPU's architecture, which is a relative measure of the GPU's processing power.
    * `RANGE_FOR_STREAMS` - The number of processing streams available on the GPU that can be used to execute parallel inference requests. When compiling a model in LATENCY or THROUGHPUT mode, OpenVINO will automatically select the best number of streams for low latency or high throughput.
    * `DEVICE_GOPS` - The Giga operations per second count (GFLOPS) for each precision the device supports.
* Read-Write
    * `PERFORMANCE_HINT` - A high-level way to tune the device for a specific performance metric, such as latency or throughput, without worrying about device-specific settings.
    * `INFERENCE_PRECISION_HINT` - A high-level way to specify which model data type to use for inference.

To sum up this section, we can check the value for each property by simply looping through the dictionary returned by `core.get_property("GPU", "SUPPORTED_PROPERTIES")` and then querying for that property.

In [None]:
supported_properties = core.get_property("CPU", "SUPPORTED_PROPERTIES")
indent = len(max(supported_properties, key=len))
for prop in supported_properties:
    print(f"{prop:>{indent}}:", core.get_property("CPU", prop))

## Compiling a Model on GPU

1. Compile with default configuration (ie.compile_model(model, “GPU”)
2. Throughput and latency performance hints
3. Using multiple GPUs with multi-device and cumulative throughput

We now know what is a GPU, how to check if we have one and its properties but, how do we actually **use** one?

### Compile with default configuration (ie.compile_model(model, “GPU”)

In fact, due to the [AUTO plugin](https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_AUTO.html#how-auto-works), we might already be using GPUs if they are properly recognized by OpenVINO. Despite this, if we want to use a specific device, we can do so by compiling our models as follows

In [None]:
# load any model
model = core.read_model(model="../001-hello-world/model/v3-small_224_1.0_float.xml")

# compile the loaded model using the GPU device
compiled_model = core.compile_model(model, "CPU")

Note that above we are using `"GPU"` which is an alias for `"GPU.0"` according to the [docs](https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_GPU.html). Actually, as expected, any of the values returned by `core.available_devices` are valid device specifiers.

Once we have a compiled model, we can check for its properties, just as we did with the GPU devices in the previous section. Instead of calling `core.get_property` with a specific device, we use `compiled_model.get_property` directly with the property name.

In [None]:
compiled_model.get_property("SUPPORTED_PROPERTIES")

As we can appreciate from the above result, the `PERFORMANCE_HINT` property defined earlier reappeared here, let's take a deeper look at it.

### Throughput and latency performance hints

Essentially, the `PERFORMANCE_HINT` allows us to easily tweak either our device or model properties to better suit certain tasks. Currently, it supports three values: `"LATENCY"` which prioritizes short response times for each inference, `"THROUGHPUT"` which helps inferring large amounts of data at the same time like a video feed, and `"CUMULATIVE_THROUGHPUT"` which will see later. The hint's behavior is the same in both the device and in the model, except for the fact that in the device the hint can be considered a "global" property, i.e all models created on that device will use the same value of the hint, whereas in the compiled model acts as a "local" one, i.e only affects that model. See [the docs](https://docs.openvino.ai/latest/openvino_docs_OV_UG_query_api.html#setting-properties-globally) for more info.

Until now we have only queried either the device or the compiled model for properties so, how can we modify those that are read-write such as the `PERFOMANCE_HINT`?

To modify properties in the device, we can use the `core.set_property` method, in which instead of just using the property name, a dictionary is required specifying the value as well. For example, if we want to change the `PERFORMANCE_HINT` to improve latency we can do it like

In [None]:
core.set_property("CPU", {"PERFORMANCE_HINT": "LATENCY"})
core.get_property("CPU", "PERFORMANCE_HINT")

If instead we only want to affect a certain model, we need to add the dictionary as the third argument in the `core.compile_model` method. For instance, to improve throughtput for just this model the following works

In [None]:
compiled_model = core.compile_model(model, "CPU", {"PERFORMANCE_HINT": "THROUGHPUT"})
compiled_model.get_property("PERFORMANCE_HINT")

### Using multiple GPUs with multi-device and cumulative throughput

The latency and throughput hints mentioned above are great and can make a difference when used adequately but they usually use just one device, either due to the [AUTO plugin](https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_AUTO.html#how-auto-works) or by manual specification of the device name as we did above. In case we have multiple devices, such as an integrated and discrete GPU, we could use both at the same time to improve the utlization of our resources. In order to do this, OpenVINO provides a virtual device called [MULTI](https://docs.openvino.ai/nightly/openvino_docs_OV_UG_Running_on_multiple_devices.html), which is just a combination of our existent devices that knows how to split inference work between them, leveraging the capabilities of each device.

So, as an example, if we want to use both our integrated and discrete GPUs and the CPU at the same time, we can compile our model as follows:

In [None]:
compiled_model = core.compile_model(model=model, device_name="MULTI:GPU.1,GPU.0,CPU")

Note that we always need to explicitly specify the device list for MULTI to work, as otherwise MULTI does not know which devices are available for inference. However, this is not the only way to use multiple devices in OpenVINO. There is another `PERFORMANCE_HINT` called CUMULATIVE_THROUGHPUT that works similar to MULTI, except it uses the devices automatically selected by AUTO. This way, we don't need to manually specify which devices to use. Here's an example showing how to use CUMULATIVE_THROUGHPUT, equivalent to the MULTI one:


In [None]:
compiled_model = core.compile_model(model, "AUTO", {"PERFORMANCE_HINT": "CUMULATIVE_THROUGHPUT"})

## Performance Comparison with benchmark_app

1. Commands showing users how to run benchmark_app on GPU with various performance hints
2. Show performance results with a basic model (person-detection-0303, perhaps)

For further details check https://docs.openvino.ai/latest/openvino_inference_engine_tools_benchmark_tool_README.html#benchmark-python-tool

### Commands showing users how to run benchmark_app on GPU with various performance hints

In [None]:
!benchmark_app -m notebooks/001-hello-world/model/v3-small_224_1.0_float.xml -hint latency -d GPU

In [None]:
!benchmark_app -m notebooks/001-hello-world/model/v3-small_224_1.0_float.xml -hint latency

In [None]:
!benchmark_app -m notebooks/001-hello-world/model/v3-small_224_1.0_float.xml -hint throughput

In [None]:
!benchmark_app -m notebooks/001-hello-world/model/v3-small_224_1.0_float.xml -hint cumulative_throughput

### Show performance results with a basic model (person-detection-0303, perhaps)

## Basic Application Using GPUs

1. Provide end-to-end sample code for running inference on GPU in a basic application

## Conclusion

1. GPUs are easy to use with OpenVINO and considerably boost performance
2. Links to OpenVINO documentation where readers can learn more