# Welcome to OpenVINO(TM) Hands-on Lab Session 3 - Advanced Features

In this tutorial, we will look at some of the advanced featues of Intel(R) Distribution of OpenVINO(TM) to help you get performance upgrades, using multiple target devices at the same time, creating custom layers and configuring execution of layers manually on different hardware devices.

Jupyter notebook is a browser based IDE and allows partial code executions and text based inputs as markdown at each cell.

Please follow the notations for each instruction in the following sections:

- If tutorial wants you run a certain command on terminal you will see the text as below. You should open a terminal or continue on the opened one as instructed. 

**Commands To Run on Terminal**
***
```bash
python3 
```
***

In Ubuntu, when you copy text, easiest way to paste code with keyboard is to press:

**SHIFT + INS** buttons

- If there are text looks like code; it is mainly the output of an example run of a sample code, please don't copy those back into terminal. 

**Example Terminal Output** 
***
``` output ```
***

- If you see python code inside the cell like below, take focus to cell by clicking to it then press:

**SHIFT + ENTER** 

or Click on `>| Run` run button above.

Below example is a code block, which you can run inside this browser session and see the output directly below the cell. You don't need to copy the code to terminal or any other IDE.

In [None]:
import numpy as np

arr1 = np.zeros(5)
print('Array 1 ', arr1)

arr2 = np.ones(5)
print('Array 2', arr2)

# Array Product 
print(arr1 * arr2)

# Agenda

## Part 1 - Inference Engine - Performance Counters
Another feature of Inference Engine will be seen at this part to see performance metrics of layer executions.


## Part 2 - Inference Engine - Heterogeneous Plugin
At this section we will run Inference Engine Heterogeneous Plugin code samples inside Jupyter Notebook to see how Heterogeneous Plugin runs on heterogeneous hardware environments. 


## Part 3 - Inference Engine - Layer Affinity
This part is a follow-up of Heterogeneous Plugin which shows how manual configuration be applied to DL layer assignment to HW devices.

## Part 4 - 8-bit Integer Inference


## Part 5 - Custom Layers


## Part 6 - Calibration Tools

# Part 5: Intel Distribution of OpenVINO : Heterogenous Plugin

At this session, we would like to go over more detailed explanation of OpenVINO Inference Engine's Heterogeneous API which helps to run inference on heterogeneous platforms. If you are running on a platform with Intel CPU and GPU, you can control what layers of Deep Learning Model would be running on the selected hardware platform. 

Heterogeneous API has been developed first with fallback principle, it means we wanted to execute layers and methods with a prioritized order. Not all layer's implementations are complete for each platform and certain layers can execute a lot faster on different platforms where some can't. As such, there has been great progress over the heterogeneous API use cases for Inference to analyze the performance bottlenecks and improve the inference process. 

At this section, we want to showcase its use on object detection scenario and get detailed analysis of executions of layers on devices a quick course of Heterogeneous Plugin of Intel OpenVINO.  

Let's start importing required libraries for this session as following cell.  

## 1

Implement Helper Methods and Load Libraries

#### Change Focus to Below Cell and Press (SHIFT + ENTER) or Click on Run Button

In [None]:
# Run code with : (SHIFT + ENTER) or Press Run Button #
# Let's Import Required Libraries first
import sys
import os
import time
import cv2 as cv
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline 

# Import OpenVINO
from openvino.inference_engine import IENetwork, IEPlugin

# Define Methods

def createNetwork(model_xml, model_bin, plugin):
    # Importing network weights from IR models.

    net = IENetwork(model=model_xml, weights=model_bin)
        
    return net

def loadNetwork(plugin, net, num_requests=2):
    # Loading IR model to the plugin.
    exec_net = plugin.load(network=net, num_requests=num_requests)
    
    # Getting the input and outputs of the network
    input_blob = next(iter(net.inputs))
    out_blob = next(iter(net.outputs))
    
    return exec_net,input_blob,out_blob

def preprocessImage(img_path, net, input_blob):
    # Reading the frame from a jpeg file
    frame = cv.imread(img_path)
    
    # Reshaping data
    n, c, h, w = net.inputs[input_blob].shape
    in_frame = cv.resize(frame, (w, h))
    in_frame = in_frame.transpose((2, 0, 1))  # Change data layout from HWC to CHW
    
    return in_frame.reshape((n, c, h, w)),frame

print('SUCCESS')

At this part, we are implementing a new inference method which is working with heterogeneous plugin. 

There are couple more steps we have added to inference process.

- First we check if there is a CPU device on the list of devices, we load cpu extensions library.

- Then, we load the network as usual process. 

-  `set_config` function configures the plugin fallback devices and their order.

Heterogeneous Plugin used with `HETERO:` prefix, which is followed with a list of devices, according to their prioritization. 

- After network load, we send the network to plugin to set affinity of layers according to priority list we give to plugin. When affinity sets the layer's target device to be executed.

- Additionally we have performance counters or certain outputs we can later analyze the network. `.set_config({"HETERO_DUMP_GRAPH_DOT": "YES"})` prints .dot output which has a detailed graph representation of model. 

- Finally, `get_perf_counts` method prints the detailed execution times of layers on the devices.

Let's run the next cell and make `runInference` method ready.

## 2

Implement `runInference` Method which uses Hetero Plugin by Default

#### Change Focus to Below Cell and Press (SHIFT + ENTER) or Click on Run Button

In [None]:
# Run code with : (SHIFT + ENTER) or Press Run Button #

# For labeling the image after inference.
from out_process import placeBoxes

# At this stage we implement our inference method to run with Heterogeneous plugin. 

def runInference(hetero_device = 'HETERO:GPU,CPU',
                 model_xml='/home/intel/openvino_models/ir/mobilenet-ssd/FP32/mobilenet-ssd.xml',
                 model_bin='/home/intel/openvino_models/ir/mobilenet-ssd/FP32/mobilenet-ssd.bin',
                 image_file='images/car.png',
                 performance_counters = False,
                 dot_graph = True,
                 confidence_threshold=0.6):

    # Plugin initialization for specified device. We will be targeting CPU initially.
    plugin = IEPlugin(device=hetero_device, plugin_dirs='/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64')
    
    # Heterogeneous Plugin is provided as
    #HETERO:GPU,CPU or similar.
    vals = hetero_device.split(':')
    targets = vals[1].split(',')
    
    # Let's check if heterogeneous plugin has CPU target so we can add cpu extensions
    # Loading additional exension libraries for the CPU
    if hetero_device == 'CPU' or ('CPU' in targets):
        extension_list=['/home/intel/inference_engine_samples_build/intel64/Release/lib/libcpu_extension.so']
        for extension in extension_list:
            plugin.add_cpu_extension(extension)
 
    net = createNetwork(model_xml, model_bin, plugin)
    
     # Let's check if the target is heterogeneous then we add fallback device priorities
     #If we set TARGET_FALLBACK configuration, we want IE to decide what device to fallback. 
    
    plugin.set_initial_affinity(net)
    plugin.set_config({"TARGET_FALLBACK": hetero_device})
    
    if dot_graph:
        # Just for a show case, remove
        plugin.set_config({"HETERO_DUMP_GRAPH_DOT": "YES"})
    
    exec_net, input_blob, out_blob = loadNetwork(plugin, net)
    
    in_frame,original_frame = preprocessImage(image_file, net, input_blob)

    my_request_id = 0

        # Starting the inference in async mode, which starts the inference in parallel
    inference_start = time.time()
    exec_net.start_async(request_id=my_request_id, inputs={input_blob: in_frame})
    # ... You can do additional processing or latency masking while we wait ...

    # Blocking wait for a particular request_id
    if exec_net.requests[my_request_id].wait(-1) == 0:
        # getting the result of the network
        res = exec_net.requests[my_request_id].outputs[out_blob]
        inference_end = time.time()
        # Processing the output result and adding labels on the image. Implementation is not shown in the
        #  this notebook; you can find it in object_detection_demo_ssd_async.py
        initial_w = original_frame.shape[1]
        initial_h = original_frame.shape[0]

        frame = placeBoxes(res, None, confidence_threshold, original_frame, initial_w, initial_h, False, my_request_id, ((inference_end - inference_start)))
        # We use pyplot because it plays nicer with Jupyter Notebooks
        fig = plt.figure(dpi=300)
        ax = fig.add_subplot(111)
        ax.imshow(cv.cvtColor(frame, cv.COLOR_BGR2RGB), interpolation='none')
        plt.axis("off")
        plt.show()
    else:
        print("There was an error with the request")
    
    if performance_counters:
        perf_counts = exec_net.requests[0].get_perf_counts()
        print("Performance counters:")
        for layer, stats in perf_counts.items():
            print(layer, ': ', stats)
    
    return (plugin, net, exec_net)

print('SUCCESS')

In next cell, I would like to use Heterogeneous plugin to run object detection sample. 

With `HETERO:GPU,CPU,MYRIAD` we indicate to our plugin to prioritize `GPU > CPU > MYRIAD` for layers to be executed. 

For MobileNet-SSD example, almost all layers can run on GPU except `PriorBox` Caffe layer. Therefore, it will use GPU for all the `Convolution` layers and CPU for PriorBox layer.

If, we have been used `HETERO:GPU,MYRIAD,CPU` , `PriorBox` layer would be running on MYRIAD since it has also support for it. 

Note that, we use FP16 because, in case CPU target is being used, IE helps to convert FP16 layers to FP32 automatically.  

## 3

Use Hetero Plugin with GPU, CPU order.

#### Change Focus to Below Cell and Press (SHIFT + ENTER) or Click on Run Button

In [None]:
# Run code with : (SHIFT + ENTER) or Press Run Button #

hetero_objects = runInference(hetero_device = 'HETERO:GPU,CPU',
                              model_xml='/home/intel/openvino_models/ir/mobilenet-ssd/FP16/mobilenet-ssd.xml',
                              model_bin='/home/intel/openvino_models/ir/mobilenet-ssd/FP16/mobilenet-ssd.bin',
                              image_file='images/car.png',
                              performance_counters = False)

# Part 6: Layer Affinity

Let's first see the affinity of the layers. 

Below example shows that, Input layer uses CPU and ScaleShift uses GPU.

`
Type:  Input Device:  CPU
Type:  ScaleShift Device:  GPU
Type:  Convolution Device:  GPU
`

## 1

Get Details from Previous Section for Heterogeneour Workload

#### Change Focus to Below Cell and Press (SHIFT + ENTER) or Click on Run Button

In [None]:
# Run code with : (SHIFT + ENTER) or Press Run Button #

net = hetero_objects[1]

for l in net.layers.values():
    print('Type: ', l.type, 'Device: ', l.affinity)

`{"HETERO_DUMP_GRAPH_DOT": "YES"}` configuraion let's us to print the network visualisation in .dot graph format. You can navigate to current directory and run it as below from a new terminal.


```bash
!xdot hetero_affinity_MobileNet-SSD.dot
```

![Dot Output](images/affinity_dot.png)

## 2

Run implemented inference with given parameters.

#### Change Focus to Below Cell and Press (SHIFT + ENTER) or Click on Run Button

In [None]:
# Run code with : (SHIFT + ENTER) or Press Run Button #

hetero_objects = runInferenceBatch(hetero_device = 'HETERO:CPU,GPU',
                                   model_xml='/home/intel/openvino_models/ir/mobilenet-ssd/FP16/mobilenet-ssd.xml',
                                   model_bin='/home/intel/openvino_models/ir/mobilenet-ssd/FP16/mobilenet-ssd.bin')

## 3

Show Layer Affinities

#### Change Focus to Below Cell and Press (SHIFT + ENTER) or Click on Run Button

In [None]:
# Run code with : (SHIFT + ENTER) or Press Run Button #

net = hetero_objects[1]

# This time all CPU, no need to fallback to GPU

for l in net.layers.values():
    print('Type: ', l.type, 'Device: ', l.affinity)

## 4

Show Corresponding Performance Counters

#### Change Focus to Below Cell and Press (SHIFT + ENTER) or Click on Run Button

In [None]:
# Run code with : (SHIFT + ENTER) or Press Run Button #

# Get the exec net
exec_net = hetero_objects[2]

# Print the performance counteres
perf_counts = exec_net.requests[0].get_perf_counts()
print("Performance counters:")
for layer, stats in perf_counts.items():
    print(layer, ': ', stats)

# Part 7: Performance Counters

Let's see the detailed run of execution of layers. Following command will give us the layer executions in details. 

Note: If GPU used, performance counters returns blank. It doens't report for GPU at this time, see next example for detailed execution report

## 1

See the Performance Counter Details

#### Change Focus to Below Cell and Press (SHIFT + ENTER) or Click on Run Button

In [None]:
# Run code with : (SHIFT + ENTER) or Press Run Button # 

# Get the exec net
exec_net = hetero_objects[2]

# Print the performance counteres
perf_counts = exec_net.requests[0].get_perf_counts()
print("Performance counters:")
for layer, stats in perf_counts.items():
    print(layer, ': ', stats)

# Note: GPU Perfomance Counters Don't Output

## 2

Instructions to Install NCS2 Driver for Ubuntu

#### Change Focus to Below Cell and Press (SHIFT + ENTER) or Click on Run Button

You can try out:

`'HETERO:MYRIAD,GPU,CPU'` and so on. 

Note: if you get NCS2 Not Found Error you install NCS2 as below. 


```bash
cat <<EOF > 97-usbboot.rules
SUBSYSTEM=="usb", ATTRS{idProduct}=="2150", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
SUBSYSTEM=="usb", ATTRS{idProduct}=="2485", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
SUBSYSTEM=="usb", ATTRS{idProduct}=="f63b", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
EOF
```

```bash
sudo cp 97-usbboot.rules /etc/udev/rules.d/
sudo udevadm control --reload-rules
sudo udevadm trigger
sudo ldconfig
rm 97-usbboot.rules
```

## 3

Clean Previous Run

#### Change Focus to Below Cell and Press (SHIFT + ENTER) or Click on Run Button

In [None]:
# Run code with : (SHIFT + ENTER) or Press Run Button #

# Before re-run delete objects
del hetero_objects

print('SUCCESS')

## 4

Re-run with CPU, GPU priority order

#### Change Focus to Below Cell and Press (SHIFT + ENTER) or Click on Run Button

In [None]:
# Run code with : (SHIFT + ENTER) or Press Run Button #

hetero_objects = runInference(hetero_device = 'HETERO:CPU,GPU',
                              model_xml='/home/intel/openvino_models/ir/mobilenet-ssd/FP16/mobilenet-ssd.xml',
                              model_bin='/home/intel/openvino_models/ir/mobilenet-ssd/FP16/mobilenet-ssd.bin',
                              image_file='images/car.png',
                              performance_counters = False)

# Get the exec net
exec_net = hetero_objects[2]

# Print the performance counteres
perf_counts = exec_net.requests[0].get_perf_counts()
print("Performance counters:")
for layer, stats in perf_counts.items():
    print(layer, ': ', stats)

## 5

Show Layer Affinities when Order Changed

#### Change Focus to Below Cell and Press (SHIFT + ENTER) or Click on Run Button

In [None]:
# Run code with : (SHIFT + ENTER) or Press Run Button #

net = hetero_objects[1]

# This time all CPU, no need to fallback to GPU

for l in net.layers.values():
    print('Type: ', l.type, 'Device: ', l.affinity)

# Layer Affinity

# Heterogeneous Plugin

# Custom Layers

# INT8 

# Thread Management

# Using Yolo for Object Detection