<a id="top"></a>
# Benchmark Tool

## About Benchmark Tool

This uses the [OpenVINO benchmarking app](https://docs.openvinotoolkit.org/latest/_inference_engine_tools_benchmark_tool_README.html) to benchmark your model on different hardware.

Benchmark python tool provides estimation of deep learning inference performance on the supported devices. Performance can be measured for two inference modes: synchronous (latency-oriented) and asynchronous (throughput-oriented).  

This tutorial benchmarks the deep learning model with 
1. different hardware
2. workload distribution with Multi plugin

### How It Works
Upon start-up, the application reads command-line parameters and loads a network and images/binary files to the Inference Engine plugin, which is chosen depending on a specified device. The number of infer requests and execution approach depend on the mode defined with the -api command-line parameter. 

In this tutorial, we use following input parameters with the benchmark app:

- -m: deep learning model to infer in intermediate format, e.g. Resnet-50 tensorflow model. 
- -d: device to offload inference workload
- -niter: number of iterations
- -api: sync or async
- --report_type: information about details counter e.g. FPS and latency
- --report_folder: Path to a folder where statistics report is stored.
- -i: input image/video, If a topology is not data sensitive, you can skip the input parameter. 

###  Setup the environment variables and import dependencies

In [None]:
from IPython.display import HTML
import os
import time
import sys                                                     
from openvino.inference_engine import IECore
import cv2
import pandas as pd
from qarpo.model_visualizer_link import *

### Install progress package


In [None]:
!pip3 install -r ./benchmark/requirements.txt

### Deep Learning model for inference
This example uses a Tensorflow* implementation of a resnet-50 model for classification.

#### Download the resnet-50 Tensorflow* model from the model downloader from the Intel distribution OpenVINO toolkit


In [None]:
!omz_downloader --name resnet-50-tf -o models

To view a graph of the model used in this application, run the cell below then select the link generated:

In [None]:
showModelVisualizerLink("models/public/resnet-50-tf/resnet_v1-50.pb")

### Optimize a deep-learning model using the Model Optimizer (MO) 
In this section, you will use the Model Optimizer to convert a trained model to two Intermediate Representation (IR) files (one .bin and one .xml). The Inference Engine requires this model conversion so that it can use the IR as input and achieve optimum performance on Intel® hardware.

#### Create a directory to store IR files

In [None]:
! mkdir -p models/FP32
! mkdir -p models/FP16

##### Convert the model with FP16 quantization 

In [None]:
!mo \
--input_model models/public/resnet-50-tf/resnet_v1-50.pb \
--input_shape=[1,224,224,3] \
--mean_values=[123.68,116.78,103.94] \
-o models/FP16 \
--data_type FP16

##### Convert the model with FP32 quantization 

In [None]:
!mo \
--input_model models/public/resnet-50-tf/resnet_v1-50.pb \
--input_shape=[1,224,224,3] \
--mean_values=[123.68,116.78,103.94] \
-o models/FP32 \
--data_type FP32

### Creating job file
Till now, we ran all the above steps on a single edge system allocated for your account. Now we want to run the inference on different edge systems on the Intel IoT devcloud to benchmark the inference performance. For that, we will submit the inference jobs for each edge device in a queue. For each job, we will specify the type of the edge compute node that must be allocated for the job.

The job file in the below cell is written in Bash, and will be executed directly on the edge compute node. Run the following cell to write this in to the file "benchmark_app_job.sh"

In [None]:
%%writefile benchmark_app_job.sh


# The default path for the job is your home directory, so we change directory to where the files are.
cd $PBS_O_WORKDIR
JOB_ID=`basename ${0} | cut -f1 -d"."`
OUTPUT_FILE=$1
DEVICE=$2
#FP_MODEL=$3
API=$3
# Benchmark Application script writes output to a file inside a directory. We make sure that this directory exists.
#  The output directory is the first argument of the bash script
mkdir -p $OUTPUT_FILE


if  [ "$DEVICE" = "MYRIAD" ] || [ "$DEVICE" = "HDDL" ] || [ "$DEVICE" = "MULTI:HDDL,CPU" ] || [ "$DEVICE" = "MULTI:CPU,GPU" ]; then
    FP_MODEL="FP16"
else
    FP_MODEL="FP32"
fi
echo VENV_PATH=$VENV_PATH
echo OPENVINO_RUNTIME=$OPENVINO_RUNTIME
echo INPUT_FILE=$INPUT_FILE
echo FP_MODEL=$FP_MODEL
echo INPUT_TILE=$INPUT_FILE
echo NUM_REQS=$NUM_REQS

# Follow this order of setting up environment for openVINO 2022.1.0.553
echo "Activating a Python virtual environment from ${VENV_PATH}..."
source ${VENV_PATH}/bin/activate
echo "Activating OpenVINO variables from ${OPENVINO_RUNTIME}..."
source ${OPENVINO_RUNTIME}/setupvars.sh

SAMPLEPATH=$PBS_O_WORKDIR

mkdir -p ${OUTPUT_FILE}
rm -f ${OUTPUT_FILE}/*

echo ${SAMPLEPATH}/${OUTPUT_FILE} > benchmark_filename_${JOB_ID}.txt

# Running the benchmark application code

python3 benchmark_app.py -m ${SAMPLEPATH}/models/${FP_MODEL}/resnet_v1-50.xml \
            -d $DEVICE \
            -niter 10 \
            -api $API \
            --report_type detailed_counters \
            --report_folder ${SAMPLEPATH}/${OUTPUT_FILE}

Now that we have the job script, we can submit the jobs to edge compute nodes. In the IoT DevCloud, you can do this using the qsub command. We can submit object_detection_job to 5 different types of edge compute nodes simultaneously or just one node at at time.

There are three options of qsub command that we use for this:

-l : this option lets us select the number and the type of nodes using nodes={node_count}:{property}.

-F : this option lets us send arguments to the bash script.

-N : this option lets use name the job so that it is easier to distinguish between them.
If you are curious to see the available types of nodes on the IoT DevCloud, run the following optional cell.

In [None]:
!pbsnodes | grep compnode | sort | uniq -c

## Wait until the benchmarking report files are written 

We submit the job to different hardware platform using job queue. We will have to wait until we get the results back from our specified hardware. In the following script, we check if the reports file is generated that shows the job is complete. Until, the job is completed, we print dots on the screen. 

In [None]:
def wait_for_job_to_finish(job_id):
    # import pandas as pd
    
    print(job_id[0]) 
    if job_id:
        
        print("Job submitted to the queue. Waiting for it to complete .", end="")
        filename = "benchmark_filename_{}.txt".format(job_id[0].split(".")[0])
        
        while not os.path.exists(filename):  # Wait until the file report is created.
            time.sleep(1)
            print(".", end="")
        
        # Print the results
        with open(filename) as f:
            results_dir = f.read().split("\n")[0]
            
        report_filename = os.path.join(results_dir, "benchmark_report.csv") # Wait until the file report is created.
        while not os.path.exists(report_filename):
            time.sleep(1)
            print(".", end="")
        
        df = pd.read_csv(report_filename, delimiter=";")
        print(df)
        
        throughput = float(df.loc["throughput"][0])
        device = df.loc["target device"][0]
        load_time = float(df.loc["load network time (ms)"][0])
        read_time = float(df.loc["read network time (ms)"][0])
        
        os.remove(filename) # Cleanup
        
    else:
        print("Error in job submission.")
        
        throughput = None
        device = None
        load_time = None
        read_time = None
        
    return {"Throughput (FPS)": throughput, 
            "Load network time (ms)" : load_time,
            "Read network time (ms)" : read_time}
        

The above wait_for_job_to_finish() function returns throughput, load network time and read network time. We save these return values in a dictionary, benchamarks, to be used later in graphs. 

In [None]:
benchmarks = {}  # Save the benchmarking results to a dictionary

## Benchmark Individual system with the deep learning model
The Multi-Device plugin automatically assigns inference requests to available computational devices to execute the requests in parallel. 

This example shows how to use MULTI plugin from the Intel® Distribution of OpenVINO™ toolkit.
First, let's take a look at the performance of each single Inference Engine device

#### 1. Run the Benchmark tool app with Intel® Core™ CPU

In [None]:
print("Submitting a job to an edge compute node with an Intel Core CPU...")
#Submit job to the queue
job_id_core = !qsub benchmark_app_job.sh -l nodes=1:idc001skl -F "results/core CPU async" -v VENV_PATH,OPENVINO_RUNTIME
benchmarks["Core"] = wait_for_job_to_finish(job_id_core)

#### 2. Run the Benchmark tool app with Intel® Xeon® CPU
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank* 870-Q170</a> edge node with an <a 
    href="https://ark.intel.com/products/88178/Intel-Xeon-Processor-E3-1268L-v5-8M-Cache-2-40-GHz-">Intel® 
    Xeon® Processor E3-1268L v5</a>. The inference workload will run on the CPU.

In [None]:
print("Submitting a job to an edge compute node with an Intel Xeon CPU...")
#Submit job to the queue
job_id_xeon = !qsub benchmark_app_job.sh -l nodes=1:idc007xv5 -F "results/xeon/ CPU async"  -v VENV_PATH,OPENVINO_RUNTIME
benchmarks["XeonE3"] = wait_for_job_to_finish(job_id_xeon)

#### 3. Run Benchmark tool application with GPU

In [None]:
print("Submitting a job to an edge compute node with an Intel Core CPU and an Intel GPU...")
#Submit job to the queue
job_id_gpu = !qsub benchmark_app_job.sh -l nodes=1:idc001skl -F "results/gpu GPU async"    -v VENV_PATH,OPENVINO_RUNTIME   
benchmarks["GPU"] = wait_for_job_to_finish(job_id_gpu)

#### 4. Run Benchmark tool application with NCS2

In [None]:
print("Submitting job to an edge compute node with Intel NCS2...")
#Submit job to the queue
job_id_ncs2 = !qsub benchmark_app_job.sh -l nodes=1:idc004nc2 -F "results/ncs2 MYRIAD async"    -v VENV_PATH,OPENVINO_RUNTIME
benchmarks["NCS2"] = wait_for_job_to_finish(job_id_ncs2)

#### 5. Run Benchmark tool application with HDDL-R

In [None]:
#Submit job to the queue
job_id_hddlr = !qsub benchmark_app_job.sh -l nodes=1:idc002mx8 -F "results/hddlr HDDL async" -v VENV_PATH,OPENVINO_RUNTIME
benchmarks["HDDL-R"] = wait_for_job_to_finish(job_id_hddlr)

## Multi plugin
Now let's try [MULTI plugin](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_MULTI.html) with different combination of available Inference Engine devices.

Multi-Device plugin automatically assigns inference requests to available computational devices to execute the requests in parallel. Potential gains are as follows:

- Improved throughput that multiple devices can deliver (compared to single-device execution)
- More consistent performance, since the devices can now share the inference burden (so that if one device is becoming too busy, another device can take more of the load)

Notice that with multi-device the application logic left unchanged, so you don't need to explicitly load the network to every device, create and balance the inference requests and so on.

#### 1. Run Benchmark tool application with MULTI:CPU,GPU

In [None]:
print("Submitting a job to an edge compute node with an CPU and GPU...")
#Submit job to the queue
job_id_cpu_gpu = !qsub benchmark_app_job.sh -l nodes=1:idc001skl -F "results/cpu_gpu MULTI:CPU,GPU sync" -v VENV_PATH,OPENVINO_RUNTIME
benchmarks["MULTI:CPU,GPU"] = wait_for_job_to_finish(job_id_cpu_gpu)

#### 2. Run Benchmark tool application with MULTI:CPU,HDDL

In [None]:
print("Submitting a job to an edge compute node with Intel CPU and Intel Movidius HDDL-R...")
#Submit job to the queue
job_id_cpu_hddl = !qsub benchmark_app_job.sh -l nodes=1:idc002mx8 -F "results/cpu_hddl MULTI:HDDL,CPU async" -v VENV_PATH,OPENVINO_RUNTIME
benchmarks["MULTI:HDDL,CPU"] = wait_for_job_to_finish(job_id_cpu_hddl)

# Assess Performance

The running time of each inference task is recorded in benchmark{} dictionary. Run the cell below to plot the results of all jobs side-by-side.


### Plot the benchmarking results

In [None]:
import matplotlib.pyplot as plt
import matplotlib.patheffects as PathEffects

%matplotlib inline

def plot_benchmarks(metric):
    
    latency = {}
    no_number = False
    for device in benchmarks:
        if isinstance(benchmarks[device][metric], str):
            no_number = True
        else:
            latency[device] = benchmarks[device][metric]    
    
    if not no_number:
        plt.figure(figsize=(18,8))
        plt.bar(*zip(*latency.items()));
        plt.xticks(fontsize=14);
        plt.yticks(fontsize=18);
        plt.ylabel(metric, fontsize=20);

        rects = plt.gca().patches

        # Make some labels.
        labels = ["{:,.2f}".format(i) for i in latency.values()]

        for rect, label in zip(rects, labels):
            height = rect.get_height()
            plt.gca().text(rect.get_x() + rect.get_width() / 2, height/2.0, label,
                    ha="center", va="bottom", fontsize=20, color="white", path_effects=[PathEffects.withStroke(linewidth=2, foreground="black")])
            
    else:
        print("ERROR: Field '{}' has text strings. Can't plot it.".format(metric))
        
    

In [None]:
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

#### Note:

Read network time - It is the time to read the deep learning model from its stored location. 

Load network time - It is the time to load the deep learning model to the device plugin where the inference should happen. 

In [None]:
interact(plot_benchmarks, metric=benchmarks[next(iter(benchmarks))].keys());

# Next steps
- [More Jupyter* Notebook Samples](https://software.intel.com/content/www/us/en/develop/tools/devcloud/edge/build/sample-apps.html)- additional sample applications 
- [Jupyter* Notebook Tutorials](https://software.intel.com/content/www/us/en/develop/tools/devcloud/edge/learn/tutorials.html) - sample application Jupyter* Notebook tutorials
- [Intel® Distribution of OpenVINO™ toolkit Main Page](https://software.intel.com/openvino-toolkit) - learn more about the tools and use of the Intel® Distribution of OpenVINO™ toolkit for implementing inference on the edge


# About this notebook

For technical support, please see the [Intel® DevCloud Forums](https://software.intel.com/en-us/forums/intel-devcloud-for-edge)

<p style=background-color:#0071C5;color:white;padding:0.5em;display:table-cell;width:100pc;vertical-align:middle>
<img style=float:right src="https://devcloud.intel.com/edge/static/images/svg/IDZ_logo.svg" alt="Intel DevCloud logo" width="150px"/>
<a style=color:white>Intel® DevCloud for the Edge</a><br>   
<a style=color:white href="#top">Top of Page</a> | 
<a style=color:white href="https://devcloud.intel.com/edge/static/docs/terms/Intel-DevCloud-for-the-Edge-Usage-Agreement.pdf">Usage Agreement (Intel)</a> | 
<a style=color:white href="https://devcloud.intel.com/edge/static/docs/terms/Colfax_Cloud_Service_Terms_v1.3.pdf">Service Terms (Colfax)</a>
</p>