## Benchmark Application - Python

This topic demonstrates how to use the Benchmark Application to estimate deep learning inference performance on supported devices. Performance can be measured for two inference modes: synchronous (latency-oriented) and asynchronous (throughput-oriented).

### How it works
If you run the application in the synchronous mode, it creates one infer request and executes the Infer method. If you run the application in the asynchronous mode, it creates as many infer requests as specified in the -nireq command-line parameter and executes the StartAsync method for each of them. If -nireq is not set, the demo will use the default value for specified device.

## Set up

### Import dependencies

In [None]:
from IPython.display import HTML
import os
import time
import sys
from pathlib import Path
sys.path.insert(0, str(Path().resolve().parent))
from demoTools.demoutils import *

## Using Intel® Distribution of OpenVINO™ toolkit

We will be using Intel® Distribution of OpenVINO™ toolkit Inference Engine (IE) to locate person in the frame.
There are five steps involved in this task:

1. Create an Intermediate Representation (IR) Model using the Model Optimizer by Intel
2. Choose a device and create IEPlugin for the device
3. Read the IRModel using IENetwork
4. Load the IENetwork into the Plugin
5. Run inference.

### Creating IR Model

The Model Optimizer creates Intermediate Representation (IR) models that are optimized for different end-point target devices.
These models can be created from existing DNN models from popular frameworks (e.g. Caffe*, TF) using the Model Optimizer. 
The Intel® Distribution of OpenVINO™ toolkit includes a utility script `model_downloader.py` that you can use to download some common models. Run the following cell to see the models available through `model_downloader.py`

In [None]:
!/opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --print_all

**Note**: The '!' is a special Jupyter Notebook command that allows you to run shell commands as if you are in a command line. So the above command will work straight out of the box on in a terminal (with '!' removed).

Some of these downloaded models are already in the IR format, while others will require the model optimizer. In this demo, we will be using the **emotion-recognition-retail-0003** model, which is already in IR format. This model can be downloaded with the following command.

In [None]:
!/opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name emotions-recognition-retail-0003 -o models/
!/opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name emotions-recognition-retail-0003-fp16 -o models/

The input arguments are as follows:
* --name : name of the model you want to download. It should be one of the models listed in the previous cell
* -o : output directory. If this directory does not exist, it will be created for you.

There are more arguments to this script and you can get the full list using the `-h` option.


With the `-o` option set as above, this command downloads the model in the directory `models`, with the model files (.xml and .bin) located at `/Retail/object_attributes/emotions_recognition/0003/dldt`

In the above case, the location is ~/benchmark_models/Retail/object_attributes/emotions_recognition/0003/dldt/


### Initialize OpenVINO env

In [None]:
%%bash
source /opt/intel/openvino/bin/setupvars.sh

#### Running the application with the -h option yields the following usage message:

In [None]:
!python3 python/benchmark_app/benchmark_app.py -h

### Creating job file

In [None]:
%%writefile benchmarkapp_python.sh
set -x
INPUT_FILE=$1
FP_MODEL=$2
DEVICE=$3
API=$4

cd $PBS_O_WORKDIR
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:~/iot-devcloud/benchmark_app/python/benchmark_app

if [ "$FP_MODEL" == "FP16" ]; then
  FPEXT='-fp16'
fi

echo $INPUT_FILE

! python3 python/benchmark_app/benchmark_app.py \
-m models/Retail/object_attributes/emotions_recognition/0003/dldt/emotions-recognition-retail-0003${FPEXT}.xml \
-i $INPUT_FILE \
-d $DEVICE \
-api $API

In [None]:
!pbsnodes | grep compnode | sort | uniq -c

Here, the properties describe the node, and number on the left is the number of available nodes of that architecture.

**Note**: If you want to use your own image, change the environment variable 'IMAGE' in the following cell from "~/benchmark_app/sample_image.png" to the full path of your uploaded image.


In [None]:
os.environ["IMAGE"] = os.getcwd()+"/emotions.jpg"

### Job queue submission

Each of the cells below will submit a job to different edge compute nodes.
The output of the cell is the `JobID` of your job, which you can use to track progress of a job.

**Note** You can submit all jobs at once or follow one at a time. 

After submission, it will go into a queue and run as soon as the requested compute resources become available. 
(tip: **shift+enter** will run the cell and automatically move you to the next cell. So you can hit **shift+enter** multiple times to quickly run multiple cells)

#### The log files with output are stored in the below location.
~/benchmark_app/python/benchmark_app

#### Since the default time for the application to run and provide output, i.e Latency and Throughput values, the output logs will be generated after 60 seconds 

#### Submitting to an edge compute node with Intel® Xeon® CPU
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank* 870-Q170</a> edge node with an <a 
    href="https://ark.intel.com/products/88178/Intel-Xeon-Processor-E3-1268L-v5-8M-Cache-2-40-GHz-">Intel® 
    Xeon® Processor E3-1268L v5</a>. The inference workload will run on the CPU.

In [None]:
print("Submitting job to Intel Xeon CPU FP32 - Sync...")
job_id_xeon = !qsub benchmarkapp_python.sh -l nodes=1:tank-870:e3-1268l-v5 -F "$IMAGE FP32 CPU async" -N benchmark_xeon
print(job_id_xeon[0]) 
while True:
    var=job_id_xeon[0].split(".")
    file="benchmark_xeon.o"+var[0]
    if os.path.isfile(file): 
        ! cat $file
        break

In [None]:
print("Submitting job to Intel Xeon CPU FP32 - Async...")
job_id_xeon = !qsub benchmarkapp_python.sh -l nodes=1:tank-870:e3-1268l-v5 -F "$IMAGE FP32 CPU async" -N benchmark_xeon
print(job_id_xeon[0]) 
while True:
    var=job_id_xeon[0].split(".")
    file="benchmark_xeon.o"+var[0]
    if os.path.isfile(file): 
        ! cat $file
        break

#### Submitting to an edge compute node with Intel® Core CPU and using the onboard Intel® GPU
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank* 870-Q170</a> edge node with an <a href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel® Core i5-6500TE</a>. The inference workload will run on the Intel® HD Graphics 530 card integrated with the CPU.

#### 16 bit - GPU - Async

In [None]:
print("Submitting job to Intel GPU...")
job_id_gpu = !qsub benchmarkapp_python.sh -l nodes=1:tank-870:i5-6500te -F "$IMAGE FP16 GPU async" -N benchmark_gpu
print(job_id_gpu[0]) 
while True:
    var=job_id_gpu[0].split(".")
    file="benchmark_gpu.o"+var[0]
    if os.path.isfile(file): 
        ! cat $file
        break

#### 16 bit - GPU - Sync

In [None]:
print("Submitting job to Intel GPU...")
job_id_gpu = !qsub benchmarkapp_python.sh -l nodes=1:tank-870:i5-6500te -F "$IMAGE FP16 GPU sync" -N benchmark_gpu
print(job_id_gpu[0]) 
while True:
    var=job_id_gpu[0].split(".")
    file="benchmark_gpu.o"+var[0]
    if os.path.isfile(file): 
        ! cat $file
        break

#### 32 bit - GPU - Async

In [None]:
print("Submitting job to Intel GPU...")
job_id_gpu32 = !qsub benchmarkapp_python.sh -l nodes=1:tank-870:i5-6500te -F "$IMAGE FP32 GPU async" -N benchmark_gpu32
print(job_id_gpu32[0]) 
while True:
    var=job_id_gpu32[0].split(".")
    file="benchmark_gpu32.o"+var[0]
    if os.path.isfile(file): 
        ! cat $file
        break

#### 32 bit - GPU - Sync

In [None]:
print("Submitting job to Intel GPU...")
job_id_gpu32 = !qsub benchmarkapp_python.sh -l nodes=1:tank-870:i5-6500te -F "$IMAGE FP32 GPU sync" -N benchmark_gpu32
print(job_id_gpu32[0]) 
while True:
    var=job_id_gpu32[0].split(".")
    file="benchmark_gpu32.o"+var[0]
    if os.path.isfile(file): 
        ! cat $file
        break

#### IEI Mustang-V100-MX8 ( Intel® Movidius™ Myriad™ X Vision Processing Unit (VPU))
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank 870-Q170</a> edge node with an <a href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel Core i5-6500te CPU</a>. The inference workload will run on an <a 
    href="https://www.ieiworld.com/mustang-v100/en/">IEI Mustang-V100-MX8 </a>accelerator installed in this node.

In [None]:
print("Submitting job to Intel VPU...")
#Submit job to the queue
job_id_hddl = !qsub benchmarkapp_python.sh -l nodes=1:tank-870:i5-6500te:iei-mustang-v100-mx8 -F "$IMAGE FP16 HDDL async" -N benchmark_hddl
print(job_id_hddl[0])
while True:
    var=job_id_hddl[0].split(".")
    file="benchmark_hddl.o"+var[0]
    if os.path.isfile(file): 
        ! cat $file
        break

In [None]:
print("Submitting job to Intel VPU...")
#Submit job to the queue
job_id_hddl = !qsub benchmarkapp_python.sh -l nodes=1:tank-870:i5-6500te:iei-mustang-v100-mx8 -F "$IMAGE FP16 HDDL sync" -N benchmark_hddl
print(job_id_hddl[0])
while True:
    var=job_id_hddl[0].split(".")
    file="benchmark_hddl.o"+var[0]
    if os.path.isfile(file): 
        ! cat $file
        break