## Benchmark Application - C++

This topic demonstrates how to use the Benchmark Application to estimate deep learning inference performance on supported devices. Performance can be measured for two inference modes: synchronous (latency-oriented) and asynchronous (throughput-oriented).

### How it works
If you run the application in the synchronous mode, it creates one infer request and executes the Infer method. If you run the application in the asynchronous mode, it creates as many infer requests as specified in the -nireq command-line parameter and executes the StartAsync method for each of them. If -nireq is not set, the demo will use the default value for specified device.

## Set up

### Import dependencies

In [2]:
from IPython.display import HTML
import os
import time
import sys
from pathlib import Path
sys.path.insert(0, str(Path().resolve().parent))
from demoTools.demoutils import *

### Create binary

Go to below directory and run build samples.sh

In [4]:
! /opt/intel/openvino/deployment_tools/inference_engine/samples/build_samples.sh

-- Looking for C++ include unistd.h
-- Looking for C++ include unistd.h - found
-- Looking for C++ include stdint.h
-- Looking for C++ include stdint.h - found
-- Looking for C++ include sys/types.h
-- Looking for C++ include sys/types.h - found
-- Looking for C++ include fnmatch.h
-- Looking for C++ include fnmatch.h - found
-- Looking for C++ include stddef.h
-- Looking for C++ include stddef.h - found
-- Check size of uint32_t
-- Check size of uint32_t - done
-- Looking for strtoll
-- Looking for strtoll - found
-- Found InferenceEngine: /opt/intel/openvino_2019.1.094/deployment_tools/inference_engine/lib/intel64/libinference_engine.so (Required is at least version "1.6") 
-- Performing Test HAVE_CPUID_INFO
-- Performing Test HAVE_CPUID_INFO - Success
-- Host CPU features:
--   3DNOW not supported
--   3DNOWEXT not supported
--   ABM not supported
--   ADX supported
--   AES supported
--   AVX supported
--   AVX2 supported
--   AVX512CD supported
--   AVX512F supported
--   AVX512ER

[ 28%] [32mBuilding CXX object ie_cpu_extension/CMakeFiles/ie_cpu_extension.dir/ext_list.cpp.o[0m
[ 28%] [32m[1mLinking CXX executable ../intel64/Release/human_pose_estimation_demo[0m
[ 29%] [32mBuilding CXX object ie_cpu_extension/CMakeFiles/ie_cpu_extension.dir/ext_interp.cpp.o[0m
[ 29%] [32mBuilding CXX object ie_cpu_extension/CMakeFiles/ie_cpu_extension.dir/ext_proposal_onnx.cpp.o[0m
[ 29%] Built target human_pose_estimation_demo
[ 30%] [32mBuilding CXX object ie_cpu_extension/CMakeFiles/ie_cpu_extension.dir/ext_base.cpp.o[0m
[ 30%] [32m[1mLinking CXX executable ../intel64/Release/lenet_network_graph_builder[0m
[ 31%] [32mBuilding CXX object ie_cpu_extension/CMakeFiles/ie_cpu_extension.dir/ext_ctc_greedy.cpp.o[0m
[ 31%] Built target lenet_network_graph_builder
[ 31%] [32mBuilding CXX object ie_cpu_extension/CMakeFiles/ie_cpu_extension.dir/ext_topkrois_onnx.cpp.o[0m
[ 32%] [32mBuilding CXX object ie_cpu_extension/CMakeFiles/ie_cpu_extension.dir/ext_shuffle_channel

[ 66%] Built target end2end_video_analytics_ie
[35m[1mScanning dependencies of target style_transfer_sample[0m
[ 67%] [32mBuilding CXX object style_transfer_sample/CMakeFiles/style_transfer_sample.dir/main.cpp.o[0m
[ 67%] Built target interactive_face_detection_demo
[ 68%] [32mBuilding CXX object pedestrian_tracker_demo/CMakeFiles/pedestrian_tracker_demo.dir/src/detector.cpp.o[0m
[ 68%] [32mBuilding CXX object pedestrian_tracker_demo/CMakeFiles/pedestrian_tracker_demo.dir/src/distance.cpp.o[0m
[ 68%] [32m[1mLinking CXX executable ../intel64/Release/mask_rcnn_demo[0m
[ 68%] Built target mask_rcnn_demo
[35m[1mScanning dependencies of target perfcheck[0m
[ 69%] [32mBuilding CXX object perfcheck/CMakeFiles/perfcheck.dir/main.cpp.o[0m
[ 69%] [32mBuilding CXX object calibration_tool/CMakeFiles/calibration_tool.dir/__/validation_app/ClassificationProcessor.cpp.o[0m
[ 70%] [32mBuilding CXX object pedestrian_tracker_demo/CMakeFiles/pedestrian_tracker_demo.dir/src/image_reade

## Using Intel® Distribution of OpenVINO™ toolkit

We will be using Intel® Distribution of OpenVINO™ toolkit Inference Engine (IE) to locate person in the frame.
There are five steps involved in this task:

1. Create an Intermediate Representation (IR) Model using the Model Optimizer by Intel
2. Choose a device and create IEPlugin for the device
3. Read the IRModel using IENetwork
4. Load the IENetwork into the Plugin
5. Run inference.

### Creating IR Model

The Model Optimizer creates Intermediate Representation (IR) models that are optimized for different end-point target devices.
These models can be created from existing DNN models from popular frameworks (e.g. Caffe*, TF) using the Model Optimizer. 
The Intel® Distribution of OpenVINO™ toolkit includes a utility script `model_downloader.py` that you can use to download some common models. Run the following cell to see the models available through `model_downloader.py`

In [4]:
!/opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --print_all

densenet-121
densenet-161
densenet-169
densenet-201
squeezenet1.0
squeezenet1.1
mtcnn-p
mtcnn-r
mtcnn-o
mobilenet-ssd
vgg19
vgg16
ssd512
ssd300
inception-resnet-v2
dilation
googlenet-v1
googlenet-v2
googlenet-v4
alexnet
ssd_mobilenet_v2_coco
resnet-50
resnet-101
resnet-152
googlenet-v3
se-inception
se-resnet-101
se-resnet-152
se-resnet-50
se-resnext-50
se-resnext-101
Sphereface
license-plate-recognition-barrier-0007
mobilenet-v1-1.0-224
mobilenet-v2
faster_rcnn_inception_v2_coco
deeplabv3
ctpn
ssd_mobilenet_v1_coco
faster_rcnn_resnet101_coco
mobilenet-v2-1.4-224
age-gender-recognition-retail-0013
age-gender-recognition-retail-0013-fp16
emotions-recognition-retail-0003
emotions-recognition-retail-0003-fp16
face-detection-adas-0001
face-detection-adas-0001-fp16
face-detection-retail-0004
face-detection-retail-0004-fp16
face-person-detection-retail-0002
face-person-detection-retail-0002-fp16
face-reidentification-retail-0095
face-reident

**Note**: The '!' is a special Jupyter Notebook command that allows you to run shell commands as if you are in a command line. So the above command will work straight out of the box on in a terminal (with '!' removed).

Some of these downloaded models are already in the IR format, while others will require the model optimizer. In this demo, we will be using the **emotion-recognition-retail-0003** model, which is already in IR format. This model can be downloaded with the following command.

In [5]:
!/opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name emotions-recognition-retail-0003 -o models/
!/opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name emotions-recognition-retail-0003-fp16 -o models/


###############|| Downloading topologies ||###############

... 100%, 19 KB, 167 KB/s, 0 seconds passed

... 100%, 9697 KB, 28195 KB/s, 0 seconds passed


###############|| Post processing ||###############


###############|| Downloading topologies ||###############

... 100%, 19 KB, 42608 KB/s, 0 seconds passed

... 100%, 4848 KB, 28276 KB/s, 0 seconds passed


###############|| Post processing ||###############



The input arguments are as follows:
* --name : name of the model you want to download. It should be one of the models listed in the previous cell
* -o : output directory. If this directory does not exist, it will be created for you.

There are more arguments to this script and you can get the full list using the `-h` option.


With the `-o` option set as above, this command downloads the model in the directory `models`, with the model files (.xml and .bin) located at `/Retail/object_attributes/emotions_recognition/0003/dldt`

In the above case, the location is ~/benchmark_models/Retail/object_attributes/emotions_recognition/0003/dldt/


### Initialize OpenVINO env

In [7]:
%%bash
source /opt/intel/openvino/bin/setupvars.sh

[setupvars.sh] OpenVINO environment initialized


#### Running the application with the -h option yields the following usage message:

In [23]:
! $HOME/inference_engine_samples_build/intel64/Release/benchmark_app -h

[ INFO ] InferenceEngine: 
	API version ............ 1.6
	Build .................. custom_releases/2019/R1_c9b66a26e4d65bb986bb740e73f58c6e9e84c7c2

[Step 1/8] Parsing and validation of input args
[ INFO ] Parsing input parameters

benchmark_app [OPTION]
Options:

    -h                        Print a usage message
    -i "<path>"               Required. Path to a folder with images or to image files.
    -m "<path>"               Required. Path to an .xml file with a trained model.
    -pp "<path>"              Optional. Path to a plugin folder.
    -d "<device>"             Optional. Specify a target device to infer on: CPU, GPU, FPGA, HDDL or MYRIAD. Default value is CPU. Use "-d HETERO:<comma-separated_devices_list>" format to specify HETERO plugin. The application looks for a suitable plugin for the specified device.
    -l "<absolute_path>"      Required for CPU custom layers. Absolute path to a shared library with the kernels implementations.
          Or
    -c

### Creating job file

In [28]:
%%writefile benchmark_app.sh
set -x
INPUT_FILE=$1
FP_MODEL=$2
DEVICE=$3
API=$4

cd $PBS_O_WORKDIR
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:~/inference_engine_samples_build/intel64/Release/

if [ "$FP_MODEL" == "FP16" ]; then
  FPEXT='-fp16'
fi

echo $INPUT_FILE

$HOME/inference_engine_samples_build/intel64/Release/benchmark_app \
-m models/Retail/object_attributes/emotions_recognition/0003/dldt/emotions-recognition-retail-0003${FPEXT}.xml \
-i $INPUT_FILE \
-d $DEVICE \
-api $API

Overwriting benchmark_app.sh


In [22]:
!echo $HOME

/home/u26212


In [12]:
!pbsnodes | grep compnode | sort | uniq -c

     35      properties = idc001skl,compnode,iei,tank-870,intel-core,i5-6500te,skylake,intel-hd-530,ram8gb,1gbe
     15      properties = idc002mx8,compnode,iei,tank-870,intel-core,i5-6500te,skylake,intel-hd-530,ram8gb,net1gbe,hddl-r,iei-mustang-v100-mx8
     18      properties = idc003a10,compnode,iei,tank-870,intel-core,i5-6500te,skylake,intel-hd-530,ram8gb,net1gbe,hddl-f,iei-mustang-f100-a10
     23      properties = idc004nc2,compnode,iei,tank-870,intel-core,i5-6500te,skylake,intel-hd-530,ram8gb,net1gbe,ncs,intel-ncs2
      5      properties = idc006kbl,compnode,iei,tank-870,intel-core,i5-7500t,kaby-lake,intel-hd-630,ram8gb,net1gbe
     16      properties = idc007xv5,compnode,iei,tank-870,intel-xeon,e3-1268l-v5,skylake,intel-hd-p530,ram32gb,net1gbe
     15      properties = idc008u2g,compnode,up-squared,grove,intel-atom,e3950,apollo-lake,intel-hd-505,ram4gb,net1gbe,ncs,intel-ncs2
      1      properties = idc009jkl,compnode,jwip,intel-core,i5-7500,kaby-lake,intel-hd-630,ram8

Here, the properties describe the node, and number on the left is the number of available nodes of that architecture.

**Note**: If you want to use your own image, change the environment variable 'IMAGE' in the following cell from "~/benchmark_app/sample_image.png" to the full path of your uploaded image.


In [18]:
os.environ["IMAGE"] = os.getcwd()+"/emotions.jpg"

/home/u26212/temp/adi/iot-devcloud/benchmark_app/emotions.jpg


### Job queue submissinon

Each of the cells below will submit a job to different edge compute nodes.
The output of the cell is the `JobID` of your job, which you can use to track progress of a job.

**Note** You can submit all jobs at once or follow one at a time. 

After submission, it will go into a queue and run as soon as the requested compute resources become available. 
(tip: **shift+enter** will run the cell and automatically move you to the next cell. So you can hit **shift+enter** multiple times to quickly run multiple cells)

#### The log files with output are stored in the below location.
~/inference_engine_samples_build/intel64/Release

#### Since the default time for the application to run and provide output, i.e Latency and Throughput values, the output logs will be generated 60 seconds 

#### Submitting to an edge compute node with an Intel® CPU
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank* 870-Q170</a> edge node with an <a 
    href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel® Core™ i5-6500TE processor</a>. The inference workload will run on the CPU.

In [None]:
print("Submitting job to Intel CPU...")
job_id_cpu = !qsub benchmark_app.sh -l nodes=1:idc001skl:intel-hd-530 -F "$IMAGE FP16 CPU async" -N benchmark_cpu
print(job_id_cpu[0]) 
while True:
    var=job_id_cpu[0].split(".")
    file="benchmark_cpu.o"+var[0]
    if os.path.isfile(file): 
        ! cat $file
        break

In [None]:
print("Submitting job to Intel CPU... core-FP32")
job_id_cpu32 = !qsub benchmark_app.sh -l nodes=1:idc001skl:intel-hd-530 -F "$IMAGE FP32 CPU async" -N benchmark_cpu32
print(job_id_cpu32[0]) 
while True:
    var=job_id_cpu32[0].split(".")
    file="benchmark_cpu32.o"+var[0]
    if os.path.isfile(file): 
        ! cat $file
        break

#### Submitting to an edge compute node with Intel® Xeon® CPU
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank* 870-Q170</a> edge node with an <a 
    href="https://ark.intel.com/products/88178/Intel-Xeon-Processor-E3-1268L-v5-8M-Cache-2-40-GHz-">Intel® 
    Xeon® Processor E3-1268L v5</a>. The inference workload will run on the CPU.

In [None]:
print("Submitting job to Intel Xeon CPU...")
job_id_xeon = !qsub benchmark_app.sh -l nodes=1:tank-870:e3-1268l-v5 -F "$IMAGE FP32 CPU async" -N benchmark_xeon
print(job_id_xeon[0]) 
while True:
    var=job_id_xeon[0].split(".")
    file="benchmark_xeon.o"+var[0]
    if os.path.isfile(file): 
        ! cat $file
        break

In [None]:
print("Submitting job to Intel Xeon CPU FP16...")
job_id_xeon16 = !qsub benchmark_app.sh -l nodes=1:tank-870:e3-1268l-v5 -F "$IMAGE FP16 CPU async" -N benchmark_xeon16
print(job_id_xeon16[0]) 
while True:
    var=job_id_xeon16[0].split(".")
    file="benchmark_xeon16.o"+var[0]
    if os.path.isfile(file): 
        ! cat $file
        break

#### Submitting to an edge compute node with Intel® Core CPU and using the onboard Intel® GPU
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank* 870-Q170</a> edge node with an <a href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel® Core i5-6500TE</a>. The inference workload will run on the Intel® HD Graphics 530 card integrated with the CPU.

In [29]:
print("Submitting job to Intel GPU...")
job_id_gpu = !qsub benchmark_app.sh -l nodes=1:tank-870:i5-6500te -F "$IMAGE FP16 GPU async" -N benchmark_gpu
print(job_id_gpu[0]) 
while True:
    var=job_id_gpu[0].split(".")
    file="benchmark_gpu.o"+var[0]
    if os.path.isfile(file): 
        ! cat $file
        break

Submitting job to Intel GPU...
56780.c003

########################################################################
#      Date:           Mon Sep 30 01:54:36 PDT 2019
#    Job ID:           56780.c003
#      User:           u26212
# Resources:           neednodes=1:tank-870:i5-6500te,nodes=1:tank-870:i5-6500te,walltime=01:00:00
########################################################################

[setupvars.sh] OpenVINO environment initialized
/home/u26212/temp/adi/iot-devcloud/benchmark_app/emotions.jpg
[ INFO ] InferenceEngine: 
	API version ............ 1.6
	Build .................. custom_releases/2019/R1_c9b66a26e4d65bb986bb740e73f58c6e9e84c7c2

[Step 1/8] Parsing and validation of input args
[ INFO ] Parsing input parameters
[ INFO ] Files were added: 1
[ INFO ]     /home/u26212/temp/adi/iot-devcloud/benchmark_app/emotions.jpg
Progress: [....................] 100.00% done

[Step 2/8] Loading plugin
[ INFO ] 
	API version ............ 1.6
	Build .................. 22443
	Desc

In [None]:
print("Submitting job to Intel GPU...")
job_id_gpu32 = !qsub benchmark_app.sh -l nodes=1:tank-870:i5-6500te -F "$IMAGE FP32 GPU async" -N benchmark_gpu32
print(job_id_gpu32[0]) 
while True:
    var=job_id_gpu32[0].split(".")
    file="benchmark_gpu32.o"+var[0]
    if os.path.isfile(file): 
        ! cat $file
        break

#### IEI Mustang-V100-MX8 ( Intel® Movidius™ Myriad™ X Vision Processing Unit (VPU))
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank 870-Q170</a> edge node with an <a href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel Core i5-6500te CPU</a>. The inference workload will run on an <a 
    href="https://www.ieiworld.com/mustang-v100/en/">IEI Mustang-V100-MX8 </a>accelerator installed in this node.

In [None]:
print("Submitting job to Intel VPU...")
#Submit job to the queue
job_id_hddl = !qsub benchmark_app.sh -l nodes=1:tank-870:i5-6500te:iei-mustang-v100-mx8 -F "$IMAGE FP16 HDDL async" -N benchmark_hddl
print(job_id_hddl[0])
while True:
    var=job_id_hddl[0].split(".")
    file="benchmark_hddl.o"+var[0]
    if os.path.isfile(file): 
        ! cat $file
        break

### Check if the jobs are done

To check on the jobs that were submitted, use the `qstat` command.

We have created a custom Jupyter widget  to get live qstat update.
Run the following cell to bring it up.

In [None]:
liveQstat()

You should see the jobs you have submitted (referenced by `Job ID` that gets displayed right after you submit the job).
There should also be an extra job in the queue "jupyterhub": this job runs your current Jupyter Notebook session.

The 'S' column shows the current status. 
- If it is in Q state, it is in the queue waiting for available resources. 
- If it is in R state, it is running. 
- If the job is no longer listed, it means it is completed.

**Note**: Time spent in the queue depends on the number of users accessing the edge nodes. Once these jobs begin to run, they should take from 1 to 5 minutes to complete.