

# Optimizing Computer Vision Applications

This tutorial shows some techniques to get better performance for computer vision applications with the Intel® Distribution of OpenVINO™ toolkit.



## 1. Setup the environment variables,download model files and import dependencies

In [1]:
from IPython.display import HTML
import matplotlib.pyplot as plt
import os
import time
import sys                                     
from pathlib import Path
sys.path.insert(0, str(Path().resolve().parent.parent))
from demoTools.demoutils import *
from openvino.inference_engine import IEPlugin, IENetwork
import cv2
# For labeling the image
from out_process import placeBoxes

In [2]:
!/opt/intel/openvino/bin/setupvars.sh

[setupvars.sh] OpenVINO environment initialized


In [4]:
!/opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name mobilenet-ssd -o models
!/opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name ssd300 -o models
!/opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name ssd512 -o models



###############|| Downloading topologies ||###############

... 100%, 28 KB, 64593 KB/s, 0 seconds passed

... 100%, 22605 KB, 22197 KB/s, 1 seconds passed


###############|| Post processing ||###############


###############|| Downloading topologies ||###############

... 100%, 95497 KB, 25761 KB/s, 3 seconds passed


###############|| Post processing ||###############


###############|| Downloading topologies ||###############

... 100%, 98624 KB, 20327 KB/s, 4 seconds passed


###############|| Post processing ||###############



In [7]:
!mkdir -p models/object_detection/SSD512/{FP16,FP32} 
!mkdir -p models/object_detection/SSD300/{FP16,FP32}

### Run Model Optimizer on the models to get IR files

First, we will create the required directories, then run the model Optimizer to get the IR files. 

In [8]:
! python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_caffe.py --input_model models/object_detection/common/mobilenet-ssd/caffe/mobilenet-ssd.caffemodel -o models/object_detection/common/mobilenet-ssd/FP32/ --scale 256 --mean_values [127,127,127]
! python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_caffe.py --input_model models/object_detection/common/mobilenet-ssd/caffe/mobilenet-ssd.caffemodel -o models/object_detection/common/mobilenet-ssd/FP16/ --scale 256 --mean_values [127,127,127] --data_type FP16
! python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_caffe.py --input_model models/object_detection/common/ssd/300/caffe/ssd300.caffemodel -o models/object_detection/SSD300/FP32/
! python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_caffe.py --input_model models/object_detection/common/ssd/300/caffe/ssd300.caffemodel -o models/object_detection/SSD300/FP16/ --data_type FP16
! python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_caffe.py --input_model models/object_detection/common/ssd/512/caffe/ssd512.caffemodel -o models/object_detection/SSD512/FP32/
! python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_caffe.py --input_model models/object_detection/common/ssd/512/caffe/ssd512.caffemodel -o models/object_detection/SSD512/FP16/ --data_type FP16


Model Optimizer arguments:
Common parameters:
	- Path to the Input Model: 	/home/u26212/30/iot-devcloud/smart_video_workshop/optimization-tools-and-techniques/Python/models/object_detection/common/mobilenet-ssd/caffe/mobilenet-ssd.caffemodel
	- Path for generated IR: 	/home/u26212/30/iot-devcloud/smart_video_workshop/optimization-tools-and-techniques/Python/models/object_detection/common/mobilenet-ssd/FP32/
	- IR output name: 	mobilenet-ssd
	- Log level: 	ERROR
	- Batch: 	Not specified, inherited from the model
	- Input layers: 	Not specified, inherited from the model
	- Output layers: 	Not specified, inherited from the model
	- Input shapes: 	Not specified, inherited from the model
	- Mean values: 	[127,127,127]
	- Scale values: 	Not specified
	- Scale factor: 	256.0
	- Precision of IR: 	FP32
	- Enable fusing: 	True
	- Enable grouped convolutions fusing: 	True
	- Move mean values to preprocess section: 	False
	- Reverse input channels: 	False
Caffe specific parameters:
	- Enable resne


[ SUCCESS ] Generated IR model.
[ SUCCESS ] XML file: /home/u26212/30/iot-devcloud/smart_video_workshop/optimization-tools-and-techniques/Python/models/object_detection/SSD512/FP32/ssd512.xml
[ SUCCESS ] BIN file: /home/u26212/30/iot-devcloud/smart_video_workshop/optimization-tools-and-techniques/Python/models/object_detection/SSD512/FP32/ssd512.bin
[ SUCCESS ] Total execution time: 10.59 seconds. 
Model Optimizer arguments:
Common parameters:
	- Path to the Input Model: 	/home/u26212/30/iot-devcloud/smart_video_workshop/optimization-tools-and-techniques/Python/models/object_detection/common/ssd/512/caffe/ssd512.caffemodel
	- Path for generated IR: 	/home/u26212/30/iot-devcloud/smart_video_workshop/optimization-tools-and-techniques/Python/models/object_detection/SSD512/FP16/
	- IR output name: 	ssd512
	- Log level: 	ERROR
	- Batch: 	Not specified, inherited from the model
	- Input layers: 	Not specified, inherited from the model
	- Output layers: 	Not specified, inherited from the mod


## 2. Tune parameters - set batch size

In this section, we will see how changes in the batch size affect the performance. 

### Let us first look at the performance numbers for the batch size 1.
The default batch size for the Model Optimizer is 1.

In [23]:
! python3 tutorial1_batch.py -i cars_1900.mp4 -m models/object_detection/common/mobilenet-ssd/FP32/mobilenet-ssd.xml -l $HOME/inference_engine_samples_build/intel64/Release/lib/libcpu_extension.so

[ INFO ] Initializing plugin for CPU device...
[ INFO ] Reading IR...
[ INFO ] Loading IR to the plugin...
inputdims= 300 300 3 1
outputdims= 7 100 1 1
SSD Mode
[ INFO ] Starting inference in async mode...
framenum:256

Preprocess: 22.98 ms/frame
Inference: 8.37 ms/frame 
Postprocess: 0.61 ms/frame



    
### Change the batch size to 2 and run the object-detection example for new batch size

In [9]:
! python3 tutorial1_batch.py -i cars_1900.mp4 -m models/object_detection/common/mobilenet-ssd/FP32/mobilenet-ssd.xml -l $HOME/inference_engine_samples_build/intel64/Release/lib/libcpu_extension.so  -b 2 

[ INFO ] Initializing plugin for CPU device...
[ INFO ] Reading IR...
[ INFO ] Loading IR to the plugin...
inputdims= 300 300 3 2
outputdims= 7 100 1 2
SSD Mode
[ INFO ] Starting inference in async mode...
framenum:256

Preprocess: 23.60 ms/frame
Inference: 6.58 ms/frame 
Postprocess: 0.77 ms/frame



    
### Run the example for different batch sizes

Change the batch sizes to 8,16,32,64,128 and so on and see the performance difference in terms of the inference time.

## 3. Pick the right model based on application and hardware

Use/train a model with the right performance/accuracy tradeoffs. Performance differences between models can be bigger than any optimization you can do at the inference app level. Run various SSD models from the model_downloader in the car detection example which we used in the initial tutorial and observe the performance. We will run these tests on different hardware accelerators to determine how application performance depends on models as well as hardware.

In the previous step we have all the models convered and ready by model Optimizer. 

### Set environmental variables

In [10]:
! /opt/intel/openvino/bin/setupvars.sh

[setupvars.sh] OpenVINO environment initialized


In [11]:
os.environ["VIDEO"] = "cars_1900.mp4"

### Create Job Script 

We will run the workload on several DevCloud's edge compute nodes. We will send work to the edge compute nodes by submitting jobs into a queue. For each job, we will specify the type of the edge compute server that must be allocated for the job.

To pass the specific variables to the Python code, we will use following arguments:

* `-m`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;location of the optimized models XML
* `-i`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;location of the input video
* `-r`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;output directory
* `-d`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;hardware device type (CPU, GPU, MYRIAD, HDDL or HETERO:FPGA,CPU)
* `-n`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;number of infer requests

The job file will be executed directly on the edge compute node.

In [27]:
%%writefile object_detection.sh

ME=`basename $0`

# The default path for the job is your home directory, so we change directory to where the files are.
cd $PBS_O_WORKDIR

# Object detection script writes output to a file inside a directory. We make sure that this directory exists.
# The output directory is the first argument of the bash script
while getopts 'd:f:i:r:n:m:x:?' OPTION; do
    case "$OPTION" in
    d)
        DEVICE=$OPTARG
        echo "$ME is using device $OPTARG"
      ;;

    i)
        INPUT_FILE=$OPTARG
        echo "$ME is using input file $OPTARG"
      ;;
    r)
        RESULTS_BASE=$OPTARG
        echo "$ME is using results base $OPTARG"
      ;;
    n)
        NUM_INFER_REQS=$OPTARG
        echo "$ME is running $OPTARG inference requests"
      ;;
    m)
        NUM_MODEL_PATH=$OPTARG
        echo "$ME is running $OPTARG inference requests"
      ;;
    esac  
done

RESULTS_PATH="${RESULTS_BASE}"
mkdir -p $RESULTS_PATH
echo "$ME is using results path $RESULTS_PATH"

if [ "$DEVICE" = "HETERO:FPGA,CPU" ]; then
    # Environment variables and compilation for edge compute nodes with FPGAs
    export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/opt/altera/aocl-pro-rte/aclrte-linux64/
    # Environment variables and compilation for edge compute nodes with FPGAs
    source /opt/fpga_support_files/setup_env.sh
    aocl program acl0 /opt/intel/openvino/bitstreams/a10_vision_design_bitstreams/2019R1_PL1_FP11_MobileNet_Clamp.aocx
fi
    
# Running the object detection code
SAMPLEPATH=$PBS_O_WORKDIR
python3 tutorial1.py                        -m $NUM_MODEL_PATH \
                                            -i $INPUT_FILE \
                                            -o $RESULTS_PATH \
                                            -d $DEVICE \
                                            -nireq $NUM_INFER_REQS \
                                            -ce /opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_avx2.so

g++ -std=c++14 ROI_writer.cpp -o ROI_writer  -lopencv_core -lopencv_videoio -lopencv_imgproc -lopencv_highgui  -fopenmp -I/opt/intel/openvino/opencv/include/ -L/opt/intel/openvino/opencv/lib/
# Rendering the output video
SKIPFRAME=1
RESOLUTION=0.5
./ROI_writer $INPUT_FILE $RESULTS_PATH $SKIPFRAME $RESOLUTION

Overwriting object_detection.sh


    
### Run the object detection example with different models on different devices.

For simplicity of the code and in order to put more focus on the performance number, video rendering with rectangle boxes for detected objects has been separated from object detection example(tutorial1.py). The inference difference in different scenarios can be seen in the progress bar after running the sample. 


### a) CPU

#### - Inferencing using **mobilenet-ssd** model

In [28]:
#Submit job to the queue
job_id_core = !qsub object_detection.sh -l nodes=1:idc001skl:i5-6500te -F "-r results/Core/mobilenet -d CPU -i $VIDEO -m models/object_detection/common/mobilenet-ssd/FP32/mobilenet-ssd.xml -n 2" -N obj_det_core
print(job_id_core[0]) 
#Progress indicators
if job_id_core:
    progressIndicator('results/Core/mobilenet', 'pre_progress.txt', "Preprocessing", 0, 100)
    progressIndicator('results/Core/mobilenet', 'i_progress.txt', "Inference", 0, 100)
    progressIndicator('results/Core/mobilenet', 'post_progress.txt', "Rendering", 0, 100)

48054.c003


HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Preprocessing', style=ProgressStyle(des…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Inference', style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Rendering', style=ProgressStyle(descrip…

#### - Inferencing using **ssd300** model

In [29]:
#Submit job to the queue
job_id_core = !qsub object_detection.sh -l nodes=1:idc001skl:i5-6500te -F "-r results/Core/ssd300 -d CPU -i $VIDEO -n 2 -m models/object_detection/SSD300/FP32/ssd300.xml" -N obj_det_core
print(job_id_core[0]) 
#Progress indicators
if job_id_core:
    progressIndicator('results/Core/ssd300', 'pre_progress.txt', "Preprocessing", 0, 100)
    progressIndicator('results/Core/ssd300', 'i_progress.txt', "Inference", 0, 100)
    progressIndicator('results/Core/ssd300', 'post_progress.txt', "Rendering", 0, 100)

48055.c003


HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Preprocessing', style=ProgressStyle(des…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Inference', style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Rendering', style=ProgressStyle(descrip…

#### - Inferencing using **ssd512** model

In [37]:
#Submit job to the queue
job_id_core = !qsub object_detection.sh -l nodes=1:idc001skl:i5-6500te -F "-r results/Core/ssd512 -d CPU -i $VIDEO -n 2 -m models/object_detection/SSD512/FP32/ssd512.xml" -N obj_det_core
print(job_id_core[0]) 
#Progress indicators
if job_id_core:
    progressIndicator('results/Core/ssd512', 'pre_progress.txt', "Preprocessing", 0, 100)
    progressIndicator('results/Core/ssd512', 'i_progress.txt', "Inference", 0, 100)
    progressIndicator('results/Core/ssd512', 'post_progress.txt', "Rendering", 0, 100)

48063.c003


HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Preprocessing', style=ProgressStyle(des…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Inference', style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Rendering', style=ProgressStyle(descrip…


    
### b) GPU

#### - Inferencing using **mobilenet-ssd** model

In [38]:
#Submit job to the queue
job_id_gpu = !qsub object_detection.sh -l nodes=1:idc001skl:intel-hd-530 -F "-r results/GPU/mobilenet -d GPU -i $VIDEO -m models/object_detection/common/mobilenet-ssd/FP32/mobilenet-ssd.xml -n 4" -N obj_det_gpu 
print(job_id_gpu[0]) 
#Progress indicators
if job_id_gpu:
    progressIndicator('results/GPU/mobilenet', 'pre_progress.txt', "Preprocessing", 0, 100)
    progressIndicator('results/GPU/mobilenet', 'i_progress.txt', "Inference", 0, 100)
    progressIndicator('results/GPU/mobilenet', 'post_progress.txt', "Rendering", 0, 100)

48064.c003


HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Preprocessing', style=ProgressStyle(des…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Inference', style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Rendering', style=ProgressStyle(descrip…

#### - Inferencing using model: ssd300

In [39]:
#Submit job to the queue
job_id_gpu = !qsub object_detection.sh -l nodes=1:idc001skl:intel-hd-530 -F "-r results/GPU/ssd300 -d GPU -i $VIDEO -m models/object_detection/SSD300/FP32/ssd300.xml -n 4" -N obj_det_gpu 
print(job_id_gpu[0]) 
#Progress indicators
if job_id_gpu:
    progressIndicator('results/GPU/ssd300', 'pre_progress.txt', "Preprocessing", 0, 100)
    progressIndicator('results/GPU/ssd300', 'i_progress.txt', "Inference", 0, 100)
    progressIndicator('results/GPU/ssd300', 'post_progress.txt', "Rendering", 0, 100)

48065.c003


HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Preprocessing', style=ProgressStyle(des…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Inference', style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Rendering', style=ProgressStyle(descrip…

#### - Inferencing using model: ssd512

In [40]:
#Submit job to the queue
job_id_gpu = !qsub object_detection.sh -l nodes=1:idc001skl:intel-hd-530 -F "-r results/GPU/ssd512 -d GPU -i $VIDEO -m models/object_detection/SSD512/FP32/ssd512.xml -n 4" -N obj_det_gpu 
print(job_id_gpu[0]) 
#Progress indicators
if job_id_gpu:
    progressIndicator('results/GPU/ssd512', 'pre_progress.txt', "Preprocessing", 0, 100)
    progressIndicator('results/GPU/ssd512', 'i_progress.txt', "Inference", 0, 100)
    progressIndicator('results/GPU/ssd512', 'post_progress.txt', "Rendering", 0, 100)

48066.c003


HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Preprocessing', style=ProgressStyle(des…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Inference', style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Rendering', style=ProgressStyle(descrip…


    
### c) Intel® Movidius™ Neural Compute Stick

#### - Inferencing using **mobilenet-ssd** model

In [35]:
#Submit job to the queue
job_id_ncs2 = !qsub object_detection.sh -l nodes=1:idc004nc2:intel-ncs2 -F "-r results/ncs2/mobilenet -d MYRIAD -i $VIDEO -m models/object_detection/common/mobilenet-ssd/FP16/mobilenet-ssd.xml -n 2" -N obj_det_ncs2
print(job_id_ncs2[0]) 
#Progress indicators
if job_id_ncs2:
    progressIndicator('results/ncs2/mobilenet', 'pre_progress.txt', "Preprocessing", 0, 100)
    progressIndicator('results/ncs2/mobilenet', 'i_progress.txt', "Inference", 0, 100)
    progressIndicator('results/ncs2/mobilenet', 'post_progress.txt', "Rendering", 0, 100)

48061.c003


HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Preprocessing', style=ProgressStyle(des…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Inference', style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Rendering', style=ProgressStyle(descrip…

#### - Inferencing using model: ssd300

In [12]:
#Submit job to the queue
job_id_ncs2 = !qsub object_detection.sh -l nodes=1:idc004nc2:intel-ncs2 -F "-r results/ncs2/ssd300 -d MYRIAD -i $VIDEO -m models/object_detection/SSD300/FP16/ssd300.xml -n 8" -N obj_det_ncs2
print(job_id_ncs2[0]) 
#Progress indicators
if job_id_ncs2:
    progressIndicator('results/ncs2/ssd300', 'pre_progress.txt', "Preprocessing", 0, 100)
    progressIndicator('results/ncs2/ssd300', 'i_progress.txt', "Inference", 0, 100)
    progressIndicator('results/ncs2/ssd300', 'post_progress.txt', "Rendering", 0, 100)

48085.c003


HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Preprocessing', style=ProgressStyle(des…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Inference', style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Rendering', style=ProgressStyle(descrip…

#### - Inferencing using model: ssd512

In [36]:
#Submit job to the queue
job_id_ncs2 = !qsub object_detection.sh -l nodes=1:idc004nc2:intel-ncs2 -F "-r results/ncs2/ssd512 -d MYRIAD -i $VIDEO -m models/object_detection/SSD512/FP16/ssd512.xml -n 8" -N obj_det_ncs2
print(job_id_ncs2[0]) 
#Progress indicators
if job_id_ncs2:
    progressIndicator('results/ncs2/ssd512', 'pre_progress.txt', "Preprocessing", 0, 100)
    progressIndicator('results/ncs2/ssd512', 'i_progress.txt', "Inference", 0, 100)
    progressIndicator('results/ncs2/ssd512', 'post_progress.txt', "Rendering", 0, 100)

48062.c003


HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Preprocessing', style=ProgressStyle(des…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Inference', style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Rendering', style=ProgressStyle(descrip…

### 3. Use the right data type for your target hardware and accuracy needs

In this section, we will consider an example running on a GPU. FP16 operations are better optimized than FP32 on GPUs. We will run the object detection example with SSD models with data types FP16 and FP32 and observe the performance difference.

In [41]:
#Submit job to the queue
job_id_gpu = !qsub object_detection.sh -l nodes=1:idc001skl:intel-hd-530 -F "-r results/GPU/3/FP32 -d GPU -i $VIDEO -m models/object_detection/common/mobilenet-ssd/FP32/mobilenet-ssd.xml -n 4" -N obj_det_gpu 
print(job_id_gpu[0]) 
#Progress indicators
if job_id_gpu:
    progressIndicator('results/GPU/3/FP32', 'pre_progress.txt', "Preprocessing", 0, 100)
    progressIndicator('results/GPU/3/FP32', 'i_progress.txt', "Inference", 0, 100)
    progressIndicator('results/GPU/3/FP32', 'post_progress.txt', "Rendering", 0, 100)

48067.c003


HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Preprocessing', style=ProgressStyle(des…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Inference', style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Rendering', style=ProgressStyle(descrip…

In [42]:
#Submit job to the queue
job_id_gpu = !qsub object_detection.sh -l nodes=1:idc001skl:intel-hd-530 -F "-r results/GPU/3/FP16 -d GPU -i $VIDEO -m models/object_detection/common/mobilenet-ssd/FP16/mobilenet-ssd.xml -n 4" -N obj_det_gpu 
print(job_id_gpu[0]) 
#Progress indicators
if job_id_gpu:
    progressIndicator('results/GPU/3/FP16', 'pre_progress.txt', "Preprocessing", 0, 100)
    progressIndicator('results/GPU/3/FP16', 'i_progress.txt', "Inference", 0, 100)
    progressIndicator('results/GPU/3/FP16', 'post_progress.txt', "Rendering", 0, 100)

48068.c003


HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Preprocessing', style=ProgressStyle(des…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Inference', style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Rendering', style=ProgressStyle(descrip…


    
It is clear that we got better performance with FP16 models.
### 4. Use async

The async API can improve the overall frame rate of the application. While the accelerator is busy with running inference operations, the application can continue encoding, decoding or post inference data processing on the host. For this section, we will use the object_detection_demo_ssd_async sample. This sample makes asynchronous requests to the inference engine. This reduces the inference request latency, so that the overall framerate is determined by the MAXIMUM(detection time, input capturing time) and not the SUM(detection time, input capturing time).



    
### a) Run the async example

In [8]:
!./object_detection_demo_ssd_async -i cars_1900.mp4 -m models/object_detection/common/mobilenet-ssd/FP32/mobilenet-ssd.xml

InferenceEngine: 
	API version ............ 1.6
	Build .................. custom_releases/2019/R1_c9b66a26e4d65bb986bb740e73f58c6e9e84c7c2
[ INFO ] Parsing input parameters
[ INFO ] Reading input
[ INFO ] Loading plugin

	API version ............ 1.6
	Build .................. 22443
	Description ....... MKLDNNPlugin
[ INFO ] Loading network files
[ INFO ] Batch size is forced to  1.
[ INFO ] Checking that the inputs are as the demo expects
[ INFO ] Checking that the outputs are as the demo expects
[ INFO ] Loading model to the plugin
[ INFO ] Start inference 
To close the application, press 'CTRL+C' or any key with focus on the output window
Encountered last video frame. Exiting..
Total Inference time: 286824
[ INFO ] Execution successful


Observe the number of fps (frames per second) for both sync and async mode in the videos below. The number frames processed per second are more in async than the sync mode.

There are important performance caveats though. Tasks that run in parallel should try to avoid oversubscribing to shared computing resources. For example, if the inference tasks are running on the FPGA and the CPU is essentially idle, then it makes sense to run tasks on the CPU in parallel.

In [1]:
videoHTML('ASync Mode :',
           ['results/Core/output_async.mp4']
        )

NameError: name 'videoHTML' is not defined

In [None]:
!./object_detection_demo_ssd_sync -i cars_1900.mp4 -m models/object_detection/common/mobilenet-ssd/FP32/mobilenet-ssd.xml

InferenceEngine: 
	API version ............ 1.6
	Build .................. custom_releases/2019/R1_c9b66a26e4d65bb986bb740e73f58c6e9e84c7c2
[ INFO ] Parsing input parameters
[ INFO ] Reading input
[ INFO ] Loading plugin

	API version ............ 1.6
	Build .................. 22443
	Description ....... MKLDNNPlugin
[ INFO ] Loading network files
[ INFO ] Batch size is forced to  1.
[ INFO ] Checking that the inputs are as the demo expects
[ INFO ] Checking that the outputs are as the demo expects
[ INFO ] Loading model to the plugin
[ INFO ] Start inference 
To close the application, press 'CTRL+C' or any key with focus on the output window
Encountered last video frame. Exiting..
Total Inference time: 287257


In [6]:
videoHTML('Sync Mode :',
           ['results/Core/output_sync.mp4']
        )

In both the Video Sync and Async, we can see the inference timing. The timing for the async are lower and frames per second are higher than the sync Video