# Optimizing Computer Vision Applications

This tutorial shows some techniques to get better performance for computer vision applications with the Intel® Distribution of OpenVINO™ toolkit.


## 1. Setup the environment variables,download model files and import dependencies

In [None]:
from IPython.display import HTML
import os
import time
import sys                                     
from pathlib import Path
sys.path.insert(0, str(Path().resolve().parent.parent.parent))
from demoTools.demoutils import *

In [None]:
!/opt/intel/openvino/bin/setupvars.sh

In [None]:
!/opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name mobilenet-ssd -o models
!/opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name ssd300 -o models
!/opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name ssd512 -o models

In [None]:
!mkdir -p models/object_detection/SSD512/{FP16,FP32} 
!mkdir -p models/object_detection/SSD300/{FP16,FP32}

### Run Model Optimizer on the models to get IR files

First, we will create the required directories, then run the model Optimizer to get the IR files. 

In [None]:
! python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_caffe.py --input_model models/public/mobilenet-ssd/mobilenet-ssd.caffemodel -o models/mobilenet-ssd/FP32/ --scale 256 --mean_values [127,127,127]
! python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_caffe.py --input_model models/public/mobilenet-ssd/mobilenet-ssd.caffemodel -o models/mobilenet-ssd/FP16/ --scale 256 --mean_values [127,127,127] --data_type FP16
! python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_caffe.py --input_model models/public/ssd300/models/VGGNet/VOC0712Plus/SSD_300x300_ft/VGG_VOC0712Plus_SSD_300x300_ft_iter_160000.caffemodel --input_proto models/public/ssd300/models/VGGNet/VOC0712Plus/SSD_300x300_ft/deploy.prototxt  -o models/SSD300/FP32/
! python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_caffe.py --input_model models/public/ssd300/models/VGGNet/VOC0712Plus/SSD_300x300_ft/VGG_VOC0712Plus_SSD_300x300_ft_iter_160000.caffemodel --input_proto models/public/ssd300/models/VGGNet/VOC0712Plus/SSD_300x300_ft/deploy.prototxt -o models/SSD300/FP16/ --data_type FP16
! python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_caffe.py --input_model models/public/ssd512/models/VGGNet/VOC0712Plus/SSD_512x512/VGG_VOC0712Plus_SSD_512x512_iter_240000.caffemodel --input_proto models/public/ssd512/models/VGGNet/VOC0712Plus/SSD_512x512/deploy.prototxt -o models/SSD512/FP32/
! python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_caffe.py --input_model models/public/ssd512/models/VGGNet/VOC0712Plus/SSD_512x512/VGG_VOC0712Plus_SSD_512x512_iter_240000.caffemodel --input_proto models/public/ssd512/models/VGGNet/VOC0712Plus/SSD_512x512/deploy.prototxt -o models/SSD512/FP16/ --data_type FP16

## 2. Pick the right model based on application and hardware

Use/train a model with the right performance/accuracy tradeoffs. Performance differences between models can be bigger than any optimization you can do at the inference app level. Run various SSD models from the model_downloader in the car detection example which we used in the initial tutorial and observe the performance. We will run these tests on different hardware accelerators to determine how application performance depends on models as well as hardware.

In the previous step we have all the models convered and ready by model Optimizer. 

### Set environmental variables

In [None]:
!ln -sf /data/reference-sample-data/object-detection-python/cars_1900.mp4 
videoHTML('Cars video', ['cars_1900.mp4'])

### Compile the code

The code in this demo is separated into two parts.
First part is responsible for reading the input stream and running the object detection inference workload on the stream. 
This part outputs Region Of Interest (ROI), in terms of coordinates, for each frame.
The source code for this part can be found in [main.cpp](./main.cpp), and the executable will be named "tutorial1".
Output ROI will be written into a text file, "ROIs.txt".

The second part reads the ROIs.txt file, and overlays boxes on each frame of the stream based on the coordinates.
Then the output video is written into a file. 
The source code for this step is in [ROI_writer.cpp](./ROI_writer.cpp).

We have provided a Makefile for compiling the examples. Run the following cell to compile the application.
(tip: use **crtl+enter** to run the cell)

In [None]:
!make

### Commandline flags

The two executables, tutorial1 and ROIwriter, take a number of commandline arguments.

Run the following cells to see the list of the available arguments: 

In [None]:
! ./tutorial1 -h

In [None]:
!./ROI_writer -h

### Create Job Script 

We will run the workload on several DevCloud's edge compute nodes. We will send work to the edge compute nodes by submitting jobs into a queue. For each job, we will specify the type of the edge compute server that must be allocated for the job.

The job file will be executed directly on the edge compute node.

In [None]:
%%writefile object_detection_job.sh

# The default path for the job is your home directory, so we change directory to where the files are.
cd $PBS_O_WORKDIR
OUTPUT_FILE=$1
DEVICE=$2
MODEL=$3
# Object detection script writes output to a file inside a directory. We make sure that this directory exists.
#  The output directory is the first argument of the bash script
mkdir -p $OUTPUT_FILE
ROIFILE=$OUTPUT_FILE/ROIs.txt
OVIDEO=$OUTPUT_FILE/output.mp4

if [ "$MODEL" = "FP32" ]; then
    config_file="conf_fp32.txt"
else
    config_file="conf_fp16.txt"
fi

# Running the object detection code
SAMPLEPATH=$PBS_O_WORKDIR
./tutorial1 -i /data/reference-sample-data/object-detection-python/cars_1900.mp4 \
            -m $MODEL \
            -d $DEVICE \
            -o $OUTPUT_FILE\
            -fr 3000

# Converting the text output to a video
./ROI_writer -i /data/reference-sample-data/object-detection-python/cars_1900.mp4 \
             -o $OUTPUT_FILE \
             -ROIfile $ROIFILE \
             -l pascal_voc_classes.txt \
             -r 2.0 # output in half res

## 3. Run the object detection example with different models on different devices.

For simplicity of the code and in order to put more focus on the performance number, video rendering with rectangle boxes for detected objects has been separated from object detection example(tutorial1.py). The inference difference in different scenarios can be seen in the progress bar after running the sample. 


### a) CPU

#### - Inferencing using **mobilenet-ssd** model

In [None]:
print("Submitting a job to an edge compute node with an Intel Core CPU...")
#Submit job to the queue
job_id_core = !qsub object_detection_job.sh -l nodes=1:tank-870:i5-6500te -F "results/core/mobilenet CPU models/mobilenet-ssd/FP32/mobilenet-ssd.xml" -N obj_det_cpu
print(job_id_core[0])
#Progress indicators
if job_id_core:
   progressIndicator('results/core/mobilenet', 'i_progress_'+job_id_core[0]+'.txt', "Inference", 0, 100)
   progressIndicator('results/core/mobilenet', 'v_progress_'+job_id_core[0]+'.txt', "Rendering", 0, 100)

#### - Inferencing using **ssd300** model

In [None]:
#Submit job to the queue
job_id_core = !qsub object_detection_job.sh -l nodes=1:idc001skl:i5-6500te -F "results/Core/ssd300 CPU models/SSD300/FP32/VGG_VOC0712Plus_SSD_300x300_ft_iter_160000.xml" -N obj_det_cpu
print(job_id_core[0]) 
#Progress indicators
if job_id_core:
    progressIndicator('results/Core/ssd300', 'i_progress_'+job_id_core[0]+'.txt', "Inference", 0, 100)
    progressIndicator('results/Core/ssd300', 'v_progress_'+job_id_core[0]+'.txt', "Rendering", 0, 100)

#### - Inferencing using **ssd512** model

In [None]:
#Submit job to the queue
job_id_core = !qsub object_detection_job.sh -l nodes=1:idc001skl:i5-6500te -F "results/Core/ssd512 CPU models/SSD512/FP32/VGG_VOC0712Plus_SSD_512x512_iter_240000.xml" -N obj_det_cpu
print(job_id_core[0]) 
#Progress indicators
if job_id_core:
    progressIndicator('results/Core/ssd512', 'i_progress_'+job_id_core[0]+'.txt', "Inference", 0, 100)
    progressIndicator('results/Core/ssd512', 'v_progress_'+job_id_core[0]+'.txt', "Rendering", 0, 100)

### b) GPU

#### - Inferencing using **mobilenet-ssd** model

In [None]:
#Submit job to the queue
job_id_core = !qsub object_detection_job.sh -l nodes=1:idc001skl:intel-hd-530 -F "results/Core/mobilenet GPU models/mobilenet-ssd/FP32/mobilenet-ssd.xml" -N obj_det_gpu
print(job_id_core[0]) 
#Progress indicators
if job_id_core:
    progressIndicator('results/Core/mobilenet', 'i_progress_'+job_id_core[0]+'.txt', "Inference", 0, 100)
    progressIndicator('results/Core/mobilenet', 'v_progress_'+job_id_core[0]+'.txt', "Rendering", 0, 100)

#### - Inferencing using model: ssd300

In [None]:
#Submit job to the queue
job_id_core = !qsub object_detection_job.sh -l nodes=1:idc001skl:intel-hd-530 -F "results/Core/ssd300 GPU models/SSD300/FP32/VGG_VOC0712Plus_SSD_300x300_ft_iter_160000.xml" -N obj_det_gpu
print(job_id_core[0]) 
#Progress indicators
if job_id_core:
    progressIndicator('results/Core/ssd300', 'i_progress_'+job_id_core[0]+'.txt', "Inference", 0, 100)
    progressIndicator('results/Core/ssd300', 'v_progress_'+job_id_core[0]+'.txt', "Rendering", 0, 100)

#### - Inferencing using model: ssd512

In [None]:
#Submit job to the queue
job_id_core = !qsub object_detection_job.sh -l nodes=1:idc001skl:intel-hd-530 -F "results/Core/ssd512 GPU models/SSD512/FP32/VGG_VOC0712Plus_SSD_512x512_iter_240000.xml" -N obj_det_gpu
print(job_id_core[0]) 
#Progress indicators
if job_id_core:
    progressIndicator('results/Core/ssd512', 'i_progress_'+job_id_core[0]+'.txt', "Inference", 0, 100)
    progressIndicator('results/Core/ssd512', 'v_progress_'+job_id_core[0]+'.txt', "Rendering", 0, 100)

### c) Intel® Movidius™ Neural Compute Stick

#### - Inferencing using **mobilenet-ssd** model

In [None]:
#Submit job to the queue
job_id_core = !qsub object_detection_job.sh -l nodes=1:idc004nc2:intel-ncs2 -F "results/ncs/mobilenet MYRIAD models/mobilenet-ssd/FP32/mobilenet-ssd.xml" -N obj_det_ncs2
print(job_id_core[0]) 
#Progress indicators
if job_id_core:
    progressIndicator('results/ncs/mobilenet', 'i_progress_'+job_id_core[0]+'.txt', "Inference", 0, 100)
    progressIndicator('results/ncs/mobilenet', 'v_progress_'+job_id_core[0]+'.txt', "Rendering", 0, 100)

#### - Inferencing using model: ssd300

In [None]:
#Submit job to the queue
job_id_core = !qsub object_detection_job.sh -l nodes=1:idc004nc2:intel-ncs2 -F "results/ncs/ssd300 MYRIAD models/SSD300/FP32/VGG_VOC0712Plus_SSD_300x300_ft_iter_160000.xml" -N obj_det_ncs2
print(job_id_core[0]) 
#Progress indicators
if job_id_core:
    progressIndicator('results/ncs/ssd300', 'i_progress_'+job_id_core[0]+'.txt', "Inference", 0, 100)
    progressIndicator('results/ncs/ssd300', 'v_progress_'+job_id_core[0]+'.txt', "Rendering", 0, 100)

#### - Inferencing using model: ssd512

In [None]:
#Submit job to the queue
job_id_core = !qsub object_detection_job.sh -l nodes=1:idc004nc2:intel-ncs2 -F "results/ncs/ssd512 MYRIAD models/SSD512/FP32/VGG_VOC0712Plus_SSD_512x512_iter_240000.xml" -N obj_det_ncs2
print(job_id_core[0]) 
#Progress indicators
if job_id_core:
    progressIndicator('results/ncs/ssd512', 'i_progress_'+job_id_core[0]+'.txt', "Inference", 0, 100)
    progressIndicator('results/ncs/ssd512', 'v_progress_'+job_id_core[0]+'.txt', "Rendering", 0, 100)

### 4. Use the right data type for your target hardware and accuracy needs

In this section, we will consider an example running on a GPU. FP16 operations are better optimized than FP32 on GPUs. We will run the object detection example with SSD models with data types FP16 and FP32 and observe the performance difference.

In [None]:
#Submit job to the queue
job_id_core = !qsub object_detection_job.sh -l nodes=1:idc001skl:intel-hd-530 -F "results/GPU/mobilenet GPU models/mobilenet-ssd/FP32/mobilenet-ssd.xml" -N obj_det_gpu
print(job_id_core[0]) 
#Progress indicators
if job_id_core:
    progressIndicator('results/GPU/mobilenet', 'i_progress_'+job_id_core[0]+'.txt', "Inference", 0, 100)
    progressIndicator('results/GPU/mobilenet', 'v_progress_'+job_id_core[0]+'.txt', "Rendering", 0, 100)

In [None]:
#Submit job to the queue
job_id_core = !qsub object_detection_job.sh -l nodes=1:idc001skl:intel-hd-530 -F "results/GPU/mobilenet GPU models/mobilenet-ssd/FP16/mobilenet-ssd.xml" -N obj_det_gpu
print(job_id_core[0]) 
#Progress indicators
if job_id_core:
    progressIndicator('results/GPU/mobilenet', 'i_progress_'+job_id_core[0]+'.txt', "Inference", 0, 100)
    progressIndicator('results/GPU/mobilenet', 'v_progress_'+job_id_core[0]+'.txt', "Rendering", 0, 100)

It is clear that we got better performance with FP16 models.