
# Intel® Movidius™ Neural Compute Stick (NCS)

This lab shows how the Intel® Distribution of OpenVINO™ toolkit provides hardware abstraction to run the sample object detection application which was built in previous modules on Intel® Movidius™ Neural Compute Stick.

###  Importing dependencies, Setting the Environment variables, downloading models and Generate the IR files

In [None]:
from IPython.display import HTML
import os
import time
import sys                                     
from pathlib import Path
sys.path.insert(0, str(Path().resolve().parent.parent.parent))
from demoTools.demoutils import *
import matplotlib.pyplot as plt

In [None]:
! /opt/intel/openvino/bin/setupvars.sh

In [None]:
!/opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name mobilenet-ssd -o models

In [None]:
! cd models/public/mobilenet-ssd/  && mkdir -p FP16 && mkdir -p FP32

In [None]:
!python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_caffe.py --input_model models/public/mobilenet-ssd/mobilenet-ssd.caffemodel -o models/mobilenet-ssd/FP32 --scale 256 --mean_values [127,127,127] 

The Model Optimizer by default generate FP32 IR files if the data type is not particularly specified.
Let's run the Model Optimizer to get IR files in FP16 format suitable for the Intel® Movidius™ NCS by setting the data_type flag to FP16

In [None]:
!python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_caffe.py --input_model models/public/mobilenet-ssd/mobilenet-ssd.caffemodel -o models/mobilenet-ssd/FP16/ --scale 256 --mean_values [127,127,127] --data_type FP16

Check if the .xml and .bin files are created in folder $SV/object-detection/mobilenet-ssd/FP16.

In [None]:
 ! cd models/mobilenet-ssd/FP16 && ls


    
Now run the example application with these new IR files.


    
## Run the sample application on Intel® Movidius™ Neural Compute Stick (NCS)

#### Create Job Script 

We will run the workload on several DevCloud's edge compute nodes. We will send work to the edge compute nodes by submitting jobs into a queue. For each job, we will specify the type of the edge compute server that must be allocated for the job.

To pass the specific variables to the Python code, we will use following arguments:

* `-f`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Type of optimized models XML
* `-i`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;location of the input video
* `-r`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;output directory
* `-d`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;hardware device type (CPU, GPU, MYRIAD, HDDL or HETERO:FPGA,CPU)
* `-n`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;number of infer requests

The job file will be executed directly on the edge compute node. In this exercise, we will use -d = MYRIAD as the hardware

In [None]:
%%writefile object_detection_job_ex.sh

ME=`basename $0`

# The default path for the job is your home directory, so we change directory to where the files are.
cd $PBS_O_WORKDIR

# Object detection script writes output to a file inside a directory. We make sure that this directory exists.
# The output directory is the first argument of the bash script
while getopts 'd:f:i:r:n:?' OPTION; do
    case "$OPTION" in
    d)
        DEVICE=$OPTARG
        echo "$ME is using device $OPTARG"
      ;;

    f)
        FP_MODEL=$OPTARG
        echo "$ME is using floating point model $OPTARG"
      ;;

    i)
        INPUT_FILE=$OPTARG
        echo "$ME is using input file $OPTARG"
      ;;
    r)
        RESULTS_BASE=$OPTARG
        echo "$ME is using results base $OPTARG"
      ;;
    n)
        NUM_INFER_REQS=$OPTARG
        echo "$ME is running $OPTARG inference requests"
      ;;
    esac  
done

NN_MODEL="mobilenet-ssd.xml"
RESULTS_PATH="${RESULTS_BASE}"
mkdir -p $RESULTS_PATH
echo "$ME is using results path $RESULTS_PATH"

# Running the object detection code
SAMPLEPATH=$PBS_O_WORKDIR
python3 tutorial1.py                        -m models/mobilenet-ssd/${FP_MODEL}/${NN_MODEL}  \
                                            -i $INPUT_FILE \
                                            -o $RESULTS_PATH \
                                            -d $DEVICE \
                                            -nireq $NUM_INFER_REQS \
                                            -ce /opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_avx2.so

g++ -std=c++14 ROI_writer.cpp -o ROI_writer  -lopencv_core -lopencv_videoio -lopencv_imgproc -lopencv_highgui  -fopenmp -I/opt/intel/openvino/opencv/include/ -L/opt/intel/openvino/opencv/lib/
# Rendering the output video
SKIPFRAME=1
RESOLUTION=0.5
./ROI_writer $INPUT_FILE $RESULTS_PATH $SKIPFRAME $RESOLUTION

In [None]:
os.environ["VIDEO"] = "cars_1900.mp4"

#### Submitting to an edge compute node with Intel® Movidius™ Neural Compute Stick 2

The Model Optimizer by default generate FP32 IR files if the data type is not particularly specified. Technically, the Intel® Movidius™ devices support FP16 quantization only. The movidius plugin converts the FP32 quantized model to FP16 during runtime and does through any error even if you provide FP32 quantized model.

Let's run the Model Optimizer to get IR files in FP16 format to infer the model on the Intel® Movidius™ NCS by setting the data_type flag to FP16.

In [None]:
#Submit job to the queue
job_id_ncs2 = !qsub object_detection_job_ex.sh -l nodes=1:idc004nc2:intel-ncs2 -F "-r results/ncs2 -d MYRIAD -f FP16 -i $VIDEO -n 2" -N obj_det_ncs2
print(job_id_ncs2[0]) 
#Progress indicators
if job_id_ncs2:
    progressIndicator('results/ncs2', 'pre_progress.txt', "Preprocessing", 0, 100)
    progressIndicator('results/ncs2', 'i_progress.txt', "Inference", 0, 100)
    progressIndicator('results/ncs2', 'post_progress.txt', "Rendering", 0, 100)

In [None]:
videoHTML('IEI Tank (Intel  Movidius)', 
          ['results/ncs2/output.mp4'])


## Run the example on different hardware


#### Submitting to an edge compute node with CPU
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank 870-Q170</a> edge node with an <a 
    href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel 
    Core i5-6500TE</a>. The inference workload will run on the CPU.

In [None]:
#Submit job to the queue
job_id_core = !qsub object_detection_job_ex.sh -l nodes=1:idc001skl:tank-870:i5-6500te -F "-r results/core -d CPU -f FP32 -i $VIDEO -n 2" -N obj_det_core
print(job_id_core[0]) 
#Progress indicators
if job_id_core:
    progressIndicator('results/core', 'pre_progress.txt', "Preprocessing", 0, 100)
    progressIndicator('results/core', 'i_progress.txt', "Inference", 0, 100)
    progressIndicator('results/core', 'post_progress.txt', "Rendering", 0, 100)

In [None]:
videoHTML('IEI Tank (Intel  Core)', 
          ['results/core/output.mp4'])

#### Submitting to a node with Intel® Core CPU and using the onboard Intel GPU

In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank 870-Q170</a> edge node with an <a href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel Core i5-6500TE</a>. The inference workload will run on the Intel® HD Graphics 530 card integrated with the CPU.
    
    
Set target hardware as GPU with -d GPU

In [None]:
#Submit job to the queue
job_id_gpu = !qsub object_detection_job_ex.sh -l nodes=1:idc001skl:intel-hd-530 -F "-r results/gpu -d GPU -f FP32 -i $VIDEO -n 2" -N obj_det_gpu
print(job_id_gpu[0]) 
#Progress indicators
if job_id_gpu:
    progressIndicator('results/gpu', 'pre_progress.txt', "Preprocessing", 0, 100)
    progressIndicator('results/gpu', 'i_progress.txt', "Inference", 0, 100)
    progressIndicator('results/gpu', 'post_progress.txt', "Rendering", 0, 100)

In [None]:
videoHTML('IEI Tank (Intel  GPU)', 
          ['results/gpu/output.mp4'])

## Assess Performance

The running time of each inference task is recorded in `results/*/stats.txt`, where the subdirectory name corresponds to the architecture of the target edge compute node. Run the cell below to plot the results of all jobs side-by-side. Lower values mean better performance. Keep in mind that some architectures are optimized for the highest performance, others for low power or other metrics.

In [None]:
arch_list = [('core', 'Intel Core\ni5-6500TE\nCPU'),
             ('gpu', ' Intel Core\ni5-6500TE\nGPU'),
             ('ncs2', 'Intel\nNCS2')]
stats_list = []
for arch, a_name in arch_list:
    if 'job_id_'+arch in vars():
        stats_list.append(('results/{arch}/stats.txt'.format(arch=arch), a_name))
    else:
        stats_list.append(('placeholder'+arch, a_name))
summaryPlot(stats_list, 'Architecture', 'Time, seconds', 'Inference Engine Processing Time', 'time', 0.4)
summaryPlot(stats_list, 'Architecture', 'Frames per second', 'Inference Engine FPS', 'fps', 0.4 )