# Object Detection Demo: Car Detection

This is a sample reference implementation to showcase object detection (car in this case) with single-shot detection (SSD) and Async API.
Async API improves the overall frame-rate of the application by not waiting for the inference to complete but continuing to do things on the host while inference accelerator is busy. 
Specifically, this code demonstrates two parallel inference requests by processing the current frame while the next input frame is being captured. This essentially hides the latency of frame capture.

## Overview of how it works
At start-up the sample application reads the equivalent of command line arguments and loads a network and image from the video input to the Inference Engine (IE) plugin. 
A job is submitted to a hardware accelerator (Intel® Core CPU, Intel® HD Graphics GPU, Intel® Core CPU, Intel® Movidius™ Neural Compute Stick, Intel® Neural Compute Stick 2, Intel® HDDL-R, and Intel® HDDL-F.
After the inference is completed, the output videos are appropriately stored in the /results directory, which can then be viewed within the Jupyter Notebook instance.

## Demonstration objectives
* Video as input is supported using **OpenCV**
* Inference performed on edge hardware (rather than on the development node hosting this Jupyter notebook)
* **OpenCV** provides the bounding boxes, labels and other information
* Visualization of the resulting bounding boxes
* Demonstrate the Async API in action


## Step 0: Set Up

### 0.1: Import dependencies

Run the below cell to import Python dependencies needed for displaying the results in this notebook
(tip: select the cell and use **Ctrl+enter** to run the cell)

In [None]:
from IPython.display import HTML
import matplotlib.pyplot as plt
import os
import time
import sys                                     
from pathlib import Path
sys.path.insert(0, str(Path().resolve().parent.parent))
from demoTools.demoutils import *
from openvino.inference_engine import IEPlugin, IENetwork
import cv2
# For labeling the image
from out_process import placeBoxes

We will start with running inference on a single image to see how the Intel® Distribution of OpenVINO™ toolkit works, then we will run inference on a video stream.

We will go over detecting cars with OpenVINO in several steps:

1. Create an intermediate representation (IR) of the model using the Model Optimizer.
2. Create IEPlugin for the device.
3. Read the model's IR using IENetwork.
4. Load the IENetwork instance into the plugin.
5. Preprocess image and run inference on it.
6. Create a job file to target different hardware types.
7. Submit jobs to the queue.
8. View the results and hardware performance comparison.

## Create an Intermediate Representation of the Model

Model Optimizer creates the Intermediate Representation of the model which is the device-agnostic, generic optimization of the model. Caffe\*, TensorFlow\*, MXNet\*, ONNX\*, and Kaldi\* models are supported by Model Optimizer.

We will use the **MobileNet-SSD** model. Download the model, specifying the name and output directory.

In [None]:
!/opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name mobilenet-ssd -o raw_models

Let's convert this model to the intermediate representation using Model Optimizer.

In [None]:
!/opt/intel/openvino/deployment_tools/model_optimizer/mo.py \
--input_model raw_models/object_detection/common/mobilenet-ssd/caffe/mobilenet-ssd.caffemodel \
--data_type FP32 \
--output_dir models/mobilenet-ssd/FP32 \
--scale 256 \
--mean_values [127,127,127] 

We will also need the FP16 IR version of the model for the calculations on Neural Compute Stick 2 (NCS2) and Visual Processing Unit (VPU); let's create it here.

In [None]:
!/opt/intel/openvino/deployment_tools/model_optimizer/mo.py \
--input_model raw_models/object_detection/common/mobilenet-ssd/caffe/mobilenet-ssd.caffemodel \
--data_type FP16 \
--output_dir models/mobilenet-ssd/FP16 \
--scale 256 \
--mean_values [127,127,127] 

## Create Plugin

Let's create a function to construct a plugin.

In [None]:
def createPlugin(device, extension_list):
    # Plugin initialization for specified device. We will be targeting CPU initially.
    plugin = IEPlugin(device=device)

    # Loading additional extension libraries for the CPU
    for extension in extension_list:
        plugin.add_cpu_extension('/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_avx2.so')
    
    return plugin

## Read the IR of the Model

Let's import the optimized model into our neural network using **`IENetwork`**. 


In [None]:
def createNetwork(model_xml, model_bin, plugin):
    # Importing network weights from IR models.
    net = IENetwork(model=model_xml, weights=model_bin)
    
    # Some layers in IR models may be unsupported by some plugins. 
    if "CPU" in plugin.device:
        supported_layers = plugin.get_supported_layers(net)
        not_supported_layers = [l for l in net.layers.keys() if l not in supported_layers]
        if len(not_supported_layers) != 0:
            print("Following layers are not supported by the plugin for specified device {}:\n {}".
                      format(plugin.device, ', '.join(not_supported_layers)))
            print("Please try to specify cpu extensions library path in sample's command line parameters "
                  "using -l or --cpu_extension command line argument")
            return None
    return net

## Load the Network into the Plugin

Once we have the plugin and the network, we can load the network into the plugin.

In [None]:
def loadNetwork(plugin, net):
    # Loading IR model to the plugin.
    exec_net = plugin.load(network=net, num_requests=2)
    
    # Getting the input and outputs of the network
    input_blob = next(iter(net.inputs))
    out_blob = next(iter(net.outputs))
    return exec_net,input_blob,out_blob

## Preprocess Image

We will need the function to load the image using OpenCV and change its shape to be compatible with our network.

In [None]:
def preprocessImage(img_path, net, input_blob):
    # Reading the frame from a jpeg file
    frame = cv2.imread(img_path)
    
    # Reshaping data
    n, c, h, w = net.inputs[input_blob].shape
    in_frame = cv2.resize(frame, (w, h))
    in_frame = in_frame.transpose((2, 0, 1))  # Change data layout from HWC to CHW
    return in_frame.reshape((n, c, h, w)),frame

## Run Inference on a Single Frame

Now we are ready to run the inference workload using the plugin. We will be running in **async_mode** by using `start_async` method. With the async_mode, the inference is started in parallel on either a separate thread or device.
In other words, `start_async` is non-blocking and the main process is free to do any additional processing needed. 
In the next section, we will see an implementation of pipelining to mask the latency of loading and modifying images.

During asynchronous runs, the different images are tracked by an integer `request_id`. Because we only have one image to process, we will just use 0.

In [None]:
# For labeling the image
from out_process import placeBoxes
import matplotlib.pyplot as plt

# Request id to keep track of
def runInference():
    plugin = createPlugin(device='CPU', extension_list=['/data/reference-sample-data/extension/libcpu_extension.so'])
    model_xml = "models/mobilenet-ssd/FP32/mobilenet-ssd.xml"
    model_bin = "models/mobilenet-ssd/FP32/mobilenet-ssd.bin"
    net = createNetwork(model_xml, model_bin, plugin)
    exec_net,input_blob,out_blob = loadNetwork(plugin, net)
    in_frame,original_frame = preprocessImage('cars_1900_first_frame.jpg', net, input_blob)
    
    my_request_id=0

    # Starting the inference in async mode, which starts the inference in parallel
    exec_net.start_async(request_id=my_request_id, inputs={input_blob: in_frame})
    # ... You can do additional processing or latency masking while we wait ...

    # Blocking wait for a particular request_id
    if exec_net.requests[my_request_id].wait(-1) == 0:
        # getting the result of the network
        res = exec_net.requests[my_request_id].outputs[out_blob]
        
        # Processing the output result and adding labels on the image. Implementation is not shown in the
        #  this notebook; you can find it in object_detection_demo_ssd_async.py
        prob_threshold = 0.5  # 50% confidence needed for "detection"
        initial_w = original_frame.shape[1]
        initial_h = original_frame.shape[0]
        frame = placeBoxes(res, None, prob_threshold, original_frame, initial_w, initial_h, False, my_request_id, 0)
        fig = plt.figure(dpi=300)
        ax = fig.add_subplot(111)
        ax.imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB), interpolation='none')
        plt.axis("off")
        plt.show()
    else:
        print("There was an error with the request")

runInference()

## Input Video

Run the following cell to create a symlink and view the input video.

In [None]:
!ln -sf /data/reference-sample-data/object-detection-python/cars_1900.mp4 
videoHTML('Cars video', ['cars_1900.mp4'])

## How Video Inferencing Works

Video inferencing is similar to the image inferencing, however, there are few differences. Let's review them.

The following lines determine the source of the video. We will use a pre-recorded input video file in this example, we could also use a camera by setting the input argument to 'cam'.

```python
if args.input == 'cam':
        input_stream = 0
        out_file_name = 'cam'
    else:
        input_stream = args.input
```

We capture frames from the video sample using **OpenCV VideoCapture** API.

```python
cap = cv2.VideoCapture(input_stream)
```

Finally, we have a latency masking scheme, where we post-process a frames while other frames are being processed on the inference engine. User can control the number of inference requests running in parallel.

```python
current_inference = 0
required_inference_requests_were_executed = False
previous_inference = 1 - args.number_infer_requests
step = 0
steps_count = args.number_infer_requests - 1

while not required_inference_requests_were_executed or step < steps_count or cap.isOpened():
    # ... load the next frame from cap ...

    # start the next frame
    exec_net.start_async(request_id=current_inference, inputs={input_blob: in_frame})

    # see if the current frame is ready
    if previous_inference >= 0:
        status = infer_requests[previous_inference].wait()
    # ... post-processing current frame ...
    
    # manage request ids
    current_inference += 1
    if current_inference >= args.number_infer_requests:
        current_inference = 0
        required_inference_requests_were_executed = True

    previous_inference += 1
    if previous_inference >= args.number_infer_requests:
        previous_inference = 0

    step += 1
```

The Python code takes in command line arguments for video, model etc.

**Command line arguments options and how they are interpreted in the application source code**

```
SAMPLEPATH="/data/reference-sample-data"
python3 object_detection_demo_ssd_async.py -m ${SAMPLEPATH}/models/mobilenet-ssd/$3/mobilenet-ssd.xml \
                                           -i $INPUT_FILE \
                                           -o $RESULTS_PATH \
                                           -d $DEVICE \
                                           -nireq $NUM_INFER_REQS \
                                           -ce ${SAMPLEPATH}/extension/libcpu_extension.so
```

##### The description of the arguments used in the argument parser is the command line executable equivalent.
* -m location of the **mobilenet-ssd** pre-trained model which has been pre-processed using the **model optimizer**
   There is automated support built in this argument to support both FP32 and FP16 models targeting different hardware
   (**Note** we are using mobilenet-ssd in this example. However, OpenVINO's Inference Engine is compatible with other neural network architectures such as AlexNet*, GoogleNet*, SqueezeNet* etc.,)    

* -i location of the input video stream (video/cars_1900.mp4)
* -o location where the output file with inference needs to be stored. (results/core or results/xeon or results/gpu)
* -d Type of Hardware Acceleration (CPU or GPU or MYRIAD or HDDL or FPGA)
* -nireq Number of inference requests running in parallel
* -ce Absolute path to the shared library and is currently optimized for core/xeon (extension/libcpu_extension.so)


## Create a Job File

All the code up to this point has been run within the Jupyter Notebook instance running on a development node based on an Intel® Xeon® Scalable Processor, where the Notebook is allocated a single core. To run inference on the entire video, we need more compute power. We will run the workload on several DevCloud's edge compute nodes. We will send work to the edge compute nodes by submitting jobs into a queue. For each job, we will specify the type of the edge compute server that must be allocated for the job.

To pass the specific variables to the Python code, we will use following arguments:

* `-m`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;location of the optimized **MobileNet-SSD** model's XML
* `-i`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;location of the input video
* `-o`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;output directory
* `-d`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;hardware device type (CPU, GPU, MYRIAD, HDDL or HETERO:FPGA,CPU)
* `-l`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;path to the CPU extension library

The job file will be executed directly on the edge compute node.

In [None]:
%%writefile object_detection_job.sh

ME=`basename $0`

# The default path for the job is your home directory, so we change directory to where the files are.
cd $PBS_O_WORKDIR

# Object detection script writes output to a file inside a directory. We make sure that this directory exists.
# The output directory is the first argument of the bash script
while getopts 'd:f:i:r:n:?' OPTION; do
    case "$OPTION" in
    d)
        DEVICE=$OPTARG
        echo "$ME is using device $OPTARG"
      ;;

    f)
        FP_MODEL=$OPTARG
        echo "$ME is using floating point model $OPTARG"
      ;;

    i)
        INPUT_FILE=$OPTARG
        echo "$ME is using input file $OPTARG"
      ;;
    r)
        RESULTS_BASE=$OPTARG
        echo "$ME is using results base $OPTARG"
      ;;
    n)
        NUM_INFER_REQS=$OPTARG
        echo "$ME is running $OPTARG inference requests"
      ;;
    esac  
done

NN_MODEL="mobilenet-ssd.xml"
RESULTS_PATH="${RESULTS_BASE}"
mkdir -p $RESULTS_PATH
echo "$ME is using results path $RESULTS_PATH"

if [ "$DEVICE" = "HETERO:FPGA,CPU" ]; then
    # Environment variables and compilation for edge compute nodes with FPGAs
    export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/opt/altera/aocl-pro-rte/aclrte-linux64/
    # Environment variables and compilation for edge compute nodes with FPGAs
    source /opt/fpga_support_files/setup_env.sh
    aocl program acl0 /opt/intel/openvino/bitstreams/a10_vision_design_bitstreams/2019R1_PL1_FP11_MobileNet_Clamp.aocx
fi
    
# Running the object detection code
SAMPLEPATH=$PBS_O_WORKDIR
python3 object_detection_demo_ssd_async.py  -m ${SAMPLEPATH}/models/mobilenet-ssd/${FP_MODEL}/${NN_MODEL} \
                                            -i $INPUT_FILE \
                                            -o $RESULTS_PATH \
                                            -d $DEVICE \
                                            -nireq $NUM_INFER_REQS \
                                            -ce /opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_avx2.so

g++ -std=c++14 ROI_writer.cpp -o ROI_writer  -lopencv_core -lopencv_videoio -lopencv_imgproc -lopencv_highgui  -fopenmp -I/opt/intel/openvino/opencv/include/ -L/opt/intel/openvino/opencv/lib/
# Rendering the output video
SKIPFRAME=1
RESOLUTION=0.5
./ROI_writer $INPUT_FILE $RESULTS_PATH $SKIPFRAME $RESOLUTION

### Optional Change of Video Quality

Video rendering is a separate task invoked by `ROI_writer` at the end of the job. To reduce rendering time, you can reduce the output video quality using the SKIP_FRAME and RESOLUTION variables. 

`SKIP_FRAME=1` writes all processed video frames with bounding boxes into the output video. This is the slowest option and it preserves all inference data in the output video stream. `SKIP_FRAME=2` writes every other frame of the processed frames into the output video.

`RESOLUTION=1` produces the output video with the same resolution as the input video; this is the slowest option.
`RESOLUTION<1` reduces the output video resolution. Here we have `RESOLUTION=0.5` which sets the output video resolution in each dimension to 50% of the input video's resolution.

In [None]:
!pbsnodes | grep compnode | awk '{print $3}' | sort | uniq -c

Here, the properties describe the node, and number on the left is the number of available nodes of that architecture.

### 2.3 Job queue submission

Each cell below will submit a job to different edge compute nodes.
The output of the cell is the `JobID` of your job, which you can use to track progress of a job.

**Note** You can submit all 5 jobs at once or follow one at a time. 

After submission, they will go into a queue and run as soon as the requested compute resources become available. 
(tip: **shift+enter** will run the cell and automatically move you to the next cell. So you can hit **shift+enter** multiple times to quickly run multiple cells)

**Note** If you want to use your own video, Change the environment variable 'VIDEO' in the following cell from "/data/reference-sample-data/safety-gear-detection/Safety_Full_Hat_and_Vest.mp4" to the full path of your uploaded video.




In [None]:
os.environ["VIDEO"] = "/data/reference-sample-data/object-detection-python/cars_1900.mp4"

### Intel® CPU
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank 870-Q170</a> edge node with an <a 
    href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel 
    Core i5-6500TE</a>. The inference workload will run on the CPU.

In [None]:
#Submit job to the queue
job_id_core = !qsub object_detection_job.sh -l nodes=1:idc001skl:i5-6500te -F "-r results/Core -d CPU -f FP32 -i $VIDEO -n 2" -N obj_det_core
print(job_id_core[0]) 
#Progress indicators
if job_id_core:
    progressIndicator('results/Core', 'pre_progress.txt', "Preprocessing", 0, 100)
    progressIndicator('results/Core', 'i_progress.txt', "Inference", 0, 100)
    progressIndicator('results/Core', 'post_progress.txt', "Rendering", 0, 100)

### Intel® Xeon® CPU
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank 870-Q170</a> edge node with an <a 
    href="https://ark.intel.com/products/88178/Intel-Xeon-Processor-E3-1268L-v5-8M-Cache-2-40-GHz-">Intel 
    Xeon Processor E3-1268L v5</a>. The inference workload will run on the CPU.

In [None]:
#Submit job to the queue
job_id_xeon = !qsub object_detection_job.sh -l nodes=1:idc007xv5:e3-1268l-v5 -F "-r results/Xeon -d CPU -f FP32 -i $VIDEO -n 2" -N obj_det_xeon 
print(job_id_xeon[0]) 
#Progress indicators
if job_id_xeon:
    progressIndicator('results/Xeon', 'pre_progress.txt', "Preprocessing", 0, 100)
    progressIndicator('results/Xeon', 'i_progress.txt', "Inference", 0, 100)
    progressIndicator('results/Xeon', 'post_progress.txt', "Rendering", 0, 100)

### Intel® Core CPU with Intel® GPU
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank 870-Q170</a> edge node with an <a href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel Core i5-6500TE</a>. The inference workload will run on the Intel® HD Graphics 530 card integrated with the CPU.

In [None]:
#Submit job to the queue
job_id_gpu = !qsub object_detection_job.sh -l nodes=1:idc001skl:intel-hd-530 -F "-r results/GPU -d GPU -f FP32 -i $VIDEO -n 4" -N obj_det_gpu 
print(job_id_gpu[0]) 
#Progress indicators
if job_id_gpu:
    progressIndicator('results/GPU', 'pre_progress.txt', "Preprocessing", 0, 100)
    progressIndicator('results/GPU', 'i_progress.txt', "Inference", 0, 100)
    progressIndicator('results/GPU', 'post_progress.txt', "Rendering", 0, 100)

### Intel® Arria® 10 FPGA
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank 870-Q170</a> edge node with an <a href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel Core i5-6500te CPU</a> . The inference workload will run on the <a href="https://www.ieiworld.com/mustang-f100/en/"> IEI Mustang-F100-A10 </a> card installed in this node.

In [None]:
#Submit job to the queue
job_id_fpga = !qsub object_detection_job.sh -l nodes=1:idc003a10:iei-mustang-f100-a10 -F "-r results/FPGA -d HETERO:FPGA,CPU -f FP32 -i $VIDEO -n 4" -N obj_det_fpga

print(job_id_fpga[0]) 
#Progress indicators
if job_id_fpga:
    progressIndicator('results/FPGA', 'pre_progress.txt', "Preprocessing", 0, 100)
    progressIndicator('results/FPGA', 'i_progress.txt', "Inference", 0, 100)
    progressIndicator('results/FPGA', 'post_progress.txt', "Rendering", 0, 100)

### Intel® Neural Compute Stick 2
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank 870-Q170</a> edge node with an <a href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel Core i5-6500te CPU</a>. The inference workload will run on an <a 
    href="https://software.intel.com/en-us/neural-compute-stick">Intel Neural Compute Stick 2</a> installed in this  node.

In [None]:
#Submit job to the queue
job_id_ncs2 = !qsub object_detection_job.sh -l nodes=1:idc004nc2:intel-ncs2 -F "-r results/NCS2 -d MYRIAD -f FP16 -i $VIDEO -n 8" -N obj_det_ncs2
print(job_id_ncs2[0]) 
#Progress indicators
if job_id_ncs2:
    progressIndicator('results/NCS2', 'pre_progress.txt', "Preprocessing", 0, 100)
    progressIndicator('results/NCS2', 'i_progress.txt', "Inference", 0, 100)
    progressIndicator('results/NCS2', 'post_progress.txt', "Rendering", 0, 100)

### IEI Mustang-V100-MX8 ( Intel® Movidius™ Myriad™ X Vision Processing Unit (VPU))
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank 870-Q170</a> edge node with an <a href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel Core i5-6500te CPU</a>. The inference workload will run on an <a 
    href="https://www.ieiworld.com/mustang-v100/en/">IEI Mustang-V100-MX8 </a>accelerator installed in this node.

In [None]:
#Submit job to the queue
job_id_hddlr = !qsub object_detection_job.sh -l nodes=1:idc002mx8:iei-mustang-v100-mx8 -F "-r results/HDDLR -d HDDL -f FP16 -i $VIDEO -n 128" -N obj_det_hddlr
print(job_id_hddlr[0]) 
#Progress indicators
if job_id_hddlr:
    progressIndicator('results/HDDLR', 'pre_progress.txt', "Preprocessing", 0, 100)
    progressIndicator('results/HDDLR', 'i_progress.txt', "Inference", 0, 100)
    progressIndicator('results/HDDLR', 'post_progress.txt', "Rendering", 0, 100)

### UP Squared Grove IoT Development Kit
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/up-squared-grove-dev-kit">UP Squared Grove IoT Development Kit</a> edge node with an <a 
    href="https://ark.intel.com/products/96488/Intel-Atom-x7-E3950-Processor-2M-Cache-up-to-2-00-GHz-">Intel Atom® x7-E3950 Processor</a>. The inference  workload will run on the integrated Intel® HD Graphics 505 card.

In [None]:
#Submit job to the queue
job_id_up2 = !qsub object_detection_job.sh -l nodes=1:up-squared -F "-r results/UP2 -d GPU -f FP32 -i $VIDEO -n 2" -N obj_det_up2
print(job_id_up2[0]) 
#Progress indicators
if job_id_up2:
    progressIndicator('results/UP2', 'pre_progress.txt', "Preprocessing", 0, 100)
    progressIndicator('results/UP2', 'i_progress.txt', "Inference", 0, 100)
    progressIndicator('results/UP2', 'post_progress.txt', "Rendering", 0, 100)

### Check the Progress

Check the progress of the jobs. `Q` status stands for `queued`, `R` for `running`. How long a job is being queued is dependent on number of the users. It should take up to 5 minutes for a job to run. If the job is no longer listed, it's done. 

In [None]:
liveQstat()

You should see the jobs you have submitted (referenced by `Job ID` that gets displayed right after you submit the job in step 2.3).
There should also be an extra job in the queue "jupyterhub": this job runs your current Jupyter Notebook session.

The 'S' column shows the current status. 
- If it is in Q state, it is in the queue waiting for available resources. 
- If it is in R state, it is running. 
- If the job is no longer listed, it means it is completed.

**Note**: Time spent in the queue depends on the number of users accessing the edge nodes. Once these jobs begin to run, they should take from 1 to 5 minutes to complete. 

***Wait!***

Please wait for the inference jobs and video rendering complete before proceeding to the next step.

## While you wait...

### Why do we have three progress indicators for every job? 

In a real-life inference application, the video processing pipeline may have these phases:

1. Preprocessing (such as video decoding, frame capture from camera, or fetching a frame from a network, resizing, cropping, contrast normalization, etc.) The performance of this phase will often depend on the CPU, storage, and networking speed.
2. Inference (running a neural network forward propagation procedure). The performance of inference depends mostly on the compute device (CPU or an accelerator).
3. Postprocessing (feeding data into the next pipeline stage, writing to file, sending it to a database, or even encoding a new video for a human viewer). This phase will often use the CPU, storage, or network, and will not use an accelerator.

In our demonstration samples, we separate these three phases into separate workloads. This separation allows you to independently judge the relative impact of each phase and draw conclusions for your applications. 

### What does it mean for my application?

Your application may have more or less pre/post-processing work depending on how you get your data (from storage, sensor or network), what is the resolution of your raw data, is it encoded or not, whether you need to produce an output video, etc. Therefore, you need to understand how your processing pipeline relates to this demo before drawing conclusions on performance.

In this demo, we provide three performance numbers for each architecture. However, they are not hard limits on the solution's performance. You need to relate these numbers and code to what you are doing in your own project.

### The total time is not necessarily equal to the sum of the parts

It is important to realize that in a real-life application, you may want to fuse preprocessing, inference, and postprocessing, as opposed to separating them as we did here. Fused preprocessing has several advantages:
* If inference is run asynchronously on the accelerator, the rest of the system is free for preprocessing future frames or postprocessing past frames. This allows you to mask the pre/post processing time behind inference time. 
* With fused preprocessing, inference and postprocessing, your pipeline has better data locality, allowing you to reuse data in the caches, in the application's memory, or in the hard drive cache. 



## Step 3: View Results

Once the jobs are completed, the queue system outputs the `stdout` and `stderr` streams of each job into files with names
`obj_det_{type}.o{JobID}` and `obj_det_{type}.e{JobID}`. Here, obj_det_{type} corresponds to the `-N` option of qsub. For example, `core` for Core CPU target.

You can find the output video files inside the `results` directory. We wrote a short utility script that will display these videos within the notebook. See `demoutils.py` if you are interested in understanding further how the results are displayed in notebook. 

`obj_det_{type}.e{JobID}`

(here, obj_det_{type} corresponds to the `-N` option of qsub).

However, for this case, we may be more interested in the output video files. They are stored in mp4 format inside the `results/` directory.
We wrote a short utility script that will display these videos with in the notebook.
Run the cells below to display them.
See `demoutils.py` if you are interested in understanding further how the results are displayed in notebook.

In [None]:
videoHTML('IEI Tank (Intel Core CPU)', 
          ['results/Core/output.mp4'], 
          'results/Core/stats.txt')

In [None]:
videoHTML('IEI Tank Xeon (Intel Xeon CPU)',
           ['results/Xeon/output.mp4'], 
          'results/Xeon/stats.txt')

In [None]:
videoHTML('IEI Intel GPU (Intel Core + Onboard GPU)', 
          ['results/GPU/output.mp4'], 
          'results/GPU/stats.txt')

In [None]:
videoHTML('IEI Tank + IEI Mustang-F100-A10 (Intel® Arria® 10 FPGA)',
         ['results/FPGA/output.mp4'], 
          'results/FPGA/stats.txt')

In [None]:
videoHTML('IEI Tank + Intel CPU + Intel NCS2',
           ['results/NCS2/output.mp4'], 
          'results/NCS2/stats.txt')

In [None]:
videoHTML('IEI Tank + IEI Mustang-V100-MX8 (Intel® Movidius™ Myriad™ X Vision Processing Unit (VPU))',
         ['results/HDDLR/output.mp4'], 
          'results/HDDLR/stats.txt')

In [None]:
videoHTML('UP Squared Grove IoT Development Kit (UP2)',
         ['results/UP2/output.mp4'], 
          'results/UP2/stats.txt')

## Performance Comparison

The running time of each inference task is recorded in `stats_*job_id*_*architectute*.txt` in `results` folder, where the *architecture* corresponds to the architecture of the target edge compute node. Run the cell below to plot the results of all jobs side-by-side. Lower values for processing time mean better performance. Keep in mind that some architectures are optimized for the highest performance, others for low power or other metrics.

In [None]:
arch_list = [('core', 'Core', 'Intel Core\ni5-6500TE\nCPU'),
             ('xeon', 'Xeon', 'Intel Xeon\nE3-1268L v5\nCPU'),
             ('gpu', 'GPU', ' Intel Core\ni5-6500TE\nGPU'),
             ('fpga', 'FPGA', ' IEI Mustang\nF100-A10\nFPGA'),
             ('ncs2', 'NCS2', 'Intel\nNCS2'),
             ('hddlr','HDDLR',  ' IEI Mustang\nV100-MX8\nVPU'),
             ('up2', 'UP2', 'Intel Atom\nx7-E3950\nUP2/GPU')]

stats_list = []
for arch, dir_,  a_name in arch_list:
    if 'job_id_'+arch in vars():
        stats_list.append(('results/{}/stats.txt'.format(dir_), a_name))
    else:
        stats_list.append(('placeholder'+arch, a_name))

summaryPlot(stats_list, 'Architecture', 'Time, seconds', 'Inference Engine Processing Time', 'time' )
summaryPlot(stats_list, 'Architecture', 'Frames per second', 'Inference Engine FPS', 'fps' )