### Intel Neural Compute Stick - Movidius

Intel's Movidius™ Myriad™ 2 VPU is an industry-defining always-on vision processor, and second generation VPU from Movidius™, an Intel® company. Myriad 2 can be found in millions of devices on the market today and continues to be utilized for some of the most ambitious AI, vision and imaging applications where both performance and low power consumption are 

![](resources/myriad_vpu.png)

Standing at the intersection of low-power and high performance, the Myriad 2 family of processors are transforming the capabilities of devices. Myriad 2 gives device makers industry-proven performance on AI, imaging and computer vision tasks, all at an unbeatable performance/price proposition.

The Intel® Movidius™ Neural Compute Stick (Intel® Movidius™ NCS) enables rapid prototyping of deep neural networks (DNNs) with the Intel® Movidius™ Neural Compute SDK (NCSDK).

![](resources/ncs.jpg)

Images from: https://movidius.github.io/ncsdk/ 

The Myriad VPU includes 4Gbits of LPDDR3 DRAM, imaging and vision accelerators, and an array of 12 VLIW vector processors called SHAVE processors. 

These processors are used to accelerate neural networks by running parts of the neural networks in parallel.

The NCS is connected to a host machine using the USB interface on the VPU. The USB3 interface can be used both in Super Speed (5 Gbps) or High Speed (480 Mbps) modes.

![](resources/ncs_architecture.jpg)

The VPU also has a SPARC microprocessor core that runs custom firmware. When the NCS is first plugged in, there is no firmware loaded onto it. The VPU boots from the internal ROM and connects to the host machine as a USB 2.0 device. 

Applications executing on the host machine communicate to the VPU SOC using the Neural Compute API (NCAPI). When the NCAPI initializes and opens a device, the firmware from the Neural Compute SDK (NCSDK) is loaded onto the NCS. At this time, the NCS resets and reconnects to the host machine as either a USB 2.0 or USB 3.0 device (depending on the host type). It is now ready to accept the neural network graph files and instructions to execute inferences.

A graph file is loaded into the DRAM attached to the VPU via the NCAPI. A LEON processor coordinates receiving the graph file and images for inference via the USB connection. 

It also parses the graph file and schedules kernels to the SHAVE neural compute accelerator engines. In addition, the LEON processor also takes care of monitoring die temperature and throttling processing on high temperature alerts. 

The output of the neural network and associated statistics are sent back to the host machine via the USB connection and are received by the host application via the NCAPI.

## Intel Neural Compute Stick SDK

The [Intel® Movidius™ Neural Compute SDK](https://github.com/movidius/ncsdk) provides tools for profiling, tuning, and compiling a deep neural network (DNN) model on a development computer (host system).

In order to develop applications using Movidius NCS Stick you should install tools and SDK as following.

In order to install it on Ubuntu 16.04, clone the sdk from github repo:

Check following url for installation instructions: https://github.com/movidius/ncsdk

### NCSDK Installation Instructions

```shell

git clone https://github.com/movidius/ncsdk.git 

```
or get version 2 from ncsdk2 branch. v2 is not back compatible.

```shell

git clone -b ncsdk2 https://github.com/movidius/ncsdk.git

cd ncsdk && make install

```

After installation, you can get a set of examples from SDK App Zoo:

```shell

git clone https://github.com/movidius/ncappzoo.git

```

or NCSDK v2

```shell

git clone -b ncsdk2 https://github.com/movidius/ncappzoo.git

```

You will see set of applications under `ncappzoo/app/` folder. 

Navigate to any of application folder, and run `make`, which downloads required models and video resources to run the application as shown below.

```shell

cd ncappzoo/apps/video_objects

make 

python3 video_objects.py

```

## Start from Scratch with NCSDK

Like all dedicated hardwares, Myriad VPU should get neural net's with its own format. Therefore NCSDK application `mvNCCompile` is provided with SDK to convert existing Caffe or Tensorflow models to required graph file with certain set of parameters to load them on Movidius.

`
mvNCCompile is a command line tool that compiles network and weights files for Caffe or TensorFlow* models into an Intel® Movidius™ graph file format that is compatible with the Intel® Movidius™ Neural Compute SDK (Intel® Movidius™ NCSDK) and Neural Compute API (NCAPI).
`

Please see following documentation for arguments details: https://movidius.github.io/ncsdk/tools/compile.html

Required input is neural net file: 
`.prototxt` and `.caffemodel` for Caffe models
`.pb` or `.meta` for Tensorflow models


`-w` defines the weights, .caffemodel for Caffe Framework Model

`-s` The number of available SHAVEs depends on your neural compute device. The device runtime code may use fewer SHAVEs for some layers where measurements have typically shown no inference performance degradation (and consequently show a power benefit) from using fewer SHAVEs.

`-is` Specify input dimensions for networks that do not have dimension constraints on the input layer.

Caffe Model Conversion Sample
```shell
mvNCCompile dnn_models/MobileNetSSD_deploy.prototxt -w dnn_models/MobileNetSSD_deploy.caffemodel -s 12 -is 300 300 -o MobileNetSSD.graph
```

Tensorflow Sample (Not working example, fix it later)
```shell
mvNCCompile inception-v1.meta -s 12 -in=input  -is 300 300 -o frozen.graph
```

#### mvNCCheck

`mvNCCheck` is a command line tool that checks the validity of a Caffe or TensorFlow* neural network on a neural compute device.

The check is done by running an inference on both the device and in software on the host computer using the supplied network and appropriate framework libraries. The results for both inferences are compared to determine a if the network passes or fails. The top 5 inference results are provided as output. This tool works best with image classification networks.

#### mvNCProfile

`mvNCProfile` is a command line tool that compiles a network for use with the Intel® Movidius™ Neural Compute SDK (Intel® Movidius™ NCSDK), runs the network on a connected neural compute device, and outputs text and HTML profile reports.

The profiling data contains layer-by-layer statistics about the performance of the network. This is helpful in determining how much time is spent on each layer to narrow down potential changes to the network to improve the total inference time.


#### NCSDK API 

A quick overview of NC SDK Overview: 
https://movidius.github.io/ncsdk/ncapi/ncapi2/c_api/readme.html 

## Object Detection with  Movidius using Python API

Below sample uses converted Caffe Model and loads it to Movidius for object detection.

For more comprehensive example refer to [RealTimeObjectDetection.py](RealTimeObjectDetection.py) code sample.

```python
import sys
# import NCSDK as below
#import mvnc

#import MVNC for Movidius Support
sys.path.insert(0, "/home/intel/Intel/ncappzoo/ncapi2_shim")
import mvnc_simple_api as mvnc

import cv2 as cv
import numpy as np

# get attached devices

devices = mvnc.EnumerateDevices()

print('Number of Attached Devices {} '.format(len(devices)))

# pick device
device = mvnc.Device(devices[0])

labels = ("background", "aeroplane", "bicycle",
                "bird", "boat", "bottle", "bus",
                "car", "cat", "chair", "cow",
                "diningtable", "dog", "horse",
                "motorbike", "person", "pottedplant",
                "sheep", "sofa", "train", "tvmonitor")

label_colors = np.random.uniform(0, 255, (len(labels), 3))

with open('dnn_models/MobileNetSSD_Caffe.graph', mode='rb') as f:
    graph_data = f.read()
    
movidius_graph = device.AllocateGraph(graph_data) 

img = cv.imread('resources/street.jpg')

resized_image = cv.resize(img, (300, 300))

# trasnform values from range 0-255 to range -1.0 - 1.0q
resized_image = resized_image - 127.5
resized_image = resized_image * 0.007843

movidius_graph.LoadTensor(resized_image.astype(np.float16), None)

output, userobj = movidius_graph.GetResult()

num_valid_boxes = int(output[0])
print('Number of Valid Detections {}'.format(num_valid_boxes))

actual_frame_width = img.get(cv.CAP_PROP_FRAME_WIDTH)
actual_frame_height = img.get(cv.CAP_PROP_FRAME_HEIGHT)

for box_index in range(num_valid_boxes):
    base_index = 7 + box_index * 7
    if (not np.isfinite(output[base_index]) or
            not np.isfinite(output[base_index + 1]) or
            not np.isfinite(output[base_index + 2]) or
            not np.isfinite(output[base_index + 3]) or
            not np.isfinite(output[base_index + 4]) or
            not np.isfinite(output[base_index + 5]) or
            not np.isfinite(output[base_index + 6])):
        # boxes with non finite (inf, nan, etc) numbers must be ignored
        continue

    left = max(int(output[base_index + 3] * 300), 0)
    top = max(int(output[base_index + 4] * 300), 0)
    right = min(int(output[base_index + 5] * 300), 300 - 1)
    bottom = min((output[base_index + 6] * 300), 300 - 1)

    object_info = output[base_index:base_index + 7]

    base_index = 0

    class_id = int(object_info[base_index + 1])
    if class_id < 0:
        continue

    percentage = object_info[base_index + 2]

    if percentage >= 0.6:
        #print(percentage)
        # overlay boxes and labels on to the image
        # original image
        row_factor = actual_frame_height / 300.0
        col_factor = actual_frame_width / 300.0

        # Scale object detection to original image
        left = int(col_factor * left)
        top = int(row_factor * top)
        right = int(col_factor * right)
        bottom = int(row_factor * bottom)
        # display text to let user know how to quit
        
        label_text = labels[class_id] + " " + str(round(percentage, 4))
        
        cv.putText(img, label_text, (int(left), int(top)), cv.FONT_HERSHEY_SIMPLEX, 0.5, label_colors[class_id], 2)
        cv.rectangle(img, (int(left), int(top)), (int(right), int(bottom)), label_colors[class_id], thickness=3)
```

### Real Time Object Detection

You can continue to run RealTimeObjectDetection.py example with following options to see its performance with OpenCV and inferrence with Movidius:

1. Runs with a usb webcam, tries to infer all the incoming frames using MobileNetSDD Caffe Model

```shell
python RealTimeOBjectDetection.py -d movidius -i live -f caffe --mconfig dnn_models/MobileNetSSD_deploy.prototxt --mweight dnn_models/MobileNetSSD_deploy.caffemodel --mlabels dnn_models/caffe_ssd_labels.txt --model_image_height 300 --model_image_width 300 -c 0.65 
```

2. Runs with a usb webcam, tries to infer only 4 frames within the incoming frames per second using MobileNetSDD Caffe Model

```shell
python RealTimeOBjectDetection.py -d movidius -i live -f caffe --mconfig dnn_models/MobileNetSSD_deploy.prototxt --mweight dnn_models/MobileNetSSD_deploy.caffemodel --mlabels dnn_models/caffe_ssd_labels.txt --model_image_height 300 --model_image_width 300 -c 0.65 --infer_fc 4
```

3. Reads frames from .mp4 file and infers 4 frames per second.

```shell
python RealTimeOBjectDetection.py -d movidius -i offline -s resources/video.mp4 -f caffe --mconfig dnn_models/MobileNetSSD_deploy.prototxt --mweight dnn_models/MobileNetSSD_deploy.caffemodel --mlabels dnn_models/caffe_ssd_labels.txt --model_image_height 300 --model_image_width 300 -c 0.65 --infer_fc 4
```