###  <font color='red'>  Important: </font> Before proceeding, run the following cell to check for code updates.

In [None]:
from qarpo.catalog import DemoCatalog
import os
status = DemoCatalog(os.getcwd(), "Demo").ShowRepositoryControls()

# MICCAI 2017 Robotic Instrument Segmentation

![Robotic Instrument Challenge](./figures/segmentation.gif)


The code here refers to the winning solution by Alexey Shvets, Alexander Rakhlin, Alexandr A. Kalinin, and Vladimir Iglovikov in the [MICCAI 2017 Robotic Instrument Segmentation Challenge](https://endovissub2017-roboticinstrumentsegmentation.grand-challenge.org/). This notebook has been modified from the original found on [GitHub](https://github.com/ternaus/robot-surgery-segmentation/blob/master/Demo.ipynb) which was provided with an [MIT license](https://github.com/ternaus/robot-surgery-segmentation/blob/master/LICENSE). The data files necessary to run this notebook are included in `/data/robotic-surgery-segmentation`.

![TernausNet](./figures/TernausNet.png)

## 0. Setup

Import dependencies for the notebook by running the following cells.

In [None]:
import os
import time
import sys
import cv2
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
from IPython.core.display import HTML
from qarpo.demoutils import *
from python.utils import create_script, mask_overlay

## 1. Inference with PyTorch (Optional)
Run the following cells to perform inference with the PyTorch model on an [Intel Core i5-6500TE](https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-) using the code in [pytorch_infer.py](python/pytorch_infer.py). **Wait for the progress bar to complete before running the following cells to ensure inference is complete.**

In [None]:
# Create script to run pytorch_infer.py
create_script("generated/pytorch_infer.sh",
              "python3 python/pytorch_infer.py")

# Run the script
job_id_infer = !qsub generated/pytorch_infer.sh -l nodes=1:idc001skl:tank-870:i5-6500te -N seg_core -e logs/ -o logs/
if job_id_infer:
    print(job_id_infer[0])
    progressIndicator('results/', job_id_infer[0]+'.txt', "Inference", 0, 100)

View the original image and the inference results.

In [None]:
image = cv2.imread("generated/input.png")
mask = cv2.imread("generated/mask.png")
plt.figure(1, figsize=(15, 15))
plt.subplot(121)
plt.axis('off')
plt.title("Input Image")
plt.imshow(image)
plt.subplot(122)
plt.axis('off')
plt.title("Segmentation")
plt.imshow(mask_overlay(image, mask));

## 2. Converting PyTorch to ONNX

The ONNX models need to be generated from the original PyTorch models to be used with OpenVINO, do this by running [pytorch_to_onnx.py](python/pytorch_to_onnx.py). 

**Wait for the progress bar to complete before running the following cells to ensure all ONNX models have been generated.**

In [None]:
# Create script to run pytorch_to_onnx.py
create_script("generated/pytorch_to_onnx.sh",
              "python3 python/pytorch_to_onnx.py")

# Run the script
job_id_onnx = !qsub generated/pytorch_to_onnx.sh -l nodes=1:idc001skl:tank-870:i5-6500te -N seg_core -e logs/ -o logs/
if job_id_onnx:
    print(job_id_onnx[0])
    progressIndicator('results/', job_id_onnx[0]+'.txt', "Inference", 0, 100)

## 3. Converting ONNX Model to IR 

We will convert pre-trained ONNX model into an intermediate representation (IR) which will be used by OpenVINO Inference Engine in the next step.

Run the cell below to create binary segmentation model files.

In [None]:
!python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_onnx.py \
    --input_model "models/onnx/surgical_tools.onnx" \
    --output_dir models/ov/FP16/ \
    --data_type FP16 \
    --move_to_preprocess \
    --scale_values "[0.229, 0.224, 0.225]" \
    --mean_values "[0.485, 0.456, 0.406]"

Run the cell below to create parts segmentation model files.

In [None]:
!python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_onnx.py \
    --input_model "models/onnx/surgical_tools_parts.onnx" \
    --output_dir models/ov/FP16/ \
    --data_type FP16 \
    --move_to_preprocess \
    --scale_values "[0.229, 0.224, 0.225]" \
    --mean_values "[0.485, 0.456, 0.406]"

## 4. Inference on the edge

All the code up to this point has been run within the Jupyter Notebook instance running on a development node based on an Intel Xeon Scalable processor, where the Notebook is allocated a single core. We will run the workload on other edge compute nodes represented in the IoT DevCloud. We will send work to the edge compute nodes by submitting the corresponding non-interactive jobs into a queue. For each job, we will specify the type of the edge compute server that must be allocated for the job.

The job file is written in Bash, and will be executed directly on the edge compute node. For this example, we have written the job file for you in the notebook. It performs the classification using the script "segmentation.sh".

In [None]:
%%writefile generated/segmentation.sh

# The default path for the job is your home directory, so we change directory to where the files are.
cd $PBS_O_WORKDIR


OUTPUT_FILE=$1
DEVICE=$2
FP_MODEL=$3
INPUT_FILE=$4

if [ "$DEVICE" = "HETERO:FPGA,CPU" ]; then
    # Environment variables and compilation for edge compute nodes with FPGAs - Updated for OpenVINO 2020.1
    source /opt/altera/aocl-pro-rte/aclrte-linux64/init_opencl.sh
    export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/opt/intel/openvino/bitstreams/a10_vision_design_sg1_bitstreams/BSP/a10_1150_sg1/linux64/lib
    aocl program acl0 /opt/intel/openvino/bitstreams/a10_vision_design_sg1_bitstreams/2019R4_PL1_FP11_ResNet_VGG.aocx
    export CL_CONTEXT_COMPILER_MODE_INTELFPGA=3
fi

python3 python/segmentation_parts.py  -m ${FP_MODEL} \
                               -i data/${INPUT_FILE} \
                               -d ${DEVICE} \
                               -o ${OUTPUT_FILE}

### 4.1 How jobs are submitted into the queue

Now that we have the job script, we can submit the jobs to edge compute nodes. In the IoT DevCloud, you can do this using the `qsub` command.
We can submit the job to 5 different types of edge compute nodes simultaneously or just one node at at time.

There are five options of `qsub` command that we use for this:
- `-l` : this option lets us select the number and the type of nodes using `nodes={node_count}:{property}`. 
- `-F` : this option lets us send arguments to the bash script. 
- `-N` : this option lets us name the job so that it is easier to distinguish between them.
- `-o` : this option lets us determine the path to be used for the standard output stream.
- `-e` : this option lets us determine the path to be used for the standard error stream.


The `-F` flag is used to pass in arguments to the job script.
The [segmentation.sh](segmentation.sh) script takes in 4 arguments:
1. the path to the directory for the output video and performance stats
2. targeted device (e.g. CPU, GPU, MYRIAD, HDDL or HETERO:FPGA,CPU)
3. the floating precision to use for inference
4. the path to the input video

The job scheduler will use the contents of `-F` flag as the argument to the job script.

If you are curious to see the available types of nodes on the IoT DevCloud, run the following cell.

In [None]:
!pbsnodes | grep compnode | sort | uniq -c

Here, the properties describe the node, and number on the left is the number of available nodes of that architecture.

### 4.2 Job queue submission

The output of the cell is the `JobID` of your job, which you can use to track progress of a job.

**Note** You can submit all the jobs at once or follow one at a time. 

After submission, they will go into a queue and run as soon as the requested compute resources become available. 
(tip: **shift+enter** will run the cell and automatically move you to the next cell. So you can hit **shift+enter** multiple times to quickly run multiple cells).


#### Submitting to an edge compute node with an Intel Core CPU
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank 870-Q170</a> edge node with an <a 
    href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel 
    Core i5-6500TE</a>. The inference workload will run on the CPU.

In [None]:
job_id_core = !qsub generated/segmentation.sh -l nodes=1:idc001skl:tank-870:i5-6500te -F "results/ CPU FP16 short_source.mp4" -N seg_core -e logs/ -o logs/
print(job_id_core[0]) 
#Progress indicators
if job_id_core:
    progressIndicator('results/', job_id_core[0]+'.txt', "Inference", 0, 100)

#### Submitting to an edge compute node with an 8th Generation Intel Core CPU
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/8th-gen-core-dev-kit">UP Xtreme Edge Compute Enabling Kit
    </a> edge node with a low power <a 
    href="https://ark.intel.com/content/www/us/en/ark/products/193554/intel-core-i7-8665ue-processor-8m-cache-up-to-4-40-ghz.html">Intel 
    Core i7-8865UE</a>. The inference workload will run on the CPU.


In [None]:
job_id_core2 = !qsub generated/segmentation.sh -l nodes=1:idc014upxa10fx1:upx-edgei7 -F "results/ CPU FP16 short_source.mp4" -N seg_core2 -e logs/ -o logs/
print(job_id_core2[0]) 
#Progress indicators
if job_id_core2:
    progressIndicator('results/', job_id_core2[0]+'.txt', "Inference", 0, 100)

#### Submitting to an edge compute node with Intel Xeon CPU
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank 870-Q170</a> edge node with an <a 
    href="https://ark.intel.com/products/88178/Intel-Xeon-Processor-E3-1268L-v5-8M-Cache-2-40-GHz-">Intel 
    Xeon Processor E3-1268L v5</a>. The inference workload will run on the CPU.

In [None]:
#Submit job to the queue
job_id_xeon = !qsub generated/segmentation.sh -l nodes=1:tank-870:e3-1268l-v5 -F "results/ CPU FP16 short_source.mp4" -N seg_xeon -e logs/ -o logs/
print(job_id_xeon[0])

#Progress indicator
if job_id_xeon:
    progressIndicator('results/', job_id_xeon[0]+'.txt', "Inference", 0, 100)

#### Submitting to an edge compute node with  IEI Mustang-F100-A10 (Intel® Arria® 10 FPGA)
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank 870-Q170</a> edge node with an <a href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel Core i5-6500te CPU</a> . The inference workload will run on the <a href="https://www.ieiworld.com/mustang-f100/en/"> IEI Mustang-F100-A10 </a> card installed in this node.

In [None]:
#Submit job to the queue
job_id_fpga = !qsub generated/segmentation.sh -l nodes=1:idc003a10:iei-mustang-f100-a10 -F "results/ HETERO:FPGA,CPU FP16 short_source.mp4" -N seg_fpga -e logs/ -o logs/
print(job_id_fpga[0]) 

#Progress indicator
if job_id_fpga:
    progressIndicator('results/', job_id_fpga[0]+'.txt', "Inference", 0, 100)

#### Generate example visualizations using an Intel Core CPU
In the cell below, we submit a job to run the script [figures.py](python/figures.py) to an [IEI Tank 870-Q170](https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core) edge node with an [Intel Core i5-6500TE](https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-). The inference workload will run on the CPU.

In [None]:
# Create script to run figures.py
create_script("generated/figures.sh",
              "python3 python/figures.py")

# Run the script
job_id_figures = !qsub generated/figures.sh -l nodes=1:idc001skl:tank-870:i5-6500te -N figures -e logs/ -o logs/
if job_id_figures:
    print(job_id_figures[0])
    progressIndicator('results/', job_id_figures[0]+'.txt', "Inference", 0, 100)

### 4.3 Check if the jobs are done

To check on the jobs that were submitted, use the `qstat` command.

We have created a custom Jupyter widget  to get live qstat update.
Run the following cell to bring it up. 

In [None]:
liveQstat()

You should see the jobs you have submitted (referenced by `Job ID` that gets displayed right after you submit the job in step 2.3).
There should also be an extra job in the queue "jupyterhub": this job runs your current Jupyter Notebook session.

The 'S' column shows the current status. 
- If it is in Q state, it is in the queue waiting for available resources. 
- If it is in R state, it is running. 
- If the job is no longer listed, it means it is completed.

**Note**: Time spent in the queue depends on the number of users accessing the edge nodes. Once these jobs begin to run, they should take from 1 to 5 minutes to complete. 

***Wait!***

Please wait for the inference jobs complete before proceeding to the next step.

### 4.4 View Results

Once the jobs are completed, the stdout and stderr streams of each job are saved into the `logs/` folder.

We also saved the probability output and inference time for each input image in the folder `results/` for each architecture. 
We observe the results below.

#### Result on the Intel Core CPU 

In [None]:
videoHTML('IEI Tank (Intel Core CPU)', 
          ['results/output_'+job_id_core[0]+'.mp4'], 
          'results/stats_'+job_id_core[0]+'.txt')

#### Result on the 8th Generation Intel Core CPU

In [None]:
videoHTML('UP Xtreme Edge Compute Enabling Kit (Intel 8th Generation Core CPU)', 
          ['results/output_'+job_id_core2[0]+'.mp4'], 
          'results/stats_'+job_id_core2[0]+'.txt')

#### Result on the Intel Xeon CPU

In [None]:
videoHTML('IEI Tank Xeon (Intel Xeon CPU)',
          ['results/output_'+job_id_xeon[0]+'.mp4'],
          'results/stats_'+job_id_xeon[0]+'.txt')

#### Result on the IEI Mustang-F100-A10 (Intel® Arria® 10 FPGA)

In [None]:
videoHTML('IEI Mustang-F100-A10',
          ['results/output_'+job_id_fpga[0]+'.mp4'],
          'results/stats_'+job_id_fpga[0]+'.txt')

#### Visualize results on the Intel Core CPU

In [None]:
HTML('''{img}'''.format(img="<img src='generated/predictions.png'>"))

### 4.5 Assess Performance

The total average time of each inference task is recorded in `results/{ARCH}/statsjob_id.txt`, where the subdirectory name corresponds to the architecture of the target edge compute node. Run the cell below to plot the results of all jobs side-by-side. Lower values mean better performance. Keep in mind that some architectures are optimized for the highest performance, others for low power or other metrics.

In [None]:
arch_list = [('core', 'Intel Core\ni5-6500TE\nCPU'),
             ('core2', 'Intel Core\ni7-8865UE\nCPU'),
             ('xeon', 'Intel Xeon\nE3-1268L v5\nCPU'),
             ('fpga', 'Intel\nFPGA')]

stats_list = []

for arch, a_name in arch_list:
    if 'job_id_'+arch in vars():
        stats_list.append(('results/stats_'+vars()['job_id_'+arch][0]+'.txt', a_name))
    else:
        stats_list.append(('placeholder'+arch, a_name))
summaryPlot(stats_list, 'Architecture', 'Time(s)', 'Inference Engine Processing Time', 'time' )
summaryPlot(stats_list, 'Architecture', 'Frames per second', 'Inference Engine FPS', 'fps' )

## 5. Citation
    @inproceedings{shvets2018automatic,
    title={Automatic Instrument Segmentation in Robot-Assisted Surgery using Deep Learning},
    author={Shvets, Alexey A and Rakhlin, Alexander and Kalinin, Alexandr A and Iglovikov, Vladimir I},
    booktitle={2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)},
    pages={624--628},
    year={2018}
    }