###  <font color='red'>  Important: </font> Before proceeding, run the following cell to check for code updates.

In [None]:
from qarpo.catalog import DemoCatalog
import os
status = DemoCatalog(os.getcwd(), "Demo").ShowRepositoryControls()

# Brain Tumor Segmentation (BraTS) with OpenVINO™

In this code example, we apply the U-Net architecture to segment brain tumors from raw MRI scans. With relatively little data we are able to train a U-Net model to accurately predict where tumors exist. 

The Dice coefficient (the standard metric for the BraTS dataset used in the study) for our model is about 0.82-0.88.  Menze et al. [reported](http://ieeexplore.ieee.org/document/6975210/) that expert neuroradiologists manually segmented these tumors with a cross-rater Dice score of 0.75-0.85, meaning that the model’s predictions are on par with what expert physicians have made. The below MRI brain scans highlight brain tumor matter segmented using deep learning. 

<img src="images/figure1.png">

## Demonstration objectives
* Healthcare use-case demo
* **Model Optimizer** in action
* U-Net based segmentation on edge hardware
* Inferencing with OpenVINO™ Inference Engine
* Running inference across CPU, integrated GPU, VPU, and FPGA and comparing throughput and latency

## What is U-Net?
Since its introduction two years ago, the [U-Net](https://arxiv.org/pdf/1505.04597.pdf0) architecture has been used to create deep learning models for segmenting [nerves](https://github.com/jocicmarko/ultrasound-nerve-segmentation) in ultrasound images, [lungs](https://www.kaggle.com/c/data-science-bowl-2017#tutorial) in CT scans, and even [interference](https://github.com/jakeret/tf_unet) in radio telescopes.

U-Net is designed like an [auto-encoder](https://en.wikipedia.org/wiki/Autoencoder). It has an encoding path (“contracting”) paired with a decoding path (“expanding”) which gives it the “U” shape.  However, in contrast to the autoencoder, U-Net predicts a pixelwise segmentation map of the input image rather than classifying the input image as a whole. For each pixel in the original image, it asks the question: “To which class does this pixel belong?” This flexibility allows U-Net to predict different parts of the tumor simultaneously.

<img src="images/unet.png">

## Step 0: Set Up

### 0.1: Import dependencies

Run the below cells to import dependencies (select the cell and use **Ctrl+enter** to run the cell). 

In [None]:
import sys, os 
import ipywidgets as widgets
from pathlib import Path
from qarpo.demoutils import *
from IPython.display import display

Note that the first time you run this notebook this step will take a few minutes, as we will be installing `keras` and `psutil` locally to your instance in the DevCloud.

In [None]:
try: 
    import keras
except:
    print("Keras not installed")
    !{sys.executable} -m pip install keras

In [None]:
try: 
    import psutil
except:
    print("psutil not installed")
    !{sys.executable} -m pip install psutil

## Step 1. Create an Intermediate Representation (IR) Model using the Model Optimizer by Intel

The Model Optimizer creates Intermediate Representation (IR) models that are optimized for different end-point target devices.

### 1.1: Generate FP32 Optimized Model
Now, let's convert the model to the optimized model using the model optimizer. This model works best on most hardware, including CPUs.

In [None]:
!python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_tf.py \
            --input_model /data/Healthcare_app/data/saved_model_frozen.pb \
            --input_shape=[1,144,144,4] \
            --data_type FP32  \
            --output_dir output/IR_models/FP32  \
            --model_name saved_model

**Note** the above line is a single command line input, which spans 4 lines thanks to the backslash '\\', which is a line continuation character in Bash.

Here, the arguments are:
* --input-model : the original model
* --data_type : Data type to use. One of {FP32, FP16, half, float}
* -o : output directory (If this directory does not exist, it will be created for you.)

This script also supports `-h` that will you can get the full list of arguments.

Running that command will produce two files:
```
output/IR_models/FP32/saved_model.xml
output/IR_models/FP32/saved_model.bin
```
These will be used later in the exercise.

### 1.2: Generate FP16 Optimized Model

We will also be needing the FP16 version of the model for the calculations on the VPU architecture. Run the following command to create it.

In [None]:
!python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_tf.py \
            --input_model /data/Healthcare_app/data/saved_model_frozen.pb  \
            --input_shape=[1,144,144,4] \
            --data_type FP16  \
            --output_dir output/IR_models/FP16  \
            --model_name saved_model

In [None]:
%%writefile healthcare_job_openvino.sh

# Prevent error and output files from being saved to DevCloud
#PBS -e /dev/null

cd $PBS_O_WORKDIR
DEVICE=$1
RESULTS=$2

if [ "$DEVICE" = "HETERO:FPGA,CPU" ]; then
    # Environment variables and compilation for edge compute nodes with FPGAs - Updated for OpenVINO 2020.1
    source /opt/altera/aocl-pro-rte/aclrte-linux64/init_opencl.sh
    export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/opt/intel/openvino/bitstreams/a10_vision_design_sg1_bitstreams/BSP/a10_1150_sg1/linux64/lib
    aocl program acl0 /opt/intel/openvino/bitstreams/a10_vision_design_sg1_bitstreams/2019R4_PL1_FP11_ResNet_VGG.aocx
    export CL_CONTEXT_COMPILER_MODE_INTELFPGA=3
fi

if [ "$DEVICE" = "HETERO:FPGA,CPU" ] || [ "$DEVICE" = "MYRIAD" ] || [ "$DEVICE" = "HDDL" ]; then
    FP_MODEL="FP16"
else
    FP_MODEL="FP32"
fi

    
# Running the object detection code
SAMPLEPATH=$PBS_O_WORKDIR
python3 healthcare_openvino.py     -d $DEVICE \
                                   -IR output/IR_models/${FP_MODEL}/saved_model \
                                   -r $RESULTS

## Step 2. Inference Time!

Here we create job files and submit them to different edge compute nodes. They will go into a queue and run once the compute resources are available. 

#### Submitting to an edge compute node with an Intel® CPU
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank* 870-Q170</a> edge node with an <a 
    href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel® Core™ i5-6500TE processor</a>. The inference workload will run the CPU.

In [None]:
job_id_core = !qsub healthcare_job_openvino.sh -l nodes=1:idc001skl:tank-870:i5-6500te  -F "CPU results/"
print(job_id_core[0]) 
if job_id_core:
    progressIndicator('results/'+job_id_core[0], 'i_progress.txt', "Processing", 0, 100)

#### Submitting to an edge compute node with Intel® Xeon® CPU
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank* 870-Q170</a> edge node with an <a 
    href="https://ark.intel.com/products/88178/Intel-Xeon-Processor-E3-1268L-v5-8M-Cache-2-40-GHz-">Intel® 
    Xeon® Processor E3-1268L v5</a>. The inference workload will run on the CPU.

In [None]:
job_id_xeon = !qsub healthcare_job_openvino.sh -l nodes=1:idc007xv5:e3-1268l-v5  -F "CPU results/"
print(job_id_xeon[0]) 
if job_id_xeon:
    progressIndicator('results/'+job_id_xeon[0], 'i_progress.txt', "Processing", 0, 100)

#### Submitting to an edge compute node with Intel® Core CPU and using the onboard Intel® GPU
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank* 870-Q170</a> edge node with an <a href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel® Core i5-6500TE</a>. The inference workload will run on the Intel® HD Graphics 530 card integrated with the CPU.

In [None]:
#Submit job to the queue
job_id_gpu = !qsub healthcare_job_openvino.sh -l nodes=1:idc001skl:intel-hd-530 -F "GPU results/"
print(job_id_gpu) 
if job_id_gpu:
    progressIndicator('results/'+job_id_gpu[0], 'i_progress.txt', "Processing", 0, 100)

#### Submitting to an edge compute node with Intel® NCS 2 (Neural Compute Stick 2)
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank 870-Q170</a> edge node with an <a href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel Core i5-6500te CPU</a>. The inference workload will run on an <a 
    href="https://software.intel.com/en-us/neural-compute-stick">Intel Neural Compute Stick 2</a> installed in this  node.

In [None]:
#Submit job to the queue
job_id_ncs2 = !qsub healthcare_job_openvino.sh -l nodes=1:idc004nc2:intel-ncs2 -F "MYRIAD results/"
print(job_id_ncs2[0]) 
if job_id_ncs2:
    progressIndicator('results/'+job_id_ncs2[0], 'i_progress.txt', "Processing", 0, 100)

#### Submitting to an edge compute node with IEI Mustang-F100-A10 (Intel® Arria® 10 FPGA)
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank 870-Q170</a> edge node with an <a href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel Core™ i5-6500te CPU</a> . The inference workload will run on the <a href="https://www.ieiworld.com/mustang-f100/en/"> IEI Mustang-F100-A10 </a> card installed in this node.

In [None]:
job_id_fpga = !qsub healthcare_job_openvino.sh -l nodes=1:idc003a10:iei-mustang-f100-a10 -F "HETERO:FPGA,CPU results/"
print(job_id_fpga[0]) 
if job_id_fpga:
    progressIndicator('results/'+job_id_fpga[0], 'i_progress.txt', "Processing", 0, 100)

You can view the status of the jobs below. 

In [None]:
liveQstat()

## Step 3. Results

### 3.1: Image Comparison
Here, we visualize our predictions. We can observe the frame rate, execution time, and dice coefficient (a value that describes the similarity between the ground truth and the prediction, with 1.0 indicating 100% accuracy). Note that it may take a few seconds to display the results. 

In [None]:
outputHTML('IEI Tank (Intel Core CPU)',
          'results/'+job_id_core[0], '.png')

In [None]:
outputHTML('IEI Tank Xeon (Intel Xeon CPU)',
          'results/'+job_id_xeon[0], '.png')

In [None]:
outputHTML('IEI Intel GPU (Intel Core + Onboard GPU)',
          'results/'+job_id_gpu[0], '.png')

In [None]:
outputHTML('IEI Tank + Intel CPU + Intel NCS2',
          'results/'+job_id_ncs2[0], '.png')

In [None]:
outputHTML('IEI Tank + IEI Mustang-F100-A10 (Intel® Arria® 10 FPGA)',
          'results/'+job_id_fpga[0], '.png')

### 3.2: Architecture Comparison

Finally, we benchmark the processing time and frames per second on different architectures.

In [None]:
arch_list = [('core', 'Intel Core\ni5-6500TE\nCPU'),
             ('xeon', 'Intel Xeon\nE3-1268L v5\nCPU'),
             ('gpu', ' Intel Core\ni5-6500TE\nGPU'),
             ('ncs2', 'Intel\nNCS2'),
             ('fpga', 'IEI Mustang\nF100-A10\nFPGA')
            ]

stats_list = []
for arch, a_name in arch_list:
    if 'job_id_'+arch in vars():
        stats_list.append(('results/'+vars()['job_id_'+arch][0]+'/stats.txt', a_name))
    else:
        stats_list.append(('placeholder'+arch, a_name))

summaryPlot(stats_list, 'Architecture', 'Time, seconds', 'Inference Engine Processing Time', 'time' )

summaryPlot(stats_list, 'Architecture', 'Frames per second', 'Inference Engine FPS', 'fps' )