###  <font color='red'>  Important: </font> Before proceeding, run the following cell to check for code updates.

In [None]:
from qarpo.catalog import DemoCatalog
import os
status = DemoCatalog(os.getcwd(), "Demo").ShowRepositoryControls()

# Pneumonia detection in chest X-ray images

Pneumonia is an inflammatory condition of the lung affecting primarily the small air sacs known as alveoli.
Typically symptoms include some combination of cough, chest pain, fever, and trouble breathing [1].
Pneumonia affects approximately 450 million people globally (7% of the population) and results in about 4 million deaths per year [2]. To diagnose this disease, chest X-ray images remain the best diagnosis tool.

![](example-pneumonia.jpeg) *Chest X-ray image of a patient with Pneumonia*

In this tutorial, we use a model trained to classify patients with pneumonia over healthy cases based on their chest X-ray images. The topology used is the DenseNet 121, this architecture has shown to be very efficient at this problem, it was the first work to claim classification rate better than practicing radiologists. The dataset used for training is from the "Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification" [3] with a CC BY 4.0 license.
The trained model is provided as a frozen Tensorflow model (.pb).


**In this tutorial, we perform inference on this model using the OpenVINO(TM) toolkit on the validation dataset on multiple hardware targets such as different Intel CPUS, Neural Compute Stick, the Intel® Arria® 10 FPGA, etc...
We will also visualize what the network has learned using the technique of Class Activation Maps.**

[1] Ashby B, Turkington C (2007). The encyclopedia of infectious diseases (3rd ed.). New York: Facts on File. p. 242. ISBN 978-0-8160-6397-0. 

[2] Ruuskanen O, Lahti E, Jennings LC, Murdoch DR (April 2011). "Viral pneumonia". Lancet. 377 (9773): 1264–75. 

[3] Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification, http://dx.doi.org/10.17632/rscbjbr9sj.2#file-41d542e7-7f91-47f6-9ff2-dd8e5a5a7861

## 1. Model Optimizer

First, we need to convert the model to OpenVINO Intermediate Representation through the Model Optimizer. The MO script is *mo_tf.py*, which takes as input:  
-m: path to the input model (.pb)  

--input_shape: dimension of the input tensor (optional for most of the models)

--data_type: data type of the IR model (FP32 for CPU, FP16 for GPU, Movidius and FPGA)

-o : path where to save the IR

--mean_values : pre-processing mean values with which the model was trained with 

--scale_values : pre-processing scale values with which the model was trained with 

The second argument is not necessary for  models with a defined input shape. However, Tensorflow models often have dynamic input shapes (often the batch will be set as -1), which is not supported by OpenVINO.
The mean and scale values are also optional, it however simplifies the inference later as it includes automatically the pre-processing (scaling and averaging) inside the model directly. For this model, the model was trained with images following the ImageNet pre-processing. 

First, we convert the model to a FP32 IR format.

```python
!python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_tf.py -m model.pb --input_shape=[1,224,224,3] --data_type FP32 -o models/FP32 --mean_values [123.75,116.28,103.58] --scale_values [58.395,57.12,57.375] 
```

We also convert to FP16.

```python
!python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_tf.py -m model.pb --input_shape=[1,224,224,3] --data_type FP16 -o models/FP16/ --mean_values [123.75,116.28,103.58] --scale_values [58.395,57.12,57.375]
```

*Both command lines above have been commented as the model has an issue on the current OpenVINO release used by the system. It is already fixed in next release.*

Let's visualize the resulting Intermediate Representation (.xml). You can observe the different layers with informations such as name, operation type, precision, input shape and output shape. 

In [None]:
# # from bs4 import BeautifulSoup

# bs = BeautifulSoup(open('models/FP32/model.xml'), 'xml')
# print(bs.prettify())

At this point, we are able to perform inference with OpenVINO. 

## 2. Classification and Class Activation Maps using OpenVINO

In this part, we will explain the sample that we will send in Section 3 to the edge devices. The sample does not only do pneumonia classification, but it also computes the Class Activation Maps. 
The code sample can be found [here](classification_pneumonia.py), we will break it down in the next subsections.
Another tutorial  explains in details how to use the Python API for classification, therefore we refer the developer to [this tutorial](../../Tutorials/classification/tutorial_classification_sample.ipynb) for deep dive on this part. In this section, we will focus on the Class Activation Maps.



### Visualize what the model has learned

Class activation maps (CAM) [1] are a simple technique to visualize the regions that are relevant to a Convolutional Neural Network to identify a specific class in the image. 
The CAM $M(x,y)$ is calculated from the N feature maps $f_i(x,y)$ from the last convolutional layer. We perform the weighted sum of those feature maps based on the weigths of the fully connected layer $w_i$ , which represents how important are those feaure maps for the classification output. 

$ M(x,y)=\sum_{i=0}^N w_i f_i(x,y) $

[1] Zhou, Bolei, et al. "Learning deep features for discriminative localization." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

In OpenVINO, this can be implemented the following way:

```python
def class_activation_map_openvino(res, convb, fc, net, fp16):
    res_bn = res[convb]
    conv_outputs=res_bn[0,:,:,:]
    weights_fc=net.layers.get(fc).weights["weights"]
    cam = np.zeros(dtype=np.float32, shape=conv_outputs.shape[1:])
    
    for i, w in enumerate(weights_fc):
        conv_outputs1=conv_outputs[i, :, :]
        if fp16:
            w=float16_conversion(w)
            conv_outputs1=float16_conversion_array(conv_outputs[i, :, :])
        cam += w * conv_outputs1
    return cam

```

In order to calculate Class Activation Maps in OpenVINO, we need to access the output feature maps of the last convolution layer (here, named **$convb$**) and the weights of the fully connected layer (**$fc$**). 
By default, only the output layer is the output (obviously). Therefore, in order to get the feature maps, we need to add the last convolution **$convb$** as an additional output. 
This is simply done by using the function *add_outputs(convb)* on the network before loading it to the plugin. 
In order to obtain the layers name, we recommend Netron, which allows to visualize graphically the model. 

In our pneumonia classification model, the name of the last convolutional layer **$convb$** is  **relu_1/Relu**.
The name of the Fully-Connected layer **$fc$** is **predictions_1/MatMul**


In order to get the feature maps, inference must be performed first. This is why our function to calculate the Class Action Maps requires the inference output **$res$** (which includes the feature maps since we added an additional output previously).
We access the FC layer weights using the call  *net.layers.get(fc).weights["weights"]* on the network **$net$**.

Then, before doing the weighted sum, if we are using the FP16 model, we need to convert the weights and feature maps value to float. 

Finally, we perform the weighted sum of the weights with the feature maps. The result of the function is an image of same size as the feature maps (here, $7\times7$).
We show an example below on the left.
Then, we can upsampled the CAM image to the original input image size and overlay it over the input image. The region of highest value (here, in yellow) will be the region that was the most relevant to decide on the classification value. 

![](example_CAM.png) *CAM image and overlay of CAM with input image*

## 3. Inference on the edge


We will run the workload on edge compute nodes represented in the IoT DevCloud. We will send work to the edge compute nodes by submitting the corresponding non-interactive jobs into a queue. For each job, we will specify the type of the edge compute server that must be allocated for the job.

The job file is written in Bash, and will be executed directly on the edge compute node.
For this example, we have written the job file for you in the notebook. You can look inside the Python script in [classification_pneumonia.py](classification_pneumonia.py)

In [None]:
from IPython.display import HTML
import matplotlib.pyplot as plt
import os
import time
import sys
from qarpo.demoutils import *
from utils_image import show_results

In [None]:
%%writefile classification_pneumonia_job.sh

cd $PBS_O_WORKDIR

OUTPUT_FILE=$1
DEVICE=$2
FP_MODEL=$3

if [ "$DEVICE" = "HETERO:FPGA,CPU" ]; then
    # Environment variables and compilation for edge compute nodes with FPGAs - Updated for OpenVINO 2020.1
    source /opt/altera/aocl-pro-rte/aclrte-linux64/init_opencl.sh
    export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/opt/intel/openvino/bitstreams/a10_vision_design_sg1_bitstreams/BSP/a10_1150_sg1/linux64/lib
    aocl program acl0 /opt/intel/openvino/bitstreams/a10_vision_design_sg1_bitstreams/2019R4_PL1_FP11_AlexNet_GoogleNet_Generic.aocx
    export CL_CONTEXT_COMPILER_MODE_INTELFPGA=3
fi

SAMPLEPATH=${PBS_O_WORKDIR}
echo ${1}
pip3 install Pillow
python3 classification_pneumonia.py  -m models/$FP_MODEL/model.xml  \
                                           -i /validation_images/PNEUMONIA/*.jpeg \
                                           -o $OUTPUT_FILE \
                                           -d $DEVICE
                                

### 3.1 Understand how jobs are submitted into the queue

Now that we have the job script, we can submit the jobs to edge compute nodes. In the IoT DevCloud, you can do this using the `qsub` command.
We can submit classification_pneumonia_job to 6 different types of edge compute nodes simultaneously or just one node at at time.

There are three options of `qsub` command that we use for this:
- `-l` : this option lets us select the number and the type of nodes using `nodes={node_count}:{property}`. 
- `-F` : this option lets us send arguments to the bash script. 
- `-N` : this option lets use name the job so that it is easier to distinguish between them.

The `-F` flag is used to pass in arguments to the job script.
The [classification_pneumonia_job.sh](classification_pneumonia_job.sh) takes in 3 arguments:
1. the path to the directory for the output video and performance stats
2. targeted device (e.g. CPU,GPU,MYRIAD)
3. the floating precision to use for inference

The job scheduler will use the contents of `-F` flag as the argument to the job script.

If you are curious to see the available types of nodes on the IoT DevCloud, run the following optional cell.

In [None]:
!pbsnodes | grep compnode | sort | uniq -c

Here, the properties describe the node, and number on the left is the number of available nodes of that architecture.

### 3.2 Job queue submission

The output of the cell is the `JobID` of your job, which you can use to track progress of a job.

**Note** You can submit all 6 jobs at once or follow one at a time. 

After submission, they will go into a queue and run as soon as the requested compute resources become available. 
(tip: **shift+enter** will run the cell and automatically move you to the next cell. So you can hit **shift+enter** multiple times to quickly run multiple cells).

The estimated time for each job does not only include the inference but also the rendering and writing on filesystem of the results, which is actually the most time-consuming part. 


#### Submitting to an edge compute node with an Intel Core CPU
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank 870-Q170</a> edge node with an <a 
    href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel 
    Core i5-6500TE</a>. The inference workload will run on the CPU.

In [None]:
#Submit job to the queue
job_id_core = !qsub classification_pneumonia_job.sh -l nodes=1:tank-870:i5-6500te -F "results/core CPU FP32" -N obj_det_core
print(job_id_core[0]) 
if job_id_core:
    progressIndicator('results/core', 'progress'+job_id_core[0]+'.txt', "Progress", 0, 100)
#Progress indicators
if not job_id_core:
    print("Error in job submission.")

#### Submitting to an edge compute node with Intel Xeon CPU
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank 870-Q170</a> edge node with an <a 
    href="https://ark.intel.com/products/88178/Intel-Xeon-Processor-E3-1268L-v5-8M-Cache-2-40-GHz-">Intel 
    Xeon Processor E3-1268L v5</a>. The inference workload will run on the CPU.

In [None]:
#Submit job to the queue
job_id_xeon = !qsub classification_pneumonia_job.sh -l nodes=1:tank-870:e3-1268l-v5 -F "results/xeon CPU FP32" -N obj_det_xeon
print(job_id_xeon[0]) 
if job_id_xeon:
    progressIndicator('results/xeon', 'progress'+job_id_xeon[0]+'.txt', "Progress", 0, 100)
if not job_id_xeon:
    print("Error in job submission.")

#### Submitting to an edge compute node with Intel Core CPU and using the onboard Intel GPU
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank 870-Q170</a> edge node with an <a href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel Core i5-6500TE</a>. The inference workload will run on the Intel® HD Graphics 530 card integrated with the CPU.

In [None]:
#Submit job to the queue
job_id_gpu = !qsub classification_pneumonia_job.sh -l nodes=1:tank-870:i5-6500te:intel-hd-530 -F "results/gpu GPU FP16" -N obj_det_gpu 
print(job_id_gpu[0]) 
if job_id_gpu:
    progressIndicator('results/gpu', 'progress'+job_id_gpu[0]+'.txt', "Progress", 0, 100)
if not job_id_gpu:
    print("Error in job submission.")

#### Submitting to an edge compute node with  IEI Mustang-F100-A10 (Intel® Arria® 10 FPGA)
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank 870-Q170</a> edge node with an <a href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel Core i5-6500te CPU</a> . The inference workload will run on the <a href="https://www.ieiworld.com/mustang-f100/en/"> IEI Mustang-F100-A10 </a> card installed in this node.

In [None]:
#Submit job to the queue
job_id_fpga = !qsub classification_pneumonia_job.sh -l nodes=1:tank-870:i5-6500te:iei-mustang-f100-a10 -F "results/fpga HETERO:FPGA,CPU FP16" -N obj_det_fpga
print(job_id_fpga[0]) 
if job_id_fpga:
    progressIndicator('results/fpga', 'progress'+job_id_fpga[0]+'.txt', "Progress", 0, 100)
if not job_id_fpga:
    print("Error in job submission.")

#### Submitting to an edge compute node with Intel NCS 2 (Neural Compute Stick 2)
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank 870-Q170</a> edge node with an <a href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel Core i5-6500te CPU</a>. The inference workload will run on an <a 
    href="https://software.intel.com/en-us/neural-compute-stick">Intel Neural Compute Stick 2</a> installed in this  node.

In [None]:
#Submit job to the queue
job_id_myriadx = !qsub classification_pneumonia_job.sh -l nodes=1:tank-870:i5-6500te:intel-ncs2 -F "results/myriadx MYRIAD FP16 " -N obj_det_myriadx
print(job_id_myriadx[0]) 
if job_id_myriadx:
    progressIndicator('results/myriadx', 'progress'+job_id_myriadx[0]+'.txt', "Progress", 0, 100)
if not job_id_myriadx:
    print("Error in job submission.")


#### Submitting to an edge compute node with UP Squared Grove IoT Development Kit (UP2)
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/up-squared-grove-dev-kit">UP Squared Grove IoT Development Kit</a> edge node with an <a 
    href="https://ark.intel.com/products/96488/Intel-Atom-x7-E3950-Processor-2M-Cache-up-to-2-00-GHz-">Intel Atom® x7-E3950 Processor</a>. The inference  workload will run on the integrated Intel® HD Graphics 505 card.

In [None]:
#Submit job to the queue
job_id_up2 = !qsub classification_pneumonia_job.sh -l nodes=1:up-squared -F "results/up2 GPU FP16" -N obj_det_up2
print(job_id_up2[0]) 
if job_id_up2:
    progressIndicator('results/up2', 'progress'+job_id_up2[0]+'.txt', "Progress", 0, 100)
if not job_id_up2:
    print("Error in job submission.")

### 3.3 Check if the jobs are done

To check on the jobs that were submitted, use the `qstat` command.

We have created a custom Jupyter widget  to get live qstat update.
Run the following cell to bring it up. 

In [None]:
liveQstat()

You should see the jobs you have submitted (referenced by `Job ID` that gets displayed right after you submit the job in step 2.3).
There should also be an extra job in the queue "jupyterhub": this job runs your current Jupyter Notebook session.

The 'S' column shows the current status. 
- If it is in Q state, it is in the queue waiting for available resources. 
- If it is in R state, it is running. 
- If the job is no longer listed, it means it is completed.

**Note**: Time spent in the queue depends on the number of users accessing the edge nodes. Once these jobs begin to run, they should take from 1 to 5 minutes to complete. 

***Wait!***

Please wait for the inference jobs complete before proceeding to the next step.

### 3.4 View Results

Once the jobs are completed, the queue system outputs the stdout and stderr streams of each job into files with names of the form

`obj_det_{type}.o{JobID}`

`obj_det_{type}.e{JobID}`

(here, obj_det_{type} corresponds to the `-N` option of qsub).

We also saved the probability output and inference time for each input image in the folder `results/` for each architecture. 
We observe the results below.



#### Result on the Intel Core CPU 

In [None]:
show_results(job_id_core[0], 'core')

#### Result on the Intel Xeon CPU

In [None]:
show_results(job_id_xeon[0], 'xeon')

#### Result on the Intel Integrated GPU

In [None]:
show_results(job_id_gpu[0], 'gpu')
result_file="results/gpu/result"+job_id_gpu[0]+".txt"

#### Result on the IEI Mustang-F100-A10 (Intel® Arria® 10 FPGA)

In [None]:
show_results(job_id_fpga[0], 'fpga')

#### Result on the Intel NCS2

In [None]:
show_results(job_id_myriadx[0], 'myriadx')

#### Result on the UP2

In [None]:
show_results(job_id_up2[0], 'up2')

### 3.5 Assess Performance

The total average time of each inference task is recorded in `results/{ARCH}/statsjob_id.txt`, where the subdirectory name corresponds to the architecture of the target edge compute node. Run the cell below to plot the results of all jobs side-by-side. Lower values mean better performance. Keep in mind that some architectures are optimized for the highest performance, others for low power or other metrics.

In [None]:
arch_list = [('core', 'Intel Core\ni5-6500TE\nCPU'),
             ('xeon', 'Intel Xeon\nE3-1268L v5\nCPU'),
             ('gpu', ' Intel Core\ni5-6500TE\nGPU'),
             ('fpga', ' IEI Mustang\nF100-A10\nFPGA'),
             ('myriadx', 'Intel\nNCS2'),
             ('up2', 'Intel Atom\nx7-E3950\nUP2/GPU')]

stats_list = []
for arch, a_name in arch_list:
    if 'job_id_'+arch in vars():
        stats_list.append(('results/'+arch+'/stats'+vars()['job_id_'+arch][0]+'.txt', a_name))
    else:
        stats_list.append(('placeholder'+arch, a_name))

#print(stats_list)

summaryPlot(stats_list, 'Architecture', 'Time, milliseconds', 'Inference Engine Processing Time', 'time' )