# Electrocardiogram (ECG) Arrhythmia Detection Demo (WIP)

Electrocardiograms are records of the electrical activity of the heart gathered from electrodes placed on the skin. They are commonly used for medical monitoring and diagnosis.

This ECG demo is based on the model developed by the [Stanford ML group](https://stanfordmlgroup.github.io/projects/ecg2/) using the [PhysioNet 2017 challenge dataset](https://www.physionet.org/content/challenge-2017/1.0.0/).  The [GitHub repository](https://github.com/awni/ecg) contains the original code and resources for training models.

![Example ECG](figures/A00150.gif) 

Awni Y Hannun, Pranav Rajpurkar, Masoumeh Haghpanahi, Geoffrey H Ti-son, Codie Bourn, Mintu P Turakhia, and Andrew Y Ng. Cardiologist-levelarrhythmia  detection  and  classification  in  ambulatory  electrocardiogramsusing a deep neural network.Nature Medicine, 25(1):65, 2019. https://www.nature.com/articles/s41591-018-0268-3


## Requirements and Imports
Running the inference requires Keras and serveral other packages to be installed. Run the following cells to ensure that all dependencies are satisfied.

In [None]:
print("Installing requirements")
!python3 -m pip --no-cache-dir install -r requirements.txt --user

In [None]:
import os
import sys
import subprocess
from pathlib import Path
sys.path.insert(0, os.path.join(Path.home(), 'Reference-samples/iot-devcloud'))
from demoTools.demoutils import *

import matplotlib.pyplot as plt
import matplotlib.animation as ani
import numpy as np
import scipy.io as sio
from IPython.display import HTML
from matplotlib.ticker import MultipleLocator, AutoMinorLocator

os.makedirs("results", exist_ok=True)
os.makedirs("logs", exist_ok=True)
os.makedirs("models", exist_ok=True)

## Description of Problem and Model

This network is used to detect arrhythmias from ECG time series data. The model was trained using the PhysioNet Computing in Cardiology Challenge 2017 (CINC17) dataset, which has four distinct classes which are as follows:

N - normal sinus rhythm

A - atrial fibrillation (AF)

O - an alternative rhythm

~ - too noisy to be classified

Although this dataset only has four distinct classifications for the ECG records, it is possible to train the model to distinguish between different types of arrhythmias if labels are provided to distinguish between different types.

Both the Keras and the OpenVINO models will be run using a subset of the original test data that excludes any examples below a specified length. Although the Keras model can take inputs of variable size, we use the same input set as the OpenVINO examples for consistency. 

## Visualizing the Information

In this section we will see examples of the four different classes of arrhythmias and some of their distinguishing characteristics.

The code below will convert the time series data into short animations which illustrate what each class looks like. Note that processing can take some time (~1-2 mins).

In [None]:
def create_ecg_graph(filename, y):
    step_size = 12
    x_lim = 1800
    scale_factor = 7
    num_frames = x_lim // step_size
    x =  range(0,(x_lim*scale_factor),scale_factor)
    y_max = 1.11*max(np.amax(y), abs(np.amin(y)))
    
    plt.ioff()
    
    #set up the figure
    fig, ax = plt.subplots()
    line, = ax.plot([], [], color='k')
    
    ax.tick_params(axis="both", which="both", length=0.0, labelbottom=False, labelleft=False)
    for spine in ax.spines.values():
        spine.set_visible(False)

    # Create the grid with 4x4 grid squares
    aspect_ratio = 4200 / (2*y_max)
    ax.xaxis.set_major_locator(MultipleLocator(400))
    ax.yaxis.set_major_locator(MultipleLocator(400 / aspect_ratio))
    ax.xaxis.set_minor_locator(AutoMinorLocator(4))
    ax.yaxis.set_minor_locator(AutoMinorLocator(4))
    ax.grid(which='major', linestyle='-', axis='both')
    ax.grid(which='minor', linewidth='0.5', axis='both', color='lightgray')

    # Settings for figure size
    ax.set_xlim(1, x_lim*scale_factor)
    ax.set_ylim(-y_max, y_max)
    fig.set_figheight(4.5)
    fig.set_figwidth(13.5)
    fig.tight_layout()
    
    canvas_width, canvas_height = fig.canvas.get_width_height()

    def update(num):
        offset = int(num // (x_lim / step_size)) + 2
        index = int(num % (x_lim / step_size))
        line.set_data(x[:(step_size*index)], y[(x_lim*offset):(x_lim*offset+step_size*index)])

    # Open an ffmpeg process
    cmdstring = ('ffmpeg', 
                 '-y', '-r', '25', # 25fps
                 '-s', '%dx%d' % (canvas_width, canvas_height), # size of image string
                 '-pix_fmt', 'argb', # format
                 '-f', 'rawvideo',  '-i', '-', # tell ffmpeg to expect raw video from the pipe
                 '-preset', 'ultrafast',
                 '-vcodec', 'h264', 'figures/' + filename) # output encoding
    p = subprocess.Popen(cmdstring, stdin=subprocess.PIPE)

    # Draw frames and write to the pipe
    for frame in range(num_frames):
        # draw the frame
        update(frame)
        fig.canvas.draw()

        # extract the image as an ARGB string
        string = fig.canvas.tostring_argb()

        # write to pipe
        p.stdin.write(string)

    # Finish up
    p.communicate()


### Normal Sinus Rhythm

Normal ECG rhythms consist of four distinct sections: P wave, QRS complex, T wave, and U wave.

<figure>
<img src="figures/EKG_info.svg" height=40%, width=40%/>
<figcaption style="text-align:center"><a href="https://commons.wikimedia.org/wiki/File:EKG_Complex_en.svg" title="via Wikimedia Commons">ECG Complex</a> [<a href="https://creativecommons.org/licenses/by-sa/3.0">CC BY-SA</a>]</figcaption>
</figure>

The P wave represents atrial depolarization.  
The QRS complex represents ventricular depolarization.  
The T wave represents ventricular repolarization.  
The U wave represents papillary muscle repolarization.  

In [None]:
ecg_normal = sio.loadmat('data/A00001.mat')['val'].squeeze()


create_ecg_graph('ecg_normal.mp4', ecg_normal)

HTML('''
    <video alt="test" controls autoplay loop>
        <source src="figures/ecg_normal.mp4" type="video/mp4">
    </video>
''')

### Atrial Fibrilation

This is usually marked by irregular intervals between heart beats, rapid heart rate, and lack of a P wave.

In [None]:
ecg_af = sio.loadmat('data/A00004.mat')['val'].squeeze()

create_ecg_graph("ecg_af.mp4", ecg_af)

HTML('''
    <video alt="test" controls autoplay loop>
        <source src="figures/ecg_af.mp4" type="video/mp4">
    </video>
''')

### Other Rhythm

For this dataset, all non-AF abnormal rhythms are classified as other rhythms. 

In [None]:
ecg_other = sio.loadmat('data/A00077.mat')['val'].squeeze()

create_ecg_graph("ecg_other.mp4", ecg_other)

HTML('''
    <video alt="test" controls autoplay loop>
        <source src="figures/ecg_other.mp4" type="video/mp4">
    </video>
''')

### Too noisy to be classified

This final classification includes data that has too much noise to have any distinguishable pattern.

In [None]:
ecg_undef = sio.loadmat('data/A01246.mat')['val'].squeeze()

create_ecg_graph("ecg_undef.mp4", ecg_undef)

HTML('''
    <video alt="test" controls autoplay loop>
        <source src="figures/ecg_undef.mp4" type="video/mp4">
    </video>
''')

## Running Prediction Using Keras

In this section we will run all of the sample data through Keras using a Tensorflow backend. After getting predictions from the model, we will compare it to the ground truth labels to measure accuracy. The inference it done by running the [keras_inference.py](./keras_inference.py) script, whose contents are shown in the cell below.

```python
import os
from time import time
from warnings import simplefilter 
simplefilter(action='ignore', category=FutureWarning)

import keras
import numpy as np
import scipy.stats as sst
import sklearn.metrics as skm
from keras.backend.tensorflow_backend import tf
from openvino.inference_engine import IENetwork, IECore
from tqdm import tqdm

import load

tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

model_path = "/data/ecg/0.427-0.863-020-0.290-0.899.hdf5"
data_csv = "./data/reference.csv"

print("Loading Dataset")
ecgs, labels = load.load_dataset(data_csv)

print("Loading Model")
model = keras.models.load_model(model_path)

print("Starting Inference")
probs = []
total_time = 0
for x in tqdm(ecgs):
    x = load.process_x(x)
    start_time = time()
    probs.append(model.predict(x))
    total_time += (time() - start_time)
    
print("Keras took {} sec for inference".format(total_time))

# The class distribution of the overall dataset
prior = [[[0.15448743, 0.66301941, 0.34596848, 0.09691286]]]

# Determine the predicted class from the most commonly predicted class 
preds = []
for p in probs:
    preds.append(sst.mode(np.argmax(p / prior, axis=2).squeeze())[0][0])

# Generate a report with the precision, recall, and f-1 scores for each of the classes
report = skm.classification_report(labels, preds, target_names=['A','N','O','~'], digits=3)
scores = skm.precision_recall_fscore_support(labels, preds, average=None)

print(report)
print ("CINC Average {:3f}".format(np.mean(scores[2][:3])))
```

In [None]:
!python3 python/keras_inference.py

## Converting Keras Model for OpenVINO

Next we will run inference using OpenVINO. In order to make the model usable for OpenVINO we first need to convert the model from its original format to a tensorflow frozen protobuf format, which then can be run through the OpenVINO model optimizer to produce an Intermediate Representation that is usable by OpenVINO.


.hdf5 (keras) -> pb (tensorflow) -> IR (OpenVINO)

```python
import os
from warnings import simplefilter 
simplefilter(action='ignore', category=FutureWarning)

from keras.models import load_model
from keras import backend as K
from keras.backend.tensorflow_backend import tf
from tensorflow.python.framework import graph_io

tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

K.clear_session()
K.set_learning_phase(0)

input_model = '/data/ecg/0.427-0.863-020-0.290-0.899.hdf5'
output_model = 'models/output_graph.pb'
num_output = 1 

model = load_model(input_model)
print(model.summary())

predictions = [None] * num_output
predrediction_node_names = [None] * num_output

for i in range(num_output):
    predrediction_node_names[i] = 'output_node' + str(i)
    predictions[i] = tf.identity(model.outputs[i], 
    name=predrediction_node_names[i])

sess = K.get_session()

constant_graph = tf.compat.v1.graph_util.convert_variables_to_constants(sess, sess.graph.as_graph_def(), predrediction_node_names)
infer_graph = tf.compat.v1.graph_util.remove_training_nodes(constant_graph) 

graph_io.write_graph(infer_graph, '.', output_model, as_text=False)
```

In [None]:
!python3 python/tensorflow_conversion.py

For conversion to the OpenVINO Intermediate Representation (IR) we need to specify the size of the input. We also specify the data type as FP16.

In [None]:
!python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_tf.py \
--input_model models/output_graph.pb                                   \
--output_dir models/                                                   \
--input_shape "[1,8960,1]"                                             \
--data_type FP16                                                      

## Running Inference Using OpenVINO

The OpenVINO model only takes input of a specific size so we truncate all of the data that is above that size before feeding it into the model.

```python
import os
from time import time

import numpy as np
import scipy.stats as sst
import sklearn.metrics as skm
from openvino.inference_engine import IENetwork, IECore
from tqdm import tqdm

import load

data_csv = "./data/reference.csv"

print("Loading Dataset")
ecgs, labels = load.load_dataset(data_csv)

# Load network and add CPU extension
ie = IECore()
ie.add_extension('/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so',"CPU")
net = IENetwork(model = './models/output_graph.xml', weights = './models/output_graph.bin')
exec_net = ie.load_network(network=net, device_name='CPU')

print("Starting Inference")
probs_total = []
total_time = 0
for x in tqdm(ecgs):
    x = load.process_x(x)
    start_time = time()
    res = exec_net.infer(inputs={"inputs": x})
    total_time += (time() - start_time)
    probs = res["time_distributed_1/Reshape_1/Softmax"]
    probs_total.append(probs)

print("OpenVINO took {} sec for inference".format(total_time))

# The class distribution of the overall dataset
prior = [[[0.15448743, 0.66301941, 0.34596848, 0.09691286]]]

# Determine the predicted class from the most commonly predicted class 
preds = []
for p in probs_total:
    preds.append(sst.mode(np.argmax(p / prior, axis=2).squeeze())[0][0])
    
# Generate a report with the precision, recall, and f-1 scores for each of the classes
report = skm.classification_report(labels, preds, target_names=['A','N','O','~'], digits=3)
scores = skm.precision_recall_fscore_support(labels, preds, average=None)

print(report)
print ("CINC Average {:3f}".format(np.mean(scores[2][:3])))
```

In [None]:
!python3 python/openvino_inference.py

Everything below should follow the standard layout that all of the other notebooks share.

Will need to test on the various hardware platforms to make sure that everything works. 

## Inference on the edge

All the code up to this point has been run on a development node based on an Intel Xeon Scalable processor. We will run the workload on other edge compute nodes represented in the IoT DevCloud by submitting the corresponding non-interactive jobs into a queue. For each job, we will specify the type of the edge compute server that must be allocated for the job.

In [None]:
%%writefile prediction.sh

cd $PBS_O_WORKDIR

DEVICE=$1

if [ "$DEVICE" = "HETERO:FPGA,CPU" ]; then
  # Environment variables and compilation for edge compute nodes with FPGAs
  source /opt/intel/init_openvino.sh
  aocl program acl0 /opt/intel/openvino/bitstreams/a10_vision_design_sg1_bitstreams/2019R3_PV_PL1_FP16_MobileNet_Clamp.aocx
fi

python3 python/inference.py -d ${DEVICE}


### View what nodes are available

In [None]:
!pbsnodes | grep compnode | awk '{print $3}' | sort | uniq -c

### Job queue submission
For this demo we only include the CPU and GPU nodes since the layers of the model are not currently supported by the FPGA, MYRIAD, and HDDL OpenVINO plugins. 

In [None]:
job_id_core = !qsub prediction.sh -l nodes=1:idc001skl:i5-6500te -F "CPU" -N arrhythmia_core -e logs/ -o logs/      
print(job_id_core[0]) 
#Progress indicators
if job_id_core:
    progressIndicator('./logs', job_id_core[0] + '_load.txt', "Data Loading", 0, 100)
    progressIndicator('./logs', job_id_core[0]+'.txt', "Inference", 0, 100)

In [None]:
job_id_xeon = !qsub prediction.sh -l nodes=1:idc007xv5:intel-xeon -F "CPU" -N arrhythmia_xeon -e logs/ -o logs/
print(job_id_xeon[0]) 
#Progress indicators
if job_id_xeon:
    progressIndicator('./logs', job_id_xeon[0] + '_load.txt', "Data Loading", 0, 100)
    progressIndicator('./logs', job_id_xeon[0]+'.txt', "Inference", 0, 100)        

In [None]:
job_id_gpu = !qsub prediction.sh -l nodes=1:tank-870:i5-6500te:intel-hd-530 -F "GPU" -N arrhythmia_gpu -e logs/ -o logs/
print(job_id_gpu[0]) 
#Progress indicators
if job_id_gpu:
    progressIndicator('./logs', job_id_gpu[0] + '_load.txt', "Data Loading", 0, 100)
    progressIndicator('./logs', job_id_gpu[0]+'.txt', "Inference", 0, 100)        

In [None]:
job_id_up2 = !qsub prediction.sh -l nodes=1:up-squared -F "GPU" -N arrhythmia_up2 -e logs/ -o logs/
print(job_id_up2[0]) 
#Progress indicators
if job_id_up2:
    progressIndicator('./logs', job_id_up2[0] + '_load.txt', "Data Loading", 0, 100)
    progressIndicator('./logs', job_id_up2[0]+'.txt', "Inference", 0, 100)    

In [None]:
job_id_fpga = !qsub prediction.sh -l nodes=1:idc003a10:iei-mustang-f100-a10 -F "HETERO:FPGA,CPU" -N arrhythmia_fpga -e logs/ -o logs/
print(job_id_fpga[0]) 
#Progress indicators
if job_id_fpga:
    progressIndicator('./logs', job_id_fpga[0] + '_load.txt', "Data Loading", 0, 100)
    progressIndicator('./logs', job_id_fpga[0]+'.txt', "Inference", 0, 100)    

In [None]:
job_id_ncs2 = !qsub prediction.sh -l nodes=1:tank-870:i5-6500te:intel-ncs2 -F "MYRIAD" -N arrhythmia_ncs2 -e logs/ -o logs/
print(job_id_ncs2[0]) 
#Progress indicators
if job_id_ncs2:
    progressIndicator('./logs', job_id_ncs2[0] + '_load.txt', "Data Loading", 0, 100)
    progressIndicator('./logs', job_id_ncs2[0]+'.txt', "Inference", 0, 100)    

In [None]:
job_id_hddlr = !qsub prediction.sh -l nodes=1:tank-870:iei-mustang-v100-mx8 -F "HDDL" -N arrhythmia_hddlr -e logs/ -o logs/
print(job_id_hddlr[0]) 
#Progress indicators
if job_id_hddlr:
    progressIndicator('./logs', job_id_hddlr[0] + '_load.txt', "Data Loading", 0, 100)
    progressIndicator('./logs', job_id_hddlr[0]+'.txt', "Inference", 0, 100)    

### View Progress

In [None]:
liveQstat()

## Compare Results

In [None]:
arch_list = [('core', 'Intel Core\ni5-6500TE\nCPU'),
             ('xeon', 'Intel Xeon\nE3-1268L v5\nCPU'),
             ('gpu', ' Intel Core\ni5-6500TE\nGPU'),
             ('up2', 'Intel Atom\nx7-E3950\nUP2/GPU'),
             ('fpga', ' IEI Mustang\nF100-A10\nFPGA'),
             ('hddlr', ' IEI Mustang\nV100-MX8\nVPU'),
             ('ncs2', 'Intel\nNCS2')]

stats_list = []
for arch, a_name in arch_list:
    if 'job_id_'+arch in vars():
        stats_list.append(('results/stats_'+vars()['job_id_'+arch][0]+'.txt', a_name))
    else:
        stats_list.append(('placeholder'+arch, a_name))

summaryPlot(stats_list, 'Architecture', 'Time, miliseconds', 'Inference Engine Processing Time Per Sample', 'time' )