# **ECG HEARTBEAT CLASSIFICATION WITH ML - Mortara**

by Erica Brisigotti (2097202), Ekaterina Chueva (2072050), Sofia Pacheco Garcia (2070771), Nadillia Sahputra (2070770)

This project is part of the Laboratory of Computational Physics (mod. B) class, from the Physics of Data Master's Degree, held at the University of Padova during Academic Year 2022-2023. The project was supervised by professors Alberto Zucchetta,  Marco Zanetti and teaching assistant Federico Agostini.

The (ambitious) goal of the project is to test if ML can reliably classify heartbeats from a given ECG dataset. Commercial softwares can perform a preliminary analysis of anormal beats, but a very large fraction of them are false positive anomalies. ML could provide a better identication and classication of the anomalies, signicantly reducing
the false-positive ratio. 

The ECG dataset consists of 450 Holter ECGs collected in the past decade by the Cardiology Department of the University of Padova. Each ECG is a 12 leads, 24h recording of a single patient. As of executing the project, no ECGs from this datasets provided labels. 

We looked outwards for training datasets since we implemented supervised ML techniques. The closest dataset found was the [PhysioNet St Petersburg INCART 12-lead Arrhythmia Database dataset](https://physionet.org/content/incartdb/1.0.0/). This database consists of 75 annotated recordings extracted from 32 Holter records. Each record is 30 minutes long and contains 12 standard leads, each sampled at 257 Hz, with reference annotation files, totaling over 175000 beat annotations in all. [0]

The ML model was trained in the "ECG_Physionet" file: the trained model was then saved and uploaded in the "ECG_Mortara" file to predict the type of heartbeat.

A Plotly Dash interface (in the "ECG_Doctor" file) was prepared to eventually check the ML predictions with a Cardiologist.

In [1]:
# we import the packages needed for plotting with Plotly Dash
import pandas as pd
import numpy as np

from dash import Dash, html, dcc, callback, Output, Input, Patch
from plotly.subplots import make_subplots
import plotly.graph_objects as go

In [2]:
# we choose which file we want to import
# path='01'
# and import it a dataframe 
# df_ = pd.read_csv('output_files/'+'data'+path+'_filtered.csv')
# and also import the dataframe with information about all the heartbeats (of all patients)
# hb_ = pd.read_csv('output_files/'+'everything.csv')
hb_ = pd.read_csv('extended_output/'+'everything.csv')

# and take only its rows relevant to the chosen patient
# hb_ = hb_[hb_['patient']==int(path)]

In [3]:
# we choose the number of ill heartbeats to display
n_hb_ill=25
# we identify the ill heartbeats
hb_mask = hb_['ann_symbol']!='N'
hb_ill = hb_[hb_mask]
# ???? (we should be randomly sampling this as well)
hb_ill = hb_ill.iloc[:n_hb_ill]

# we choose the number of healthy heartbeats to display
n_hb_normal=25
# ???? (why do we reference ill for healthy)
hb_N = hb_[:hb_ill.index[-1]]
# we identify the healthy heartbeats
hb_mask_N = hb_N['ann_symbol']=='N'
hb_N = hb_N[hb_mask_N]
# and randomly sample based on the chosen number
hb_N = hb_N.sample(n_hb_normal)

# we save the heartbeats together
hb = pd.concat([hb_ill, hb_N])
# and scrumble them
hb = hb.sort_index()

# ????
# df = df_.iloc[0:hb['ann_index'].values[-1]+1]
# hb

In [4]:
df = pd.DataFrame({})
path = 'blah'
for beat in range(len(hb)):
    
    
    if path != str(hb["patient"].iloc[beat]):
        path = str(hb["patient"].iloc[beat])
#         data = pd.read_csv("output_files/data"+path+"_filtered.csv")
        data = pd.read_csv("extended_output/data"+path+"_filtered.csv")
        

    ds = data[hb['start'].iloc[beat] : hb['end'].iloc[beat]]
    
    df = pd.concat([df,ds], ignore_index=True)
# and import it a dataframe 
# df_ = pd.read_csv('output_files/'+'data'+path+'_filtered.csv')

In [5]:
length = (hb['end']-hb['start'])
l = length.shift(1)
l.iat[0] = 0
l = l.astype(int)

In [6]:
# we set the quantities needed for plotting
# - the indexes i.e. values for the x-axis
x = np.arange(0, len(df), 1)
# - the dataframe with the signal in all leads
df = df
# - the name of all leads
names = list(df.columns.values)

# - the colors to use to represent the signal in all leads
colors = ['salmon', 'darkorange', 
          'gold', 'khaki', 
          'lawngreen', 'limegreen', 
          'springgreen', 'aquamarine', 
          'mediumturquoise', 'lightskyblue',
          'orchid', 'pink']

# - the xlimits for each heartbeat
# starts = list(hb['start'].values)
# ends =  list(hb['end'].values)
lengths  = list(length)
starts = [0] + [sum(lengths[:i+1]) for i in range(len(lengths)-1)]
ends = [sum(lengths[:i+1]) - 1 for i in range(len(lengths))]
ploc = starts + (hb['ann_index'] -hb['start'])
# ????
heartbeats = {"all": "All"}
heartbeats.update({i: f"beat {i}" for i in range(len(starts))})

# - the ML predictions
df["ml"] = ""
ann_pos = list(hb['ann_index'].values)

df.loc[ploc, "ml"] = hb['ann_symbol'].values

In [7]:
fig_ecg = make_subplots(
    rows=6, cols=2,
    shared_xaxes=True,
    #row_titles=names,
    x_title="Time"
)

fig_ecg.update_layout(
    title_text="",
    height=1000,
    paper_bgcolor='white',
    plot_bgcolor='white'
)
  

for i, n in enumerate(names):
    if ((i+1)%2)==1:
        y = df[n]
        fig_ecg.append_trace(go.Scatter(x=x, y=df[n], marker_color=colors[i], name=n),row=int((i + 2)/2), col=1)
        fig_ecg['layout']['yaxis{}'.format(i+1)]['title']=n
    else: 
        y = df[n]
        fig_ecg.append_trace(go.Scatter(x=x, y=df[n], marker_color=colors[i], name=n), row=int((i + 1)/2), col=2)
        fig_ecg['layout']['yaxis{}'.format(i+1)]['title']=n
#fig_ecg['layout']['yaxis{}'.format(i)]['title']=n
    
    
fig_ecg.update_layout(title_text="", height=1000)
#fig_ecg.update_yaxes(showticklabels=False)

fig_ecg.update_yaxes(title_text=names, secondary_y=True, showticklabels=False)     

In [8]:
# %%
app = Dash(__name__)
app.layout = html.Div(
    [
        html.H1("ECG"),
        html.Div(
            [
                html.Div(
                    [
                        dcc.Dropdown(heartbeats, id="dropdown", value="all")
                    ],
                    style={'padding': 10, 'flex': 1}
                ),
                html.Div(
                    [
                        dcc.Checklist(
                            {"ML": "Show ML annotations"}, id="checkbox", value=[])
                    ],
                    style={'padding': 10, 'flex': 1}
                )
            ],
            style={'display': 'flex', 'flex-direction': 'row'}
        ),
        dcc.Graph(figure=fig_ecg, id="ecg"),
    ]
)

In [9]:
@callback(
    Output("ecg", "figure"),
    Input("dropdown", "value"),
    Input("checkbox", "value")
)
def update(dropdown, checkbox):
    # extract data from df within the heartbeat range
    if dropdown == "all":
        ddf = df
        range_x = x
        range_y = None  # If dropdown is 'all', there's no specific range_y for a beat
    else:
        dropdown = int(dropdown)
        mask = (x > starts[dropdown]) & (x < ends[dropdown])
        ddf = df.loc[mask]
        range = x[mask]

    # rest of your code
    # ML annotation
    mode = "lines+text" if checkbox else "lines"

    # redo the plot
    figure = make_subplots(
    rows=6, cols=2,
    shared_xaxes=True,
    #row_titles=names,
    x_title="Time")
    
    figure.update_layout(
    title_text="",
    height=1000,
    paper_bgcolor='white',
    plot_bgcolor='white'
)
    
        
    for i, n in enumerate(names):
        if ((i+1)%2)==1:

            y = df[n]
            figure.append_trace(go.Scatter(x=range, y=ddf[n], marker_color=colors[i], name=n, mode=mode, 
                                text=ddf["ml"]),
                                row=int((i + 2)/2), col=1)
            figure['layout']['yaxis{}'.format(i+1)]['title']=n
        else: 
            y = df[n]
            figure.append_trace(go.Scatter(x=range, y=ddf[n], marker_color=colors[i], name=n, mode=mode, text=ddf["ml"]), 
                                row=int((i + 1)/2), col=2)
            figure['layout']['yaxis{}'.format(i+1)]['title']=n



    figure.update_layout(title_text="", height=1000)
    figure.update_yaxes(title_text=names, secondary_y=True, showticklabels=False)     
#     figure.update_traces(textposition='middle', textfont_size=20)
    figure.update_traces(textfont_size=20)


    return figure


In [None]:
try: 
    if  __name__ == '__main__':
        app.run(debug=False,port=8000)
except:
    print("Exception occured!")
    from werkzeug.serving import run_simple
    run_simple('localhost', 10, app)

Dash is running on http://127.0.0.1:8000/

 * Serving Flask app '__main__'
 * Debug mode: off


 * Running on http://127.0.0.1:8000
Press CTRL+C to quit
127.0.0.1 - - [06/Jul/2023 12:57:52] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [06/Jul/2023 12:57:52] "GET /_dash-dependencies HTTP/1.1" 200 -
127.0.0.1 - - [06/Jul/2023 12:57:52] "GET /_dash-layout HTTP/1.1" 200 -
127.0.0.1 - - [06/Jul/2023 12:57:52] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [06/Jul/2023 12:57:52] "GET /_dash-component-suites/dash/dcc/async-dropdown.js HTTP/1.1" 304 -
127.0.0.1 - - [06/Jul/2023 12:57:52] "GET /_dash-component-suites/dash/dcc/async-graph.js HTTP/1.1" 304 -
127.0.0.1 - - [06/Jul/2023 12:57:52] "GET /_dash-component-suites/dash/dcc/async-plotlyjs.js HTTP/1.1[2023-07-06 12:57:52,545] ERROR in app: Exception on /_dash-update-component [POST]
Traceback (most recent call last):
  File "C:\Users\nadil\anaconda3\envs\torch\lib\site-packages\flask\app.py", line 2529, in wsgi_app
    response = self.full_dispatch_request()
  File "C:\Users\nadil\anaconda3\envs\torch\lib\site-packages\flask\app.py", line 1825, in ful