# Processing of ECG signals

This document highlights the process of signal processing used in this project. The script used to run the actual processing can be found at scripts/processing.py. This notebook is meant to highlight the process and reasoning behind using certain methods.



## 0. Logistics

### 0.1 Loading the files 
The following cell block loads in all the records from the test directory into a list. 

In [4]:
import os # for handling loading in the files

In [9]:
base_path = "../physionet.org/files/ecg-arrhythmia/1.0.0/WFDBRecords/01/010"
recordPath = []


for root, dirs, files in os.walk(base_path):

    files.sort()

    for file in files:
        if file.endswith(".mat"):

            record_name = os.path.splitext(file)[0]

            record_path = os.path.join(root, record_name)


            recordPath.append(record_path) # has all record paths



### 0.2 Getting the signal data

The next step is to get the signal data. For that wfdb is used which is a standard library for looking at ECG signals.

In [10]:
from wfdb.io import rdrecord
from wfdb import Record

In [11]:
records:list[Record] = []
sf = 0
for r in recordPath:
    record = rdrecord(
r
    )
    if sf == 0:
        sf = record.fs
    elif sf != record.fs:
        raise ValueError("Sampling frequency of signals do not match")
    records.append(record)

### 0.3 Writing Processed signal files

Once again we use the wfdb library for this task. This can be further reviewed in the script file.

### 0.4  Extracting comments from the file

The conditions names are written with a snomed code so to implement labels for our model a function was made that converts the code to a condition name. Details of this function can be found in the code block bellow. This step is skipped here because it is done when writing to the file can be seen in the script file.

In [12]:
import pandas as pd

SNOMED = pd.read_csv('../physionet-data/a-large-scale-12-lead-electrocardiogram-database-for-arrhythmia-study-1.0.0/ConditionNames_SNOMED-CT.csv')


mapping = pd.Series(SNOMED['Acronym Name'].values, index=SNOMED['Snomed_CT'].astype(str)).to_dict()


def parseConditions(comments):


    for data in comments:
        if data.startswith("Dx:"):
            dx_codes = data.split(": ")[1].split(",")

            mapped = [mapping.get(dx, f"Unknown Dx: {dx}") for dx in dx_codes]
            return mapped



## 1. Normalisation of signal

Before removing noise the first step would be normalise the signal, luckily wfbd has this built in so we just use the function. 


In [21]:
record=records[0] #first signal for testing, stored as a pandas dataframe
fs=record.fs # sampling requency fo record 

In [17]:
import wfdb.processing as wd


In [18]:
record.p_signal = wd.normalize_bound(record.p_signal)


### 2. Baseline Wander Removal

Second noise removal is the baseline wander removal of the signal. This has been done with a high pass filter. Scipy signal library was used to achieve this task.

In [2]:

from scipy import signal

In [19]:
def removeBaselineWander(ecg_signal:pd.DataFrame,sf):
    
    sampling_rate = sf  
    cutoff_frequency = 0.8
    nyquist_rate = sampling_rate / 2

    index=['I', 'II', 'III', 'aVR', 'aVL', 'aVF', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6'] 

    b, a = signal.butter(1, cutoff_frequency / nyquist_rate, btype='highpass')
    for i in index:    
        ecg_signal[i] = signal.filtfilt(b, a, ecg_signal[i])
    return ecg_signal

In [22]:
recordDf=record.to_dataframe()# converting for easier manipulation
filtered=removeBaselineWander(recordDf.copy(),fs)