In [45]:
import pandas as pd
import numpy as np
import wfdb
import ast
import json

# Objective
Given a 12 lead ECG system, classify the infarction stadium level.
* Noise may or may not be present in the signal

# Strategy
* First, train a neural network (NN) to identify the four types of noise ['baseline_drift', 'static_noise', 'burst_noise', 'electrodes_problems']
* Next, train a neural network to identify the infarction stadium (IS) using the 12 lead signal information with an additional layer than indicates what type of noise is in the signal for each lead.
* In Production, each lead is passed through the noise detection NN to detect noise. That information is then passed to the IS neural network with the signals to make a classification.

In [46]:
root_directory_path = '/home/lucas/Desktop/programming/classwork/Senior_project/project/data/'
sig_database = pd.read_csv(root_directory_path+'ptbxl_database.csv', index_col='ecg_id')
sig_database.columns


Index(['patient_id', 'age', 'sex', 'height', 'weight', 'nurse', 'site',
       'device', 'recording_date', 'report', 'scp_codes', 'heart_axis',
       'infarction_stadium1', 'infarction_stadium2', 'validated_by',
       'second_opinion', 'initial_autogenerated_report', 'validated_by_human',
       'baseline_drift', 'static_noise', 'burst_noise', 'electrodes_problems',
       'extra_beats', 'pacemaker', 'strat_fold', 'filename_lr', 'filename_hr'],
      dtype='object')

Remove the stuff we don't need

In [47]:
sig_database = sig_database.drop(
    ['patient_id', 'age', 'sex', 'height', 'weight', 'nurse', 'site', 'device', 'recording_date', 'validated_by', 'second_opinion', 'initial_autogenerated_report', 'validated_by_human','extra_beats', 'pacemaker'],
    axis=1
)
print(sig_database.columns)

Index(['report', 'scp_codes', 'heart_axis', 'infarction_stadium1',
       'infarction_stadium2', 'baseline_drift', 'static_noise', 'burst_noise',
       'electrodes_problems', 'strat_fold', 'filename_lr', 'filename_hr'],
      dtype='object')


# Looking at one Signal
* We will use the 100 hz sample

In [48]:
def load_one_sample(path):
    data = wfdb.rdsamp(path)
    signals = data[0]
    meta = data[1]
    return signals, meta

In [49]:
one_sample = sig_database.iloc[0]
file_path = one_sample['filename_lr']
signals, meta = load_one_sample(root_directory_path + file_path)


# Infarction Stadium Labels
* I used the methodology described in this paper to distill the labels: https://arxiv.org/pdf/2306.15681

In [50]:
infarction_labels = sig_database['infarction_stadium1']
print(set(infarction_labels))

{nan, 'Stadium II', 'Stadium III', 'Stadium I-II', 'unknown', 'Stadium II-III', 'Stadium I'}


In [51]:
def fix_infarction_label(label):
    if pd.isna(label) or label == "unknown":
        return "None"
    if label == "Stadium I-II":
        return "Stadium I"
    if label == "Stadium II-III":
        return "Stadium II"
    return label
    
infarction_labels = infarction_labels.apply(lambda x: fix_infarction_label(x))
print(set(infarction_labels))

{'Stadium III', 'Stadium II', 'Stadium I', 'None'}


In [52]:
sig_database = sig_database.drop(
    ['infarction_stadium1', 'infarction_stadium2'],
    axis=1  # Drop columns
)
sig_database.columns

Index(['report', 'scp_codes', 'heart_axis', 'baseline_drift', 'static_noise',
       'burst_noise', 'electrodes_problems', 'strat_fold', 'filename_lr',
       'filename_hr'],
      dtype='object')

In [53]:
sig_database['infarction_labels'] = infarction_labels
print(sig_database.columns)

Index(['report', 'scp_codes', 'heart_axis', 'baseline_drift', 'static_noise',
       'burst_noise', 'electrodes_problems', 'strat_fold', 'filename_lr',
       'filename_hr', 'infarction_labels'],
      dtype='object')


In [54]:
codes = sig_database[['scp_codes']]
codes.head()

Unnamed: 0_level_0,scp_codes
ecg_id,Unnamed: 1_level_1
1,"{'NORM': 100.0, 'LVOLT': 0.0, 'SR': 0.0}"
2,"{'NORM': 80.0, 'SBRAD': 0.0}"
3,"{'NORM': 100.0, 'SR': 0.0}"
4,"{'NORM': 100.0, 'SR': 0.0}"
5,"{'NORM': 100.0, 'SR': 0.0}"


In [55]:
unique_codes = set()
for i in range(len(codes)):
    code_dict = ast.literal_eval(codes.iloc[i]['scp_codes'])
    for key in code_dict.keys():
        unique_codes.add(str(key))
unique_codes = sorted(list(unique_codes))

In [56]:
print(len(unique_codes))

71


Theres obviously a lot, I asked GPT4o to categorize and explain them to me:

Certainly! The ECG codes you provided can be categorized into different types of heart conditions and ECG findings. Below is an attempt to separate them into categories:

### 1. **Heart Blocks:**
   - **1AVB**: First-Degree Atrioventricular Block
   - **2AVB**: Second-Degree Atrioventricular Block
   - **3AVB**: Third-Degree Atrioventricular Block
   - **CLBBB**: Complete Left Bundle Branch Block
   - **CRBBB**: Complete Right Bundle Branch Block
   - **ILBBB**: Incomplete Left Bundle Branch Block
   - **IRBBB**: Incomplete Right Bundle Branch Block
   - **IVCD**: Intraventricular Conduction Delay
   - **LAFB**: Left Anterior Fascicular Block
   - **LPFB**: Left Posterior Fascicular Block
   - **TAB_**: Terminal Activation Block

### 2. **Arrhythmias:**
   - **AFIB**: Atrial Fibrillation
   - **AFLT**: Atrial Flutter
   - **BIGU**: Bigeminy
   - **PAC**: Premature Atrial Contractions
   - **PSVT**: Paroxysmal Supraventricular Tachycardia
   - **PVC**: Premature Ventricular Contractions
   - **SARRH**: Sinus Arrhythmia
   - **SBRAD**: Sinus Bradycardia
   - **SR**: Sinus Rhythm
   - **STACH**: Sinus Tachycardia
   - **SVARR**: Supraventricular Arrhythmia
   - **SVTAC**: Supraventricular Tachycardia
   - **TRIGU**: Trigeminy
   - **WPW**: Wolff-Parkinson-White Syndrome

### 3. **Myocardial Infarction (Heart Attack):**
   - **ALMI**: Anterolateral Myocardial Infarction
   - **AMI**: Acute Myocardial Infarction
   - **ASMI**: Anterior Septal Myocardial Infarction
   - **ILMI**: Inferolateral Myocardial Infarction
   - **IMI**: Inferior Myocardial Infarction
   - **IPLMI**: Inferoposterolateral Myocardial Infarction
   - **IPMI**: Inferoposterior Myocardial Infarction
   - **LMI**: Lateral Myocardial Infarction
   - **PMI**: Posterior Myocardial Infarction

### 4. **Ischemia (Reduced Blood Flow):**
   - **ISCAL**: Ischemia Anterolateral
   - **ISCAN**: Ischemia Anterior
   - **ISCAS**: Ischemia Anterior Septal
   - **ISCIL**: Ischemia Inferolateral
   - **ISCIN**: Ischemia Inferior
   - **ISCLA**: Ischemia Lateral
   - **ISC_**: General Ischemia

### 5. **Injury Patterns:**
   - **INJAL**: Injury Anterolateral
   - **INJAS**: Injury Anterior Septal
   - **INJIL**: Injury Inferolateral
   - **INJIN**: Injury Inferior
   - **INJLA**: Injury Lateral

### 6. **Hypertrophy (Thickened Heart Muscle):**
   - **LVH**: Left Ventricular Hypertrophy
   - **RVH**: Right Ventricular Hypertrophy
   - **SEHYP**: Severe Hypertrophy
   - **VCLVH**: Ventricular Conduction with Left Ventricular Hypertrophy

### 7. **Atrial Abnormalities:**
   - **LAO/LAE**: Left Atrial Overload/Left Atrial Enlargement
   - **RAO/RAE**: Right Atrial Overload/Right Atrial Enlargement

### 8. **Voltage Abnormalities:**
   - **HVOLT**: High Voltage
   - **LVOLT**: Low Voltage

### 9. **QT Interval Abnormalities:**
   - **LNGQT**: Long QT Interval

### 10. **Medication and Electrolyte Effects:**
   - **DIG**: Digitalis Effect
   - **EL**: Electrolyte Imbalance

### 11. **T-Wave and ST-Segment Abnormalities:**
   - **INVT**: Inverted T-Wave
   - **LOWT**: Low T-Wave Amplitude
   - **NDT**: Non-Diagnostic T-Waves
   - **NT_**: Non-Specific T-Wave Changes
   - **NST_**: Non-Specific ST Changes
   - **STD_**: ST Depression
   - **STE_**: ST Elevation

### 12. **Pacing and Pacemaker-Related:**
   - **PACE**: Pacemaker Activity
   - **PRC(S)**: Precordial Syndrome (could be related to pacemaker or other conditions)

### 13. **Other Findings:**
   - **ABQRS**: Abnormal QRS Complex
   - **ANEUR**: Aneurysm (ventricular aneurysm)
   - **QWAVE**: Pathological Q-Waves
   - **NORM**: Normal ECG

### Summary:
This categorization helps you understand the various conditions and abnormalities that these codes represent. Each category groups related conditions, making it easier to interpret what the ECG is indicating based on the specific codes present.

# Noise Identification Dataset

In [57]:
noise_information = sig_database[['baseline_drift', 'static_noise', 'burst_noise', 'electrodes_problems']]
print(noise_information.head(10))

        baseline_drift static_noise burst_noise electrodes_problems
ecg_id                                                             
1                  NaN    , I-V1,           NaN                 NaN
2                  NaN          NaN         NaN                 NaN
3                  NaN          NaN         NaN                 NaN
4         , II,III,AVF          NaN         NaN                 NaN
5        , III,AVR,AVF          NaN         NaN                 NaN
6                 , V1          NaN         NaN                 NaN
7                  NaN          NaN         NaN                 NaN
8             , II,AVF   , I-AVF,           NaN                 NaN
9                  NaN   , I-AVR,           NaN                 NaN
10                 NaN          NaN         NaN                 NaN
