## Data reading - First method

This is the first database read method implemented. According to the database's own [documentation](https://physionet.org/content/ptb-xl/1.0.2/) it *"provides a minimal use example showing how to load waveform data and tags making use of the proposed train-test split."*

Later in the rest of the notebooks, I will refer to this way of loading data as **"first data loading method"**.

The necessary libraries are imported and a random number generator is created.

In [1]:
import pandas as pd
import numpy as np
import ast
import wfdb
from wfdb import processing
import matplotlib.pyplot as plt
import heartpy as hp
import ecg_plot
import os
import glob

import warnings
warnings.filterwarnings('ignore')

# Create random generator with its seed
rng = np.random.default_rng(123)

Functions for loading data. Their operation is described in the functions.

In [2]:
def load_raw_data(df:pd.core.frame.DataFrame, sampling_rate:int, path:str):
    """Load data from DataFrame acording to sampling rate

    Args:
        df (pandas.core.frame.DataFrame): DataFrame to read
        sampling_rate (int): Sampling rate of signal
        path (str): Path that contains signal data

    Returns:
        numpy.ndarray: Signal data
    """

    if sampling_rate == 100:
        data = [wfdb.rdsamp(path+f) for f in df.filename_lr]
    else:
        data = [wfdb.rdsamp(path+f) for f in df.filename_hr]
    
    data = np.array([signal for signal, meta in data])
    return data

def aggregate_diagnostic(y_dic:dict):
    """Add diagnostics from csv 

    Args:
        y_dic (dict): Contains diagnostic 

    Returns:
        list: Diagnostic in list format
    """
    tmp = []
    for key in y_dic.keys():
        if key in agg_df.index:
            tmp.append(agg_df.loc[key].diagnostic_class)
    return list(set(tmp))

The data are loaded following the [documentation](https://physionet.org/content/ptb-xl/1.0.2/) scheme. It can be seen that the data is ready to be used in an artificial intelligence training system.

In [4]:
# Setup data variables
path = '../data/physionet.org/files/ptb-xl/1.0.2/'
sampling_rate = 100 # Sampling rate of the signal (100 or 500)

# Check that the folder exists
try:
    folder_exists = os.path.exists(path)
    if not folder_exists: 
        raise FileNotFoundError()
except FileNotFoundError:
    print("Folder " + path + " does not exists!")

# Load and convert annotation data
Y_aux = pd.read_csv(path+'ptbxl_database.csv', index_col='ecg_id')
Y_aux.scp_codes = Y_aux.scp_codes.apply(lambda x: ast.literal_eval(x))

# Subset of N random data
n_data = rng.integers(low=0, high=50)
Y = Y_aux.sample(n=n_data)

# Load raw signal data
X = load_raw_data(Y, sampling_rate, path)

# Load scp_statements.csv for diagnostic aggregation
agg_df = pd.read_csv(path+'scp_statements.csv', index_col=0)
agg_df = agg_df[agg_df.diagnostic == 1]

# Apply diagnostic superclass
Y['diagnostic_superclass'] = Y.scp_codes.apply(aggregate_diagnostic)