# QRS training data Preprocessor

Data preprocessor to build the training and validation data for our neural network

This notebook implementes all the necessary steps to run the computations needed to create the training and test data files.

## Importing libraries

In [1]:
import wfdb
import numpy as np
import pickle as pkl
import matplotlib.pyplot as plt

## Creating the parabola function

This function will create a parabola around a spike in order to give it more width so it can be more easily detected

In [34]:
# auxiliary function
def parabola(a,n,r):
    """
    Creates a parabola around the position of a spike specified in 'a'
    Params:
        a - A vector specifying peak positions
        n - The length of the target vector to generate
        r - The radius of the parabola
    """
    assert n>2*r
    y = np.zeros(n, dtype = np.float32)
    x= np.array(range(2,2*r+1))
    for i in a:
        if i > r-1 and i <= n-r:
            y[i-r+1:i+r] = ((r+1)**2-(x-r-1)**2)/(r+1)**2
        elif i < r:
            y[:i+r] = ((r+1)**2-(x[r-i-1:]-r-1)**2)/(r+1)**2
        elif i<n:
            y[i-r+1:] = ((r+1)**2-(x[:r-1+(n-i)]-r-1)**2)/(r+1)**2
    return y

## Preprocessing the files

We iterate all the files and for each of them we read channels II and V1.

After reading the channels we separate them into two distinct arrays in order to then join them into one 1D array.

We filter out undesired lines, i.e., lines that do not have a QRS symbol specified in `qrs_symbs` list

Once the lines are filtered we create a parabola around the spikes of our labels to better identify them, this completes the preprocessing of the data and we are now ready to create an output dictionary to be serialized into a file with `pickle`

In [46]:
for i in range(1, 76):
    
    file_path = f"./data/I{i:02}"
    output_file_name = f"./processed_data/I{i:02}"
    try:
        # Reading the channels of interest
        signal, info = wfdb.rdsamp(file_path, channel_names = ["II", "V1"])

        # Separating the two signals so we can put them in one dimension
        signal_II = signal[:, 0]
        signal_V1 = signal[:, 1]


        # Reading the annotations
        annotations = wfdb.rdann(file_path, "atr")
        symbol_positions = annotations.sample
        symbol_list = annotations.symbol

        # Filtering out all the lines that do not have a QRS symbol

        qrs_symbs = ['N','L','R','B','A','a','J','S','V','r','F','e','j', 'n', 'E', '/', 'f', 'Q',' ?']

        qrs_symbol_positions = [symbol_positions[idx] for idx, symb in enumerate(symbol_list) if symb in qrs_symbs]

        target_vec = parabola(qrs_symbol_positions, len(signal), 3)

        output_dict = {
            "features": signal_II + signal_V1,
            "label": target_vec
        }

        pkl.dump(output_dict, open(output_file_name, "wb"), protocol=pkl.HIGHEST_PROTOCOL)
    except:
        print(f"Error on file {file_path}")

Error on file ./data/I04
Error on file ./data/I17
Error on file ./data/I35
Error on file ./data/I44
Error on file ./data/I57
Error on file ./data/I72
Error on file ./data/I74


Example on how to read a file

In [48]:
data_dict = pkl.load(open("./processed_data/I01", "rb"))
data_dict["features"]

array([-1.94444444, -1.97385621, -1.95751634, ...,  4.44117647,
        4.41176471,  4.45751634])