## Some Useful Links to refer to
* [Dataset Location - Select the PAF Prediction Challenge Database (afpbd)](https://physionet.org/cgi-bin/atm/ATM "PhysioBank ATM")
* [Refer this to see how data is supposed to be fed into the network for a similar kind of problem](https://towardsdatascience.com/human-activity-recognition-har-tutorial-with-keras-and-core-ml-part-1-8c05e365dfa0 "Human Activity Recognition (HAR) Tutorial with Keras and Core ML")
* [This is the implementation of the above with 1D Convolution](https://blog.goodaudience.com/introduction-to-1d-convolutional-neural-networks-in-keras-for-time-sequences-3a7ff801a2cf "Introduction to 1D Convolutional Neural Networks in Keras for Time Sequences")
* [Choice of Optimizers available in Keras](https://keras.io/optimizers "Keras Optimizers")
* [Choice of Loss Functions available in Keras](https://keras.io/losses "Keras Loss Functions")

In [6]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
import os
warnings.filterwarnings("ignore")

In [7]:
# Setting the parameter to plot at 30x15 size
plt.rcParams['figure.figsize'] = 30, 15

In [8]:
# Declare the locations of the files
root_folder_train = 'C:\\Users\\animi\\Documents\\Dataset\\ECG Dataset\\Train\\'
root_folder_test = 'C:\\Users\\animi\\Documents\\Dataset\\ECG Dataset\\Test\\'
category = ['Normal', 'Abnormal']
destination = 'C:\\Users\\animi\\Documents\\Dataset\\ECG Dataset\\Customized_Data\\'

In [33]:
# Function to load the dataset and do the required formatting to be able to feed the data into the Keras Model
def load_data(root_folder):
    # These are delcarations of variables that have been used inside the for loop
    final_list = list()
    labels = list()
    # It is iterating through both the categories Normal and Abnormal
    for cat in category:
        # It is taking and processing each file in the folder
        for filename in os.listdir(root_folder+cat):
            
            # Read each file for each category and drop the unnecessary columns
            path = root_folder + cat + '\\' + filename
            df = pd.read_csv(path) # Read the CSV using inbuilt Pandas Function
            df.drop(index=0, axis=0, inplace=True) # Drop the first row, which contains the units of measurement (useless for our use case)
            df.columns = ["time", "ECG0", "ECG1"] # Rename the columns for convinience and easy access of the columns
            df.drop(['time'], axis=1, inplace=True) # Drop the time column, as we are not using it as a time series. We are using the indexes instead
            df.ECG0 = pd.to_numeric(df.ECG0) # The data by default is in the form of an object, Convert each row into numeric or floating point
            df.ECG1 = pd.to_numeric(df.ECG1)
            
            print(filename, len(df))
            
            # Split each file into 6 parts and then make each of them a new row by transposing
            df_split = np.array_split(df, 30) # Split the dataset into 30 different sets. This is not mandatory, but is suggested since the dataset size is less
            for splitted_array in df_split:
                final_list.append(np.array(splitted_array)) # After splitting, we are appending all the splitted arrays into 1 single large array of 3 dimentions
                # The following if-else block is used to create labels. We have taken '1' for AF ECG and '0' for Normal ECG
                # This is not the ideal way to create labels, but this is the most simplest way for this situation
                if cat == 'Normal':
                    labels.append(0)
                if cat == 'Abnormal':
                    labels.append(1)

    # Before returning, convert the lists to arrays and increase the dmentions for being able to feed into the Neural Network
    return np.array(final_list), np.expand_dims(np.array(labels), axis=1)

In [34]:
# Load the training and testing dataset separately by calling the function for each of their root folder locations
X_train, y_train = load_data(root_folder_train)
X_test, y_test = load_data(root_folder_test)
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)

n01.csv 7680
n02.csv 7680
n03.csv 38400
n04.csv 7680
n05.csv 7680
n06.csv 7680
n07.csv 7680
n08.csv 7680
n09.csv 7680
n10.csv 7680
n11.csv 7680
n12.csv 38400
n13.csv 38400
n14.csv 7680
n15.csv 7680
n16.csv 7680
n17.csv 7680
n18.csv 7680
n19.csv 7680
n20.csv 7680
n21.csv 7680
n22.csv 7680
n23.csv 7680
n24.csv 7680
n25.csv 7680
n26.csv 7680
n27.csv 7680
n28.csv 7680
n29.csv 7680
n30.csv 7680
n31.csv 7680
n32.csv 7680
n33.csv 7680
n34.csv 7680
n35.csv 7680
n36.csv 7680
n37.csv 7680
n38.csv 7680
n39.csv 7680
n40.csv 7680
n41.csv 7680
n42.csv 7680
n43.csv 7680
n44.csv 7680
n45.csv 7680
n46.csv 7680
n47.csv 7680
n48.csv 7680
n49.csv 7680
n50.csv 7680
p01.csv 7680
p02.csv 7680
p03.csv 7680
p04.csv 7680
p05.csv 7680
p06.csv 7680
p07.csv 7680
p08.csv 7680
p09.csv 7680
p10.csv 7680
p11.csv 7680
p12.csv 7680
p13.csv 7680
p14.csv 7680
p15.csv 7680
p16.csv 7680
p17.csv 7680
p18.csv 7680
p19.csv 7680
p20.csv 7680
p21.csv 7680
p22.csv 7680
p23.csv 7680
p24.csv 7680
p25.csv 7680
p26.csv 7680
p27.csv 7

In [6]:
# Save the formatted data for easy access later. This is then loaded and used for the neural network
np.save('C:/Users/animi/Documents/Dataset/ECG Dataset/Customized_Data/X_train.npy', X_train)
np.save('C:/Users/animi/Documents/Dataset/ECG Dataset/Customized_Data/y_train.npy', y_train)
np.save('C:/Users/animi/Documents/Dataset/ECG Dataset/Customized_Data/X_test.npy', X_test)
np.save('C:/Users/animi/Documents/Dataset/ECG Dataset/Customized_Data/y_test.npy', y_test)