# <center>Physionet/Cinc Challenge 2020</center>
## <center>This study is a part of the PhysioNet/Computation in Cardiology (CinC) Challenge 2020. Our objective was to classify 27 cardiac abnormalities based on a provided dataset of 43101 12-lead ECG recordings. We developed a hybrid model combining a rule-based algorithm with different Deep Learning architectures</center>

### Introduction
The electrocardiogram (ECG) reflects the electrical activity of the heart, and the interpretation of this recording can reveal numerous pathologies of the heart. An ECG is recorded using an electrocardiograph, where modern clinical devices usually contain automatic interpretation software that interprets the ECGs directly after recording. Although automatic ECG interpretation started in the 1950s, there are still some limitations. Because of the errors they make, doctors have to read over the ECGs . This is time consuming for the doctors and requires high degree of expertise. There is clearly a need for better ECG interpretation algorithms.

The recent years has shown a rapid improvement in the field of machine learning. A sub-field of machine learning is called Deep Learning, where more complex architectures of neural networks are better able to scale with the amount of data in terms of performance. This type of machine learning has shown promising performance in many fields including medicine, and in this study, we have explored the usefulness of deep learning in classifying 12-lead ECGs. 

As a starting point for our model architecture we chose to use the two best performing Convolutional Neural Networks (CNN) used on ECG data in [Fawaz HI et al 2019](https://link.springer.com/article/10.1007/s10618-019-00619-1?shared-article-renderer). They reported that Fully Convolutional Networks (FCN) outperformed eight other CNN architectures compared. We also wanted to test the second-best architecture which was an Encoder Network. We also assessed the integration of a rule-based algorithm within these models in order to test the performance of a CNN and rule-based hybrid classifier. 

This study is a part of the PhysioNet/Computing in Cardiology Challenge 2020, where the aim was to develop an automated interpretation algorithm for identification of clinical diagnoses from 12-lead ECG recordings.

In [1]:
!pip install ecg-plot

import physionet_challenge_utility_script as pc
import ecg_plot
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow_addons as tfa
import tensorflow as tf
from tensorflow import keras
from keras.utils import plot_model
from keras.preprocessing.sequence import pad_sequences
%load_ext autoreload
%autoreload
%reload_ext autoreload

def plot_ecg(path):
    ecg_data = pc.load_challenge_data(path)
    ecg_plot.plot(ecg_data[0]/1000, sample_rate=500, title='')
    ecg_plot.show()
plot_ecg("/kaggle/input/china-12lead-ecg-challenge-database/Training_2/Q0948.mat")

You should consider upgrading via the 'C:\Users\Acer\AppData\Local\Programs\Python\Python39\python.exe -m pip install --upgrade pip' command.


Collecting ecg-plot
  Downloading ecg_plot-0.2.8-py3-none-any.whl (9.2 kB)
Installing collected packages: ecg-plot
Successfully installed ecg-plot-0.2.8


ModuleNotFoundError: No module named 'physionet_challenge_utility_script'

### Methods


#### Data
The training data in this study contains 43.101 Electrocariographic recordings from 4 different sources. 
1. Southeast University, China, including the data from the China Physiological Signal Challenge 2018 (2 datasets from this source)
2. St. Petersburg Institute of Cardiological Technics, St. Petersburg, Russia.
3. The Physikalisch Technische Bundesanstalt, Brunswick, Germany. (2 datasets from this source)
4. Georgia 12-Lead ECG Challenge Database, Emory University, Atlanta, Georgia, USA.

The data is given in the form of native Python waveform-database-format [WFDB](https://wfdb.readthedocs.io/en/latest/). The dataset contains two file types:


1.   Header files (.hea)
2.   Signal files (.mat)

We have 43.101 Signal file with a corresponding header file. Each file are named with a patient number starting with ***A0001*** and goes all the way up to ***A6877***


In [None]:
gender, age, labels, ecg_filenames = pc.import_key_data("/kaggle/input/")
ecg_filenames = np.asarray(ecg_filenames)

#### From the figure under we can se that the signals varies, but most of the signals are around 5000 samples long

In [None]:
import os
signal_lenght=[]
for subdir, dirs, files in sorted(os.walk("/kaggle/input/")):
    for filename in files:
        filepath = subdir + os.sep + filename
        if filepath.endswith(".mat"):
            data, header_data = pc.load_challenge_data(filepath)
            splitted = header_data[0].split()
            signal_lenght.append(splitted[3])
signal_lenght_df = pd.DataFrame(signal_lenght)
signal_count=signal_lenght_df[0].value_counts()
plt.figure(figsize=(20,10))
#plt.title(title,fontsize =36)
sns.barplot(signal_count[:10,].index, signal_count[:10,].values)

In [None]:
signal_count

In [None]:
pc.get_signal_lengths("/kaggle/input/", "Distribution of signal lengths of the ECGs")

#### From the header file we have access to gender and age from each patient

In [None]:
age, gender = pc.import_gender_and_age(age, gender)

#### All diagnoses are encoded with SNOMED-CT codes. We need a CSV-file to decode them:

In [None]:
SNOMED_scored=pd.read_csv("/kaggle/input/physionet-snomed-mappings/SNOMED_mappings_scored.csv", sep=";")
SNOMED_unscored=pd.read_csv("/kaggle/input/physionet-snomed-mappings/SNOMED_mappings_unscored.csv", sep=";")
df_labels = pc.make_undefined_class(labels,SNOMED_unscored)

#### To be able to feed the labels to a Neural Network we need to OneHot encode the labels

In [None]:
y , snomed_classes = pc.onehot_encode(df_labels)

In [None]:
snomed_abbr = []
for j in range(len(snomed_classes)):
    for i in range(len(SNOMED_scored.iloc[:,1])):
        if (str(SNOMED_scored.iloc[:,1][i]) == snomed_classes[j]):
            snomed_abbr.append(SNOMED_scored.iloc[:,2][i])
            
snomed_abbr = np.asarray(snomed_abbr)

#### The distribution of diagnoses accross the dataset
In the figure under we can see the same SNOMED CT codes decoded into human readable diagnoses on the X-axis. On the Y-axis we have the number of the given diagnoses in the dataset

In [None]:
pc.plot_classes(snomed_classes, SNOMED_scored,y)

#### Since this is a multiclass multi-label classification there will be a lot of different combinations of the 27 diagnoses in this study

In [None]:
y_all_comb = pc.get_labels_for_all_combinations(y)
print("Total number of unique combinations of diagnosis: {}".format(len(np.unique(y_all_comb))))

#### We will split the data using a 10-fold split with Shuffle=True and random_seed = 42. 
The distribution of Training and Val data in each fold is now:
(in this study we only use the first fold for hold out validation)

In [None]:
folds = pc.split_data(labels, y_all_comb)

In [None]:
pc.plot_all_folds(folds,y,snomed_classes)

#### Make Batch generators
To feed the Neural Network with a dataset that is too large for our RAM set we need a batch generator to get data into the RAM in batches.
We start by making a "order array" so we can shuffle the order of the data during the training process

In [None]:
order_array = folds[0][0]

In [None]:
def shuffle_batch_generator_demo(batch_size, gen_x,gen_y, gen_z): 
    np.random.shuffle(order_array)
    batch_features = np.zeros((batch_size,5000, 12))
    batch_labels = np.zeros((batch_size,snomed_classes.shape[0])) #drop undef class
    batch_demo_data = np.zeros((batch_size,2))
    while True:
        for i in range(batch_size):

            batch_features[i] = next(gen_x)
            batch_labels[i] = next(gen_y)
            batch_demo_data[i] = next(gen_z)

        X_combined = [batch_features, batch_demo_data]
        yield X_combined, batch_labels
        
def shuffle_batch_generator(batch_size, gen_x,gen_y): 
    np.random.shuffle(order_array)
    batch_features = np.zeros((batch_size,5000, 12))
    batch_labels = np.zeros((batch_size,snomed_classes.shape[0])) #drop undef class
    while True:
        for i in range(batch_size):

            batch_features[i] = next(gen_x)
            batch_labels[i] = next(gen_y)
            
        yield batch_features, batch_labels

def generate_y_shuffle(y_train):
    while True:
        for i in order_array:
            y_shuffled = y_train[i]
            yield y_shuffled


def generate_X_shuffle(X_train):
    while True:
        for i in order_array:
                #if filepath.endswith(".mat"):
                    data, header_data = pc.load_challenge_data(X_train[i])
                    X_train_new = pad_sequences(data, maxlen=5000, truncating='post',padding="post")
                    X_train_new = X_train_new.reshape(5000,12)
                    yield X_train_new

def generate_z_shuffle(age_train, gender_train):
    while True:
        for i in order_array:
            gen_age = age_train[i]
            gen_gender = gender_train[i]
            z_train = [gen_age , gen_gender]
            yield z_train

#### Imbalanced data
To compensate for the imbalaced data we calculate a weight for each label. The weight decides how much the Neural Network will learn from the different data labels

In [None]:
new_weights=pc.calculating_class_weights(y)

In [None]:
keys = np.arange(0,27,1)
weight_dictionary = dict(zip(keys, new_weights.T[1]))
weight_dictionary

#### Learning rate reduction
To controll the learning rate we use learning rate reduction and early stopping to prevent overfitting

In [None]:
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(
    monitor='val_AUC', factor=0.1, patience=1, verbose=1, mode='max',
    min_delta=0.0001, cooldown=0, min_lr=0
)

early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_AUC', mode='max', verbose=1, patience=2)

#### To find the optimal threshold we will use Downhill simplex method

In [None]:
from scipy import optimize
def thr_chall_metrics(thr, label, output_prob):
    return -pc.compute_challenge_metric_for_opt(label, np.array(output_prob>thr))

### Results

#### Residual Neural Network

In [None]:
model = pc.residual_network_1d()

load a pre-trained model

In [None]:
model.load_weights("/kaggle/input/physionet-challenge-models/resnet_model.h5")

##### or train it your self by uncomment the code below

In [None]:
#batchsize = 30
#model.fit(x=shuffle_batch_generator(batch_size=batchsize, gen_x=generate_X_shuffle(ecg_filenames), gen_y=generate_y_shuffle(y)), epochs=100, steps_per_epoch=(len(order_array)/batchsize), validation_data=pc.generate_validation_data(ecg_filenames,y,folds[0][1]), validation_freq=1, class_weight=weight_dictionary, callbacks=[reduce_lr,early_stop])

In [None]:
y_pred = model.predict(x=pc.generate_validation_data(ecg_filenames,y,folds[0][1])[0])

In [None]:
init_thresholds = np.arange(0,1,0.05)

In [None]:
all_scores = pc.iterate_threshold(y_pred, ecg_filenames, y ,folds[0][1] )

In [None]:
new_best_thr = optimize.fmin(thr_chall_metrics, args=(pc.generate_validation_data(ecg_filenames,y,folds[0][1])[1],y_pred), x0=init_thresholds[all_scores.argmax()]*np.ones(27))

In [None]:
print(pc.compute_challenge_metric_for_opt(pc.generate_validation_data(ecg_filenames,y,folds[0][1])[1],(y_pred>new_best_thr)*1))

Make conf.matrix

In [None]:
pc.plot_normalized_conf_matrix(y_pred, ecg_filenames, y, folds[0][1], new_best_thr, snomed_classes, snomed_abbr)
plt.savefig("confusion_matrix_resnet.png", dpi=100)

#### Encoder Network

In [None]:
model = pc.encoder_model()

load a pre-trained model

In [None]:
model.load_weights("/kaggle/input/physionet-challenge-models/encoder_model.h5")

##### or train it your self by uncomment the code below

In [None]:
#batchsize = 30

#model.fit(x=shuffle_batch_generator(batch_size=batchsize, gen_x=generate_X_shuffle(ecg_filenames), gen_y=generate_y_shuffle(y)), epochs=50, steps_per_epoch=(len(order_array)/batchsize), validation_data=pc.generate_validation_data(ecg_filenames,y,folds[0][1]), validation_freq=1, class_weight=weight_dictionary, callbacks=[reduce_lr,early_stop])

In [None]:
y_pred = model.predict(x=pc.generate_validation_data(ecg_filenames,y,folds[0][1])[0])

In [None]:
init_thresholds = np.arange(0,1,0.05)

In [None]:
all_scores = pc.iterate_threshold(y_pred, ecg_filenames, y ,folds[0][1])

In [None]:
new_best_thr = optimize.fmin(thr_chall_metrics, args=(pc.generate_validation_data(ecg_filenames,y,folds[0][1])[1],y_pred), x0=init_thresholds[all_scores.argmax()]*np.ones(27))#

In [None]:
print(pc.compute_challenge_metric_for_opt(pc.generate_validation_data(ecg_filenames,y,folds[0][1])[1],(y_pred>new_best_thr)*1))

Make conf.matrix

In [None]:
pc.plot_normalized_conf_matrix(y_pred, ecg_filenames, y, folds[0][1], new_best_thr, snomed_classes, snomed_abbr)
plt.savefig("confusion_matrix_encoder.png",dpi=100)

#### Fully Convolutional Network

In [None]:
model = pc.FCN()

load a pre-trained model

In [None]:
model.load_weights("/kaggle/input/physionet-challenge-models/fcn_model.h5")

##### or train it your self by uncomment the code below

In [None]:
#batchsize = 30

#model.fit(x=shuffle_batch_generator(batch_size=batchsize, gen_x=generate_X_shuffle(ecg_filenames), gen_y=generate_y_shuffle(y)), epochs=30, steps_per_epoch=(len(order_array)/batchsize), validation_data=pc.generate_validation_data(ecg_filenames,y,folds[0][1]), validation_freq=1, class_weight=weight_dictionary, callbacks=[reduce_lr,early_stop])

In [None]:
y_pred = model.predict(x=pc.generate_validation_data(ecg_filenames,y,folds[0][1])[0])

In [None]:
init_thresholds = np.arange(0,1,0.05)

In [None]:
all_scores = pc.iterate_threshold(y_pred, ecg_filenames, y ,folds[0][1])

In [None]:
new_best_thr = optimize.fmin(thr_chall_metrics, args=(pc.generate_validation_data(ecg_filenames,y,folds[0][1])[1],y_pred), x0=init_thresholds[all_scores.argmax()]*np.ones(27))#

In [None]:
print(pc.compute_challenge_metric_for_opt(pc.generate_validation_data(ecg_filenames,y,folds[0][1])[1],(y_pred>new_best_thr)*1))#

Make conf.matrix

In [None]:
pc.plot_normalized_conf_matrix(y_pred, ecg_filenames, y, folds[0][1], new_best_thr, snomed_classes,snomed_abbr)
plt.savefig("confusion_matrix_fcn.png", dpi = 100)

#### ResNet + Gender and Age

In [None]:
model = pc.residual_network_1d_demo()

load a pre-trained model

In [None]:
model.load_weights("/kaggle/input/physionet-challenge-models/resnet_gender_age_model.h5")

##### or train it your self by uncomment the code below

In [None]:
#batchsize = 30

#history = model.fit(x=shuffle_batch_generator_demo(batch_size=batchsize, gen_x=generate_X_shuffle(ecg_filenames), gen_y=generate_y_shuffle(y), gen_z=generate_z_shuffle(age, gender)), epochs=50, steps_per_epoch=(len(order_array)/batchsize), validation_data=pc.generate_validation_data_with_demo_data(ecg_filenames,y, gender, age, folds[0][1]), validation_freq=1, class_weight=weight_dictionary, callbacks=[reduce_lr,early_stop])

In [None]:
y_pred = model.predict(x=pc.generate_validation_data_with_demo_data(ecg_filenames,y, gender, age,folds[0][1])[0])

In [None]:
init_thresholds = np.arange(0,1,0.05)

In [None]:
all_scores = pc.iterate_threshold(y_pred, ecg_filenames, y ,folds[0][1] )

In [None]:
new_best_thr = optimize.fmin(thr_chall_metrics, args=(pc.generate_validation_data_with_demo_data(ecg_filenames,y, gender, age,folds[0][1])[1],y_pred), x0=init_thresholds[all_scores.argmax()]*np.ones(27))

In [None]:
print(pc.compute_challenge_metric_for_opt(pc.generate_validation_data_with_demo_data(ecg_filenames,y, gender, age,folds[0][1])[1],(y_pred>new_best_thr)*1))

Conf.matrix

In [None]:
pc.plot_normalized_conf_matrix(y_pred, ecg_filenames, y, folds[0][1], new_best_thr, snomed_classes,snomed_abbr)
plt.savefig("confusion_matrix_resnet_age_gender.png", dpi = 100)

#### Encoder + Gender and Age

In [None]:
model = pc.encoder_model_demo()

load a pre-trained model

In [None]:
model.load_weights("/kaggle/input/physionet-challenge-models/encoder_gender_age_model.h5")

##### or train it your self by uncomment the code below

In [None]:
#batchsize = 30

#history = model.fit(x=shuffle_batch_generator_demo(batch_size=batchsize, gen_x=generate_X_shuffle(ecg_filenames), gen_y=generate_y_shuffle(y), gen_z=generate_z_shuffle(age, gender)), epochs=50, steps_per_epoch=(len(order_array)/batchsize), validation_data=pc.generate_validation_data_with_demo_data(ecg_filenames,y, gender, age, folds[0][1]), validation_freq=1, class_weight=weight_dictionary, callbacks=[reduce_lr,early_stop])

In [None]:
y_pred = model.predict(x=pc.generate_validation_data_with_demo_data(ecg_filenames,y, gender, age,folds[0][1])[0])

In [None]:
init_thresholds = np.arange(0,1,0.05)

In [None]:
all_scores = pc.iterate_threshold(y_pred, ecg_filenames, y ,folds[0][1] )

In [None]:
new_best_thr = optimize.fmin(thr_chall_metrics, args=(pc.generate_validation_data_with_demo_data(ecg_filenames,y, gender, age,folds[0][1])[1],y_pred), x0=init_thresholds[all_scores.argmax()]*np.ones(27))

In [None]:
print(pc.compute_challenge_metric_for_opt(pc.generate_validation_data_with_demo_data(ecg_filenames,y, gender, age,folds[0][1])[1],(y_pred>new_best_thr)*1))

Conf.matrix

In [None]:
pc.plot_normalized_conf_matrix(y_pred, ecg_filenames, y, folds[0][1], new_best_thr, snomed_classes, snomed_abbr)
plt.savefig("confusion_matrix_encoder_age_gender.png", dpi=100)

#### FCN + Gender and Age

In [None]:
model = pc.FCN_demo()

load a pre-trained model

In [None]:
model.load_weights("/kaggle/input/physionet-challenge-models/fcn_gender_age_model.h5")

##### or train it your self by uncomment the code below

In [None]:
#batchsize = 30

#history = model.fit(x=shuffle_batch_generator_demo(batch_size=batchsize, gen_x=generate_X_shuffle(ecg_filenames), gen_y=generate_y_shuffle(y), gen_z=generate_z_shuffle(age, gender)), epochs=50, steps_per_epoch=(len(order_array)/batchsize), validation_data=pc.generate_validation_data_with_demo_data(ecg_filenames,y, gender, age, folds[0][1]), validation_freq=1, class_weight=weight_dictionary, callbacks=[reduce_lr,early_stop])

In [None]:
y_pred = model.predict(x=pc.generate_validation_data_with_demo_data(ecg_filenames,y, gender, age,folds[0][1])[0])

In [None]:
init_thresholds = np.arange(0,1,0.05)

In [None]:
all_scores = pc.iterate_threshold(y_pred, ecg_filenames, y ,folds[0][1] )

In [None]:
new_best_thr = optimize.fmin(thr_chall_metrics, args=(pc.generate_validation_data_with_demo_data(ecg_filenames,y, gender, age,folds[0][1])[1],y_pred), x0=init_thresholds[all_scores.argmax()]*np.ones(27))

In [None]:
print(pc.compute_challenge_metric_for_opt(pc.generate_validation_data_with_demo_data(ecg_filenames,y, gender, age,folds[0][1])[1],(y_pred>new_best_thr)*1))

Conf matrix

In [None]:
pc.plot_normalized_conf_matrix(y_pred, ecg_filenames, y, folds[0][1], new_best_thr, snomed_classes, snomed_abbr)
plt.savefig("confusion_matrix_fcn_gender_age.png", dpi=100)

#### FCN and Encoder

In [None]:
model = pc.FCN_Encoder()

load a pre-trained model

In [None]:
model.load_weights("/kaggle/input/physionet-challenge-models/fcn_and_encoder_model.h5")

##### or train it your self by uncomment the code below

In [None]:
#batchsize = 30

#model.fit(x=shuffle_batch_generator(batch_size=batchsize, gen_x=generate_X_shuffle(ecg_filenames), gen_y=generate_y_shuffle(y)), epochs=5, steps_per_epoch=(len(order_array)/batchsize), validation_data=pc.generate_validation_data(ecg_filenames,y,folds[0][1]), validation_freq=1, class_weight=weight_dictionary, callbacks=[reduce_lr,early_stop])

In [None]:
y_pred = model.predict(x=pc.generate_validation_data_with_demo_data(ecg_filenames,y, gender, age,folds[0][1])[0])

In [None]:
init_thresholds = np.arange(0,1,0.05)

In [None]:
all_scores = pc.iterate_threshold(y_pred, ecg_filenames, y ,folds[0][1] )

In [None]:
new_best_thr = optimize.fmin(thr_chall_metrics, args=(pc.generate_validation_data_with_demo_data(ecg_filenames,y, gender, age,folds[0][1])[1],y_pred), x0=init_thresholds[all_scores.argmax()]*np.ones(27))

In [None]:
print(pc.compute_challenge_metric_for_opt(pc.generate_validation_data_with_demo_data(ecg_filenames,y, gender, age,folds[0][1])[1],(y_pred>new_best_thr)*1))

Conf.matrix

In [None]:
pc.plot_normalized_conf_matrix(y_pred, ecg_filenames, y, folds[0][1], new_best_thr, snomed_classes, snomed_abbr)
plt.savefig("confusion_matrix_fcn_and_encoder.png", dpi = 100)

#### FCN and Encoder + Rule-based model

In [None]:
binary_prediction = y_pred > new_best_thr
binary_prediction = binary_prediction * 1

In [None]:
rb_pred = pc.rule_based_predictions(ecg_filenames,folds[0][1],binary_prediction)

In [None]:
pc.plot_normalized_conf_matrix_rule(y,folds[0][1], binary_prediction, snomed_classes)

In [None]:
print(pc.compute_challenge_metric_for_opt(pc.generate_validation_data_with_demo_data(ecg_filenames,y, gender, age,folds[0][1])[1],rb_pred))

#### FCN and Encoder + Gender and Age

In [None]:
model = pc.FCN_Encoder_demo()

load a pre-trained model

In [None]:
model.load_weights("/kaggle/input/physionet-challenge-models/fcn_encoder_and_gender_age_model.h5")

##### or train it your self by uncomment the code below

In [None]:
#batchsize = 30

#history = model.fit(x=shuffle_batch_generator_demo(batch_size=batchsize, gen_x=generate_X_shuffle(ecg_filenames), gen_y=generate_y_shuffle(y), gen_z=generate_z_shuffle(age, gender)), epochs=30, steps_per_epoch=(len(order_array)/batchsize), validation_data=pc.generate_validation_data_with_demo_data(ecg_filenames,y, gender, age, folds[0][1]), validation_freq=1, class_weight=weight_dictionary, callbacks=[reduce_lr,early_stop])

In [None]:
y_pred = model.predict(x=pc.generate_validation_data_with_demo_data(ecg_filenames,y, gender, age,folds[0][1])[0])

In [None]:
init_thresholds = np.arange(0,1,0.05)

In [None]:
all_scores = pc.iterate_threshold(y_pred, ecg_filenames, y ,folds[0][1] )

In [None]:
new_best_thr = optimize.fmin(thr_chall_metrics, args=(pc.generate_validation_data_with_demo_data(ecg_filenames,y, gender, age,folds[0][1])[1],y_pred), x0=init_thresholds[all_scores.argmax()]*np.ones(27))

In [None]:
print(pc.compute_challenge_metric_for_opt(pc.generate_validation_data_with_demo_data(ecg_filenames,y, gender, age,folds[0][1])[1],(y_pred>new_best_thr)*1))

Conf.matrix

In [None]:
pc.plot_normalized_conf_matrix(y_pred, ecg_filenames, y, folds[0][1], new_best_thr, snomed_classes, snomed_abbr)
plt.savefig("confusion_matrix_fcn_and_encoder_an_demo.png", dpi = 100)

#### FCN and Encoder + Gender and Age + Rule-based

In [None]:
binary_prediction = y_pred > new_best_thr
binary_prediction = binary_prediction * 1

In [None]:
rb_pred = pc.rule_based_predictions(ecg_filenames,folds[0][1],binary_prediction)

In [None]:
pc.plot_normalized_conf_matrix_rule(y,folds[0][1], binary_prediction, snomed_classes, snomed_abbr)
plt.savefig("confusion_matrix_fcn_and_encoder_an_demo_rulebased.png", dpi = 100)

In [None]:
print(pc.compute_challenge_metric_for_opt(pc.generate_validation_data_with_demo_data(ecg_filenames,y, gender, age,folds[0][1])[1],rb_pred))

### Further work:

#### Residual Network with Separable Convolution

In [None]:
def sepres():
    n_feature_maps = 64
    input_shape = (5000,12)
    input_layer = keras.layers.Input(input_shape)

        # BLOCK 1

    conv_x = keras.layers.Conv1D(filters=n_feature_maps, kernel_size=8, padding='same', data_format='channels_last')(input_layer)
    conv_x = keras.layers.BatchNormalization()(conv_x)
    conv_x = keras.layers.Activation('relu')(conv_x)

    conv_y = keras.layers.Conv1D(filters=n_feature_maps, kernel_size=5, padding='same', data_format='channels_last')(conv_x)
    conv_y = keras.layers.BatchNormalization()(conv_y)
    conv_y = keras.layers.Activation('relu')(conv_y)

    conv_z = keras.layers.Conv1D(filters=n_feature_maps, kernel_size=3, padding='same', data_format='channels_last')(conv_y)
    conv_z = keras.layers.BatchNormalization()(conv_z)

        # expand channels for the sum
    shortcut_y = keras.layers.SeparableConv1D(filters=n_feature_maps, kernel_size=1, padding='same',data_format='channels_last')(input_layer)
    shortcut_y = keras.layers.BatchNormalization()(shortcut_y)

    output_block_1 = keras.layers.add([shortcut_y, conv_z])
    output_block_1 = keras.layers.Activation('relu')(output_block_1)

        # BLOCK 2

    conv_x = keras.layers.SeparableConv1D(filters=n_feature_maps * 2, kernel_size=8, padding='same', data_format='channels_last')(output_block_1)
    conv_x = keras.layers.BatchNormalization()(conv_x)
    conv_x = keras.layers.Activation('relu')(conv_x)

    conv_y = keras.layers.SeparableConv1D(filters=n_feature_maps * 2, kernel_size=5, padding='same',data_format='channels_last')(conv_x)
    conv_y = keras.layers.BatchNormalization()(conv_y)
    conv_y = keras.layers.Activation('relu')(conv_y)

    conv_z = keras.layers.SeparableConv1D(filters=n_feature_maps * 2, kernel_size=3, padding='same',data_format='channels_last')(conv_y)
    conv_z = keras.layers.BatchNormalization()(conv_z)

        # expand channels for the sum
    shortcut_y = keras.layers.Conv1D(filters=n_feature_maps * 2, kernel_size=1, padding='same', data_format='channels_last')(output_block_1)
    shortcut_y = keras.layers.BatchNormalization()(shortcut_y)

    output_block_2 = keras.layers.add([shortcut_y, conv_z])
    output_block_2 = keras.layers.Activation('relu')(output_block_2)

        # BLOCK 3

    conv_x = keras.layers.Conv1D(filters=n_feature_maps * 2, kernel_size=8, padding='same', data_format='channels_last')(output_block_2)
    conv_x = keras.layers.BatchNormalization()(conv_x)
    conv_x = keras.layers.Activation('relu')(conv_x)

    conv_y = keras.layers.Conv1D(filters=n_feature_maps * 2, kernel_size=5, padding='same', data_format='channels_last')(conv_x)
    conv_y = keras.layers.BatchNormalization()(conv_y)
    conv_y = keras.layers.Activation('relu')(conv_y)

    conv_z = keras.layers.Conv1D(filters=n_feature_maps * 2, kernel_size=3, padding='same', data_format='channels_last')(conv_y)
    conv_z = keras.layers.BatchNormalization()(conv_z)

        # no need to expand channels because they are equal
    shortcut_y = keras.layers.SeparableConv1D(filters=n_feature_maps * 2, kernel_size=1, padding='same',data_format='channels_last')(output_block_2)
    shortcut_y = keras.layers.BatchNormalization()(shortcut_y)

    output_block_3 = keras.layers.add([shortcut_y, conv_z])
    output_block_3 = keras.layers.Activation('relu')(output_block_3)


        # Block 4

    conv_x = keras.layers.SeparableConv1D(filters=n_feature_maps * 2, kernel_size=8, padding='same',data_format='channels_last', depth_multiplier=12)(output_block_3)
    conv_x = keras.layers.BatchNormalization()(conv_x)
    conv_x = keras.layers.Activation('relu')(conv_x)

    conv_y = keras.layers.SeparableConv1D(filters=n_feature_maps * 2, kernel_size=5, padding='same',data_format='channels_last')(conv_x)
    conv_y = keras.layers.BatchNormalization()(conv_y)
    conv_y = keras.layers.Activation('relu')(conv_y)

    conv_z = keras.layers.SeparableConv1D(filters=n_feature_maps * 2, kernel_size=3, padding='same',data_format='channels_last')(conv_y)
    conv_z = keras.layers.BatchNormalization()(conv_z)

          # expand channels for the sum
    shortcut_y = keras.layers.Conv1D(filters=n_feature_maps * 2, kernel_size=1, padding='same',data_format='channels_last')(output_block_1)
    shortcut_y = keras.layers.BatchNormalization()(shortcut_y)

    output_block_4 = keras.layers.add([shortcut_y, conv_z])
    output_block_4 = keras.layers.Activation('relu')(output_block_4)

          # BLOCK 5

    conv_x = keras.layers.Conv1D(filters=n_feature_maps * 2, kernel_size=8, padding='same', data_format='channels_last')(output_block_4)
    conv_x = keras.layers.BatchNormalization()(conv_x)
    conv_x = keras.layers.Activation('relu')(conv_x)

    conv_y = keras.layers.Conv1D(filters=n_feature_maps * 2, kernel_size=5, padding='same', data_format='channels_last')(conv_x)
    conv_y = keras.layers.BatchNormalization()(conv_y)
    conv_y = keras.layers.Activation('relu')(conv_y)

    conv_z = keras.layers.Conv1D(filters=n_feature_maps * 2, kernel_size=3, padding='same', data_format='channels_last')(conv_y)
    conv_z = keras.layers.BatchNormalization()(conv_z)

        # no need to expand channels because they are equal
    shortcut_y = keras.layers.SeparableConv1D(filters=n_feature_maps * 2, kernel_size=1, padding='same',data_format='channels_last')(output_block_2)
    shortcut_y = keras.layers.BatchNormalization()(shortcut_y)

    output_block_5 = keras.layers.add([shortcut_y, conv_z])
    output_block_5 = keras.layers.Activation('relu')(output_block_5)

        # FINAL

    gap_layer = keras.layers.GlobalAveragePooling1D()(output_block_5)

    output_layer = keras.layers.Dense(27, activation='softmax')(gap_layer)

    model = keras.models.Model(inputs=input_layer, outputs=output_layer)

    model.compile(loss=tf.keras.losses.BinaryCrossentropy(), optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001), metrics=[tf.keras.metrics.BinaryAccuracy(
    name='accuracy', dtype=None, threshold=0.5),tf.keras.metrics.Recall(name='Recall'),tf.keras.metrics.Precision(name='Precision'), 
                    tf.keras.metrics.AUC(
        num_thresholds=200,
        curve="ROC",
        summation_method="interpolation",
        name="AUC",
        dtype=None,
        thresholds=None,
        multi_label=True,
        label_weights=None,
    )])

    #@title Plot model for better visualization
    #plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)
    return model

In [None]:
model = sepres()

In [None]:
#batchsize = 30

#model.fit(x=shuffle_batch_generator(batch_size=batchsize, gen_x=generate_X_shuffle(ecg_filenames), gen_y=generate_y_shuffle(y)), epochs=20, steps_per_epoch=(len(order_array)/batchsize), validation_data=pc.generate_validation_data(ecg_filenames,y,folds[0][1]), validation_freq=1, class_weight=weight_dictionary, callbacks=[reduce_lr,early_stop])

In [None]:
model.load_weights("/kaggle/input/physionet-challenge-models/sep_resnet_model.h5")

In [None]:
y_pred = model.predict(x=pc.generate_validation_data(ecg_filenames,y,folds[0][1])[0])

In [None]:
init_thresholds = np.arange(0,1,0.05)

In [None]:
all_scores = pc.iterate_threshold(y_pred, ecg_filenames, y ,folds[0][1] )

In [None]:
new_best_thr = optimize.fmin(thr_chall_metrics, args=(pc.generate_validation_data_with_demo_data(ecg_filenames,y, gender, age,folds[0][1])[1],y_pred), x0=init_thresholds[all_scores.argmax()]*np.ones(27))

In [None]:
print(pc.compute_challenge_metric_for_opt(pc.generate_validation_data_with_demo_data(ecg_filenames,y, gender, age,folds[0][1])[1],(y_pred>new_best_thr)*1))

Conf.matrix

In [None]:
from scipy.io import loadmat
import os

In [None]:
def load_challenge_data(filename):
    x = loadmat(filename)
    data = np.asarray(x['val'], dtype=np.float64)
    new_file = filename.replace('.mat','.hea')
    input_header_file = os.path.join(new_file)
    with open(input_header_file,'r') as f:
        header_data=f.readlines()
    return data, header_data

In [None]:
def generate_validation_data(ecg_filenames, y,test_order_array):
    y_train_gridsearch=y[test_order_array]
    ecg_filenames_train_gridsearch=ecg_filenames[test_order_array]

    ecg_train_timeseries=[]
    for names in ecg_filenames_train_gridsearch:
        data, header_data = load_challenge_data(names)
        data = pad_sequences(data, maxlen=5000, truncating='post',padding="post")
        ecg_train_timeseries.append(data)
    X_train_gridsearch = np.asarray(ecg_train_timeseries)

    X_train_gridsearch = X_train_gridsearch.reshape(ecg_filenames_train_gridsearch.shape[0],5000,12)

    return X_train_gridsearch, y_train_gridsearch

In [None]:
def compute_modified_confusion_matrix(labels, outputs):
    # Compute a binary multi-class, multi-label confusion matrix, where the rows
    # are the labels and the columns are the outputs.
    num_recordings, num_classes = np.shape(labels)
    A = np.zeros((num_classes, num_classes))

    # Iterate over all of the recordings.
    for i in range(num_recordings):
        # Calculate the number of positive labels and/or outputs.
        normalization = float(max(np.sum(np.any((labels[i, :], outputs[i, :]), axis=0)), 1))
        # Iterate over all of the classes.
        for j in range(num_classes):
            # Assign full and/or partial credit for each positive class.
            if labels[i, j]:
                for k in range(num_classes):
                    if outputs[i, k]:
                        A[j, k] += 1.0/normalization

    return A

In [None]:
def plot_normalized_conf_matrix_dev(y_pred, ecg_filenames, y, val_fold, threshold, snomedclasses):
    df_cm = pd.DataFrame(compute_modified_confusion_matrix(generate_validation_data(ecg_filenames,y,val_fold)[1], (y_pred>threshold)*1), columns=snomedclasses, index = snomedclasses)
    df_cm = df_cm.fillna(0)
    df_cm.index.name = 'Actual'
    df_cm.columns.name = 'Predicted'
    df_norm_col=(df_cm-df_cm.mean())/df_cm.std()
    plt.figure(figsize = (36,14))
    sns.set(font_scale=1.4)
    sns.heatmap(df_norm_col, cmap="Blues", annot=True,annot_kws={"size": 16},fmt=".2f",cbar=False)# font size

In [None]:
plot_normalized_conf_matrix_dev(y_pred, ecg_filenames, y, folds[0][1], new_best_thr, snomed_classes)

In [None]:
pc.plot_normalized_conf_matrix(y_pred, ecg_filenames, y, folds[0][1], new_best_thr, snomed_classes, snomed_abbr)
plt.savefig("confusion_matrix_separable_resnet.png", transparent=True,dpi = 500, bbox_inches="tight" )

#### 10-fold crossvalidation

In [None]:
def scheduler(epoch, lr):
    if epoch < 6:
        lr = 0.001
        return lr
    else:
        return lr * 0.1


lr_schedule = tf.keras.callbacks.LearningRateScheduler(scheduler, verbose=1)

In [None]:
'''
score_array=[]
for i in range(len(folds)):
    order_array = folds[i][0]
    model = pc.FCN()
    batchsize = 30
    model.fit(x=shuffle_batch_generator(batch_size=batchsize, gen_x=generate_X_shuffle(ecg_filenames), gen_y=generate_y_shuffle(y)), epochs=10, steps_per_epoch=(len(order_array)/batchsize), validation_data=pc.generate_validation_data(ecg_filenames,y,folds[i][1]), validation_freq=1, class_weight=weight_dictionary, callbacks=[lr_schedule])
    y_pred = model.predict(x=pc.generate_validation_data(ecg_filenames,y,folds[0][1])[0])
    init_thresholds = np.arange(0,1,0.05)
    all_scores = pc.iterate_threshold(y_pred, ecg_filenames, y ,folds[0][1])
    new_best_thr = optimize.fmin(thr_chall_metrics, args=(pc.generate_validation_data(ecg_filenames,y,folds[0][1])[1],y_pred), x0=init_thresholds[all_scores.argmax()]*np.ones(27))
    print(pc.compute_challenge_metric_for_opt(pc.generate_validation_data(ecg_filenames,y,folds[0][1])[1],(y_pred>new_best_thr)*1))
    score_array.append(pc.compute_challenge_metric_for_opt(pc.generate_validation_data(ecg_filenames,y,folds[0][1])[1],(y_pred>new_best_thr)*1))
score_array = np.asarray(score_array)
'''

In [None]:
#np.savetxt("10fold_score_FCN.txt",score_array, fmt="%f")

In [None]:
'''
score_array=[]
for i in range(len(folds)):
    order_array = folds[i][0]
    model = pc.encoder_model()
    batchsize = 30
    model.fit(x=shuffle_batch_generator(batch_size=batchsize, gen_x=generate_X_shuffle(ecg_filenames), gen_y=generate_y_shuffle(y)), epochs=10, steps_per_epoch=(len(order_array)/batchsize), validation_data=pc.generate_validation_data(ecg_filenames,y,folds[i][1]), validation_freq=1, class_weight=weight_dictionary, callbacks=[lr_schedule])
    y_pred = model.predict(x=pc.generate_validation_data(ecg_filenames,y,folds[0][1])[0])
    init_thresholds = np.arange(0,1,0.05)
    all_scores = pc.iterate_threshold(y_pred, ecg_filenames, y ,folds[0][1])
    new_best_thr = optimize.fmin(thr_chall_metrics, args=(pc.generate_validation_data(ecg_filenames,y,folds[0][1])[1],y_pred), x0=init_thresholds[all_scores.argmax()]*np.ones(27))
    print(pc.compute_challenge_metric_for_opt(pc.generate_validation_data(ecg_filenames,y,folds[0][1])[1],(y_pred>new_best_thr)*1))
    score_array.append(pc.compute_challenge_metric_for_opt(pc.generate_validation_data(ecg_filenames,y,folds[0][1])[1],(y_pred>new_best_thr)*1))
score_array = np.asarray(score_array)
'''

In [None]:
#np.savetxt("10fold_score_encoder.txt",score_array, fmt="%f")

In [None]:
fcn10fold = pd.read_csv('/kaggle/input/10fold-scores/10fold_score_FCN.txt', header = None)
fcn10fold.set_axis(['FCN'], axis=1, inplace=True)

In [None]:
encoder10fold = pd.read_csv('/kaggle/input/10fold-scores/10fold_score_encoder.txt', header = None)
encoder10fold.set_axis(['Encoder'], axis=1, inplace=True)

In [None]:
all_10folds = pd.concat([fcn10fold, encoder10fold], axis=1)


In [None]:
all_10folds

In [None]:
plt.figure(figsize=(20,8))
plt.style.use('ggplot')
boxplot = all_10folds.boxplot(fontsize=20)

In [None]:
from zipfile import ZipFile
import os

In [None]:
zipObj = ZipFile('ConfusionMatrixes.zip', 'w')

In [None]:
for filename in os.listdir("/kaggle/working"):
    if filename.endswith(".png"):
        zipObj.write(filename)
zipObj.close()

# Citation
## Please cite [this article](https://ieeexplore.ieee.org/document/9344421) if you reuse some of this content:
### B. -J. Singstad and C. Tronstad, "Convolutional Neural Network and Rule-Based Algorithms for Classifying 12-lead ECGs," 2020 Computing in Cardiology, 2020, pp. 1-4, doi: 10.22489/CinC.2020.227.

### or

### Bibtex:
`
@INPROCEEDINGS{9344421,
  author={Singstad, Bjørn-Jostein and Tronstad, Christian},
  booktitle={2020 Computing in Cardiology}, 
  title={Convolutional Neural Network and Rule-Based Algorithms for Classifying 12-lead ECGs}, 
  year={2020},
  volume={},
  number={},
  pages={1-4},
  doi={10.22489/CinC.2020.227}}
 `