## Myocardial Infarction Classification 

### Introduction 
The purpose of this notebook is to design a model that classifies patients in two categories: Mycardial Infarction (MI) or Normal ECG (NORM). Considering that the dataset provides with the diagnosis likelihood, we will evaluate the impact of this variable in the model's performance. For that, we will train two models: one trained with low and high diagnosis likelihood and one trained with only high diagnosis likelihood, whereby high likelihood is defined as greater than 50%. Then, we will evaluate the model for both low and high likelihood data.

### Initialization 
We start by loading the libraries needed in the notebook.

In [49]:
import numpy as np 
import pandas as pd 
import wfdb 
import ast
import json
import matplotlib.pyplot as plt
from scipy.signal import butter, filtfilt
from sklearn.preprocessing import normalize
from tqdm.notebook import tqdm
from pytorch_lightning import loggers as pl_loggers
import torch 
from pytorch_lightning.callbacks import TQDMProgressBar
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader 
import pytorch_lightning as pl
import torchmetrics
import torchvision.transforms as transforms


We load the clinical data and options dictionnary.

In [12]:
# Select sample
with open('opts.json', 'r') as inFile:
    opts = json.load(inFile)
    
# Load database
data_df = pd.read_csv(opts['file']['ptbxl_database_processed'], index_col=0)

We'll filter the data parsing only those subjects that have been validated by a human, to ensure clinical diagnosis, and will downsample to the category with the minimum number of samples.

In [13]:
# Select subjects that have been validated
data_df = data_df[data_df.validated_by_human == True]

# Select even sample of normal and MI subjects
min_n = np.minimum(data_df[data_df.MI == 1].shape[0], data_df[data_df.NORM==1].shape[0])
mi_data = data_df[data_df.MI == 1].sample(min_n)
norm_data = data_df[data_df.NORM == 1].sample(min_n)
data_model = pd.concat([mi_data, norm_data], axis=0)

# Write 
data_model.to_csv('data_model.csv')

We've designed a Dataset class that we will use to create our Dataset and Dataloaders for training and testing.

In [37]:
# Create database class
class physioDataset(Dataset):
    
    def __init__(self, opts:dict, likelihood:bool=False, exclude_low_lh:bool=False, transform:transforms=None, filt:bool=False, normalize:bool=False):
        
        if exclude_low_lh and likelihood:
            raise RuntimeError("exclude_low_lh and likelihood arguments cannot be both True")
        
        self.opts = opts
        self.likelihood = likelihood
        self.transform = transform
        self.filt = filt
        self.normalize = normalize
        self.exclude_low_lh = exclude_low_lh
        
        self.data_df = pd.read_csv(opts['file']['data_model'], index_col=0)
        self.data_df_llh = self.data_df[(self.data_df.MI_lh < 50.0) | (self.data_df.NORM_lh < 50.0)]
        
        if exclude_low_lh:
            self.data_df = self.data_df[~self.data_df.index.isin(self.data_df_llh.index)]
        if likelihood:
            self.data_df = self.data_df[(self.data_df.MI_lh < 50.0) | (self.data_df.NORM_lh < 50.0)]

            
    def __len__(self):
        return self.data_df.shape[0]
    
    def __getitem__(self, idx):
        subj = self.data_df.iloc[idx]
        signal, _ = wfdb.rdsamp(self.opts['path']['physionet'] + subj.filename_lr)
        
        if self.filt:
            b, a = butter(3, [0.5, 49], fs=100, btype='band', output='ba')
            signal = pd.DataFrame(signal).apply(lambda x: filtfilt(b,a,x)).values

        if self.normalize:
            signal = normalize(signal, axis=1)
        
        if self.transform:
             signal = self.transform(signal.astype(np.float32))
             
        return signal, subj.MI

We design a simplified version of the [Xception](https://arxiv.org/abs/1610.02357) architecture, using simple Convolutional Layers and a smaller backbone. We've defined the loss as the Binary-Cross Entropy with Logits (which does not require a sigmoid activation in the model's classifier) and the Adam optimizer with a learning rate of 1x10<sup>-5</sup>.

In [15]:
class Ellie(pl.LightningModule):
    def __init__(self, n_classes):
        super().__init__()
                
        self.entry_block = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, stride=2),
            nn.BatchNorm2d(num_features=32),
            nn.ReLU()
        )
        
        self.block1 = nn.Sequential(
            nn.ReLU(),
            nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(num_features=32), 
            nn.ReLU(),
            nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(num_features=32),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        )
        self.conv1_res = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=1, stride=2)
        
        self.block2 = nn.Sequential(
            nn.ReLU(),
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(num_features=64), 
            nn.ReLU(),
            nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(num_features=64),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        )
        self.conv2_res = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=1, stride=2)

        self.block3 = nn.Sequential(
            nn.ReLU(),
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(num_features=128), 
            nn.ReLU(),
            nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(num_features=128),
            nn.AvgPool2d(kernel_size=3, stride=2, padding=1),
            nn.Dropout2d(p=0.5)
        )
    

        self.classifier = nn.Linear(in_features=128*125*2, out_features=n_classes)
        
        self.loss = nn.BCEWithLogitsLoss()
        self.train_accuracy = torchmetrics.Accuracy(task='binary')
        self.val_accuracy = torchmetrics.Accuracy(task='binary')
        self.conf_matrix = torchmetrics.ConfusionMatrix(task='binary')
            

    def forward(self, x):
        
        x_res = self.conv1_res(x)
        x = self.block1(x)
        x = torch.add(x, x_res)
        
        x_res = self.conv2_res(x)
        x = self.block2(x)
        x = torch.add(x, x_res)
        
        x = self.block3(x)
        x =  torch.flatten(x, start_dim=1)
        x_out = self.classifier(x)
                
        return x_out
    
    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=1e-5)
        return optimizer
    
    def compute_step(self,batch):
        imgs, labels = batch
        label_logits = self.forward(imgs).flatten()
        label_logits = torch.nan_to_num(label_logits, nan=0)
        return self.loss(label_logits,labels.float()), labels, label_logits
    
    def training_step(self, train_batch, batch_idx):
        loss, labels, label_predictions = self.compute_step(train_batch)
        self.train_accuracy(label_predictions, labels)
        self.log_dict({"train/loss": loss, 'train/acc' : self.train_accuracy}, 
                  on_step=False, 
                  on_epoch=True, 
                  prog_bar=True)
        return loss
    
    def validation_step(self, val_batch, batch_idx):
        loss, labels, label_predictions = self.compute_step(val_batch)
        self.val_accuracy(label_predictions, labels)
        self.log_dict({"val/loss": loss, 'val/acc' : self.val_accuracy}, 
                  on_step=False, 
                  on_epoch=True, 
                  prog_bar=True)
        return loss
    

    def test_step(self, val_batch, batch_idx):
        loss, labels, label_predictions = self.compute_step(val_batch)
        self.val_accuracy(label_predictions, labels)
        self.log_dict({"val/loss": loss, 'val/acc' : self.val_accuracy}, 
                  on_step=False, 
                  on_epoch=True, 
                  prog_bar=True)
        return loss

### Training 

We proceed to train the models. First, we create the datasets and dataloaders.

In [51]:
# Create datasets and dataloadets
torch.manual_seed(1984)

# Low and high LH
mi_data = physioDataset(opts=opts, exclude_low_lh=False, likelihood=False, transform=transforms.ToTensor(), filt=True, normalize=True)
mi_train, mi_test = torch.utils.data.random_split(dataset=mi_data, lengths=[int(np.floor(len(mi_data)*0.8)), int(np.ceil(len(mi_data)*0.2))])
train_dataloader, test_dataloader = DataLoader(mi_train, batch_size=32, shuffle=True, num_workers=10), DataLoader(mi_test, batch_size=32, shuffle=False, num_workers=10)

# High LH
mi_hq_data = physioDataset(opts=opts, exclude_low_lh=True, likelihood=False, transform=transforms.ToTensor(), filt=True, normalize=True)
mi_hq_train, mi_hq_test = torch.utils.data.random_split(dataset=mi_hq_data, lengths=[int(np.floor(len(mi_hq_data)*0.8)), int(np.ceil(len(mi_hq_data)*0.2))])
train_hq_dataloader, test_hq_dataloader = DataLoader(mi_hq_train, batch_size=32, shuffle=True, num_workers=10), DataLoader(mi_hq_test, batch_size=32, shuffle=False, num_workers=10)

# Low LH (only for test)
mi_lq_data=  physioDataset(opts=opts, exclude_low_lh=False, likelihood=True, transform=transforms.ToTensor(), filt=True, normalize=True)
dataloader_lq = DataLoader(mi_lq_data, batch_size=32, shuffle=False, num_workers=10)



We now create two models: one that will be trained on low and high likelihood, and one trained only with high likelihood.

In [17]:
model = Ellie(n_classes=1)
model_hq = Ellie(n_classes=1)

##### All data training
First, we train the model with all the data.

In [19]:
# Train model in all data
tb_logger = pl_loggers.TensorBoardLogger("/home/vicente/Documents/Idoven-Data-Scientist/logs", log_graph=True)
trainer = pl.Trainer(accelerator='gpu', devices=1, max_epochs=10, auto_lr_find=False, logger=tb_logger, callbacks=[TQDMProgressBar(refresh_rate=10)])
trainer.fit(model, train_dataloader, test_dataloader)

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

   | Name           | Type                  | Params
----------------------------------------------------------
0  | entry_block    | Sequential            | 384   
1  | block1         | Sequential            | 9.7 K 
2  | conv1_res      | Conv2d                | 64    
3  | block2         | Sequential            | 55.7 K
4  | conv2_res      | Conv2d                | 2.1 K 
5  | block3         | Sequential            | 221 K 
6  | classifier     | Linear                | 32.0 K
7  | loss           | BCEWithLogitsLoss     | 0     
8  | train_accuracy | BinaryAccuracy        | 0     
9  | val_accuracy   | BinaryAccuracy        | 0     
10 | conf_matrix    | BinaryConfusionMatrix | 0     
----------------------------------------------------------
321 K     Trainable params
0         Non-trainable 

Epoch 9: 100%|██████████| 219/219 [00:20<00:00, 10.78it/s, loss=0.254, v_num=2, val/loss=0.253, val/acc=0.890, train/loss=0.242, train/acc=0.905]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 219/219 [00:20<00:00, 10.76it/s, loss=0.254, v_num=2, val/loss=0.253, val/acc=0.890, train/loss=0.242, train/acc=0.905]


We empty the GPU cache.

In [20]:
torch.cuda.empty_cache()

Let's have a look at the performance. Training and validation curves seem normal, with a final validation accuracy of ~89%. 

In [22]:
%load_ext tensorboard 
%tensorboard --logdir 'logs/lightning_logs/version_2/'

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


Reusing TensorBoard on port 6013 (pid 55958), started 0:00:03 ago. (Use '!kill 55958' to kill it.)

##### High likelihood data training
We now train the model with only high-likelihood data.

In [23]:
# Train model
tb_logger_hq = pl_loggers.TensorBoardLogger("/home/vicente/Documents/Idoven-Data-Scientist/logs_hq")
trainer_hq = pl.Trainer(gpus=1,max_epochs=10, auto_lr_find=False, logger=tb_logger_hq)
trainer_hq.fit(model_hq, train_hq_dataloader, test_hq_dataloader)

  rank_zero_deprecation(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

   | Name           | Type                  | Params
----------------------------------------------------------
0  | entry_block    | Sequential            | 384   
1  | block1         | Sequential            | 9.7 K 
2  | conv1_res      | Conv2d                | 64    
3  | block2         | Sequential            | 55.7 K
4  | conv2_res      | Conv2d                | 2.1 K 
5  | block3         | Sequential            | 221 K 
6  | classifier     | Linear                | 32.0 K
7  | loss           | BCEWithLogitsLoss     | 0     
8  | train_accuracy | BinaryAccuracy        | 0     
9  | val_accuracy   | BinaryAccuracy        | 0     
10 | conf_matrix    | BinaryConfusionMatrix | 0     
----------------------------------------------------------
321 K     Trainable params

Epoch 9: 100%|██████████| 187/187 [00:20<00:00,  9.33it/s, loss=0.158, v_num=1, val/loss=0.194, val/acc=0.922, train/loss=0.172, train/acc=0.932]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 187/187 [00:20<00:00,  9.30it/s, loss=0.158, v_num=1, val/loss=0.194, val/acc=0.922, train/loss=0.172, train/acc=0.932]


We clear the GPU cache again.

In [24]:
torch.cuda.empty_cache()

We evaluate the model's training routine. Learning curves seem fine, with a final validation accuracy of ~92%, marginally higher than the model trained with low likelihood as well.

In [26]:
%load_ext tensorboard 
%tensorboard --logdir 'logs_hq/lightning_logs/version_1/'

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


Reusing TensorBoard on port 6014 (pid 59431), started 0:00:02 ago. (Use '!kill 59431' to kill it.)

### Evaluation of low and high likelihood.

We want to check how each model did, and compare the differences in performance between the high and low likelihood. We begin with the first model. Let's look first at the test data used to evaluate the model during training.

In [27]:
# Eval on low HQ
trainer.test(model=model, dataloaders=test_dataloader)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
  rank_zero_warn(


Testing DataLoader 0: 100%|██████████| 44/44 [00:02<00:00, 17.40it/s]
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
         val/acc            0.8901650905609131
        val/loss            0.2525832951068878
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────


[{'val/loss': 0.2525832951068878, 'val/acc': 0.8901650905609131}]

Now, let's see how it does with only likelihood greater than 50%.

In [50]:
trainer.test(model=model, dataloaders=test_hq_dataloader)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
  rank_zero_warn(


Testing: 1000it [31:01,  1.86s/it] 
Testing: 30it [00:03,  9.42it/s]
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
         val/acc            0.9437447786331177
        val/loss            0.15542574226856232
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────


[{'val/loss': 0.15542574226856232, 'val/acc': 0.9437447786331177}]

It performs marginally better, with and increase of ~5% and crossing the 90% accuracy boundary. We proceed to check the low likelihood data.

In [52]:
trainer.test(model=model, dataloaders=dataloader_lq)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
  rank_zero_warn(


Testing DataLoader 0: 100%|██████████| 32/32 [00:01<00:00, 21.82it/s]
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
         val/acc            0.7318768501281738
        val/loss             0.537009060382843
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────


[{'val/loss': 0.537009060382843, 'val/acc': 0.7318768501281738}]

The model's accuracy is dragged down by the low likelihood data, despite being trained with part of it. 

We now look at how the model trained with high-likelihood data did. We begin to evaluate the performance of the validation set used during training.

In [53]:
trainer_hq.test(model=model_hq, dataloaders=test_hq_dataloader)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Testing DataLoader 0: 100%|██████████| 38/38 [00:01<00:00, 19.27it/s]
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
         val/acc            0.9219143390655518
        val/loss            0.1935616284608841
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────


[{'val/loss': 0.1935616284608841, 'val/acc': 0.9219143390655518}]

And now, let's evaluate the model on low-likelihood data.

In [54]:
trainer_hq.test(model=model_hq, dataloaders=dataloader_lq)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Testing DataLoader 0: 100%|██████████| 32/32 [00:01<00:00, 18.95it/s]
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
         val/acc            0.7507447600364685
        val/loss            0.5656914710998535
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────


[{'val/loss': 0.5656914710998535, 'val/acc': 0.7507447600364685}]

The model's performance is severely affected by the low-likelihood data, with a decrease in accuracy of ~20%.

## Conclusions

The purpose of the noteboook was to evaluate the impact of likelihood in a classification model's performance. For this purpose, we've trained two models with low and high diagnostic likelihood, and high diagnostic likelihood exclusively. After assessing the classification accuracy of both models, we can conclude that low-likelihood decreases accuracy in about 20% regardless of whether the model was trained with low diagnostic likelihood, or not. 

Diagnotic likelihood is rarely included in medical dataset despite being a reality for practioners. Doctors cannot always ensure a diagnosis. This diagnosis variability influences model performance, and adds noise to the data. Therefore, it's a relevant variable worth requesting.