# Load your data

Before finetuning a pretrained model of the experiments we provide in our repository (or precomputed and provided [here](https://datacloud.hhi.fraunhofer.de/nextcloud/s/NCjYws3mamLrkKq)), first load your custom 100 Hz sampled 12-lead ECG signal data `X` of shape `[N,L,12]` in Millivolts (mV) and multi-hot encoded labels `y` of shape `[N,C]` as numpy arrays, where `C` is the number of classes and `N` the number of total samples in this dataset. Although PTB-XL comes with fixed `L=1000` (i,e. 10 seconds), it is not required to be fixed, **BUT** the shortest sample must be longer than `input_size` of the specific model (e.g. 2.5 seconds for our fastai-models).

For proper tinetuning split your data into four numpy arrays: `X_train`,`y_train`,`X_val` and `y_val`

### Example: finetune model trained on all (71) on superdiagnostic (5)
Below we provide an example for loading [PTB-XL](https://physionet.org/content/ptb-xl/1.0.1/) aggregated at the `superdiagnostic` level, where we use the provided folds for train-validation-split:

In [None]:
!pip install wget wfdb

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting wget
  Downloading wget-3.2.zip (10 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting wfdb
  Downloading wfdb-4.1.0-py3-none-any.whl (159 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m159.9/159.9 KB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: wget
  Building wheel for wget (setup.py) ... [?25l[?25hdone
  Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9674 sha256=9ee2ac4220cf0050bbe1ff61078358ae5d397aff8bc8735d071f55a5e1653f46
  Stored in directory: /root/.cache/pip/wheels/bd/a8/c3/3cf2c14a1837a4e04bd98631724e81f33f462d86a1d895fae0
Successfully built wget
Installing collected packages: wget, wfdb
Successfully installed wfdb-4.1.0 wget-3.2


In [None]:
import wget
import numpy as np
import os
import zipfile
import tensorflow as tf
from sklearn.utils.class_weight import compute_class_weight
import pandas as pd
import seaborn as sns
from sklearn.metrics import confusion_matrix
import pickle

In [None]:
!wget https://physionet.org/static/published-projects/ptb-xl/ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.1.zip

os.mkdir("./data/")


with zipfile.ZipFile("./ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.1.zip", 'r') as zip_ref:
    zip_ref.extractall("./data/")

--2023-01-04 13:54:56--  https://physionet.org/static/published-projects/ptb-xl/ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.1.zip
Resolving physionet.org (physionet.org)... 18.18.42.54
Connecting to physionet.org (physionet.org)|18.18.42.54|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1842722380 (1.7G) [application/zip]
Saving to: ‘ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.1.zip’


2023-01-04 13:57:26 (11.8 MB/s) - ‘ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.1.zip’ saved [1842722380/1842722380]



In [None]:
with zipfile.ZipFile("./ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.1.zip", 'r') as zip_ref:
    zip_ref.extractall("./data/")

In [None]:
!pip install GitPython
from git import Repo

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting GitPython
  Downloading GitPython-3.1.30-py3-none-any.whl (184 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m184.0/184.0 KB[0m [31m19.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting gitdb<5,>=4.0.1
  Downloading gitdb-4.0.10-py3-none-any.whl (62 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.7/62.7 KB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting smmap<6,>=3.0.1
  Downloading smmap-5.0.0-py3-none-any.whl (24 kB)
Installing collected packages: smmap, gitdb, GitPython
Successfully installed GitPython-3.1.30 gitdb-4.0.10 smmap-5.0.0


In [None]:
HTTPS_REMOTE_URL = 'https://github.com/Bsingstad/ecg_ptbxl_benchmarking.git'
DEST_NAME = 'github_repo'

In [None]:
Repo.clone_from(HTTPS_REMOTE_URL, DEST_NAME)

<git.repo.base.Repo '/content/github_repo/.git'>

In [14]:
from github_repo.code import *
%matplotlib inline
%load_ext autoreload

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [None]:
/content/data/ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.1

In [10]:
from github_repo.code.utils import utils

sampling_frequency=100
datafolder='./data/ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.1/'
task='superdiagnostic'
outputfolder='./github_repo/output/'


# Load PTB-XL data
data, raw_labels = utils.load_dataset(datafolder, sampling_frequency)
# Preprocess label data
labels = utils.compute_label_aggregations(raw_labels, datafolder, task)
# Select relevant data and convert to one-hot
#data, labels, Y, _ = utils.select_data(data, labels, task, min_samples=0, outputfolder=outputfolder)

data, labels, Y, _ = utils.select_data(data, labels, task, min_samples=0, outputfolder=outputfolder)

# 1-9 for training 
X_train = data[labels.strat_fold < 10]
y_train = Y[labels.strat_fold < 10]
# 10 for validation
X_val = data[labels.strat_fold == 10]
y_val = Y[labels.strat_fold == 10]

num_classes = 5         # <=== number of classes in the finetuning dataset
input_shape = [1000,12] # <=== shape of samples, [None, 12] in case of different lengths

X_train.shape, y_train.shape, X_val.shape, y_val.shape

((19267, 1000, 12), (19267, 5), (2163, 1000, 12), (2163, 5))

# Train or download models
There are two possibilities:
   1. Run the experiments as described in README. Afterwards you find trained in models in `output/expX/models/`
   2. Download the precomputed `output`-folder with all experiments and models from [here]((https://datacloud.hhi.fraunhofer.de/nextcloud/s/NCjYws3mamLrkKq))

# Load pretrained model

For loading a pretrained model:
   1. specify `modelname` which can be seen in `code/configs/` (e.g. `modelname='fastai_xresnet1d101'`)
   2. provide `experiment` to build the path `pretrainedfolder` (here: `exp0` refers to the experiment with `all` 71 SCP-statements)
   
This returns the pretrained model where the classification is replaced by a random initialized head with the same number of outputs as the number of classes.

In [16]:
from github_repo.code.models.your_model import inception_time_model

experiment = 'exp0'
modelname = 'fastai_xresnet1d101'
pretrainedfolder = '../output/'+experiment+'/models/'+modelname+'/'
mpath='../output/' # <=== path where the finetuned model will be stored
n_classes_pretrained = 71 # <=== because we load the model from exp0, this should be fixed because this depends the experiment

model = inception_time_model("tf_inception", num_classes, sampling_frequency, mpath, input_shape)

#model = fastai_model(
#    modelname, 
#    num_classes, 
#    sampling_frequency, 
#    mpath, 
#    input_shape=input_shape, 
#    pretrainedfolder=pretrainedfolder,
#    n_classes_pretrained=n_classes_pretrained, 
#    pretrained=True,
#    epochs_finetuning=2,
#)

Inception model built.


# Preprocess data with pretrained Standardizer

Since we standardize inputs to zero mean and unit variance, your custom data needs to be standardized with the respective mean and variance. This is also provided in the respective experiment folder `output/expX/data/standard_scaler.pkl`

In [18]:
import pickle
from github_repo.code.utils import utils

standard_scaler = pickle.load(open('./github_repo/output/'+experiment+'/data/standard_scaler.pkl', "rb"))

X_train = utils.apply_standardizer(X_train, standard_scaler)
X_val = utils.apply_standardizer(X_val, standard_scaler)

# Finetune model

Calling `model.fit` of a model with `pretrained=True` will perform finetuning as proposed in our work i.e. **gradual unfreezing and discriminative learning rates**. 

In [None]:
model.fit(X_train, y_train, X_val, y_val)


Epoch 1: LearningRateScheduler setting learning rate to 0.0010000000474974513.
Epoch 1/30

Epoch 2: LearningRateScheduler setting learning rate to 0.0010000000474974513.
Epoch 2/30

# Evaluate model on validation data

In [None]:
y_val_pred = model.predict(X_val)
utils.evaluate_experiment(y_val, y_val_pred)

aggregating predictions...


Unnamed: 0,macro_auc,Fmax
0,0.931458,0.827961
