# Basic Info

## Team 1
- Ted Hsu (thhsu4@illinois.edu)
- Myles Iribarne (mylesai2@illinois.edu)
- Daniel Xu (dhxu2@illinois.edu)

## Paper

Our paper is _Transfer learning for ECG classification_ by Weimann and Conrad [1]. The project code is available on Github [2].



# Introduction
## Background of the problem

- **What type of problem**:

  The problem is to classify Atrial Fibrillation (AF) on electrocardiogram (ECG) recordings.
- **What is the importance/meaning of solving the problem**:

  - A solution to the problem is a tool that will assist physicians in analyzing large amounts of patient ECG data in an automated and time efficient manner.
  - Early detection of AF events may lead to better patient outcomes.
  
- **What is the difficulty of the problem**:

  - Devices for recording patient ECG data are able to output a _huge_ amount of raw data. This is challenging and expensive to annotate for effective Deep Learning training.
  - Large class imbalance due to cardiovascular events of interests being rare.
  - Low ECG signal quality due to sampling frequency, single ECG lead probe.

- **The state of the art methods and effectiveness**:

  - Transfer learning using 1-D residual networks [3]
  - Representation learning using encoder-decoder architectures
    - Stacked Denoising AEs [4]
    - Seq2Seq model [5]

## Paper Explanation
- **What did the paper propose**:

  - Use Transfer learning to build better ECG classifiers.
  - Pre-train 1-D CNNs on the largest publicly available ECG dataset (_Icentia11K_) on several pre-training tasks:
    - Beat Classification
    - Rhythm Classifcation
    - Heart Rate Classification
    - Future Prediction
  - Finetune the pre-trained 1-D CNNs on a _different_ task and a _different_ dataset (_PhysioNet/CinC Challenge 2017_): classify AF events.

- **What is/are the innovations of the method**:

  - Demonstration of successful large-scale pre-training of 1-D CNNs on the largest publicly available ECG dataset to date.
  - Demonstration of contrastive pre-training (unsupervised representation learning) improving 1-D CNN performance on target task.
  - Novel usage of heart rate classification task for pre-training. Note that in this task, the labels can be automatically generated without manual intervention.

- **How well the proposed method work (in its own metrics)**:

  - The paper provides AF classifier performnace comparison among five different pre-training tasks configurations (Random initalization, Beat classification, Rhythm classification, Heart Rate classification, and Future Predicition). Average macro F1 score of the AF classifier on the Physionet test set is the performance metric.
  - The macro F1 score of random initalization pre-training task is 0.731. F1 scores reported by all proposed four pre-training tasks configurations range from 0.758 to 0.779.

- **What is the contribution to the research regime (referring the Background above, how important the paper is to the problem)**:

  - Pre-training the 1-D CNN model improves the performance on the target task (i.e. AF classification), effectively reducing the number of labeled data required to achieve the same performance as 1-D CNNs that are not pre-trained.
  - Unsupervised pre-training (i.e. future prediction) on ECG data is a viable method for improving the performance on the target task and will become more relevant, since labeling ECG data is expensive.

# Scope of Reproducibility

## Hypothesis 1
Pre-training 1-D CNN models with an extremely large dataset of relatively inexpensively labeled data can improve performance of classification based on a smaller set of labeled data with a different classification objective (i.e. AF).

## Hypothesis 2
The paper does not explore how significant the effects of the pre-training data size are on the final results. We assume size of the pre-training dataset could affect the performance of the target task (i.e. AF classification).


## Verification
We will verify the hypotheses by attempting to reproduce results for a specific model and the following hyperparameter combination with 10% and 20% of the pre-training data used in the paper:

- Model: 1-D ResNet-18v2
- Pre-training Objective: Beat Classification
- Frame Size: 4096 (samples)
- Sample Rate: 250 Hz
- Fine-tuning objective: Atrial Fibrillation

The results will be compared with the performance of a randomly initialized ResNetv2.

# Ablation (Hypothesis 3)
 The original paper is entirely based on 1-D CNNs and the raw ECG signal. To extend the paper's results, we aim to pre-process the raw signals using Fourier transforms to represent the data as a spectogram -- a frequency versus time representation of ECG signals. Using this representation of the input, we will train a 2-D CNN model (i.e. 2-D ResNetv18) and compare the performance of pre-trained and randomly initialized models. Additionally, we will compare the 2-D model performance to the 1-D models originally used by the authors.

 This extension is motivated by a study on ECG Arrhythmia classification that demonstrates the effectiveness of CNNs trained on spectrograms.[6] By converting ECG data to spectrogram features and then using spectrograms to pre-train a 2-D ResNet, we intend to illustrate the adaptability of the transfer learning framework in the original paper across diverse model architectures.



# Methodology

## Environment
All of the project's codes, data, and files are in a shared Google drive. Users of this notebook are required to mount the shared drive in a Google Colab notebook.

1. Create a shortcut to the shared Google drive from your own Google Drive.
2. Modify the `PROJECT_ROOT` variable below accordingly.

The link to the project's shared Google drive: https://drive.google.com/drive/folders/1vlUILM7cToH5CoX1x0kWRpe55MbBogS-?usp=drive_link

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# modify based on shortcut to shared Google drive
PROJECT_ROOT = '/content/drive/MyDrive/UIUC MCS/CS598 Deep Learning for Healthcare/Project'
DATA_DIR = PROJECT_ROOT + '/data'

Install required packages. Note that the `%%capture` cell magic is used to suppress the output of `pip install` for brevity.

In [None]:
%%capture
%cd $PROJECT_ROOT
! pip install -r requirements.txt

##  Data

### Pre-training Dataset
 The training data is the “Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset,” which is freely available online.
 * Source of the data
  * https://physionet.org/content/icentia11k-continuous-ecg/1.0/ (raw)
  * https://academictorrents.com/details/af04abfe9a3c96b30e5dd029eb185e19a7055272 (compressed)
 * Statistics
  * 11,000 patients
  * Each patient has upto two weeks of ECG recordings with 250Hz sampling rate.
  * Each ECG recording is acompanied with beat and rhythm labels marked by the ECG signal collection device and specialists, respectively.
  * Both beat and rhythm labels are assigned to positions in the signal at irregular intervals.
  * The original paper uses 95% of the patients for pre-training and the remaining 5% for validation.
 * Data downlaoding:
   * See [this notebook ](https://github.com/myles-i/DLH_TransferLearning/blob/master/jupyter_notebooks/Download_Icentia11k_Data.ipynb) our group wrote which uses the python libtorrent library to download the compressed version of the data from academictorrents.com


### Fine-tuning Dataset
The fine-tuning dataset is the “AF Classification from a Short Single Lead ECG Recording: The PhysioNet/Computing in Cardiology Challenge 2017” and freely avaiable online for download.
 * Source of the data
  * https://physionet.org/content/challenge-2017/1.0.0/
 * Statistics
  * 8528 short ECG recordings
  * Each ECG recording duration is 9 to 60 seconds with 300Hz sampling rate
  * Each ECG recording is labeled with one of the following classes: AF, Normal, Other or Noise (too noisy to classify).
 * Data process (TODO: add more detail)



##   Model used by author
In this project, CNN model of the choice is ResNet-18. 1D ResNet-18 implemented in the paper's github [2] is used to reproduce the paper's result, and 2D ResNet-18 for abalation task.

### 1D ResNet-18
  * Model architecture
    * 18 layers
    * Input layer consists of convolution layer with 64 filters, kernel size=3 and stride=2. The output of the convolution layer passes through batch norm, ReLu and maxpooling layer sequentially.
    * Output layer is a clssifier consisting of densely-connected layer followed by softmax function.
    * The middle 16 layers consists of 8 residual blocks. A residual block consists of the following two components and outputs the sum of the two compoenets' outputs.
      1. two convolution layers, each followed by batch norm and ReLu
      2. a shortcut that passes the input through a convolution layer followed by batch norm.
    * Configurations of the residual blocks
      * 1st and 2nd: 64 filters, kernel size=7, strides=2 and 1, respectively
      * 3rd and 4th: 128 filters, kernel size=5, strides=2 and 1, respectively
      * 5th and 6th: 256 filters, kernel size=5, strides=2 and 1, respectively
      * 7th and 8th: 512 filters, kernel size=3, strides=2 and 1, respectively
    * Detail: https://github.com/myles-i/DLH_TransferLearning/blob/master/transplant/modules/resnet1d.py

  * Training objectives
    * loss function: `tf.keras.optimizers.Adam(beta_1=0.9, beta_2=0.98, epsilon=1e-9)`
    * optimizer: `tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)`
    * metric: `tf.keras.metrics.SparseCategoricalAccuracy(name='acc')`


## Training

### Computational Requirements

We use Google Colab Pro, V100 GPUs.

### Implementation code

For both pre-training and fine tuning, the paper provides API to run the entire process with parameters of choice. Following is the high level description of how the API works.

#### Pre-training
  1. Create train/validate data generator based on patient id and the number of samples per patient, both specified when calling the API.
  2. A model is generated based on the model architectrue and pre-training task specified by the API user.
  3. Weights of the model are initialized. They can also be loaded from a weights file. For all pre-training in the project, we don't load weights.
  4. Check point function is created based on training metric. For pre-training, we use `loss` as training metric.
  5. The model fits the train data. At the end of each training epoch, the check point function is called for evaluation and save the model weights.

#### Fine tuning
  1. Train and test data set are already separated from PhysioNet 2017 dataset, with a 80%/20% split, and are passed to the API. The validation dataset will be further sperated from the train dataset based on user input.
  2. A CNN model is generated based on the model architectrue specified by the API user. A binary classifier is attached to the output of the CNN model as output layer.
  3. Weights of the model are initialized. They can also be loaded from a weights file.
  4. Check point function is created based on training metric. For pre-training, we use `f1` as training metric.
  5. The model fits the train data. At the end of each training epoch, the check point function is called for evaluation and save the model weights.



## Evaluation

### Metrics Descriptions

F1 metric

### Implementation code

####Pre-training
The paper uses 95% of the patient's ECG data. On average, the paper sample 4096 ECG frames per patient, which amounts to 42.8 million (11000x0.95x4096) training samples over the course of pretraining. For pre-training with 20% of the data used in paper, we use ECG data from 2048 patients and sample 4096 ECG frames per patient, resulting to roughl 8.4 million (2048x4096) training samples.

In [None]:
TRAIN_DATASET = DATA_DIR + '/icentia11k'

Following uses the API to run pre-training with 20% of the data used in the paper. Uncommet the cell to run.

* `--job-dir`: output directory, where check points and weights are saved

* `--task`: pre-training task, "beat" for Beat classification

* `--train`: training dataset directory

* `--arch`: CNN architecture

* `--patient-ids`: patient id whose ECG data to be used in pre-training

* `--frame-size`: number of ECG samples, with 250Hz sampling rate, in a ECG frame

* To use all data: number of patients x samples_per_patient = epochs x batch_size x steps-per-epoch.

In [None]:
# !time python -m pretraining.trainer \
# --job-dir "jobs/beat_classification_16epochs_to_20percent_round3" \
# --task "beat" \
# --train $TRAIN_DATASET \
# --arch "resnet18" \
# --epochs  16\
# --patient-ids `seq 0 2047 | paste -sd, -` \
# --steps-per-epoch 1024 \
# --samples-per-patient 4096 \
# --batch-size 512 \
# --frame-size 4096

####Fine Tuning

In [None]:
JOB_DIR = PROJECT_ROOT + '/jobs/fine_tune_random_resnet18'
FINETUNE_TRAIN = DATA_DIR + '/physionet_finetune/physionet_train.pkl'
FINETUNE_TEST = DATA_DIR + '/physionet_finetune/physionet_test.pkl'

Following uses the API to run fine tuning with random initialization. Uncomment to run.
* `--weights-file $WEIGHTS_FILE`: Path to pretrained weights or a checkpoint of the model to be used for model initialization. Random initilization if not specified.

* `--val-size 0.0625`: This is the percentage of the train set size to set aside for the validation set. Note that the PhysioNet data was already split 80-20 train-test. The paper uses 5 percent of the full dataset for validation. We get this via 0.0625x0.8=0.05

* `--val-metric "f1"`: Use macro F1 score to evaluate performance on validation set and to find the best model at each epoch.

In [None]:
# %%time
# !python -m finetuning.trainer \
# --job-dir $JOB_DIR \
# --train $FINETUNE_TRAIN \
# --test $FINETUNE_TEST \
# --val-size 0.0625 \
# --val-metric "f1" \
# --arch "resnet18" \
# --batch-size 128 \
# --epochs 200 \
# --seed 2024 \
# --verbose

Following uses the API to run fine tuning with pre-traning weights. Uncomment to run.

In [None]:
JOB_DIR = PROJECT_ROOT + '/jobs/fine_tune_pre_trained_20_resnet18'
WEIGHTS_FILE = PROJECT_ROOT + '/jobs/beat_classification/pre_trained_20_resnet18.weights'

In [None]:
# %%time
# !python -m finetuning.trainer \
# --job-dir $JOB_DIR \
# --train $FINETUNE_TRAIN \
# --test $FINETUNE_TEST \
# --weights-file $WEIGHTS_FILE \
# --val-size 0.0625 \
# --val-metric "f1" \
# --arch "resnet18" \
# --batch-size 128 \
# --epochs 200 \
# --seed 2024 \
# --verbose

# Reproduction Results
We compare the validation F1 between model with random initialization weights and pre-training weights.



In [None]:
import pandas as pd
from matplotlib import pyplot as plt


random_result = pd.read_csv(PROJECT_ROOT + '/jobs/finetune_baseline_65sec/history.csv')
pretrain_20_result = pd.read_csv(PROJECT_ROOT + '/jobs/finetune_pretrain_20_weights_65sec/history.csv')

plt.plot(random_result['epoch'], random_result['f1'], color='tab:red', label='random')
plt.plot(pretrain_20_result['epoch'], pretrain_20_result['f1'], color='blue', label='pretrain 20')
plt.xlabel('Epoch')
plt.ylabel('Validation F1')
ax = plt.gca()
ax.set_xlim([0, 80])
ax.set_ylim([0.5, 1.0])
plt.legend(loc='upper right')

## Reproduction Analysis
With pre-training weights, the F1 scores stablize much sooner than random initialization. Model with pre-training weights also outperform random initalization model in the most part.



## Plan
The paper runs fine tuning 10 times for each weights initilization method and then plot the F1 versus Epoch graph. We only run fine tuning once for each  weights initilization method. We can run fine tuning more times and then compare with Figure 3 (a) in the paper again.

# Ablation Study (Hypothesis 3)
For the ablation study, we are pre-processing the data to produce a spectogram for each ecg signal, and passing this result into a 2D ResNet. Then, the same pre-traning and finetuning process from the paper is followed. The goal is to test how extensible the papers ideas are to other models. The work-in-progress notebook that includes building the model in keras, the data pre-processing function, the keras data generators, and running two epochs of pre-training. Fine-tuning is not yet complete:

https://github.com/myles-i/DLH_TransferLearning/blob/master/jupyter_notebooks/explore_spectogram.ipynb

Below, we explain the preprocessing and model in more detail

### Ablation Preprocessing - spectogram
For the spectogram, we chose the following parameters were chosen:
- window_size: 256 (~1 second)
- stride: 64 (~0.25 seconds)
- window type: hanning -> this is like a pre-defined convolution that isused to smooth the FFTs for each spectogram slice

After the spectogram for a data sample is computed, only the lower 32 frequency components (from 0-62.5Hz) are selected to reduce the input size. This was chosen empirically and validated by reproducing the original signal with the filtered spectogram models.

Below shows an example input sample, and its spectogram:

![sample_signal](https://github.com/myles-i/DLH_TransferLearning/blob/master/report/images/sample_signal.PNG?raw=1)

![sample_spectogram](https://github.com/myles-i/DLH_TransferLearning/blob/master/report/images/sample_spectogram.PNG?raw=1)


### Ablation Model: 2D ResNet-18
The model chose for the ablation study using spectograms is similar to the original model used, but is a 2D ResNet. It is presented here:
  * Model architecture
    * 18 layers
    * Input layer consists of convolution layer with 64 filters, kernel size=7x7 and stride=2. The output of the convolution layer passes through batch norm, ReLu and maxpooling layer sequentially.

    * The middle 16 layers consists of 8 residual blocks. A residual block consists of the following two components and outputs the sum of the two compoenets' outputs.
      1. two convolution layers, each followed by batch norm and ReLu
      2. a shortcut that passes the input through a convolution layer followed by batch norm.
    * Output layer is a clssifier consisting of densely-connected layer followed by softmax or sigmoid function.
    * Configurations of the residual blocks
      * 1st and 2nd: 64 filters, kernel size=3x3, strides=2 and 1, respectively
      * 3rd and 4th: 128 filters, kernel size=3x3, strides=2 and 1, respectively
      * 5th and 6th: 256 filters, kernel size=3x3, strides=2 and 1, respectively
      * 7th and 8th: 512 filters, kernel size=3x3, strides=2 and 1, respectively

  * Training objectives
    * loss function: `tf.keras.optimizers.Adam(beta_1=0.9, beta_2=0.98, epsilon=1e-9)`
    * optimizer: `tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)`
    * metric: `tf.keras.metrics.SparseCategoricalAccuracy(name='acc')`


