# FAQ and Attentions (TODO Remove)
* any report must have run-able codes and necessary annotations (in text and code comments).
* The notebook is like a demo and only uses small-size data (a subset of original data or processed data), the entire runtime of the notebook including data reading, data process, model training, printing, figure plotting, etc,
must be within **8 min**, otherwise, you may get penalty on the grade.
  * If the raw dataset is too large to be loaded  you can select a subset of data and pre-process the data, then, upload the subset or processed data to Google Drive and load them in this notebook.
  * If the whole training is too long to run, you can only set the number of training epoch to a small number, e.g., 3, just show that the training is runable.
  * For results model validation, you can train the model outside this notebook in advance, then, load pretrained model and use it for validation (display the figures, print the metrics).
* The post-process is important! For post-process of the results,please use plots/figures. The code to summarize results and plot figures may be tedious, however, it won't be waste of time since these figures can be used for presentation. While plotting in code, the figures should have titles or captions if necessary (e.g., title your figure with "Figure 1. xxxx")
* There is not page limit to your notebook report, you can also use separate notebooks for the report, just make sure your grader can access and run/test them.
* If you use outside resources, please refer them (in any formats). Include the links to the resources if necessary.

# Mount Notebook to Google Drive (TODO remove this markdown)
Upload the data, pretrianed model, figures, etc to your Google Drive, then mount this notebook to Google Drive. After that, you can access the resources freely.


# Basic Info

## Team 1
- Ted Hsu (thhsu4@illinois.edu)
- Myles Iribarne (mylesai2@illinois.edu)
- Daniel Xu (dhxu2@illinois.edu)

## Paper

Our paper is _Transfer learning for ECG classification_ by Weimann and Conrad [1]. The project code is available on Github [2].



# Introduction
## Background of the problem

- **What type of problem**:

  The problem is to classify Atrial Fibrillation (AF) on electrocardiogram (ECG) recordings.
- **What is the importance/meaning of solving the problem**:

  - A solution to the problem is a tool that will assist physicians in analyzing large amounts of patient ECG data in an automated and time efficient manner.
  - Early detection of AF events may lead to better patient outcomes.
  
- **What is the difficulty of the problem**:

  - Devices for recording patient ECG data are able to output a _huge_ amount of raw data. This is challenging and expensive to annotate for effective Deep Learning training.
  - Large class imbalance due to cardiovascular events of interests being rare.
  - Low ECG signal quality due to sampling frequency, single ECG lead probe.

- **The state of the art methods and effectiveness**:

  - Transfer learning using 1-D residual networks [3]
  - Representation learning using encoder-decoder architectures
    - Stacked Denoising AEs [4]
    - Seq2Seq model [5]

## Paper Explanation
- **What did the paper propose**:

  - Use Transfer learning to build better ECG classifiers.
  - Pre-train 1-D CNNs on the largest publicly available ECG dataset (_Icentia11K_) on several pre-training tasks:
    - Beat Classification
    - Rhythm Classifcation
    - Heart Rate Classification
    - Future Prediction
  - Finetune the pre-trained 1-D CNNs on a _different_ task and a _differrent_ dataset (_PhysioNet/CinC Challenge 2017_): classify AF events.

- **What is/are the innovations of the method**:

  - Demonstration of successful large-scale pre-training of 1-D CNNs on the largest publicly available ECG dataset to date.
  - Demonstration of contrastive pre-training (unsupervised representation learning) improving 1-D CNN performance on target task.
  - Novel usage of heart rate classifiction task for pre-training. Note that in this task, the labels can be automatically generated without manual intervention.

- **How well the proposed method work (in its own metrics)**:

  - The paper provides AF classifier performnace comparison among five different pre-training tasks configurations (random initalization, Beat classification, Phythm classification, Heart Rate classification, and Future Prediciton). Average macro F1 score of the AF classifier on test set is the prformance metric.
  - The macro F1 score of random initalization pre-training task is 0.731. F1 scores reported by all proposed four pre-training tasks configurations range from 0.742 to 0.779.

- **What is the contribution to the research regime (referring the Background above, how important the paper is to the problem)**:

  - Pre-training the 1-D CNN model improves the perofrmance on the target task (i.e. AF classification), effectively reducing the number of labeled data required to achieve the same performance as 1-D CNNs that are not pre-trained.
  - Unsupervised pre-training (i.e. future prediction) on ECG data is a viable method for improving the performance on the target task and will become more relevant, since labeling ECG data is expensive.

# Scope of Reproducibility

##Hypothesis 1
Pre-training 1-D CNN models with an extremely large dataset of relatively inexpensively labeled data can improve performance of classification based on a smaller set of labeled data with a different classification objective (i.e. AF).

##Hypothesis 2
 The paper does not explore how significant the effects of the pre-training data size are on the final results. We assume size of the pre-training dataset could affect the performance of the target task (i.e. AF classification).


##Verification
We will verify the hypothesises by attempting to reproduce results for a specific model and the folloing hyperparameter combination with 10% and 20% of the pre-training data:

- Model: 1-D ResNet-18v2
- Pre-training Objective: Beat Classification
- Frame Size: 4096 (samples)
- Sample Rate: 250 Hz
- Fine-tuning objective: Atrial Fibrillation

The results will be compared with the performance of random initilization.



# Ablation (TODO: remove from draft)
 The original paper is entirely based on 1-D CNNs and the raw ECG signal. To extend the paper's results, we aim to pre-process the raw signals using fourier transforms to represent the data as a spectogram  - a frequency versus time representation of ECG signals. Using this representation of the input, we will train a 2-D CNN model (i.e. 2-D ResNet) and compare performance of pre-trained and randomly initialized models, as well as compare the performance to the 1-D models originally used by the authors.

 This extension is motivated by a study on ECG Arrhythmia classification that demonstrates the effectiveness of CNNs trained on spectrograms.[6] By converting ECG data to spectrograms features and then using spectrograms to pre-train a 2-D ResNet, we intend to illustrate the adaptability of the transfer-learning framework in the original paper across diverse model architectures.

# Methodology

This cell will git clone the repository (https://github.com/myles-i/DLH_TransferLearning), and download a subset of the relevant data to be able to run the rest of the notebook

In [None]:
# !pip install gdown
import gdown
import os
REPO = '/tmp/ecg_transfer_learning'
DATA_DIR = '/tmp/ecg_transfer_learning/data'
PRETRAINIG_DATA_DIR =  DATA_DIR + "/icentia11k_mini_subset"
FINETUNIING_DATA_DIR = DATA_DIR + "/physionet_finetuning"


# clone git repo
!git clone https://github.com/myles-i/DLH_TransferLearning.git "{REPO}"
%cd "{REPO}"

# download subset of pretraining data
url = "https://drive.google.com/drive/folders/1pz8V78Qog9nQzfDCjHOrifkDQWg_drIK?usp=sharing"
os.makedirs(PRETRAINIG_DATA_DIR, exist_ok=True)
gdown.download_folder(url, output = PRETRAINIG_DATA_DIR)

# download finetuning data
url = "https://drive.google.com/drive/folders/10D7orBB6SWB3JRUK9GFw8w_QzDu509D7?usp=sharing"
os.makedirs(FINETUNIING_DATA_DIR, exist_ok=True)
gdown.download_folder(url, output = FINETUNIING_DATA_DIR)

install required packages

In [None]:
! pip install -r requirements.txt

##  Data

### Pre-training Dataset
 The training data is the “Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset,” which is freely available online.
 * Source of the data
  * https://physionet.org/content/icentia11k-continuous-ecg/1.0/ (raw)
  * https://academictorrents.com/details/af04abfe9a3c96b30e5dd029eb185e19a7055272 (271GB compressed)
 * Statistics
  * 11,000 patients
  * Each patient has upto two weeks of ECG recordings with 250Hz sampling rate.
  * Each ECG recording is acompanied with beat and rhythm labels marked by the ECG signal collection device and specialists, respectively.
  * Both beat and rhythm labels are assigned to positions in the signal at irregular intervals.
  * The original paper uses 95% of the patients for pre-training and the remaining 5% for validation.
 * Data downlaoding:
   * See [this notebook](https://github.com/myles-i/DLH_TransferLearning/blob/master/jupyter_notebooks/Download_Icentia11k_Data.ipynb) which uses the python libtorrent library to download the compressed version of the data from academictorrents.com
   




### Fine-tuning Dataset
The fine-tuning dataset is the “AF Classification from a Short Single Lead ECG Recording: The PhysioNet/Computing in Cardiology Challenge 2017” and freely avaiable online for download.
 * Source of the data
  * https://physionet.org/content/challenge-2017/1.0.0/
 * Statistics
  * 8528 short ECG recordings
  * Each ECG recording duration is 9 to 60 seconds with 300Hz sampling rate
  * Each ECG recording is labeled with one of the following classes: AF, Normal, Other or Noise (too noisy to classify).
 * Data process

### SP 24 rubrik final report
- Data download instruction
- Data descriptions with helpful charts and visualizations
- Preprocessing code + command




In [None]:
from transplant.datasets.icentia11k import load_patient_data
import matplotlib.pyplot as plt
import numpy as np

# lets load one patients data and plot an ecg signal
patient_id = 1
data_idx = 1
(ecg_signal, labels) = load_patient_data(PRETRAINIG_DATA_DIR, patient_id, include_labels=True, unzipped=False)

# lets explore the data a bit
print("Patient id: ", patient_id)
print("Number of ECG signals: ", len(ecg_signal))
print("Length of of ECG at index: ", len(ecg_signal[data_idx]))

# now lets plot the data
fsample = 250.0;
T_x = 1.0/fsample
N = 500
t_x = np.arange(N) * T_x
selected_signal =ecg_signal[data_idx][:N];
plt.plot(t_x, selected_signal)
title = f"ECG signal for patient {patient_id} data index {data_idx} of length {len(selected_signal)} "
plt.title(title)
plt.show()

##   Model
In this porject, CNN model of the choice is ResNet-18. 1D ResNet-18 implemented in the paper's github [2] is used to reproduce the paper's result, and 2D ResNet-18 for abalation task.

### 1D ResNet-18
  * Model architecture
    * 18 layers
    * Input layer consists of convolution layer with 64 filters, kernel size=3 and stride=2. The output of the convolution layer passes through batch norm, ReLu and maxpooling layer sequentially.
    * Output layer is a clssifier consisting of densely-connected layer followed by softmax function.
    * The middle 16 layers consists of 8 residual blocks. A residual block consists of the following two components and outputs the sum of the two compoenets' outputs.
      1. two convolution layers, each followed by batch norm and ReLu
      2. a shortcut that passes the input through a convolution layer followed by batch norm.
    * Configurations of the residual blocks
      * 1st and 2nd: 64 filters, kernel size=7, strides=2 and 1, respectively
      * 3rd and 4th: 128 filters, kernel size=5, strides=2 and 1, respectively
      * 5th and 6th: 256 filters, kernel size=5, strides=2 and 1, respectively
      * 7th and 8th: 512 filters, kernel size=3, strides=2 and 1, respectively
    * Detail: https://github.com/myles-i/DLH_TransferLearning/blob/master/transplant/modules/resnet1d.py

  * Training objectives
    * loss function: `tf.keras.optimizers.Adam(beta_1=0.9, beta_2=0.98, epsilon=1e-9)`
    * optimizer: `tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)`
    * metric: `tf.keras.metrics.SparseCategoricalAccuracy(name='acc')`

### 2D ResNet-18 (TODO in final report)


### SP24 Rubrik final report

- Citation to the original paper
- Link to the original paper’s repo (if applicable)
- Model descriptions
- Implementation code
- Pretrained model (if applicable)


## Training

### Computational Requirements

We use Google Colab Pro, V100 GPUs.

### Implementation code

For both pre-training and fine tuning, the paper provides API to run the entire process with parameters of choice. Following is the high level description of how the API works.

#### Pre-training
  1. Create train/validate data generator based on patient id and the number of samples per patient, both specified when calling the API.
  2. A model is generated based on the model architectrue and pre-training task specified by the API user.
  3. Weights of the model are initialized. They can also be loaded from a weights file. For all pre-training in the project, we don't load weights.
  4. Check point function is created based on training metric. For pre-training, we use `loss` as training metric.
  5. The model fits the train data. At the end of each training epoch, the check point function is called for evaluation and save the model weights.

#### Fine tuning
  1. Train and test data set are already separated from PhysioNet 2017 dataset, with a 80%/20% split, and are passed to the API. The validation dataset will be further sperated from the train dataset based on user input.
  2. A CNN model is generated based on the model architectrue specified by the API user. A binary classifier is attached to the output of the CNN model as output layer.
  3. Weights of the model are initialized. They can also be loaded from a weights file.
  4. Check point function is created based on training metric. For pre-training, we use `f1` as training metric.
  5. The model fits the train data. At the end of each training epoch, the check point function is called for evaluation and save the model weights.

### SP24 Final report rubrik

Hyperparams
- Report at least 3 types of hyperparameters such as learning rate, batch size, hidden size, dropout

Computational requirements
- Report at least 3 types of requirements such as type of hardware, avg runtime for each epoch, total number of trial, GPU hrs used, # training epochs
- Training code


## Evaluation

### Metrics Descriptions

F1 metric

### Implementation code

####Pre-training
The paper uses 95% of the patient's ECG data. On average, the paper sample 4096 ECG frames per patient, which amounts to 42.8 million (11000x0.95x4096) training samples over the course of pretraining. For pre-training with 20% of the data used in paper, we use ECG data from 2048 patients and sample 4096 ECG frames per patient, resulting to roughl 8.4 million (2048x4096) training samples.


In [None]:
TRAIN_DATASET = DATA_DIR + '/icentia11k'

Following uses the API to run pre-training with 20% of the data used in the paper. Uncommet the cell to run.

* `--job-dir`: output directory, where check points and weights are saved

* `--task`: pre-training task, "beat" for Beat classification

* `--train`: training dataset directory

* `--arch`: CNN architecture

* `--patient-ids`: patient id whose ECG data to be used in pre-training

* `--frame-size`: number of ECG samples, with 250Hz sampling rate, in a ECG frame

* To use all data: number of patients x samples_per_patient = epochs x batch_size x steps-per-epoch.

In [None]:
# !time python -m pretraining.trainer \
# --job-dir "jobs/beat_classification_16epochs_to_20percent_round3" \
# --task "beat" \
# --train $TRAIN_DATASET \
# --arch "resnet18" \
# --epochs  16\
# --patient-ids `seq 0 2047 | paste -sd, -` \
# --steps-per-epoch 1024 \
# --samples-per-patient 4096 \
# --batch-size 512 \
# --frame-size 4096

####Fine Tuning

In [None]:
JOB_DIR = PROJECT_ROOT + '/jobs/fine_tune_random_resnet18'
FINETUNE_TRAIN = DATA_DIR + '/physionet_finetune/physionet_train.pkl'
FINETUNE_TEST = DATA_DIR + '/physionet_finetune/physionet_test.pkl'

Following uses the API to run fine tuning with random initialization. Uncomment to run.
* `--weights-file $WEIGHTS_FILE`: Path to pretrained weights or a checkpoint of the model to be used for model initialization. Random initilization if not specified.

* `--val-size 0.0625`: This is the percentage of the train set size to set aside for the validation set. Note that the PhysioNet data was already split 80-20 train-test. The paper uses 5 percent of the full dataset for validation. We get this via 0.0625x0.8=0.05

* `--val-metric "f1"`: Use macro F1 score to evaluate performance on validation set and to find the best model at each epoch.

In [None]:
# %%time
# !python -m finetuning.trainer \
# --job-dir $JOB_DIR \
# --train $FINETUNE_TRAIN \
# --test $FINETUNE_TEST \
# --val-size 0.0625 \
# --val-metric "f1" \
# --arch "resnet18" \
# --batch-size 128 \
# --epochs 200 \
# --seed 2024 \
# --verbose

Following uses the API to run fine tuning with pre-traning weights. Uncomment to run.

In [None]:
JOB_DIR = PROJECT_ROOT + '/jobs/fine_tune_pre_trained_20_resnet18'
WEIGHTS_FILE = PROJECT_ROOT + '/jobs/beat_classification/pre_trained_20_resnet18.weights'

In [None]:
# %%time
# !python -m finetuning.trainer \
# --job-dir $JOB_DIR \
# --train $FINETUNE_TRAIN \
# --test $FINETUNE_TEST \
# --weights-file $WEIGHTS_FILE \
# --val-size 0.0625 \
# --val-metric "f1" \
# --arch "resnet18" \
# --batch-size 128 \
# --epochs 200 \
# --seed 2024 \
# --verbose

# Results
We compare the validation F1 between model with random initialization weights and pre-training weights.


## SP24 final report rubrik

- Table of results (no need to include additional experiments, but main reproducibility result should be included)
- All claims should be supported by experiment results
- Discuss with respect to the hypothesis and results from the original paper
- Experiments beyond the original paper
    - Credits for each experiment depend on how hard it is to run the experiments. Each experiment should include results and discussion
    - Ablation Study.



In [None]:
import pandas as pd
from matplotlib import pyplot as plt


random_result = pd.read_csv(PROJECT_ROOT + '/jobs/finetune_baseline_65sec/history.csv')
pretrain_20_result = pd.read_csv(PROJECT_ROOT + '/jobs/finetune_pretrain_20_weights_65sec/history.csv')

plt.plot(random_result['epoch'], random_result['f1'], color='tab:red', label='random')
plt.plot(pretrain_20_result['epoch'], pretrain_20_result['f1'], color='blue', label='pretrain 20')
plt.xlabel('Epoch')
plt.ylabel('Validation F1')
ax = plt.gca()
ax.set_xlim([0, 80])
ax.set_ylim([0.5, 1.0])
plt.legend(loc='upper right')

## Analysis
With pre-training weights, the F1 scores stablize much sooner than random initialization. Model with pre-trainin weights also outperform random initalization model in the most part.

## Plan
The paper runs fine tuning 10 times for each weights initilization method and then plot the F1 versus Epoch graph. We only run fine tuning once for each  weights initilization method. We can run fine tuning more times and then compare with Figure 3 (a) in the paper again.

## Model comparison

In [None]:
# compare you model with others
# you don't need to re-run all other experiments, instead, you can directly refer the metrics/numbers in the paper

# Discussion

In this section,you should discuss your work and make future plan. The discussion should address the following questions:
  * Make assessment that the paper is reproducible or not.
  * Explain why it is not reproducible if your results are kind negative.
  * Describe “What was easy” and “What was difficult” during the reproduction.
  * Make suggestions to the author or other reproducers on how to improve the reproducibility.
  * What will you do in next phase.

## SP24 Final report rubrik

- Implications of the experimental results, whether the original paper was reproducible, and if it wasn’t, what factors made it irreproducible
- “What was easy”
- “What was difficult”
- Recommendations to the original authors or others who work in this area for improving reproducibility



In [None]:
# no code is required for this section
'''
if you want to use an image outside this notebook for explanaition,
you can read and plot it here like the Scope of Reproducibility
'''

# Public GitHub Repo

Under construction, not required for draft

# References

1. Weimann, K., Conrad, T.O.F. Transfer learning for ECG classification. Sci Rep 11, 5251 (2021). https://doi.org/10.1038/s41598-021-84374-8
2. https://github.com/kweimann/ecg-transfer-learning
3. Kachuee,M.,Fazeli,S.,&Sarrafzadeh,M.ECGheartbeatclassification:adeeptransferablerepresentation.in2018IEEEInterna- tional Conference on Healthcare Informatics (ICHI)https://doi.org/10.1109/ichi.2018.00092 (2018).
4. Rahhal, M. A. et al. Deep learning approach for active classification of electrocardiogram signals. Inf. Sci. 345, 340–354. https:// doi.org/10.1016/j.ins.2016.01.082 (2016).
5. Rajan, D., Beymer, D., & Narayan, G. Generalization Studies of Neural Network Models for Cardiac Disease Detection Using Limited Channel ECG (2019). arXiv:1901.03295.
6. J. Huang, B. Chen, B. Yao and W. He, “ECG Arrhythmia Classification Using STFT-Based Spectrogram and Convolutional
 Neural Network,” in IEEE Access, vol. 7

