Instructions to run the `finetune_runner.py` script to generate additional fine-tuning results for each of the 3 weights scenarios:

1. `random`
2. `10` (10% pre-train weights)
3. `20` (20% pre-train weights)

Data refers to the percentage of patients in the _Icentia11K_ dataset.

## Instructions

Prerequisites:

- Google Colab Pro
- V100 GPU


### Select instance

Select V100 GPU runtime.

### Clone Git repo

In [18]:
%cd /root
! git clone https://github.com/myles-i/DLH_TransferLearning.git
%cd DLH_TransferLearning

/root
/root/DLH_TransferLearning


### Install dependencies



In [24]:
%%capture
! pip install -r requirements.txt

### Mount Google Drive

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Determine path to data and job directories on Drive

Example for me:

`/content/drive/MyDrive/DLHProject`

where `DLHProject` is the name of my shortcut for the shared folder.

In [2]:
PROJECT_DIR = '/content/drive/MyDrive/DLHProject'
DATA_DIR = PROJECT_DIR + '/data'
JOB_DIR = PROJECT_DIR + '/jobs'

Verify that the `PROJECT_DIR` is correct for you.

In [4]:
!ls $DATA_DIR

icentia11k		    icentia11k_subset_corrupted		physionet_data.zip
icentia11k_corrupted	    icentia11k_subset_unzipped		physionet_finetune
icentia11k_mini_subset	    icentia11k_subset.zip		physionet_preread
icentia11k_mini_subset.zip  physionet				session_checkpoint.dat
icentia11k_subset	    physionet_250hz_15000pad_norm_True	temp.torrent


In [5]:
!ls $JOB_DIR

beat_classification		    finetune_pretrain_20_weights_65sec
draft_demo			    finetune_random_cnn_original_data
finetune_baseline_65sec		    finetune_random_cnn_original_data_with_f1
finetune_pretrain_10_weights_65sec


Set up the paths to the input PhysioNet fine-tuning dataset (65 second).

I've also provided the path to the 60 second dataset in case we cover that for our project.

In [10]:
# This is 65 second long sampling
INPUT_DATA_DIR_NAME = 'physionet_finetune'
# This is 60 second long sampling, if we follow paper strictly.
# INPUT_DATA_DIR_NAME = physionet_250hz_15000pad_norm_True
PHYSIONET_DATA_DIR = DATA_DIR + '/' + INPUT_DATA_DIR_NAME
PHYSIONET_TRAIN = PHYSIONET_DATA_DIR + "/physionet_train.pkl"
PHYSIONET_TEST = PHYSIONET_DATA_DIR + "/physionet_test.pkl"
!ls -lh $PHYSIONET_TRAIN
!ls -lh $PHYSIONET_TEST

-rw------- 1 root root 408M Mar 31 10:32 /content/drive/MyDrive/DLHProject/data/physionet_finetune/physionet_train.pkl
-rw------- 1 root root 102M Mar 31 10:33 /content/drive/MyDrive/DLHProject/data/physionet_finetune/physionet_test.pkl


# Run in Colab cell
Sample command for running the fine-tuning trainer.

Must be in repo root directory.

Run the below cell, copy the output, modify as needed.

In [31]:
terminal_command = (
    f'%%time\n! python finetune_runner.py \\\n'
    f' --weights-type [fill in] \\\n'
    f' --job-base-dir {JOB_DIR} \\\n'
    f' --train {PHYSIONET_TRAIN} \\\n'
    f' --test {PHYSIONET_TEST} \\\n'
    f' --weights-file [do not provide if --weights-type is random] \\\n'
    # Optimal for V100 GPU
    f' --batch-size 128 \\\n'
    f' --epochs 200 \\\n'
    f' --seed [fill in, see the Experiments document] \\\n'
    f' --dryrun'
)
print(terminal_command)


%%time
! python finetune_runner.py \
 --weights-type [fill in] \
 --job-base-dir /content/drive/MyDrive/DLHProject/jobs \
 --train /content/drive/MyDrive/DLHProject/data/physionet_finetune/physionet_train.pkl \
 --test /content/drive/MyDrive/DLHProject/data/physionet_finetune/physionet_test.pkl \
 --weights-file [do not provide if --weights-type is random] \
 --batch-size 128 \
 --epochs 200 \
 --seed [fill in, see the Experiments document] \
 --dryrun


**Strongly recommended** If you run this command with `--dryrun` you can do sanity check that things look ok.

In [26]:
%%time
! python finetune_runner.py \
 --weights-type random \
 --job-base-dir /content/drive/MyDrive/DLHProject/jobs \
 --train /content/drive/MyDrive/DLHProject/data/physionet_finetune/physionet_train.pkl \
 --test /content/drive/MyDrive/DLHProject/data/physionet_finetune/physionet_test.pkl \
 --batch-size 128 \
 --epochs 200 \
 --seed 1 \
 --dryrun

Finetuning output job dir: /content/drive/MyDrive/DLHProject/jobs/finetune__random_seed1
Configured command:
python -u -m finetuning.trainer --job-dir /content/drive/MyDrive/DLHProject/jobs/finetune__random_seed1 --train /content/drive/MyDrive/DLHProject/data/physionet_finetune/physionet_train.pkl --test /content/drive/MyDrive/DLHProject/data/physionet_finetune/physionet_test.pkl --val-size 0.0625 --val-metric f1 --arch resnet18 --batch-size 128 --epochs 200
Dryrun -- Exiting.
CPU times: user 6.42 ms, sys: 1.08 ms, total: 7.5 ms
Wall time: 110 ms


It's possible to loop this command. Make sure the dry run for a single seed
works before proceeding. The below command will also perform dry run.

Here the `seq` will produce the list of 10 seeds (10, 20, ..., 100) as agreed
upon.

The `-P 1` should make the xargs work in sequential order; we can't do parallel
fine-tuning jobs due to batch size chosen to maximize V100 GPU RAM.

The `-I {}` tells xargs to substitute occurrences of `{}` in the stuff coming
after `xargs` with items it receives from standard input, specifically the
sequence of seeds.

In [None]:
! seq 10 10 100 | xargs -P 1 -I {} python finetune_runner.py \
 --weights-type random \
 --job-base-dir /content/drive/MyDrive/DLHProject/jobs \
 --train /content/drive/MyDrive/DLHProject/data/physionet_finetune/physionet_train.pkl \
 --test /content/drive/MyDrive/DLHProject/data/physionet_finetune/physionet_test.pkl \
 --batch-size 128 \
 --epochs 200 \
 --seed '{}' \
 --dryrun

If above looks ok, then comment out the `--dryrun` and run it!

Note: this will produce a lot of output. The `finetune_runner.py` script will
save the total runtime to a file in the job's output directory so no need to
worry about searching the output for the wall time.

You may also consider adding `%%capture` cell magic at the top of the cell, e.g.

```
%%capture
! seq 10 10 100 | xargs ...
```