Instructions to run the `finetune_runner.py` script to generate additional fine-tuning results for each of the 3 weights scenarios:

1. `random`
2. `10` (10% pre-train weights)
3. `20` (20% pre-train weights)

Data refers to the percentage of patients in the _Icentia11K_ dataset.

## Instructions

Prerequisites:

- Google Colab Pro
- V100 GPU


### Select instance

Select V100 GPU runtime.

### Clone Git repo

In [1]:
%cd /root
! git clone https://github.com/myles-i/DLH_TransferLearning.git
%cd DLH_TransferLearning

/root
Cloning into 'DLH_TransferLearning'...
remote: Enumerating objects: 578, done.[K
remote: Counting objects: 100% (51/51), done.[K
remote: Compressing objects: 100% (40/40), done.[K
remote: Total 578 (delta 25), reused 27 (delta 11), pack-reused 527[K
Receiving objects: 100% (578/578), 4.33 MiB | 20.81 MiB/s, done.
Resolving deltas: 100% (334/334), done.
/root/DLH_TransferLearning


### Install dependencies



In [2]:
%%capture
! pip install -r requirements.txt

### Mount Google Drive

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Determine path to data and job directories on Drive

Example for me:

`/content/drive/MyDrive/DLHProject`

where `DLHProject` is the name of my shortcut for the shared folder.

In [5]:
# setup top level directory paths
PROJECT_DIR = '/content/drive/MyDrive/DLHProject'
DATA_DIR = PROJECT_DIR + '/data'
JOB_DIR = PROJECT_DIR + '/jobs'

In [35]:
import subprocess
def execute_and_print(command):
    # Open a process and connect to the process's output
    process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True, text=True)

    # Continuous output handling
    while True:
        output = process.stdout.readline()
        if output == '' and process.poll() is not None:
            break
        if output:
            print(output.strip())
    rc = process.poll()
    return rc

In [36]:
import os
dryrun = False # change this to False once you have verified everything looks good

# finetune/pretraining data paths (can vary depending on model used)
arch = "resnet18_2d"
PHYSIONET_TRAIN = DATA_DIR + "/physionet_finetune_spectrogram/physionet_train.pkl"
PHYSIONET_TEST = DATA_DIR + "/physionet_finetune_spectrogram/physionet_test.pkl"
PRETRAIN_BASE =JOB_DIR + "/spectrogram/pretraining/min_normalization_16epochs_to_20percent"
FINETUNE_BASE =JOB_DIR + "/spectrogram/finetuning/min_normalization_16epochs_to_20percent"
os.makedirs(FINETUNE_BASE, exist_ok=True)

# define which combinations of runs to do
weight_types = ["random",
                "10",
                "20"]

weight_files = ["",
                PRETRAIN_BASE + "/epoch_08/model.weights.index",
                PRETRAIN_BASE + "/epoch_16/model.weights.index"]

# Loop through each weight type and corresponding weight file
for weight_type, weight_file in zip(weight_types, weight_files):
    # Assert that the required datasets exist
    assert os.path.exists(PHYSIONET_TRAIN), f"PHYSIONET_TRAIN does not exist: {PHYSIONET_TRAIN}"
    assert os.path.exists(PHYSIONET_TEST), f"PHYSIONET_TEST does not exist: {PHYSIONET_TEST}"
    if weight_type != "random":
        assert os.path.exists(weight_file), f"weight_file does not exist: {weight_file}"

    # Define the command to run
    command = (
        f"seq 10 10 100 | xargs -P 1 -I {{}} python finetune_runner.py \\\n"
        f"--arch {arch} \\\n"
        f"--weights-type {weight_type} \\\n"
        f"--weights-file {weight_file} \\\n"
        f"--job-base-dir /content/drive/MyDrive/DLHProject/jobs/spectrogram \\\n"
        f"--train {PHYSIONET_TRAIN} \\\n"
        f"--test {PHYSIONET_TEST} \\\n"
        f"--batch-size 128 \\\n"
        f"--epochs 200 \\\n"
        f"--seed '{{}}' \\\n"
        f"--dryrun {dryrun}"
    )

    # Execute the command and stream the output here
    print(command)

    # make sure output is printed to screen immediately
    execute_and_print(command)



seq 10 10 100 | xargs -P 1 -I {} python finetune_runner.py \
--arch resnet18_2d \
--weights-type random \
--weights-file  \
--job-base-dir /content/drive/MyDrive/DLHProject/jobs/spectrogram \
--train /content/drive/MyDrive/DLHProject/data/physionet_finetune_spectrogram/physionet_train.pkl \
--test /content/drive/MyDrive/DLHProject/data/physionet_finetune_spectrogram/physionet_test.pkl \
--batch-size 128 \
--epochs 200 \
--seed '{}' \
--dryrun True
Finetuning output job dir: /content/drive/MyDrive/DLHProject/jobs/spectrogram/finetune__random_seed10
Configured command:
python -u -m finetuning.trainer --job-dir /content/drive/MyDrive/DLHProject/jobs/spectrogram/finetune__random_seed10 --train /content/drive/MyDrive/DLHProject/data/physionet_finetune_spectrogram/physionet_train.pkl --test /content/drive/MyDrive/DLHProject/data/physionet_finetune_spectrogram/physionet_test.pkl --val-size 0.0625 --val-metric f1 --arch resnet18_2d --batch-size 128 --epochs 200 --seed 10
Dryrun -- Exiting.
Fin