# Compare our model with nnUNet
[nnUNet](https://github.com/MIC-DKFZ/nnUNet) requires that images have the same geometry  (same shape, spacing etc.). This is not the case for the UKBB data, where geometry differs between section. Hence, I will split the data sectionwise and train a new model for each section.

The final models will most likely be patch-based and (hopefully) share similar parameters. Using those common parameters I will design a patch-based geometry-independent architecture that I can train on data from all sections.

In [1]:
import pandas as pd
import os
import json

# private libraries
import sys

if "../scripts" not in sys.path:
    sys.path.insert(1, "../scripts")
import config

## Prepare dataset folder structure
A guide to the required folder structure can be found [here](https://github.com/MIC-DKFZ/nnUNet/blob/master/documentation/dataset_format.md).

To avoid having multiple copies of the same files I will replicate the folder structure with soft links.

### Folders for Self-supervised Training on Presegmented Labels

In [2]:
train = pd.read_csv(config.ukbb + "train.csv")
valid = pd.read_csv(config.ukbb + "valid.csv")
data = pd.concat((train, valid)).reset_index(drop=True)

In [4]:
PATH = config.ukbb_cache + "nnUNet/nnUNet_raw/"
mkdir = lambda _dir: os.mkdir(_dir) if not os.path.exists(_dir) else None

# create directories
_dir = PATH + "Dataset004_allSections"
mkdir(_dir)
mkdir(_dir + "/imagesTr")
mkdir(_dir + "/labelsTr")

# link images and labels
for i, row in data.reset_index().iterrows():
    _id = str(i + 1).zfill(4)
    # image
    os.symlink(
        src=config.ukbb + "nifti/" + row["image"],
        dst=_dir + "/imagesTr/" + f"UKBB_{_id}_0000.nii.gz",
    )

    # label
    os.symlink(
        src=config.ukbb + "preds_combined2/" + row["label"],
        dst=_dir + "/labelsTr/" + f"UKBB_{_id}.nii.gz",
    )

# dataset json
dataset_json = {
    "channel_names": {"0": "MR"},
    "labels": {"background": 0} | {str(i): i for i in range(1, 41)},
    "numTraining": len(data),
    "file_ending": ".nii.gz",
}
with open(_dir + "/dataset.json", "w") as f:
    json.dump(dataset_json, f)

### Folders for Fine-tuning on Annotated Data

In [5]:
PATH = config.ukbb_cache + "nnUNet/nnUNet_raw/"
mkdir = lambda _dir: os.mkdir(_dir) if not os.path.exists(_dir) else None

# create directories
_dir = PATH + "Dataset005_finetuning"
mkdir(_dir)
mkdir(_dir + "/imagesTr")
mkdir(_dir + "/labelsTr")

# link images and labels
for i, row in data.reset_index().iterrows():
    _id = str(i + 1).zfill(4)
    # image
    os.symlink(
        src=row["image"],
        dst=_dir + "/imagesTr/" + f"UKBB_{_id}_0000.nii.gz",
    )

    # label
    os.symlink(
        src=row["label"],
        dst=_dir + "/labelsTr/" + f"UKBB_{_id}.nii.gz",
    )

# dataset json
dataset_json = {
    "channel_names": {"0": "MR"},
    "labels": {"background": 0} | {str(i): i for i in range(1, 41)},
    "numTraining": len(data),
    "file_ending": ".nii.gz",
}
with open(_dir + "/dataset.json", "w") as f:
    json.dump(dataset_json, f)

## Execution Pipeline
### 1. Create virtual environment
```bash
cd $HOME
mamba create -n nnUNet python=3.11
mamba activate nnUNet
mamba  install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
.conda/envs/nnUNet/bin/pip install nnunetv2
```

### 2. Export environment variables
```bash
export nnUnet_root="..."
export nnUNet_raw="$nnUnet_root/nnUNet_raw"
export nnUNet_preprocessed="$nnUnet_root/nnUNet_preprocessed"
export nnUNet_results="/$nnUnet_root/nnUNet_results"
```

### 3. Fingerprint extraction, experiment planning and preprocessing
```bash
nnUNetv2_plan_and_preprocess -d 4 --verify_dataset_integrity
```

### 4. Start (self-supervised) Training
```bash 
TARGET_DATASET=4
CUDA_VISIBLE_DEVICES=0 nnUNetv2_train $TARGET_DATASET 3d_fullres 0 & # train on GPU 0
CUDA_VISIBLE_DEVICES=1 nnUNetv2_train $TARGET_DATASET 3d_fullres 1 & # train on GPU 1
CUDA_VISIBLE_DEVICES=2 nnUNetv2_train $TARGET_DATASET 3d_fullres 2 & # train on GPU 2
CUDA_VISIBLE_DEVICES=3 nnUNetv2_train $TARGET_DATASET 3d_fullres 3 & # train on GPU 3
CUDA_VISIBLE_DEVICES=4 nnUNetv2_train $TARGET_DATASET 3d_fullres 4 & # train on GPU 4
wait
```

### 5. Transfer Plans for Fine-Tuning
```bash 
TARGET_DATASET=5
SOURCE_DATASET=4
TARGET_PLANS_IDENTIFIER=nnUNetPlans_finetuning
SOURCE_PLANS_IDENTIFIER=nnUNetPlans

nnUNetv2_move_plans_between_datasets -t $TARGET_DATASET -s $SOURCE_DATASET -tp $TARGET_PLANS_IDENTIFIER -sp $SOURCE_PLANS_IDENTIFIER
nnUNetv2_preprocess -d $TARGET_DATASET -plans_name $TARGET_PLANS_IDENTIFIER
```

### 6. Start (fine-tuning) Training
```bash 
# Use a new virtual environment in which nnUnet uses a lowered learning rate
mamba activate nnUNet_finetuning

TARGET_DATASET=5
PATH_TO_CHECKPOINT="/$nnUnet_root/nnUNet_results/Dataset004_allSections/nnUNetTrainer__nnUNetPlans__3d_fullres/fold_0/checkpoint_best.pth"

CUDA_VISIBLE_DEVICES=0 nnUNetv2_train $TARGET_DATASET 3d_fullres 0 -pretrained_weights $PATH_TO_CHECKPOINT & # train on GPU 0
CUDA_VISIBLE_DEVICES=1 nnUNetv2_train $TARGET_DATASET 3d_fullres 1 -pretrained_weights $PATH_TO_CHECKPOINT & # train on GPU 1
CUDA_VISIBLE_DEVICES=2 nnUNetv2_train $TARGET_DATASET 3d_fullres 2 -pretrained_weights $PATH_TO_CHECKPOINT & # train on GPU 2
CUDA_VISIBLE_DEVICES=3 nnUNetv2_train $TARGET_DATASET 3d_fullres 3 -pretrained_weights $PATH_TO_CHECKPOINT & # train on GPU 3
CUDA_VISIBLE_DEVICES=4 nnUNetv2_train $TARGET_DATASET 3d_fullres 4 -pretrained_weights $PATH_TO_CHECKPOINT & # train on GPU 4
wait
mamba deactivate
```

### 7. Inference
```bash 
INPUT_FOLDER="$nnUnet_root/data/images"
OUTPUT_FOLDER="$nnUnet_root/data/preds_ft"
DATASET_NAME_OR_ID=5
CONFIGURATION="3d_fullres"

nnUNetv2_predict -i $INPUT_FOLDER -o $OUTPUT_FOLDER -d $DATASET_NAME_OR_ID -c $CONFIGURATION
```