# Get Started for GDP-HMM Challenge

This tutorial offers a quick start for training a 3D dose prediction. The participants are encouraged to bring more advanced techniques to improvement the baseline. 

If you do not like this Jupyter Notebook style, you can directly run the [train.py](train.py) with command line as below (after you have installed necessary packages): 

```
python train.py config_files/config.yaml
```

or 

```
python train_lightning.py config_files/config.yaml
```

where [config.yaml](config_files/config.yaml) summarizes all the important hyperparameters. The lightning version can directly use multi-process and multi-gpu. 

After the training finished, run the below command after set the pre-trained model path in the `config_infer.yaml` file.

```
python inference.py config_files/config_infer.yaml
```

Want more details? please continue the following. 



# 0. Before Start

**Step 1. Register the challenge**. 

Go to the <a href="https://qtim-challenges.southcentralus.cloudapp.azure.com/competitions/38/" _target='blank'>challenge platform</a>: 

1.1 create an account of the platform; 

1.2 go to "My Submissions" and read the terms carefully and register the challenge.

**Step 2: Download data/model resources**. 

2.1 download the data (and pre-train models) in huggingface (you will need to submit registration to challenge platform first). 

[Data](https://huggingface.co/datasets/Jungle15/GDP-HMM_Challenge)

[Model](https://huggingface.co/Jungle15/GDP-HMM_baseline)

2.2 [optional] for data/prediction samples, you can download from [OneDrive](https://1drv.ms/f/c/347c1b40c8c6e5ec/Ej5OQVE_APpOnNuP-ZXpnZcBnr_-ix5W-twQcYIJ-dvW2A?e=YcBSPF), and put them into `data` and `results` folders, respectively. This is not the whole dataset for the challenge. 

2.3 change the `npz_path` in the `meta_files/meta_data.csv` depending on the data path on your local machine.  

## 1. Python Environment

The baseline has been tested with Python 3.10, PyTorch 2.1.2, and MONAI 1.4.0. Similar versions should work but have not been tested by organizers. 

## 2. Install the MedNeXt as the network backbone

In the baseline, we choose the [MedNeXt](https://github.com/MIC-DKFZ/MedNeXt) as backbone. One major reason is that MedNeXt has achieved the top performance in recently release **external** testing benckmarks including the [TouchStone (NeurIPS 2024)](https://github.com/MrGiovanni/Touchstone) and [nnUnet revisited (MICCAI 2024)](https://arxiv.org/abs/2404.09556). MedNeXt is still a CNN-based structure, while in the external testing benckmarks, it has consistently beated all the other Transformers and Mamaba structures, sometimes with a large margin. 

Please follow the [MedNeXt official instructions](https://github.com/MIC-DKFZ/MedNeXt) to install and use. It is quite detailed and easy to follow. For example, you can use below command lines to install: 

```
git clone https://github.com/MIC-DKFZ/MedNeXt.git mednext
cd mednext
pip install -e .
```

## 3. Import neccessary packages and Hyperparameters

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import yaml


from nnunet_mednext import create_mednext_v1

# Import data_loader from the current directory
import sys
sys.path.append('.')
import data_loader

cfig = yaml.load(open('config_files/config_dummy.yaml'), Loader=yaml.FullLoader)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


The config includes two major parts: loader_params and model_params. We will introduce them more in the following. 

In [2]:
cfig['loader_params']

{'train_bs': 1,
 'val_bs': 4,
 'csv_root': 'meta_files/meta_data_dummy.csv',
 'scale_dose_dict': 'meta_files/PTV_DICT.json',
 'pat_obj_dict': 'meta_files/Pat_Obj_DICT.json',
 'num_workers': 2,
 'down_HU': -1000,
 'up_HU': 1000,
 'denom_norm_HU': 500,
 'in_size': [96, 128, 144],
 'out_size': [96, 128, 144],
 'norm_oar': True,
 'CatStructures': False,
 'dose_div_factor': 10}

In [3]:
cfig['model_params']

{'num_input_channels': 8,
 'out_channels': 1,
 'model_id': 'B',
 'kernel_size': 3,
 'deep_supervision': False}

## 3. Data loader for this challenge

For getting started, data loader might be most difficult part for the majority of participants. Do not worry, we will help you here! 

We include a complete data loader script in [data_loader.py](data_loader.py), with explanation of each input and parameter. You can simply test the data loader alone by running 

```
python data_loader.py
```

If you want to visualize the 3D data and Dose-Volume Histograms (DVHs) with Python, we provide a jupyter notebook [here](data_visual_understand.ipynb). 

If you want to know more about the preprocess of data and adjust it if needed, we provide code [here](geometry_creation.ipynb). 

For loading the data in deep learning framework, you can use below: 

In [4]:
loaders = data_loader.GetLoader(cfig = cfig['loader_params'])
train_loader =loaders.train_dataloader()
val_loader = loaders.val_dataloader()

## 4. Network structure

As mentioned earlier, we use MedNeXt as the backbone. Please follow the MedNeXt official instructions to adjust the structure. The example we use is as below: 

In [5]:
model = create_mednext_v1( num_input_channels = cfig['model_params']['num_input_channels'],
  num_classes = cfig['model_params']['out_channels'],
  model_id = cfig['model_params']['model_id'],          # S, B, M and L are valid model ids
  kernel_size = cfig['model_params']['kernel_size'],   # 3x3x3 and 5x5x5 were tested in publication
  deep_supervision = cfig['model_params']['deep_supervision']   
).to(device)

## 5. Define loss function and optimizer

In [6]:
optimizer = optim.Adam(model.parameters(), lr=cfig['lr'])
criterion = nn.L1Loss()

## 6. Training 

Then, you are ready to with training loops.

In [7]:
for epoch in range(cfig['num_epochs']):
    model.train()
    for i, data_dict in enumerate(train_loader):
        # Forward pass
        outputs = model(data_dict['data'].to(device))
        loss = criterion(outputs, data_dict['label'].to(device))
        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        print(f"Epoch [{epoch+1}/{cfig['num_epochs']}], Step [{i+1}/{len(train_loader)}], Loss: {loss.item():.4f}")


Epoch [1/10], Step [1/2], Loss: 21.6373
Epoch [1/10], Step [2/2], Loss: 4.7551
Epoch [2/10], Step [1/2], Loss: 4.4043
Epoch [2/10], Step [2/2], Loss: 26.0072
Epoch [3/10], Step [1/2], Loss: 23.5479
Epoch [3/10], Step [2/2], Loss: 4.1487
Epoch [4/10], Step [1/2], Loss: 20.6044
Epoch [4/10], Step [2/2], Loss: 3.8255
Epoch [5/10], Step [1/2], Loss: 3.8469
Epoch [5/10], Step [2/2], Loss: 19.6087
Epoch [6/10], Step [1/2], Loss: 3.7697
Epoch [6/10], Step [2/2], Loss: 19.5941
Epoch [7/10], Step [1/2], Loss: 22.0368
Epoch [7/10], Step [2/2], Loss: 3.2463
Epoch [8/10], Step [1/2], Loss: 3.3352
Epoch [8/10], Step [2/2], Loss: 20.9835
Epoch [9/10], Step [1/2], Loss: 19.2807
Epoch [9/10], Step [2/2], Loss: 3.0964
Epoch [10/10], Step [1/2], Loss: 18.6419
Epoch [10/10], Step [2/2], Loss: 3.0771


## 7. Timing for inference of the deep learning module

In this challenge, we impose a time regularization specifically for the deep learning module in dose prediction, reflecting its clinical application nature. Data preprocessing, however, is outside the scope of this challenge and can be optimized using C++/CUDA for significantly faster performance.  

The MedNeXt baseline we provided comprises approximately 10 million parameters and achieves an inference time of just 0.13 seconds. While we allow a relatively lenient inference time constraint of **3 seconds**, caution is advised when employing diffusion models, particularly if acceleration techniques are not utilized. For example, the default DDPM requires 1000 steps to generate results, which can easily exceed the time constraint. 

Please check below code to get sense of how inference time is calculated. Also, the peak inference GPU memory cannot exceed **24 GB** (the baseline is ~5.7 GB). 

***The solution exceeds either time constraint or GPU memory constraint will be rejected!***



In [8]:
import time, os
model.eval()
data_dict = next(iter(train_loader)) # since this is a dummy test, it does not matter using train or test loaders.

print (f"the total parameters of the model is {sum(p.numel() for p in model.parameters())}")

with torch.no_grad():
    torch.cuda.empty_cache()
    print ('----- skip first 20 times, to avoid delay because of running start ----')
    for i in range(20):
        outputs = model(data_dict['data'].to(device))
    os.system('nvidia-smi')
    start = time.time()
    for i in range(20):
        outputs = model(data_dict['data'].to(device))
    end = time.time()
    print(f"Time taken for average forward pass: {(end-start) / 20:.4f} seconds")
    assert (end-start) / 20 < 2
    

    

the total parameters of the model is 10526498
----- skip first 20 times, to avoid delay because of running start ----


sh: nvidia-smi: command not found


Time taken for average forward pass: 6.1510 seconds


AssertionError: 

## 8. Training Regularization

To strengthen the challenge's objectives, participants are required to develop a generalizable model rather than separate models tailored to individual contexts. To ensure compliance, top-performing participants must submit their training and inference code for review by the organizers.

**Prohibited approaches include (but are not limited to):**

Training separate models for different treatment techniques, such as one for IMRT and another for VMAT.
Training separate models for different treatment sites, such as one for head-and-neck cancers and another for lung cancers.

**Rationale for this regularization:**

In real-world applications, many other contexts exist, including diverse treatment sites (e.g., prostate, breast, cervical, esophageal, and bladder cancers) and varying treatment geometries (e.g., combinations of IMRT and VMAT, such as RapidArc Dynamic). The goal is to develop a generalizable model capable of adapting to new contexts as more training data become available, rather than creating multiple context-specific models.

## 9. Start your development 

Congradulations! You have reached the end of the tutorial and should get the sense how the task is. 

Here we just provide a example to help you get started. Some of the parameters are not optimal, only few examples included in the csv file. 

Now, it is time for you to include more data from the challenge and use your AI expertise to get better results. 

Wish you a great experience with this challenge and research beyond!

