# MONet-FL Tutorial 

## Federated Learning with MONet Bundle and NVIDIA Flare

This tutorial will guide you through the process of running a nnUNet experiment in a Federated Learning context, by using the [MONet Bundle](https://github.com/SimoneBendazzoli93/MONet-Bundle) and the NVIDIA Flare API. The goal is to train a model on a dataset that is distributed across multiple sites, while ensuring data privacy and security.

In this tutorial, we will use the nnUNet framework to train a model on a dataset that is distributed across multiple sites. The training will be done in a secure and privacy-preserving manner, by using the Federated Learning capabilities of the NVIDIA Flare API.

To showcase all the functionalities of the MONet Bundle in Federated Learning, we will use the Decathlon Spleen dataset and consider the Spleen segmentation as our task of reference. 

The Spleen dataset (Task09_Spleen) was obtained from the Medical Segmentation Decathlon challenge [(Simpson et al., 2019)](https://arxiv.org/abs/1902.09063).


### Model Training

In detail, the following steps will be performed:

0. Dataset Preparation: The dataset in the different sites will be prepared for training, harmonizing the data and creating the necessary files for training according to the nnUNet framework.
1. nnUNet Experiment Planning and Preprocessing: One of the sites will be selected as the main site, where the nnUNet experiment will be planned and the data will be preprocessed. This step has to be done only on one site, as the nnUNet plans will be shared with the other sites.
2. nnUNet Preprocessing: The data will be preprocessed according to the nnUNet plan in all the other sites.
3. nnUNet Training: The model will be trained on the data of all the sites, using the nnUNet framework and aggregating the local gradients from the different sites.

![](./images/Workflow.png)

### Single-Site Training
Single-Site training is also made available for the clients, to test the training process and evaluate the model at each individual site, establishing a baseline before moving on to the Federated Learning phase.
In this fase, Data Preparation, Experiment Planning and Preprocessing and Training will be done on each site separately.

### Cross-Site Validation
After the training is completed, a cross-site validation will be performed to evaluate the model's performance across different sites. This step will ensure that the model is robust and generalizes well to data from different sites. The validation will be done by using a trained model from one site and evaluating it on the validation data from the other sites. In this step, we perform the inference from a trained model with MONAI Deploy, and then compute the metrics to evaluate the model's performance on external data.

### Model Deployment
In addition, we will perform some specific steps to prepare the model for deployment:
- Convert the trained model to a TorchScript format, supported by MONAI Deploy.
- Package the model into a MONAI bundle.
- Upload the trained model to MLFlow.

### PREREQUISITES
This tutorial assumes that you have already installed the NVIDIA Flare API and have access to a Federated Learning cluster, with ``Lead`` role.

Additionally, all the sites should have the necessary data for training ready in a known folder location (not necessarily the same location across all sites).

To install the NVIDIA Flare API and the required Pytorch version, you can use the following command:

```bash
pip install nvflare==2.4.0rc6 light-the-torch
ltt install torch==2.6.0
pip install cryptography==42 # Required version for NVFlare
pip install "monai[all]"
pip install fire monai-nvflare==0.2.4 odict pyhocon

pip install --no-deps git+https://github.com/SimoneBendazzoli93/MONAI.git@dev
pip install git+https://github.com/SimoneBendazzoli93/nnUNet.git
pip install git+https://github.com/SimoneBendazzoli93/monai-deploy-app-sdk.git@nifti-support
pip install highdicom

pip install batchgenerators==0.25 # Required version for nnUNet
```

The tutorial is designed to be executed within the [MAIA Platform](https://maia.app.cloud.cbh.kth.se). If you are running it somewhere else, you will need to adapt the paths accordingly.

## Configure the Federation in POC Mode

Before starting the tutorial, we need to configure the federation in NVFlare POC (Proof of Concept) mode. This mode allows the federation to be configured within a single site, which is useful for testing and development purposes.

In [None]:
%%bash
export PATH=$PATH:$HOME/.local/bin

#export NVFLARE_POC_WORKSPACE=$(pwd)/"NVFlare_POC"
export NVFLARE_POC_WORKSPACE="/home/maia-user/Data/NVFlare_POC"
nvflare poc prepare

In [None]:
%%bash

#export NVFLARE_POC_WORKSPACE=$(pwd)/"NVFlare_POC"
export NVFLARE_POC_WORKSPACE="/home/maia-user/Data/NVFlare_POC"

nvflare poc start

In [None]:
import os 
os.environ["NVFLARE_POC_WORKSPACE"]="/home/maia-user/Data/NVFlare_POC"

or

In [None]:
from nvflare.tool.poc.poc_commands import _prepare_poc, _start_poc, _stop_poc, _clean_poc

_prepare_poc(
    ["site-1","site-2"],
    2,
    "NVFlare_POC",
)
_start_poc("NVFlare_POC",[0])

## Split the Data Across Sites

After starting the federation in POC mode, we need to split the data across the two different sites. In this tutorial, we will randomly split the Spleen Decathlon Spleen dataset into two parts, one for each site.


In [None]:
def subfiles(folder_path, prefix=None, suffix=None, join=True):
    import os
    if prefix is None:
        prefix = ""
    if suffix is None:
        suffix = ""
    files = []
    for root, dirs, file in os.walk(folder_path):
        for f in file:
            if f.startswith(prefix) and f.endswith(suffix):
                if join:
                    files.append(os.path.join(root, f))
                else:
                    files.append(f)
    return files

In [None]:
import os
import tempfile
from monai.apps import DecathlonDataset

In [None]:
os.environ["MONAI_DATA_DIRECTORY"] = "/home/maia-user/Data"

In [None]:
directory = os.environ.get("MONAI_DATA_DIRECTORY")
if directory is not None:
    os.makedirs(directory, exist_ok=True)
root_dir = tempfile.mkdtemp() if directory is None else directory
print(root_dir)

In [None]:
DecathlonDataset(root_dir=root_dir, task="Task09_Spleen", section="training", download=True, cache_num=1)

In [None]:
from random import shuffle
import shutil
from pathlib import Path
image_dir = os.environ["MONAI_DATA_DIRECTORY"] + "/Task09_Spleen/imagesTr"
label_dir = os.environ["MONAI_DATA_DIRECTORY"] + "/Task09_Spleen/labelsTr"

# Site 1 will have the "Decathlon" dataset format
Path(os.environ["NVFLARE_POC_WORKSPACE"]).joinpath("data/site-1/imagesTr").mkdir(parents=True, exist_ok=True)
Path(os.environ["NVFLARE_POC_WORKSPACE"]).joinpath("data/site-1/labelsTr").mkdir(parents=True, exist_ok=True)

# Site 2 will have the "Subfolders" dataset format
Path(os.environ["NVFLARE_POC_WORKSPACE"]).joinpath("data/site-2").mkdir(parents=True, exist_ok=True)

images = subfiles(image_dir, prefix="spleen", suffix=".nii.gz")
labels = subfiles(label_dir, prefix="spleen", suffix=".nii.gz")

print(f"Images: {len(images)}")
print(f"Labels: {len(labels)}")

shuffle(images)

for image in images[:len(images)//2]:
    shutil.copy(image, Path(os.environ["NVFLARE_POC_WORKSPACE"]).joinpath("data/site-1/imagesTr"))
    shutil.copy(image.replace("imagesTr", "labelsTr"), Path(os.environ["NVFLARE_POC_WORKSPACE"]).joinpath("data/site-1/labelsTr"))

for image in images[len(images)//2:]:
    id = image.split("/")[-1].split(".")[0]
    Path(os.environ["NVFLARE_POC_WORKSPACE"]).joinpath(f"data/site-2/{id}").mkdir(parents=True, exist_ok=True)
    shutil.copy(image, Path(os.environ["NVFLARE_POC_WORKSPACE"]).joinpath(f"data/site-2/{id}/{id}_CT.nii.gz"))
    shutil.copy(image.replace("imagesTr", "labelsTr"), Path(os.environ["NVFLARE_POC_WORKSPACE"]).joinpath(f"data/site-2/{id}/{id}_label.nii.gz"))


## Download MONet Bundle

Next, we download the MONet Bundle, one per client:

In [None]:
%%bash

mkdir -p ${NVFLARE_POC_WORKSPACE}/MONet-Bundles/site-1
mkdir -p ${NVFLARE_POC_WORKSPACE}/MONet-Bundles/site-2

wget  https://raw.githubusercontent.com/minnelab/MONet-Bundle/main/MONetBundle.zip -O ${NVFLARE_POC_WORKSPACE}/MONet-Bundles/site-1/MONetBundle.zip
unzip -o ${NVFLARE_POC_WORKSPACE}/MONet-Bundles/site-1/MONetBundle.zip -d ${NVFLARE_POC_WORKSPACE}/MONet-Bundles/site-1
rm ${NVFLARE_POC_WORKSPACE}/MONet-Bundles/site-1/MONetBundle.zip

wget  https://raw.githubusercontent.com/minnelab/MONet-Bundle/main/MONetBundle.zip -O ${NVFLARE_POC_WORKSPACE}/MONet-Bundles/site-2/MONetBundle.zip
unzip -o ${NVFLARE_POC_WORKSPACE}/MONet-Bundles/site-2/MONetBundle.zip -d ${NVFLARE_POC_WORKSPACE}/MONet-Bundles/site-2
rm ${NVFLARE_POC_WORKSPACE}/MONet-Bundles/site-2/MONetBundle.zip

cp -r ${NVFLARE_POC_WORKSPACE}/MONet-Bundles/site-2/MONetBundle/src /home/maia-user/shared/src/Spleen/

## FL Cluster Authentication

In [None]:
from nvflare.fuel.flare_api.flare_api import new_secure_session
sess = new_secure_session(
    "admin@nvidia.com",
    "/home/maia-user/Data/NVFlare_POC/example_project/prod_00/admin@nvidia.com"
)

In [None]:
print(sess.get_system_info())

## List Jobs

In [None]:
jobs = sess.list_jobs()

In [None]:
jobs[-1]

In [None]:
job_id = jobs[-1]['job_id']

## Terminate Jobs

In [None]:
sess.abort_job(job_id)

## Job Preparation

The first step in the Federated Learning process is to prepare the job configuration files. The job configuration files contain the necessary information to run the job on the Federated Learning cluster, such as the job type, the resources required, and the parameters for the job execution.

The job configurations files are automatically generated by the script `nvflare_generate_job_configs`, which is installed together with this package. The script takes as input client-specific configuration, together with the experiment-specific configuration, and generates the job configuration files for each client.


### Client Configuration
The client-specific configuration file should be in the following format:


```yaml
data_dir: "<DATASET_FOLDER>"
modality_dict:
  ct: "<CT_SUFFIX>"
  label: "<SEG_MASK_SUFFIX>"
dataset_format: "<DATASET_FORMAT>"
patient_id_in_file_identifier: True
nnunet_root_folder: "<NNUNET_ROOT_FOLDER>"
client_name: "<CLIENT_NAME>"
subfolder_suffix: "<SUBFOLDER_SUFFIX>" [OPTIONAL]
bundle_root: "<BUNDLE_ROOT>" [OPTIONAL]
# Optional parameters for MONAI Deploy Inference, we discuss them in the Model Deployment section
app_path: ""
app_model_path: ""
app_output_path: ""
model_name: ""
```

where:

`dataset_format` should refer to one of  these three different formats, according to the `data_dir` structure:
1. `subfolders`: The dataset is organized in subfolders, where each subfolder corresponds to a subject and contains the images and labels for that subject.
```plaintext
  [Dataset_folder]
        [Subject_0]
            - Subject_0_image0.nii.gz    # Subject_0 modality 0
            - Subject_0_image1.nii.gz    # Subject_0 modality 1
            - Subject_0_mask.nii.gz      # Subject_0 semantic segmentation mask
        [Subject_1]
            - Subject_1_image0.nii.gz    # Subject_1 modality 0
            - Subject_1_image1.nii.gz    # Subject_1 modality 1
            - Subject_1_mask.nii.gz      # Subject_1 semantic segmentation mask
        ...

```
2. `decathlon`: The dataset is organized in the format of the Medical Decathlon challenge, where the images and labels are stored in separate folders.
```plaintext
  [Dataset_folder]
        [imagesTr]
            - Subject_0_image0.nii.gz    # Subject_0 modality 0
            - Subject_0_image1.nii.gz    # Subject_0 modality 1
            - Subject_1_image0.nii.gz    # Subject_1 modality 0
            - Subject_1_image1.nii.gz    # Subject_1 modality 1
        [labelsTr]
            - Subject_1_mask.nii.gz      # Subject_1 semantic segmentation mask
            - Subject_0_mask.nii.gz      # Subject_0 semantic segmentation mask
        ...

```

3. `nnunet`: The dataset has been already prepared according to the nnUNet framework, with the images and labels stored in separate folders.
```plaintext
  [nnUNet_raw]
      [DatasetXYZ_TaskName]  # THIS IS THE DATASET FOLDER
          dataset.json
          [imagesTr]
              - Subject_0_image0.nii.gz    # Subject_0 modality 0
              - Subject_0_image1.nii.gz    # Subject_0 modality 1
              - Subject_1_image0.nii.gz    # Subject_1 modality 0
              - Subject_1_image1.nii.gz    # Subject_1 modality 1
          [labelsTr]
              - Subject_1_mask.nii.gz      # Subject_1 semantic segmentation mask
              - Subject_0_mask.nii.gz      # Subject_0 semantic segmentation mask
        ...

```

`nnunet_root_folder` should refer to the root folder used by the nnUnet framework, where the nnUNet experiments are stored.
For the `subfolders` and `decathlon` dataset formats, this folder is created during the dataset preparation step. 
For the `nnunet` dataset format, this folder should contain the nnUNet experiments, with the following structure:
```plaintext
  [nnunet_root_folder]
      [nnUNet_raw_data_base]
          [DatasetXYZ_TaskName]
              dataset.json
              [imagesTr]
                  - Subject_0_image0.nii.gz    # Subject_0 modality 0
                  - Subject_0_image1.nii.gz    # Subject_0 modality 1
                  - Subject_1_image0.nii.gz    # Subject_1 modality 0
                  - Subject_1_image1.nii.gz    # Subject_1 modality 1
              [labelsTr]
                  - Subject_1_mask.nii.gz      # Subject_1 semantic segmentation mask
                  - Subject_0_mask.nii.gz      # Subject_0 semantic segmentation mask
            ...

```

`modality_dict` is a dictionary that maps the modality names to the file suffixes. The suffixes are used to identify the files that correspond to the different modalities in the dataset. For example, if the CT images have the suffix `_CT.nii.gz`, the entry in the `modality_dict` should be `ct: "_CT.nii.gz"`.


`patient_id_in_file_identifier` is a flag used to specify if the patient ID is included in the file name. If this flag is set to `True`, the patient ID will be extracted from the file name. If this flag is set to `False`, the patient ID will be extracted from the file path. If set to `False`, the filename should only contain the modality suffix.

`client_name` is a unique identifier for the client.

`subfolder_suffix` is an optional parameter that specifies the suffix of the subfolders that contain the images and labels for each subject. This parameter is used when the dataset is organized in subfolders, and the subfolders have a specific suffix that needs to be removed to extract the patient ID.

`bundle_root` is an optional parameter that specifies the root folder where the MONet Bundle is stored.

`app_path` is an optional parameter that specifies the path to the MONAI Deploy application to be used for inference. This parameter is used when the model is deployed using MONAI Deploy.

`app_model_path` is an optional parameter that specifies the path to the TorchScript model file to be used by the MONAI Deploy application. This parameter is used when the model is deployed using MONAI Deploy.

`app_output_path` is an optional parameter that specifies the path where the output of the MONAI Deploy Inference will be stored. This parameter is used when the model is deployed using MONAI Deploy.

`model_name` is an optional parameter that specifies the name of the model to be used by the MONAI Deploy application. This parameter is used when the model is deployed using MONAI Deploy.

### Experiment Configuration

The experiment-specific configuration file should be in the following format:

```yaml
dataset_name_or_id: "<DATASET_NAME_OR_ID>"
experiment_name: "<EXPERIMENT_NAME>"
tracking_uri: "<TRACKING_URI>"
mlflow_token: "<MLFLOW_TOKEN>" [OPTIONAL]
nnunet_trainer: "<NNUNET_TRAINER>" [OPTIONAL]
num_rounds: "<NUM_ROUNDS>"
start_round: "<START_ROUND>"
local_epochs: "<LOCAL_EPOCHS>"
server_bundle_root: "<SERVER_BUNDLE_ROOT>" [OPTIONAL]
modality_list:
  - "<MODALITY_1>"
  - "<MODALITY_2>"
label_dict:
  class1: 1
# Extra parameters for the MONet Bundle
#bundle_extra_config:
#  resume_epoch: "latest"  # Optional, used to resume training from a specific epoch
#  region_based: True # Optional, used to enable region-based training
#  is_federated: True # Optional, used to enable federated training when preparing the bundle. Set to False if you want to prepare the bundle for single-site training
```

where:

`dataset_name_or_id` and `experiment_ame` are used as a reference to the nnUNet Dataset ID and Experiment Name, respectively. These values are used to identify the nnUNet experiment in the nnUNet framework. `Experiment Name` is also used to identify the experiment in the MLFlow server, and to generate the zipped nnUNet model file (as `<Experiment Name>.zip`).


`mlflow_token` and `tracking_uri` are used to connect to the MLFlow server, where the experiments are logged, and the trained models are uploaded.

`nnunet_trainer` is an optional parameter that specifies the nnUNet trainer to be used for training. If this parameter is not specified, the default nnUNet trainer will be used.

`num_rounds` is the number of rounds to be executed in the Federated Learning process. This parameter is used to control the number of rounds of training that will be performed on the Federated Learning cluster.

`start_round` is the round number from which to start the training. This parameter is used to control the starting point of the training process.

`local_epochs` is the number of local epochs to be executed on each client. This parameter is used to control the number of epochs that will run on each client before the local model is sent to the server for aggregation.

`server_bundle_root` is the root folder where the MONAI Bundle is stored. This parameter is used to specify the location of the MONAI Bundle that will be used for model deployment.

`modality_list` is a list of the modalities that will be used for training. This parameter is used to specify the modalities that will be used in the nnUNet experiment.


`label_dict` is a dictionary that maps the class names to the class IDs. The class IDs are used to identify the different classes in the dataset. For example, if the dataset has two classes, `Class1` and `Class2`, with IDs 1 and 2, respectively, the entry in the `label_dict` should be `label_dict: Class1: 1, Class2: 2`.

### Prepare the Job Configuration Files

To prepare the job configuration files, run the following command:

```python
monai.nvflare.nvflare_generate_job_configs.generate_configs(client_files, experiment_file, script_dir, job_dir)
```
where:
- `<CLIENT_CONFIG_FILE_1>`, `<CLIENT_CONFIG_FILE_2>`, ... are the client-specific configuration files for each client.
- `<experiment_file>` is the experiment-specific configuration file.
- `<script_dir>` is the directory where the job scripts and python files are stored.
- `<job_dir>` is the directory where the job configuration files will be stored.

In [None]:
import sys
from pathlib import Path
 
ROOT_FOLDER = "/home/maia-user/shared"
sys.path.append(ROOT_FOLDER)

Path(ROOT_FOLDER).joinpath("Experiments").mkdir(parents=True, exist_ok=True)
Path(ROOT_FOLDER).joinpath("Clients").mkdir(parents=True, exist_ok=True)

In [None]:
%%writefile /home/maia-user/shared/Experiments/Spleen.yaml
dataset_name_or_id: 
    site-1: 
        id: "009"
        name: "Task09_Spleen-1"
    site-2:
        id: "010"
        name: "Task09_Spleen-2"
experiment_name: "Task09_Spleen"
tracking_uri: "http://localhost:5000"
nnunet_trainer: "nnUNetTrainer_10epochs"
num_rounds: 10
start_round: 0
local_epochs: 1
server_bundle_root: "/workspace/FedSpleen_Bundle"
modality_list:
- "CT"
label_dict:
    Spleen: 1
bundle_extra_config:
  is_federated: False
  resume_epoch: '"latest"'

In [None]:
%%writefile /home/maia-user/shared/Clients/site-1.yaml
data_dir: "/home/maia-user/Data/NVFlare_POC/data/site-1"
modality_dict:
  ct: ".nii.gz"
  label: ".nii.gz"
dataset_format: "decathlon"
patient_id_in_file_identifier: True
nnunet_root_folder: "/home/maia-user/Data/NVFlare_POC/nnUNet"
client_name: "site-1"
bundle_root: "/home/maia-user/Data/NVFlare_POC/MONet-Bundles/site-1/MONetBundle"
app_path: ""
app_model_path: ""
app_output_path: ""
model_name: ""

In [None]:
%%writefile /home/maia-user/shared/Clients/site-2.yaml
data_dir: "/home/maia-user/Data/NVFlare_POC/data/site-2"
modality_dict:
  ct: "_CT.nii.gz"
  label: "_label.nii.gz"
dataset_format: "subfolders"
patient_id_in_file_identifier: True
nnunet_root_folder: "/home/maia-user/Data/NVFlare_POC/nnUNet"
client_name: "site-2"
bundle_root: "/home/maia-user/Data/NVFlare_POC/MONet-Bundles/site-2/MONetBundle"
app_path: ""
app_model_path: ""
app_output_path: ""
model_name: ""

In [None]:
experiment = "Spleen"
clients = [
    "site-1",
    "site-2"
]

Path(ROOT_FOLDER).joinpath("src").joinpath(experiment).mkdir(parents=True, exist_ok=True)


In [None]:
%%bash

rm -r /home/maia-user/shared/Jobs/Spleen/

In [None]:
from monai.nvflare.nvflare_generate_job_configs import generate_configs

generate_configs(
    [str(Path(ROOT_FOLDER).joinpath("Clients",client+".yaml")) for client in clients],
    str(Path(ROOT_FOLDER).joinpath("Experiments",experiment+".yaml")),
    str(Path(ROOT_FOLDER).joinpath("src",experiment)),
    str(Path(ROOT_FOLDER).joinpath("Jobs",experiment)),
    tasks = ["prepare_bundle","train"]
    
)

In [None]:
JOB_DIR=str(Path(ROOT_FOLDER).joinpath("Jobs",experiment))

## 0.0 Check Python Packages

This initial step is to check that the required Python packages are installed and available in the environment. This is done by importing the necessary packages and checking their versions.

In [None]:
from pathlib import Path
from monai.nvflare.nvflare_nnunet import run_job

task_name = "check_client_packages"

job_id = run_job(sess, task_name, str(Path(JOB_DIR).joinpath("jobs")))

In [None]:
sess.monitor_job(job_id)

In [None]:
job_dir = sess.download_job_result(job_id)

In [None]:
import json
from pathlib import Path

with open(Path(job_dir).joinpath("workspace","package_report","package_report.json"),"r") as f:
    package_report = json.load(f)

In [None]:
print(package_report)

## 0.1 Prepare Dataset

In this step, the dataset in the different sites will be prepared for training, harmonizing the data structures and creating the necessary files for training according to the nnUNet framework.

This step is internally calling the `nnUNetV2Runner` from MONAI, performing the `runner.convert_dataset()`.


Before running this step, you can start a local MLFlow server, to log the task and the FL experiments:
```bash
mkdir MLFlow
cd MLFlow
mlflow server
```


### Submit the Job

In [None]:
from pathlib import Path
from monai.nvflare.nvflare_nnunet import run_job

task_name = "prepare"

job_id = run_job(sess, task_name, str(Path(JOB_DIR).joinpath("jobs")))

In [None]:
sess.monitor_job(job_id)

In [None]:
client_id = "site-1"

To monitor the job, or print the logs from either the server or the client side, you can use the following commands:

In [None]:
#print(sess.api.do_command(f"cat server log.txt")['data'][0]['data'])
print(sess.api.do_command(f"cat {client_id} {job_id}/log.txt")['data'][0]['data'])

### Download the Job Results
When the job is completed, you can download the results, which, for the Prepare Dataset job, will contain the dataset.json file, containing the information about the dataset. In detail, for each client, the dataset.json file will list the dataset files and wheter all the files are valid or not.

In [None]:
job_dir = sess.download_job_result(job_id)

In [None]:
import json
from pathlib import Path

with open(Path(job_dir).joinpath("workspace","prepare","data_dict.json"),"r") as f:
    dataset_dict = json.load(f)

To inspect the dataset, run the following command:

In [None]:
client_id = "site-1"

In [None]:
verified = True

for case in dataset_dict[client_id]["training"]:
    for key in case:
        if key.endswith("_is_file") and not case[key]:
            file = case[key[:-len("_is_file")]]
            print(f"Error: {file} is not a valid file!")
            verified = False
if verified:
    print(f"Dataset succesfully verified for client {client_id}")

In [None]:
print(len(dataset_dict[client_id]["training"]))

## 1. Plan and Preprocess

After the dataset has been prepared, the nnUNet experiment has to be planned and the data preprocessed. This step has to be done only on one site, as the nnUNet plans will be shared with the other sites.

The steps to plan and preprocess the nnUNet experiment are the following:
1. Run the `plan_and_preprocess` job on the chosen site.
2. Extract the nnUNet plans from the job results.
3. Share the nnUNet plans with the other sites.
4. Run the `preprocess` job on the other sites.


This step is internally calling the `nnUNetV2Runner` from MONAI, performing the ` runner.plan_and_process()`.

### Submit the Job

In [None]:
from pathlib import Path
from monai.nvflare.nvflare_nnunet import run_job

task_name = "plan_and_preprocess"

job_id = run_job(sess, task_name, str(Path(JOB_DIR).joinpath("jobs")),
                 {"site-1":""} # Select only site-1 for this task
                 )

In [None]:
sess.monitor_job(job_id)

In [None]:
client_id = "site-1"
#print(sess.api.do_command(f"tail server log.txt")['data'][0]['data'])
print(sess.api.do_command(f"cat {client_id} {job_id}/log.txt")['data'][0]['data'])

In [None]:
job_dir = sess.download_job_result(job_id)

### Inspect nnUNetPlans.json

The nnUNet plans are stored in the `nnUNetPlans.json` file, which contains the configuration for the nnUNet experiment. The file can be found in the `workspace/nnUNet_preprocessing` folder of the job results.

In [None]:
import json

with open(Path(job_dir).joinpath("workspace","nnUNet_preprocessing","nnUNetPlans.json"),"r") as f:
    nnunet_plans = json.load(f)

In [None]:
print(json.dumps(nnunet_plans["site-1"],indent=4))

### Copy nnUNetPlans into Transfer Folder

To share the nnUNet plans with the other sites, copy the `nnUNetPlans.json` file into the `src/Spleen` Folder.

In [None]:
with open(Path("/home/maia-user/shared/src/Spleen/nnUNetPlans.json"),"w") as f:
    json.dump(nnunet_plans["site-1"],f)

And regenerate the job configuration files, so that the other sites can use the nnUNet plans for preprocessing:

In [None]:
generate_configs(
    [str(Path(ROOT_FOLDER).joinpath("Clients",client+".yaml")) for client in clients],
    str(Path(ROOT_FOLDER).joinpath("Experiments",experiment+".yaml")),
    str(Path(ROOT_FOLDER).joinpath("src",experiment)),
    str(Path(ROOT_FOLDER).joinpath("Jobs",experiment)),
    
)

## 2. Preprocess

After the nnUNet plans have been shared with the other sites, the data has to be preprocessed according to the nnUNet plans. This step has to be done on all the sites, except the one where the nnUNet experiment has been planned.

### Submit Job

In [None]:
from pathlib import Path
from monai.nvflare.nvflare_nnunet import run_job

task_name = "preprocess"

job_id = run_job(sess, task_name, str(Path(JOB_DIR).joinpath("jobs")),
                 {"site-2":""} # Select only site-1 for this task
                 )

In [None]:
sess.monitor_job(job_id)

In [None]:
job_dir = sess.download_job_result(job_id)

In [None]:
#print(sess.api.do_command(f"cat server log.txt")['data'][0]['data'])
print(sess.api.do_command(f"cat {client_id} {job_id}/log.txt")['data'][0]['data'])

In [None]:
import json

with open(Path(job_dir).joinpath("workspace","nnUNet_preprocessing","nnUNetPlans.json"),"r") as f:
    nnunet_plans = json.load(f)

In [None]:
print(json.dumps(nnunet_plans["site-2"],indent=4))

## 3.0 Single-Site nnUNet Training

Once the dataset has been prepared and preprocessed following the nnUNet plans, the nnUNet training is ready to begin. In this phase, we will train a simple nnUNet model on a single site using the MONet Bundle.

First, prepare the MONet Bundle and verify its configuration for the training

In [None]:
from pathlib import Path
from monai.nvflare.nvflare_nnunet import run_job

task_name = "prepare_bundle"

job_id = run_job(sess, task_name, str(Path(JOB_DIR).joinpath("jobs")),
                 {"site-1":""} # Select only site-1 for this task
                 )

In [None]:
job_dir = sess.download_job_result(job_id)

In [None]:
import json
from pathlib import Path

with open(Path(job_dir).joinpath("workspace","nnUNet_prepare_bundle","bundle_config.json"),"r") as f:
    bundle_config = json.load(f)

In [None]:
print(json.dumps(bundle_config["site-1"], indent=4))

Then, you can start the training job:

In [None]:
from pathlib import Path
from monai.nvflare.nvflare_nnunet import run_job

task_name = "train"

job_id = run_job(sess, task_name, str(Path(JOB_DIR).joinpath("jobs")),
                 {"site-1":""} # Select only site-1 for this task
                 )

You can monitor the training on MLFlow, or:

In [None]:
sess.monitor_job(job_id)

In [None]:
sess.abort_job(job_id)

Once the training is completed, the validation will be performed on the validation set, and the results will be logged in MLFlow.

In [None]:
job_dir = sess.download_job_result(job_id)

In [None]:
from pathlib import Path
import json

with open(Path(job_dir).joinpath("workspace","nnUNet_train","val_summary.json"),"r") as f:
    print(json.load(f))

To check the training logs, run the following command:

In [None]:
print(sess.api.do_command(f"cat {client_id} {job_id}/log.txt")['data'][0]['data'])

### Convert the trained MONet Bundle to a TorchScript model

This step is needed to run the Cross-Site Validation and the Model Deployment steps, as the MONAI Deploy Inference requires a TorchScript model.

IMPORTANT! The native `dynamic-network-architectures` package, used by nnUNet, is not compatible with the TorchScript format.

Install this version:

```bash
pip install --force-reinstall --no-deps git+https://github.com/SimoneBendazzoli93/dynamic-network-architectures.git
```

In [None]:
%%bash

python convert_ckpt_to_ts.py --bundle_root /home/maia-user/Data/NVFlare_POC/MONet-Bundles/site-1/MONetBundle \
                             --checkpoint_name checkpoint_epoch=10.pt \
                             --fold 0 \
                             --nnunet_trainer_name nnUNetTrainer_10epochs

## Cross-Site Evaluation

To run the cross-site evaluation, we will use the trained model from one site and evaluate it on the validation data from the other sites. 
As a requirement, you need to need to have the MONAI Deploy package installed, including the `ai_spleen_nifti_nnunet_seg_app`, which is used to run the inference on the validation NIFTI data.

Update the site-specific configuration files to include the MONAI Deploy application path and the TorchScript model path:

In [None]:
%%bash
# Clone the repo shallowly


cd /home/maia-user/Data/NVFlare_POC/example_project/prod_00/site-2

git clone --depth 1 --filter=blob:none --sparse --branch nifti-support https://github.com/SimoneBendazzoli93/monai-deploy-app-sdk.git
cd monai-deploy-app-sdk

# Enable sparse checkout and set the folder
git sparse-checkout set examples/apps/ai_spleen_nifti_nnunet_seg_app


In [None]:
%%bash
mkdir -p /home/maia-user/Data/CrossSite_Validation/spleen_site-2/spleen_site-1_predictions

In [None]:
%%writefile /home/maia-user/shared/Clients/site-2.yaml
data_dir: "/home/maia-user/Data/NVFlare_POC/data/site-2"
modality_dict:
  ct: ".nii.gz"
  label: ".nii.gz"
dataset_format: "decathlon"
patient_id_in_file_identifier: True
nnunet_root_folder: "/home/maia-user/Data/NVFlare_POC/nnUNet"
client_name: "site-2"
bundle_root: "/home/maia-user/Data/NVFlare_POC/MONet-Bundles/site-2/MONetBundle"
app_path: "monai-deploy-app-sdk/examples/apps/ai_spleen_nifti_nnunet_seg_app"
app_model_path: "/home/maia-user/Data/NVFlare_POC/MONet-Bundles/site-1/MONetBundle/models/fold_0/model.ts"
app_output_path: "/home/maia-user/Data/CrossSite_Validation/spleen_site-2/spleen_site-1_predictions"
model_name: "spleen_site-1"

In [None]:
generate_configs(
    [str(Path(ROOT_FOLDER).joinpath("Clients",client+".yaml")) for client in clients],
    str(Path(ROOT_FOLDER).joinpath("Experiments",experiment+".yaml")),
    str(Path(ROOT_FOLDER).joinpath("src",experiment)),
    str(Path(ROOT_FOLDER).joinpath("Jobs",experiment)),
    tasks=["cross_site_validation"]
    
)

In [None]:
from pathlib import Path
from monai.nvflare.nvflare_nnunet import run_job

task_name = "cross_site_validation"

job_id = run_job(sess, task_name, str(Path(JOB_DIR).joinpath("jobs")),
                 {"site-2":""} # Select only site-1 for this task
                 )

## 3.1 Federated Learning Training

The Federated Learning training will aggregate the local gradients from the different sites and update the global model. The Federated Learning training will be conducted in rounds, with each round consisting of a number of local epochs. The Federated Learning training will be conducted in a secure and privacy-preserving manner, with the data remaining on the client side and only the local gradients being shared with the server.

Prior to starting the FL training, follow the steps below to correctly configure the federation.


On the Server Side:

1. Install the required python packages and download the MONet Bundle to the server.

2. Upload the `plans.json` and `dataset.json` files to the server (in `<BUNDLE_ROOT>/models`).
`dataset.json` can be in the form:

```json
{
    "task": "Dataset109_Task09_Spleen",
    "dim": 3,
    "test_labels": true,
    "tensorImageSize": "4D",
    "channel_names": {"0": "ct"},
    "labels": {"background": 0, "Spleen": 1},
    "numTraining": 0,
    "numTest": 0,
    "training": [],
    "test": [],
    "file_ending": ".nii.gz"
}
```
3. Specify `bundle_root` in the server train bundle configuration file (`<BUNDLE_ROOT>/configs/train.yaml`):

```yaml
network_def_fl:
    _target_: $monai.apps.nnunet.nnunet_bundle.get_network_from_nnunet_plans
    plans_file: "$@bundle_root+'/models/plans.json'"
    dataset_file: "$@bundle_root+'/models/dataset.json'"
    configuration: '@nnunet_configuration'
```

4. Optionally, change in the train bundle configuration file (`<BUNDLE_ROOT>/configs/train.yaml`) the `nnunet_trainer_class_name`, and `nnunet_plans_identifier`.

In [None]:
%%bash

export NVFLARE_POC_WORKSPACE="/home/maia-user/Data/NVFlare_POC"

mkdir -p ${NVFLARE_POC_WORKSPACE}/MONet-Bundles/server

wget  https://raw.githubusercontent.com/minnelab/MONet-Bundle/main/MONetBundle.zip -O ${NVFLARE_POC_WORKSPACE}/MONet-Bundles/server/MONetBundle.zip
unzip -o ${NVFLARE_POC_WORKSPACE}/MONet-Bundles/server/MONetBundle.zip -d ${NVFLARE_POC_WORKSPACE}/MONet-Bundles/server
rm ${NVFLARE_POC_WORKSPACE}/MONet-Bundles/server/MONetBundle.zip

In [None]:
%%bash

export NVFLARE_POC_WORKSPACE="/home/maia-user/Data/NVFlare_POC"

cp -r ${NVFLARE_POC_WORKSPACE}/nnUNet/nnUNet_preprocessed/Dataset009_site-1/nnUNetPlans.json ${NVFLARE_POC_WORKSPACE}/MONet-Bundles/server/MONetBundle/models/plans.json
cp -r ${NVFLARE_POC_WORKSPACE}/nnUNet/nnUNet_preprocessed/Dataset009_site-1/dataset.json ${NVFLARE_POC_WORKSPACE}/MONet-Bundles/server/MONetBundle/models/dataset.json

On the Client Side:

1. Specify the following parameters in the `Experiment` configuration:
   
```yaml
   num_rounds: 100
   server_bundle_root: "<SERVER_BUNDLE_ROOT>"
   start_round: 0
   local_epochs: 10
   bundle_extra_config:
      is_federated: True
 ```


2. Re-execute the `generate_job_configs` script to update the job configurations.

In [None]:
%%writefile /home/maia-user/shared/Experiments/Spleen.yaml
dataset_name_or_id: 
    site-1: 
        id: "009"
        name: "Task09_Spleen-1"
    site-2:
        id: "010"
        name: "Task09_Spleen-2"
experiment_name: "Task09_Spleen"
tracking_uri: "http://localhost:5000"
nnunet_trainer: "nnUNetTrainer_10epochs"
num_rounds: 10
start_round: 0
local_epochs: 1
server_bundle_root: "/home/maia-user/Data/NVFlare_POC/MONet-Bundles/server/MONetBundle"
modality_list:
- "CT"
label_dict:
    Spleen: 1
bundle_extra_config:
  is_federated: True
#  resume_epoch: 5  # start_round * local_epochs





In [None]:
from monai.nvflare.nvflare_generate_job_configs import generate_configs

generate_configs(
    [str(Path(ROOT_FOLDER).joinpath("Clients",client+".yaml")) for client in clients],
    str(Path(ROOT_FOLDER).joinpath("Experiments",experiment+".yaml")),
    str(Path(ROOT_FOLDER).joinpath("src",experiment)),
    str(Path(ROOT_FOLDER).joinpath("Jobs",experiment)),
    tasks = ["prepare_bundle","train_fl_monet_bundle"]
    
)

### Prepare Bundle for FL Training

The MONet Bundle has to be prepared for the Federated Learning training. In this step, the train and evaluation bundle parameters will be overwritten to match the Federated Learning training configuration.

To prepare the MONet Bundle for the Federated Learning training, run the following command:


In [None]:
from pathlib import Path
from monai.nvflare.nvflare_nnunet import run_job

task_name = "prepare_bundle"

job_id = run_job(sess, task_name, str(Path(JOB_DIR).joinpath("jobs")),
                 
                 )

In [None]:
sess.monitor_job(job_id)

In [None]:
job_dir = sess.download_job_result(job_id)

In [None]:
with open(Path(job_dir).joinpath("workspace","nnUNet_prepare_bundle","bundle_config.json"),"r") as f:
    bundle_config = json.load(f)

In [None]:
print(json.dumps(bundle_config["site-1"], indent=4))

### Run FL Training

In [None]:
from pathlib import Path
from monai.nvflare.nvflare_nnunet import run_job

task_name = "train_fl_monet_bundle"

job_id = run_job(sess, task_name, str(Path(JOB_DIR).joinpath("jobs")),
                 
                 )

In [None]:
sess.monitor_job(job_id)

In [None]:
sess.abort_job(job_id)

To download the trained models from the clients and upload them to the server, run the following command:

In [None]:
job_dir = sess.download_job_result(job_id)

## Resume FL Training from Checkpoint

To resume the Federated Learning training from an existing checkpoint:

1. Download the global model from the server (``` sess.download_job_result(job_id)```) and upload them to the MONAI Bundle in the server (`<BUNDLE_ROOT>/models/fold_<id>`)

2. Add the following parameters to the server configuration file:

```yaml
network_def_fl:
  _target_: $monai.apps.nnunet.nnunet_bundle.get_network_from_nnunet_plans
  plans_file: "$@bundle_root+'/models/plans.json'"
  dataset_file: "$@bundle_root+'/models/dataset.json'"
  configuration: '@nnunet_configuration'
  model_ckpt: "$@ckpt_dir+'/FL_global_model.pt'"
```

3. Update the Experiment Configuration file with the new start_round and num_rounds values.

```yaml
start_round: <START_ROUND>
num_rounds: <INITIAL_ROUNDS - START_ROUND>
bundle_extra_config:
  resume_epoch: 20  # start_round * local_epochs
```
4. Re-execute the `generate_job_configs` script to update the job configurations.

In [None]:
from monai.nvflare.nvflare_generate_job_configs import generate_configs

generate_configs(
    [str(Path(ROOT_FOLDER).joinpath("Clients",client+".yaml")) for client in clients],
    str(Path(ROOT_FOLDER).joinpath("Experiments",experiment+".yaml")),
    str(Path(ROOT_FOLDER).joinpath("src",experiment)),
    str(Path(ROOT_FOLDER).joinpath("Jobs",experiment)),
    tasks = ["prepare_bundle","train_fl_monet_bundle"]
    
)

## Export Federated Model as MONet Bundle

To export the trained Federated Learning model as a MONet Bundle, you need to download the global trained model from the server and prepare the MONet Bundle for deployment:


In [None]:
job_dir = sess.download_job_result(job_id)

In [None]:
FL_global_model = Path(job_dir).joinpath("workspace","app_server","FL_global_model.pt")

In [None]:
%%bash
cp $FL_global_model_path \
   /home/maia-user/Data/NVFlare_POC/MONet-Bundles/site-1/MONetBundle/models/fold_0/FL_global_model.pt

cp /home/maia-user/Data/NVFlare_POC/MONet-Bundles/server/MONetBundle/models/dataset.json \
    /home/maia-user/Data/NVFlare_POC/MONet-Bundles/site-1/MONetBundle/models/dataset.json

cp /home/maia-user/Data/NVFlare_POC/MONet-Bundles/server/MONetBundle/models/plans.json \
    /home/maia-user/Data/NVFlare_POC/MONet-Bundles/site-1/MONetBundle/models/plans.json
    
cp $FL_global_model_path \
   /home/maia-user/Data/NVFlare_POC/MONet-Bundles/site-2/MONetBundle/models/fold_0/FL_global_model.pt

cp /home/maia-user/Data/NVFlare_POC/MONet-Bundles/server/MONetBundle/models/dataset.json \
    /home/maia-user/Data/NVFlare_POC/MONet-Bundles/site-2/MONetBundle/models/dataset.json

cp /home/maia-user/Data/NVFlare_POC/MONet-Bundles/server/MONetBundle/models/plans.json \
    /home/maia-user/Data/NVFlare_POC/MONet-Bundles/site-2/MONetBundle/models/plans.json

Next, we can run the validation of the global Federated model on the individual clients. At the end of the validation, the metrics will be logged in MLFlow.

In [None]:
from pathlib import Path
from monai.nvflare.nvflare_nnunet import run_job

task_name = "finalize"

job_id = run_job(sess, task_name, str(Path(JOB_DIR).joinpath("jobs")),
                 
                 )

### References

Simpson, A. L., Antonelli, M., Bakas, S., Bilello, M., Farahani, K., van Ginneken, B., ... & Menze, B. H. (2019).  *A large annotated medical image dataset for the development and evaluation of segmentation algorithms.*  arXiv preprint [arXiv:1902.09063](https://arxiv.org/abs/1902.09063).

Roth, H. R., Cheng, Y., Wen, Y., Yang, I., Xu, Z., Hsieh, Y.-T., … Feng, A. (2022)
*NVIDIA FLARE: Federated Learning from Simulation to Real-World.*
doi:10.48550/arXiv.2210.13291

Cardoso, M. J., Li, W., Brown, R., Ma, N., Kerfoot, E., Wang, Y., … Feng, A. (2022). *MONAI: An open-source framework for deep learning in healthcare.*