Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

MONAI Example adopted from https://github.com/Project-MONAI/tutorials/blob/main/2d_classification/monai_101.ipynb

Copyright (c) MONAI Consortium  
Licensed under the Apache License, Version 2.0 (the "License");  
you may not use this file except in compliance with the License.  
You may obtain a copy of the License at  
&nbsp;&nbsp;&nbsp;&nbsp;http://www.apache.org/licenses/LICENSE-2.0  
Unless required by applicable law or agreed to in writing, software  
distributed under the License is distributed on an "AS IS" BASIS,  
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  
See the License for the specific language governing permissions and  
limitations under the License.

# MONAI 101 tutorial with Federated Learning

In this example, the **server** uses the [`FedAvg`](https://github.com/NVIDIA/NVFlare/blob/main/nvflare/app_common/workflows/fedavg.py) controller, which performs the following steps:
1. Initialize the global model. This is achieved through the method `load_model()`
  from the base class
  [`ModelController`](https://github.com/NVIDIA/NVFlare/blob/fa4d00f76848fe4eb356dcde417c136047eeab36/nvflare/app_common/workflows/model_controller.py#L292),
  which relies on the
  [`ModelPersistor`](https://nvflare.readthedocs.io/en/main/glossary.html#persistor). 
2. During each training round, the global model will be sent to the
  list of participating clients to perform a training task. This is
  done using the
  [`send_model()`](https://github.com/NVIDIA/NVFlare/blob/d6827bca96d332adb3402ceceb4b67e876146067/nvflare/app_common/workflows/model_controller.py#L99)
  method under the hood from the `ModelController` base class. Once
  the clients finish their local training, results will be collected
  and sent back to the server as an [`FLModel`](https://nvflare.readthedocs.io/en/main/programming_guide/fl_model.html#flmodel)s.
3. Results sent by clients will be aggregated based on the
  [`WeightedAggregationHelper`](https://github.com/NVIDIA/NVFlare/blob/fa4d00f76848fe4eb356dcde417c136047eeab36/nvflare/app_common/aggregators/weighted_aggregation_helper.py#L20),
  which weighs the contribution from each client based on the number
  of local training samples. The aggregated updates are
  returned as a new `FLModel`.
5. After getting the aggregated results, the global model is [updated](https://github.com/NVIDIA/NVFlare/blob/724140e7dc9081eca7a912a818817f89aadfef5d/nvflare/app_common/workflows/fedavg.py#L63).
6. The last step is to save the updated global model, again through
  the [`ModelPersistor`](https://nvflare.readthedocs.io/en/main/glossary.html#persistor).

The **clients** implement the local training logic using NVFlare's [Client
API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type.html#client-api)
[here](./code/monai_mednist_train.py). The Client API
allows the user to add minimum `nvflare`-specific codes to turn a typical
centralized training script to a federated client-side local training
script.
1. During local training, each client receives a copy of the global
  model sent by the server using `flare.receive()` API. The received
  global model is an instance of `FLModel`.
2. A local validation is first performed, where validation metrics
  (accuracy and precision) are streamed to server using the
  [`SummaryWriter`](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.client.tracking.html#nvflare.client.tracking.SummaryWriter). The
  streamed metrics can be loaded and visualized using [TensorBoard](https://www.tensorflow.org/tensorboard) or [MLflow](https://mlflow.org/).
3. Then, each client performs local training as in the non-federated training [notebook](./monai_101.ipynb). At the end of each FL round, each client then sends the computed results (always in
  `FLModel` format) to the server for aggregation, using the `flare.send()`
  API.

This tutorial will use about 7GB of GPU memory and 10 minutes to run.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NVIDIA/NVFlare/blob/main/integration/monai/examples/mednist/monai_101_fl.ipynb)

## Setup environment

In [None]:
!python -c "import monai" || pip install -q "monai-weekly[ignite, tqdm]"
!pip install -r requirements.txt
# install the latest nvflare
!pip install -e ../../../..

## Configure the NVFlare job templates folder

In [4]:
!nvflare config -jt ./job_templates
!nvflare job list_templates


The following job templates are available: 

------------------------------------------------------------------------------------------------------------------------
  name                 Description                                                  Controller Type   Execution API Type     
------------------------------------------------------------------------------------------------------------------------
  fedavg_mednist       FedAvg using MONAI with in_process Client API                server            client_api             
------------------------------------------------------------------------------------------------------------------------


## Create job folder

We will use the in-process client API, we prepared a job template ([fedavg_mednist](./job_templates/fedavg_mednist)) based on the [sag_pt in_proc job template](../../../job_templates/sag_pt_in_proc) and run the following command to create the job.
The `-f` option allows us to customize some options in the template, such as specifying the training script to be used on the clients and initial arguments to the global model, as well as the number of FL rounds.

In [5]:
!nvflare job create -force -j ./jobs/fedavg_mednist -w fedavg_mednist -sd ./code/. \
    -f config_fed_client.conf app_script=monai_mednist_train.py \
    -f config_fed_server.conf model_class_path=monai.networks.nets.densenet121 \
    -f config_fed_server.conf spatial_dims=2 \
    -f config_fed_server.conf in_channels=1 \
    -f config_fed_server.conf out_channels=6 \
    -f config_fed_server.conf num_rounds=5


The following are the variables you can change in the template

---------------------------------------------------------------------------------------------------------------------------------------
                                                                                                                                       
  job folder: ./jobs/fedavg_mednist                                                                                                      
                                                                                                                                       
---------------------------------------------------------------------------------------------------------------------------------------
  file_name                      var_name                       value                               component                          
---------------------------------------------------------------------------------------------------------------------

## Run FL experiment
Then we can run it using the NVFlare Simulator for `n=2` clients on `t=2` threads in parallel:

In [6]:
!nvflare simulator -n 2 -t 2 ./jobs/fedavg_mednist -w fedavg_workspace

2024-05-01 12:51:25,269 - SimulatorRunner - INFO - Create the Simulator Server.
2024-05-01 12:51:25,284 - CoreCell - INFO - server: creating listener on tcp://0:43515
2024-05-01 12:51:25,298 - CoreCell - INFO - server: created backbone external listener for tcp://0:43515
2024-05-01 12:51:25,298 - ConnectorManager - INFO - 2900298: Try start_listener Listener resources: {'secure': False, 'host': 'localhost'}
2024-05-01 12:51:25,299 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00002 PASSIVE tcp://0:43791] is starting
2024-05-01 12:51:25,800 - CoreCell - INFO - server: created backbone internal listener for tcp://localhost:43791
2024-05-01 12:51:25,801 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00001 PASSIVE tcp://0:43515] is starting
2024-05-01 12:51:25,880 - nvflare.fuel.hci.server.hci - INFO - Starting Admin Server localhost on Port 44941
2024-05-01 12:51:25,880 - SimulatorRunner - INFO - Deploy the Apps.
2024-05-01 12:51:25,883 - SimulatorRunner - INFO - Crea

## Visualize the streamed metrics
The accuracy metrics streamed to the server during training can be visualized using either

1. TensorBoard

In [None]:
!tensorboard --logdir fedavg_workspace

or 

2. MLflow

In [None]:
!mlflow ui --backend-store-uri=/tmp/nvflare/mlruns