# Federated 2d image classification with MONAI

## Introduction

This tutorial shows how to deploy in Fed-BioMed the 2d image classification example provided in the project MONAI (https://monai.io/):

https://github.com/Project-MONAI/tutorials/blob/master/2d_classification/mednist_tutorial.ipynb

Being MONAI based on PyTorch, the deployment within Fed-BioMed follows seamlessy the same general structure of general PyTorch models. 

Following the MONAI example, this tutorial is based on the MedNIST dataset:

https://github.com/Project-MONAI/MONAI/blob/master/examples/notebooks/mednist_tutorial.ipynb.

## Creating MedNIST nodes

MedNIST provides an artificial 2d classification dataset created by gathering different medical imaging datasets from TCIA, the RSNA Bone Age Challenge, and the NIH Chest X-ray dataset. The dataset is kindly made available by Dr. Bradley J. Erickson M.D., Ph.D. (Department of Radiology, Mayo Clinic) under the Creative Commons CC BY-SA 4.0 license.

To proceed with the tutorial, we created an iid partitioning of the MedNIST dataset between 3 clients. Each client has 3000 image samples for each class. The training partitions are availables at the following link:

https://drive.google.com/file/d/1vLIcBdtdAhh6K-vrgCFy_0Y55dxOWZwf/view

The dataset owned by each client has structure:


└── client_*/

    ├── AbdomenCT/
    
    └── BreastMRI/
    
    └── CXR/
    
    └── ChestCT/
    
    └── Hand/
    
    └── HeadCT/   

To create the federated dataset, we follow the standard procedure for node creation/population of Fed-BioMed. 
After activating the fedbiomed network with the commands

`source ./scripts/fedbiomed_environment network`

and 

`./scripts/fedbiomed_run network`

we create a first node by using the commands

`source ./scripts/fedbiomed_environment node`

`./scripts/fedbiomed_run node start`

We then poulate the node with the data of first client:

`./scripts/fedbiomed_run node add`

We select option 3 (images) to add MedNIST partition of client 1, by just picking the folder of client 1. 
We can further check that the data has been added by executing `./scripts/fedbiomed_run node list`

Following the same procedure, we create the other two nodes with the datasets of client 2 and client 3 respectively.


## Running Fed-BioMed Researcher

We are now ready to start the reseracher enviroment with the command `source ./scripts/fedbiomed_environment researcher`, and open the Jupyter notebook. 

In [1]:
%load_ext autoreload
%autoreload 2

We can first quesry the network for the mednist dataset. In this case, the nodes are sharing the respective partitions unsing the same tag `mednist`:

In [2]:
from fedbiomed.researcher.requests import Requests
req = Requests()
req.list(verbose=True)

2021-12-27 23:30:10,754 fedbiomed INFO - Component environment:
2021-12-27 23:30:10,755 fedbiomed INFO - - type = ComponentType.RESEARCHER
2021-12-27 23:30:13,232 fedbiomed INFO - Messaging researcher_3f89ab24-0886-42c3-b625-d0ade13b2a44 successfully connected to the message broker, object = <fedbiomed.common.messaging.Messaging object at 0x1069dd820>
2021-12-27 23:30:13,317 fedbiomed INFO - Listing available datasets in all nodes... 
2021-12-27 23:30:13,332 fedbiomed INFO - log from: node_5ef29a9f-9647-4c43-b45a-37a67ce9b237 / DEBUG - Message received: {'researcher_id': 'researcher_3f89ab24-0886-42c3-b625-d0ade13b2a44', 'command': 'list'}
2021-12-27 23:30:13,345 fedbiomed INFO - log from: node_9261632d-ca98-4d57-81a1-8c109560d8bd / DEBUG - Message received: {'researcher_id': 'researcher_3f89ab24-0886-42c3-b625-d0ade13b2a44', 'command': 'list'}
2021-12-27 23:30:13,346 fedbiomed INFO - log from: node_84ef4966-1dae-4d55-aff3-d2bf17c3d68a / DEBUG - Message received: {'researcher_id': 'res

{'node_5ef29a9f-9647-4c43-b45a-37a67ce9b237': [{'name': 'mednist',
   'data_type': 'images',
   'tags': ['mednist'],
   'description': 'bla',
   'shape': [18000, 3, 64, 64]}],
 'node_9261632d-ca98-4d57-81a1-8c109560d8bd': [{'name': 'mednist',
   'data_type': 'images',
   'tags': ['mednist'],
   'description': 'bla',
   'shape': [16954, 3, 64, 64]}],
 'node_84ef4966-1dae-4d55-aff3-d2bf17c3d68a': [{'name': 'mednist',
   'data_type': 'images',
   'tags': ['mednist'],
   'description': 'bla',
   'shape': [18000, 3, 64, 64]}]}

## Create an experiment to train a model on the data found

The code for network and data loader of the MONAI tutorial can now be deployed in Fed-BioMed.
We first import the necessary modules from `fedbiomed` and `monai` libraries:

In [3]:
from fedbiomed.researcher.environ import environ
import tempfile
tmp_dir_model = tempfile.TemporaryDirectory(dir=environ['TMP_DIR']+'/')
model_file = tmp_dir_model.name + '/class_export_mnist.py'

In [4]:
from monai.apps import download_and_extract
from monai.config import print_config
from monai.data import decollate_batch
from monai.metrics import ROCAUCMetric
from monai.networks.nets import DenseNet121
from monai.transforms import (
    Activations,
    AddChannel,
    AsDiscrete,
    Compose,
    LoadImage,
    RandFlip,
    RandRotate,
    RandZoom,
    ScaleIntensity,
    EnsureType,
)
from monai.utils import set_determinism

We can now define the training plan. Note that we can simply use the standard `TorchTrainingPlan` natively provided in Fed-BioMed. We reuse the `MedNISTDataset` data loader defined in the original MONAI tutorial, which is returned by the method `training_data`, which also implements the data parsing from the nodes `dataset_path`. Following the MONAI tutorial, the model is the `DenseNet121`.

In [5]:
%%writefile "$model_file"

import os
import numpy as np
import torch
import torch.nn as nn
from fedbiomed.common.torchnn import TorchTrainingPlan
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

from monai.apps import download_and_extract
from monai.config import print_config
from monai.data import decollate_batch
from monai.metrics import ROCAUCMetric
from monai.networks.nets import DenseNet121
from monai.transforms import (
    Activations,
    AddChannel,
    AsDiscrete,
    Compose,
    LoadImage,
    RandFlip,
    RandRotate,
    RandZoom,
    ScaleIntensity,
    EnsureType,
)
from monai.utils import set_determinism


# Here we define the model to be used. 
# You can use any class name (here 'Net')
class MyTrainingPlan(TorchTrainingPlan):
    def __init__(self, kwargs):
        super(MyTrainingPlan, self).__init__()
        
        # Here we define the custom dependencies that will be needed by our custom Dataloader
        # In this case, we need the torch DataLoader classes
        # Since we will train on MNIST, we need datasets and transform from torchvision
        deps = ["import numpy as np",
                "import os",
                "from torch.utils.data import DataLoader",
                "from monai.apps import download_and_extract",
                "from monai.config import print_config",
                "from monai.data import decollate_batch",
                "from monai.metrics import ROCAUCMetric",
                "from monai.networks.nets import DenseNet121",
                "from monai.transforms import ( Activations, AddChannel, AsDiscrete, Compose, LoadImage, RandFlip, RandRotate, RandZoom, ScaleIntensity, EnsureType, )",
                "from monai.utils import set_determinism",]
        self.add_dependency(deps)
         
        self.num_class =  kwargs['num_class']  
        self.model = DenseNet121(spatial_dims=2, in_channels=1,
                    out_channels = self.num_class)
        
        self.loss_function = torch.nn.CrossEntropyLoss()

    def forward(self, x):
        return self.model(x)

    class MedNISTDataset(torch.utils.data.Dataset):
            def __init__(self, image_files, labels, transforms):
                self.image_files = image_files
                self.labels = labels
                self.transforms = transforms

            def __len__(self):
                return len(self.image_files)

            def __getitem__(self, index):
                return self.transforms(self.image_files[index]), self.labels[index]
    
    def parse_data(self, path):
        print(self.dataset_path)
        class_names = sorted(x for x in os.listdir(path)
                     if os.path.isdir(os.path.join(path, x)))
        num_class = len(class_names)
        image_files = [
                        [
                            os.path.join(path, class_names[i], x)
                            for x in os.listdir(os.path.join(path, class_names[i]))
                        ]
                        for i in range(num_class)
                      ]
        
        return image_files, num_class
    
    def training_data(self, batch_size = 48):
        self.image_files, num_class = self.parse_data(self.dataset_path)
        
        if self.num_class!=num_class:
                raise Exception('number of available classes does not match declared classes')
        
        num_each = [len(self.image_files[i]) for i in range(self.num_class)]
        image_files_list = []
        image_class = []
        
        for i in range(self.num_class):
            image_files_list.extend(self.image_files[i])
            image_class.extend([i] * num_each[i])
        num_total = len(image_class)
        
        
        length = len(image_files_list)
        indices = np.arange(length)
        np.random.shuffle(indices)

        val_split = int(1. * length) 
        train_indices = indices[:val_split]

        train_x = [image_files_list[i] for i in train_indices]
        train_y = [image_class[i] for i in train_indices]


        train_transforms = Compose(
            [
                LoadImage(image_only=True),
                AddChannel(),
                ScaleIntensity(),
                RandRotate(range_x=np.pi / 12, prob=0.5, keep_size=True),
                RandFlip(spatial_axis=0, prob=0.5),
                RandZoom(min_zoom=0.9, max_zoom=1.1, prob=0.5),
                EnsureType(),
            ]
        )

        val_transforms = Compose(
            [LoadImage(image_only=True), AddChannel(), ScaleIntensity(), EnsureType()])

        y_pred_trans = Compose([EnsureType(), Activations(softmax=True)])
        y_trans = Compose([EnsureType(), AsDiscrete(to_onehot=num_class)])

        print(
            f"Training count: {len(train_x)}")
        
        
        train_ds = self.MedNISTDataset(train_x, train_y, train_transforms)
        train_loader = torch.utils.data.DataLoader(
            train_ds, batch_size, shuffle=True)
        
        return train_loader
    
    def training_step(self, data, target):
        output = self.forward(data)
        loss   = self.loss_function(output, target)
        return loss


Writing /Users/mlorenzi/works/temp/fedbiomed/var/tmp/tmphi8wza9t/class_export_mnist.py


We now set the model and training parameters. Note that we use only 1 epoch for this experiment, and perform the training on ~26% of the locally available training data.

In [6]:
model_args = {'num_class':6,}

training_args = {
    'batch_size': 20, 
    'lr': 1e-5, 
    'epochs': 1, 
    'dry_run': False,  
    'batch_maxnum':250 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}

The experiment can be now defined, by providing the `mednist` tag, and running the local training on nodes with model defined in `model_path`, standard `aggregator` (FedAvg) and `client_selection_strategy` (all nodes used). Federated learning is going to be perfomed through 3 optimization rounds.

In [7]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['mednist']
rounds = 3

exp = Experiment(tags=tags,
                 #clients=None,
                 model_path=model_file,
                 model_args=model_args,
                 model_class='MyTrainingPlan',
                 training_args=training_args,
                 rounds=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None
                )

2021-12-27 23:46:02,136 fedbiomed INFO - Searching dataset with data tags: ['mednist'] for all nodes
2021-12-27 23:46:02,148 fedbiomed INFO - log from: node_5ef29a9f-9647-4c43-b45a-37a67ce9b237 / DEBUG - Message received: {'researcher_id': 'researcher_3f89ab24-0886-42c3-b625-d0ade13b2a44', 'tags': ['mednist'], 'command': 'search'}
2021-12-27 23:46:02,151 fedbiomed INFO - log from: node_9261632d-ca98-4d57-81a1-8c109560d8bd / DEBUG - Message received: {'researcher_id': 'researcher_3f89ab24-0886-42c3-b625-d0ade13b2a44', 'tags': ['mednist'], 'command': 'search'}
2021-12-27 23:46:02,153 fedbiomed INFO - log from: node_84ef4966-1dae-4d55-aff3-d2bf17c3d68a / DEBUG - Message received: {'researcher_id': 'researcher_3f89ab24-0886-42c3-b625-d0ade13b2a44', 'tags': ['mednist'], 'command': 'search'}
2021-12-27 23:46:12,141 fedbiomed INFO - Node selected for training -> node_5ef29a9f-9647-4c43-b45a-37a67ce9b237
2021-12-27 23:46:12,142 fedbiomed INFO - Node selected for training -> node_9261632d-ca98-

Let's start the experiment.

By default, this function doesn't stop until all the `rounds` are done for all the clients

In [8]:
exp.run()

2021-12-27 23:47:13,582 fedbiomed INFO - Sampled nodes in round 0 ['node_5ef29a9f-9647-4c43-b45a-37a67ce9b237', 'node_9261632d-ca98-4d57-81a1-8c109560d8bd', 'node_84ef4966-1dae-4d55-aff3-d2bf17c3d68a']
2021-12-27 23:47:13,584 fedbiomed INFO - Send message to node node_5ef29a9f-9647-4c43-b45a-37a67ce9b237 - {'researcher_id': 'researcher_3f89ab24-0886-42c3-b625-d0ade13b2a44', 'job_id': '98eecdb1-ff88-4cbc-ba53-54b4dffef130', 'training_args': {'batch_size': 20, 'lr': 1e-05, 'epochs': 1, 'dry_run': False, 'batch_maxnum': 250}, 'model_args': {'num_class': 6}, 'command': 'train', 'model_url': 'http://localhost:8844/media/uploads/2021/12/27/my_model_888c15bc-65b5-44c1-ba00-39ee4bd10208.py', 'params_url': 'http://localhost:8844/media/uploads/2021/12/27/my_model_e087b56f-3868-4353-bb0e-5873114129a9.pt', 'model_class': 'MyTrainingPlan', 'training_data': {'node_5ef29a9f-9647-4c43-b45a-37a67ce9b237': ['dataset_f07755c7-cde0-474d-8a87-7ae542957c4b']}}
2021-12-27 23:47:13,584 fedbiomed DEBUG - resea

2021-12-27 23:47:16,655 fedbiomed INFO - log from: node_84ef4966-1dae-4d55-aff3-d2bf17c3d68a / DEBUG - Dataset_path/Users/mlorenzi/works/temp/MedNIST/client_1


2021-12-27 23:52:07,257 fedbiomed INFO - log from: node_9261632d-ca98-4d57-81a1-8c109560d8bd / DEBUG - Reached 250 batches for this epoch, ignore remaining data
2021-12-27 23:52:07,866 fedbiomed INFO - log from: node_84ef4966-1dae-4d55-aff3-d2bf17c3d68a / DEBUG - Reached 250 batches for this epoch, ignore remaining data
2021-12-27 23:52:08,032 fedbiomed INFO - log from: node_5ef29a9f-9647-4c43-b45a-37a67ce9b237 / DEBUG - Reached 250 batches for this epoch, ignore remaining data
2021-12-27 23:52:15,146 fedbiomed INFO - log from: node_9261632d-ca98-4d57-81a1-8c109560d8bd / INFO - results uploaded successfully 
2021-12-27 23:52:15,313 fedbiomed INFO - log from: node_5ef29a9f-9647-4c43-b45a-37a67ce9b237 / INFO - results uploaded successfully 
2021-12-27 23:52:15,718 fedbiomed INFO - log from: node_84ef4966-1dae-4d55-aff3-d2bf17c3d68a / INFO - results uploaded successfully 
2021-12-27 23:52:23,762 fedbiomed INFO - Downloading model params after training on node_9261632d-ca98-4d57-81a1-8c109

2021-12-27 23:52:29,627 fedbiomed INFO - log from: node_9261632d-ca98-4d57-81a1-8c109560d8bd / DEBUG - [TASKS QUEUE] Item:{'researcher_id': 'researcher_3f89ab24-0886-42c3-b625-d0ade13b2a44', 'job_id': '98eecdb1-ff88-4cbc-ba53-54b4dffef130', 'params_url': 'http://localhost:8844/media/uploads/2021/12/27/researcher_params_ffa275b2-59f1-4b92-9d6f-7771c51e69e5.pt', 'training_args': {'batch_size': 20, 'lr': 1e-05, 'epochs': 1, 'dry_run': False, 'batch_maxnum': 250}, 'training_data': {'node_9261632d-ca98-4d57-81a1-8c109560d8bd': ['dataset_132eb8e2-aa42-45db-b35e-cfc490157683']}, 'model_args': {'num_class': 6}, 'model_url': 'http://localhost:8844/media/uploads/2021/12/27/my_model_888c15bc-65b5-44c1-ba00-39ee4bd10208.py', 'model_class': 'MyTrainingPlan', 'command': 'train'}
2021-12-27 23:52:29,632 fedbiomed INFO - log from: node_5ef29a9f-9647-4c43-b45a-37a67ce9b237 / DEBUG - Message received: {'researcher_id': 'researcher_3f89ab24-0886-42c3-b625-d0ade13b2a44', 'job_id': '98eecdb1-ff88-4cbc-ba53

2021-12-27 23:57:17,948 fedbiomed INFO - log from: node_9261632d-ca98-4d57-81a1-8c109560d8bd / DEBUG - Reached 250 batches for this epoch, ignore remaining data


2021-12-27 23:57:18,811 fedbiomed INFO - log from: node_5ef29a9f-9647-4c43-b45a-37a67ce9b237 / DEBUG - Reached 250 batches for this epoch, ignore remaining data
2021-12-27 23:57:19,471 fedbiomed INFO - log from: node_84ef4966-1dae-4d55-aff3-d2bf17c3d68a / DEBUG - Reached 250 batches for this epoch, ignore remaining data
2021-12-27 23:57:21,543 fedbiomed INFO - log from: node_9261632d-ca98-4d57-81a1-8c109560d8bd / INFO - results uploaded successfully 
2021-12-27 23:57:22,179 fedbiomed INFO - log from: node_5ef29a9f-9647-4c43-b45a-37a67ce9b237 / INFO - results uploaded successfully 
2021-12-27 23:57:22,842 fedbiomed INFO - log from: node_84ef4966-1dae-4d55-aff3-d2bf17c3d68a / INFO - results uploaded successfully 
2021-12-27 23:57:29,800 fedbiomed INFO - Downloading model params after training on node_9261632d-ca98-4d57-81a1-8c109560d8bd - from http://localhost:8844/media/uploads/2021/12/27/node_params_99adbdaf-542b-4231-a098-488e449b40c4.pt
2021-12-27 23:57:30,657 fedbiomed INFO - Downlo

2021-12-27 23:57:35,279 fedbiomed INFO - log from: node_84ef4966-1dae-4d55-aff3-d2bf17c3d68a / DEBUG - [TASKS QUEUE] Item:{'researcher_id': 'researcher_3f89ab24-0886-42c3-b625-d0ade13b2a44', 'job_id': '98eecdb1-ff88-4cbc-ba53-54b4dffef130', 'params_url': 'http://localhost:8844/media/uploads/2021/12/27/researcher_params_c75073d3-0568-4a64-9b43-59c1e05c7ab2.pt', 'training_args': {'batch_size': 20, 'lr': 1e-05, 'epochs': 1, 'dry_run': False, 'batch_maxnum': 250}, 'training_data': {'node_84ef4966-1dae-4d55-aff3-d2bf17c3d68a': ['dataset_b6fecfe7-8211-4319-b669-7a8abad28173']}, 'model_args': {'num_class': 6}, 'model_url': 'http://localhost:8844/media/uploads/2021/12/27/my_model_888c15bc-65b5-44c1-ba00-39ee4bd10208.py', 'model_class': 'MyTrainingPlan', 'command': 'train'}
2021-12-27 23:57:37,179 fedbiomed INFO - log from: node_9261632d-ca98-4d57-81a1-8c109560d8bd / INFO - {'monitor': <fedbiomed.node.history_monitor.HistoryMonitor object at 0x1370e0a30>, 'batch_size': 20, 'lr': 1e-05, 'epochs'

2021-12-28 00:02:24,244 fedbiomed INFO - log from: node_5ef29a9f-9647-4c43-b45a-37a67ce9b237 / DEBUG - Reached 250 batches for this epoch, ignore remaining data
2021-12-28 00:02:25,768 fedbiomed INFO - log from: node_84ef4966-1dae-4d55-aff3-d2bf17c3d68a / DEBUG - Reached 250 batches for this epoch, ignore remaining data
2021-12-28 00:02:28,643 fedbiomed INFO - log from: node_9261632d-ca98-4d57-81a1-8c109560d8bd / DEBUG - Reached 250 batches for this epoch, ignore remaining data
2021-12-28 00:02:29,925 fedbiomed INFO - log from: node_84ef4966-1dae-4d55-aff3-d2bf17c3d68a / INFO - results uploaded successfully 
2021-12-28 00:02:31,596 fedbiomed INFO - log from: node_9261632d-ca98-4d57-81a1-8c109560d8bd / INFO - results uploaded successfully 
2021-12-28 00:02:40,411 fedbiomed INFO - Downloading model params after training on node_84ef4966-1dae-4d55-aff3-d2bf17c3d68a - from http://localhost:8844/media/uploads/2021/12/27/node_params_127e9825-3d79-4f91-ba48-bc22b779363d.pt
2021-12-28 00:02:56

## Testing


Once the federated model is obtained, it is possible to test it locally on an independent testing partition.
The test dataset is available at this link:

https://drive.google.com/file/d/1YbwA0WitMoucoIa_Qao7IC1haPfDp-XD/

In [27]:
import os
import shutil
import tempfile
import PIL
import torch
import numpy as np
from sklearn.metrics import classification_report

from monai.config import print_config
from monai.data import decollate_batch
from monai.metrics import ROCAUCMetric
from monai.networks.nets import DenseNet121
import zipfile
from monai.transforms import (
    Activations,
    AddChannel,
    AsDiscrete,
    Compose,
    LoadImage,
    RandFlip,
    RandRotate,
    RandZoom,
    ScaleIntensity,
    EnsureType,
)
from monai.utils import set_determinism

print_config()

MONAI version: 0.8.0
Numpy version: 1.19.1
Pytorch version: 1.10.0
MONAI flags: HAS_EXT = False, USE_COMPILED = False
MONAI rev id: 714d00dffe6653e21260160666c4c201ab66511b

Optional dependencies:
Pytorch Ignite version: NOT INSTALLED or UNKNOWN VERSION.
Nibabel version: NOT INSTALLED or UNKNOWN VERSION.
scikit-image version: NOT INSTALLED or UNKNOWN VERSION.
Pillow version: 8.4.0
Tensorboard version: 2.7.0
gdown version: 4.2.0
TorchVision version: 0.11.1
tqdm version: 4.62.3
lmdb version: NOT INSTALLED or UNKNOWN VERSION.
psutil version: NOT INSTALLED or UNKNOWN VERSION.
pandas version: 1.3.4
einops version: NOT INSTALLED or UNKNOWN VERSION.
transformers version: NOT INSTALLED or UNKNOWN VERSION.
mlflow version: NOT INSTALLED or UNKNOWN VERSION.

For details about installing the optional dependencies, please visit:
    https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies



Download the testing dataset on the local temporary folder.

In [46]:
import gdown
import zipfile

resource = "https://drive.google.com/uc?id=1YbwA0WitMoucoIa_Qao7IC1haPfDp-XD"
base_dir = tmp_dir_model.name 
test_file = base_dir  + '/MedNIST_testing.zip'

data_dir = tmp_dir_model.name + '/MedNIST_testing'

if not os.path.exists(data_dir):
    os.mkdir(data_dir)

gdown.download(resource, test_file, quiet=False)

zf = zipfile.ZipFile(test_file)

for file in zf.infolist():
    zf.extract(file, data_dir)

Downloading...
From: https://drive.google.com/uc?id=1YbwA0WitMoucoIa_Qao7IC1haPfDp-XD
To: /Users/mlorenzi/works/temp/fedbiomed/var/tmp/tmphi8wza9t/MedNIST_testing.zip
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.50M/9.50M [00:33<00:00, 283kB/s]


Parse the data and create the testing data loader:

In [60]:
class_names = sorted(x for x in os.listdir(data_dir)
                     if os.path.isdir(os.path.join(data_dir, x)))
num_class = len(class_names)
image_files = [
    [
        os.path.join(data_dir, class_names[i], x)
        for x in os.listdir(os.path.join(data_dir, class_names[i]))
    ]
    for i in range(num_class)
]

num_each = [len(image_files[i]) for i in range(num_class)]
image_files_list = []

image_class = []
for i in range(num_class):
    image_files_list.extend(image_files[i])
    image_class.extend([i] * num_each[i])
num_total = len(image_class)
image_width, image_height = PIL.Image.open(image_files_list[0]).size

print(f"Total image count: {num_total}")
print(f"Image dimensions: {image_width} x {image_height}")
print(f"Label names: {class_names}")
print(f"Label counts: {num_each}")

Total image count: 6000
Image dimensions: 64 x 64
Label names: ['AbdomenCT', 'BreastMRI', 'CXR', 'ChestCT', 'Hand', 'HeadCT']
Label counts: [1000, 1000, 1000, 1000, 1000, 1000]


In [61]:
length = len(image_files_list)
indices = np.arange(length)
np.random.shuffle(indices)


test_split = int(0.1 * length)
test_indices = indices[:test_split]

test_x = [image_files_list[i] for i in test_indices]
test_y = [image_class[i] for i in test_indices]

val_transforms = Compose(
    [LoadImage(image_only=True), AddChannel(), ScaleIntensity(), EnsureType()])

y_pred_trans = Compose([EnsureType(), Activations(softmax=True)])
y_trans = Compose([EnsureType(), AsDiscrete(to_onehot=num_class)])

In [62]:
class MedNISTDataset(torch.utils.data.Dataset):
    def __init__(self, image_files, labels, transforms):
        self.image_files = image_files
        self.labels = labels
        self.transforms = transforms

    def __len__(self):
        return len(self.image_files)

    def __getitem__(self, index):
        return self.transforms(self.image_files[index]), self.labels[index]


test_ds = MedNISTDataset(test_x, test_y, val_transforms)
test_loader = torch.utils.data.DataLoader(
    test_ds, batch_size=300)

Define testing metric:

In [63]:
auc_metric = ROCAUCMetric()

To test the federated model we need to create a model instance and assign to it the model parameters estimated at the last federated optimization round.

In [64]:
model = exp.model_instance
model.load_state_dict(exp.aggregated_params[rounds - 1]['params'])

<All keys matched successfully>

Compute the testing performance:

In [66]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

y_true = []
y_pred = []
with torch.no_grad():
    for test_data in test_loader:
        test_images, test_labels = (
            test_data[0].to(device),
            test_data[1].to(device),
        )
        pred = model(test_images).argmax(dim=1)
        for i in range(len(pred)):
            y_true.append(test_labels[i].item())
            y_pred.append(pred[i].item())


In [67]:
print(classification_report(
    y_true, y_pred, target_names=class_names, digits=4))

              precision    recall  f1-score   support

   AbdomenCT     1.0000    1.0000    1.0000        97
   BreastMRI     1.0000    1.0000    1.0000       108
         CXR     1.0000    0.9899    0.9949        99
     ChestCT     1.0000    1.0000    1.0000        92
        Hand     0.9904    1.0000    0.9952       103
      HeadCT     1.0000    1.0000    1.0000       101

    accuracy                         0.9983       600
   macro avg     0.9984    0.9983    0.9983       600
weighted avg     0.9983    0.9983    0.9983       600



2021-12-28 14:52:49,345 fedbiomed INFO - log from: node_84ef4966-1dae-4d55-aff3-d2bf17c3d68a / CRITICAL - Node stopped in signal_handler, probably by user decision (Ctrl C)
2021-12-28 14:52:50,814 fedbiomed INFO - log from: node_9261632d-ca98-4d57-81a1-8c109560d8bd / CRITICAL - Node stopped in signal_handler, probably by user decision (Ctrl C)


In spite of the relatively small training performed on the data shared in the 3 nodes, the performance of the federated model seems pretty good. Well done! 