# Fedbiomed Image Classifier with Differential Privacy with CIFAR10

In this tutorial we will show how an Image classifier with Opacus (https://opacus.ai/) differential privacy engine can be trained with Fed-BioMed. We refer to the Opacus tutorial available here:
https://opacus.ai/tutorials/building_image_classifier

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from fedbiomed.researcher.requests import Requests
req = Requests()
req.list(verbose=True)

2022-01-20 09:06:19,048 fedbiomed INFO - Component environment:
2022-01-20 09:06:19,051 fedbiomed INFO - - type = ComponentType.RESEARCHER
2022-01-20 09:06:20,296 fedbiomed INFO - Messaging researcher_120908d2-a069-4a24-838f-3907e0f50ca4 successfully connected to the message broker, object = <fedbiomed.common.messaging.Messaging object at 0x10cc10af0>
2022-01-20 09:06:20,334 fedbiomed INFO - Listing available datasets in all nodes... 
2022-01-20 09:06:20,344 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / DEBUG - Message received: {'researcher_id': 'researcher_120908d2-a069-4a24-838f-3907e0f50ca4', 'command': 'list'}
2022-01-20 09:06:20,371 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / DEBUG - Message received: {'researcher_id': 'researcher_120908d2-a069-4a24-838f-3907e0f50ca4', 'command': 'list'}
2022-01-20 09:06:30,393 fedbiomed INFO - 
 Node: node_648c36e8-b7f1-47af-8749-f562e4abc8fb | Number of Datasets: 1 
+---------+-------------+--

{'node_648c36e8-b7f1-47af-8749-f562e4abc8fb': [{'name': 'CIFAR10',
   'data_type': 'default',
   'tags': ['#CIFAR10', '#dataset'],
   'description': '50.0 percent of CIFAR10 database',
   'shape': [25000, 3, 32, 32]}],
 'node_c7f1c8d4-9744-4555-b494-d7fdca428181': [{'name': 'CIFAR10',
   'data_type': 'default',
   'tags': ['#CIFAR10', '#dataset'],
   'description': '50.0 percent of CIFAR10 database',
   'shape': [25000, 3, 32, 32]}]}

## Start the network
Before running this notebook, start the network with `./scripts/fedbiomed_run network`

## Setting the node up
It is necessary to previously configure a node:
1. `./scripts/fedbiomed_run node add`
  * Select option 2 (default), and write CIFAR10 to add CIFAR to the node through `torchvision.datasets.CIFAR10`
  * Confirm default tags by hitting "y" and ENTER
  * Pick the folder where CIFAR is downloaded
  * Data must have been added (if you get a warning saying that data must be unique is because it's been already added)
  
2. Check that your data has been added by executing `./scripts/fedbiomed_run node list`
3. Run the node using `./scripts/fedbiomed_run node run`. Wait until you get `Starting task manager`. it means you are online.

## Create an experiment to train a model on the data found

Declare a TorchTrainingPlan Net class to send for training on the node

In [3]:
from fedbiomed.researcher.environ import environ
import tempfile
tmp_dir_model = tempfile.TemporaryDirectory(dir=environ['TMP_DIR']+'/')
model_file = tmp_dir_model.name + '/Cifar_opacus.py'

In the cell below, we are going to define the model using opacus for differential privacy. For this example, we are going to use the `ModuleValidator` function to validate and/or correct models to be compatible with the `opacus` engine, and the function `make_private_with_epsilon` from `opacus.privacy_engine`. 

To train a model with `make_private_with_epsilon` from Opacus library, there are three privacy-specific hyper-parameters that must be tuned for better performance:

* `max_grad_norm`: The maximum L2 norm of per-sample gradients before they are aggregated by the averaging step.
* `noise_multiplier`: The amount of noise sampled and added to the average of the gradients in a batch.
* `target_epsilon` and `target_delta`: The target ϵ and δ of the (ϵ,δ)-differential privacy guarantee. 

It is worth noting that in order to use the opacus `PrivacyEngine` class we need to properly define as training plan attributes a `model`, a `dataloader` and an `optimizer`.

In [4]:
%%writefile "$model_file"

import torch
import torch.nn as nn
from fedbiomed.common.torchnn import TorchTrainingPlan
from fedbiomed.common.logger import logger
from torch.utils.data import DataLoader
import torch.optim as optim
from torchvision import datasets, transforms, models
from opacus import PrivacyEngine 
from opacus.validators import ModuleValidator
from opacus.utils.batch_memory_manager import BatchMemoryManager
from typing import Union, List
from tqdm import tqdm

# Here we define the model to be used. 
# You can use any class name (here 'Net')
class CIFAR10DPPlan(TorchTrainingPlan):
    def __init__(self, model_args):
        super(CIFAR10DPPlan, self).__init__()
        
        # Here we define the custom dependencies that will be needed by our custom Dataloader
        # In this case, we need the torch DataLoader classes
        # Since we will train on MNIST, we need datasets and transform from torchvision
        deps = ["from torchvision import datasets, transforms, models",
                "from torch.utils.data import DataLoader",
                "import torch.optim as optim",
                "from fedbiomed.common.logger import logger",
                "from typing import Union, List",
                "from tqdm import tqdm",
                "from opacus import PrivacyEngine",
                "from opacus.validators import ModuleValidator",
                "from opacus.utils.batch_memory_manager import BatchMemoryManager",]
        self.add_dependency(deps)
        
        self.model = models.resnet18(num_classes=model_args['num_classes'])
        self.model = ModuleValidator.fix(self.model)
        ModuleValidator.validate(self.model, strict=False)
        
        self.loss = nn.CrossEntropyLoss()
        
        self.max_grad_norm = model_args['max_grad_norm']
        self.epsilon = model_args['target_epsilon']
        self.delta = model_args['target_delta']
        self.max_physical_batch_size = model_args['max_physical_batch_size']

    def forward(self, x):
        return self.model(x)

    def training_data(self, batch_size = 48):
        CIFAR10_MEAN = (0.4914, 0.4822, 0.4465)
        CIFAR10_STD_DEV = (0.2023, 0.1994, 0.2010)
        # Custom torch Dataloader for CIFAR data
        transform = transforms.Compose([transforms.ToTensor(),
                                        transforms.Normalize(CIFAR10_MEAN, CIFAR10_STD_DEV),
                                       ])
        dataset1 = datasets.CIFAR10(self.dataset_path, train=True, download=False, transform=transform)
        train_kwargs = {'batch_size': batch_size, 'shuffle': True}
        data_loader = torch.utils.data.DataLoader(dataset1, **train_kwargs)
        return data_loader
    
    def training_step(self, data, target):
        output = self.forward(data)
        loss   = self.loss(output, target)
        return loss
    
    def training_routine(self,
                         epochs: int = 2,
                         log_interval: int = 10,
                         lr: Union[int, float] = 1e-3,
                         batch_size: int = 48,
                         batch_maxnum: int = 0,
                         dry_run: bool = False,
                         monitor=None):
        
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        
        self.optimizer = optim.RMSprop(self.model.parameters(), lr=lr)
        
        training_data = self.training_data(batch_size=batch_size)
        
        # enter PrivacyEngine
        privacy_engine = PrivacyEngine()
        self.model, self.optimizer, training_data = privacy_engine.make_private_with_epsilon(
                                                                    module=self.model,
                                                                    optimizer=self.optimizer,
                                                                    data_loader=training_data,
                                                                    epochs=epochs,
                                                                    target_epsilon=self.epsilon,
                                                                    target_delta=self.delta,
                                                                    max_grad_norm=self.max_grad_norm,
                                                                )

        for epoch in range(1, epochs + 1):
            self.model.train()
            # (below) sampling data (with `training_data` method defined on
            # researcher's notebook)
            #for batch_idx, (data, target) in enumerate(tqdm(training_data)):
            
            # 
            with BatchMemoryManager(data_loader=training_data, 
                                    max_physical_batch_size=self.max_physical_batch_size, 
                                    optimizer=self.optimizer
                                    ) as memory_safe_data_loader:

                for batch_idx, (data, target) in enumerate(memory_safe_data_loader):
            
                    #self.model.train()  # model training
                    data, target = data.to(self.device), target.to(self.device)
                    self.optimizer.zero_grad()
                    # (below) calling method `training_step` defined on
                    # researcher's notebook
                    res = self.training_step(data, target)
                    res.backward()
                    self.optimizer.step()

                    # do not take into account more than batch_maxnum
                    # batches from the dataset
                    if (batch_maxnum > 0) and (batch_idx >= batch_maxnum):
                        #print('Reached {} batches for this epoch, ignore remaining data'.format(batch_maxnum))
                        logger.debug('Reached {} batches for this epoch, ignore remaining data'.format(batch_maxnum))
                        break

                    if batch_idx % log_interval == 0:
                        logger.info('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                            epoch,
                            batch_idx * len(data),
                            len(training_data.dataset),
                            100 * batch_idx / len(training_data),
                            res.item()))
                        eps = privacy_engine.get_epsilon(self.delta)
                        logger.info('Epsilon={:.2f}, Delta={}'.format(eps,self.delta))

                        # Send scalar values via general/feedback topic
                        if monitor is not None:
                            monitor.add_scalar('Loss', res.item(), batch_idx, epoch)

                        if dry_run:
                            return

    def save(self, filename, params: dict = None) -> None:
        if params is not None:

            # params keys are changed by the privacy engine (as _module.param_key): should be re-named
            params_keys = list(params.keys())
            for key in params_keys:
                if '_module' in key:
                    newkey = key.replace('_module.', '')
                    params[newkey] = params.pop(key)
                    
            return(torch.save(params, filename))
        else:
            return torch.save(self.state_dict(), filename)

Writing /Users/balelli/ownCloud/INRIA_EPIONE/FedBioMed/fedbiomed/var/tmp/tmpj7dulov8/Cifar_opacus.py


This group of arguments correspond respectively:
* `model_args`: a dictionary with the arguments related to the model (e.g. number of layers, features, etc.). This will be passed to the model class on the node side. For instance, the privacy parameters should be passed here.
* `training_args`: a dictionary containing the arguments for the training routine (e.g. batch size, learning rate, epochs, etc.). This will be passed to the routine on the node side.

**NOTE:** typos and/or lack of positional (required) arguments will raise error. 🤓

In [5]:
model_args = {'num_classes': 10, 'max_grad_norm': 1.2, 
              'target_epsilon': 50.0, 'target_delta': 1e-5,
              'max_physical_batch_size': 100}

training_args = {
    'batch_size': 50, 
    'lr': 1e-3, 
    'epochs': 3, 
    'dry_run': False,  
    'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}

# Train the federated model

Define an experiment
- search nodes serving data for these `tags`, optionally filter on a list of node ID with `nodes`
- run a round of local training on nodes with model defined in `model_path` + federation with `aggregator`
- run for `rounds` rounds, applying the `node_selection_strategy` between the rounds

In [6]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['#CIFAR10', '#dataset']
rounds = 3

exp = Experiment(tags=tags,
                 #nodes=None,
                 model_path=model_file,
                 model_args=model_args,
                 model_class='CIFAR10DPPlan',
                 training_args=training_args,
                 rounds=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None)

2022-01-20 09:06:53,684 fedbiomed INFO - Searching dataset with data tags: ['#CIFAR10', '#dataset'] for all nodes
2022-01-20 09:06:53,696 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / DEBUG - Message received: {'researcher_id': 'researcher_120908d2-a069-4a24-838f-3907e0f50ca4', 'tags': ['#CIFAR10', '#dataset'], 'command': 'search'}
2022-01-20 09:06:53,700 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / DEBUG - Message received: {'researcher_id': 'researcher_120908d2-a069-4a24-838f-3907e0f50ca4', 'tags': ['#CIFAR10', '#dataset'], 'command': 'search'}
2022-01-20 09:07:03,701 fedbiomed INFO - Node selected for training -> node_c7f1c8d4-9744-4555-b494-d7fdca428181
2022-01-20 09:07:03,702 fedbiomed INFO - Node selected for training -> node_648c36e8-b7f1-47af-8749-f562e4abc8fb
2022-01-20 09:07:03,705 fedbiomed INFO - Checking data quality of federated datasets...
2022-01-20 09:07:04,705 fedbiomed DEBUG - torchnn saved model filename: /Users/bal

Let's start the experiment.

By default, this function doesn't stop until all the `rounds` are done for all the nodes

In [7]:
exp.run()

2022-01-20 09:07:16,427 fedbiomed INFO - Sampled nodes in round 0 ['node_c7f1c8d4-9744-4555-b494-d7fdca428181', 'node_648c36e8-b7f1-47af-8749-f562e4abc8fb']
01/20/2022 09:07:16:INFO:Sampled nodes in round 0 ['node_c7f1c8d4-9744-4555-b494-d7fdca428181', 'node_648c36e8-b7f1-47af-8749-f562e4abc8fb']
2022-01-20 09:07:16,429 fedbiomed INFO - Send message to node node_c7f1c8d4-9744-4555-b494-d7fdca428181 - {'researcher_id': 'researcher_120908d2-a069-4a24-838f-3907e0f50ca4', 'job_id': '4fb36a8b-ab85-4e01-a732-061e0cf7ce7e', 'training_args': {'batch_size': 50, 'lr': 0.001, 'epochs': 3, 'dry_run': False, 'batch_maxnum': 100}, 'model_args': {'num_classes': 10, 'max_grad_norm': 1.2, 'target_epsilon': 50.0, 'target_delta': 1e-05, 'max_physical_batch_size': 100}, 'command': 'train', 'model_url': 'http://localhost:8844/media/uploads/2022/01/20/my_model_36efa04e-f4a5-4e66-a669-c00a1d18eb54.py', 'params_url': 'http://localhost:8844/media/uploads/2022/01/20/aggregated_params_init_97b980f3-d285-4f6b-b5e

01/20/2022 09:07:16:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / DEBUG - Message received: {'researcher_id': 'researcher_120908d2-a069-4a24-838f-3907e0f50ca4', 'job_id': '4fb36a8b-ab85-4e01-a732-061e0cf7ce7e', 'training_args': {'batch_size': 50, 'lr': 0.001, 'epochs': 3, 'dry_run': False, 'batch_maxnum': 100}, 'model_args': {'num_classes': 10, 'max_grad_norm': 1.2, 'target_epsilon': 50.0, 'target_delta': 1e-05, 'max_physical_batch_size': 100}, 'command': 'train', 'model_url': 'http://localhost:8844/media/uploads/2022/01/20/my_model_36efa04e-f4a5-4e66-a669-c00a1d18eb54.py', 'params_url': 'http://localhost:8844/media/uploads/2022/01/20/aggregated_params_init_97b980f3-d285-4f6b-b5e9-9976b42af1b4.pt', 'model_class': 'CIFAR10DPPlan', 'training_data': {'node_648c36e8-b7f1-47af-8749-f562e4abc8fb': ['dataset_b56c6286-3a54-42f3-86fc-eed3ce78b4ee']}}
2022-01-20 09:07:16,505 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / DEBUG - [TASKS QUEUE] Item:{'researche

01/20/2022 09:12:50:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=18.19, Delta=1e-05
2022-01-20 09:13:52,088 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=19.19, Delta=1e-05
01/20/2022 09:13:52:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=19.19, Delta=1e-05
2022-01-20 09:14:01,778 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=19.19, Delta=1e-05
01/20/2022 09:14:01:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=19.19, Delta=1e-05
2022-01-20 09:15:15,407 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=19.62, Delta=1e-05
01/20/2022 09:15:15:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=19.62, Delta=1e-05
2022-01-20 09:15:19,138 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=19.62, Delta=1e-05
01/20/2022 09:15:19:INFO:log fro

2022-01-20 09:23:00,559 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=22.25, Delta=1e-05
01/20/2022 09:23:00:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=22.25, Delta=1e-05
2022-01-20 09:23:18,287 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=22.25, Delta=1e-05
01/20/2022 09:23:18:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=22.25, Delta=1e-05
2022-01-20 09:24:43,224 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=22.68, Delta=1e-05
01/20/2022 09:24:43:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=22.68, Delta=1e-05
2022-01-20 09:25:05,761 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=22.68, Delta=1e-05
01/20/2022 09:25:05:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=22.68, Delta=1e-05
2022-01-20 09:26:07,237 fedbiome

2022-01-20 09:39:54,322 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=24.85, Delta=1e-05
01/20/2022 09:39:54:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=24.85, Delta=1e-05
2022-01-20 09:40:14,180 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=24.85, Delta=1e-05
01/20/2022 09:40:14:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=24.85, Delta=1e-05
2022-01-20 09:41:08,529 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / DEBUG - Reached 100 batches for this epoch, ignore remaining data
01/20/2022 09:41:08:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / DEBUG - Reached 100 batches for this epoch, ignore remaining data
2022-01-20 09:41:14,322 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=25.06, Delta=1e-05
01/20/2022 09:41:14:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / IN

2022-01-20 09:49:10,864 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=26.23, Delta=1e-05
01/20/2022 09:49:10:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=26.23, Delta=1e-05
2022-01-20 09:49:30,990 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=26.23, Delta=1e-05
01/20/2022 09:49:30:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=26.23, Delta=1e-05
2022-01-20 09:50:25,860 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=26.42, Delta=1e-05
01/20/2022 09:50:25:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=26.42, Delta=1e-05
2022-01-20 09:50:42,608 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=26.42, Delta=1e-05
01/20/2022 09:50:42:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=26.42, Delta=1e-05
2022-01-20 09:51:44,206 fedbiome

2022-01-20 09:55:18,709 fedbiomed DEBUG - researcher_120908d2-a069-4a24-838f-3907e0f50ca4
01/20/2022 09:55:18:DEBUG:researcher_120908d2-a069-4a24-838f-3907e0f50ca4
2022-01-20 09:55:18,717 fedbiomed INFO - Send message to node node_648c36e8-b7f1-47af-8749-f562e4abc8fb - {'researcher_id': 'researcher_120908d2-a069-4a24-838f-3907e0f50ca4', 'job_id': '4fb36a8b-ab85-4e01-a732-061e0cf7ce7e', 'training_args': {'batch_size': 50, 'lr': 0.001, 'epochs': 3, 'dry_run': False, 'batch_maxnum': 100}, 'model_args': {'num_classes': 10, 'max_grad_norm': 1.2, 'target_epsilon': 50.0, 'target_delta': 1e-05, 'max_physical_batch_size': 100}, 'command': 'train', 'model_url': 'http://localhost:8844/media/uploads/2022/01/20/my_model_36efa04e-f4a5-4e66-a669-c00a1d18eb54.py', 'params_url': 'http://localhost:8844/media/uploads/2022/01/20/aggregated_params_2d69aa0a-1a5b-4610-b230-299b015f43dd.pt', 'model_class': 'CIFAR10DPPlan', 'training_data': {'node_648c36e8-b7f1-47af-8749-f562e4abc8fb': ['dataset_b56c6286-3a54-

2022-01-20 09:55:24,942 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - {'monitor': <fedbiomed.node.history_monitor.HistoryMonitor object at 0x12a99e2b0>, 'batch_size': 50, 'lr': 0.001, 'epochs': 3, 'dry_run': False, 'batch_maxnum': 100}
01/20/2022 09:55:24:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - {'monitor': <fedbiomed.node.history_monitor.HistoryMonitor object at 0x12a99e2b0>, 'batch_size': 50, 'lr': 0.001, 'epochs': 3, 'dry_run': False, 'batch_maxnum': 100}
2022-01-20 09:55:24,956 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / DEBUG - Dataset_path/Users/balelli/data
01/20/2022 09:55:24:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / DEBUG - Dataset_path/Users/balelli/data
2022-01-20 09:55:25,733 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - {'monitor': <fedbiomed.node.history_monitor.HistoryMonitor object at 0x12a76c280>, 'batch_size': 50, 'lr': 0.001, 'epochs': 

01/20/2022 10:03:58:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=20.05, Delta=1e-05
2022-01-20 10:04:05,656 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=20.05, Delta=1e-05
01/20/2022 10:04:05:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=20.05, Delta=1e-05
2022-01-20 10:05:26,800 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=20.48, Delta=1e-05
01/20/2022 10:05:26:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=20.48, Delta=1e-05
2022-01-20 10:05:37,684 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=20.48, Delta=1e-05
01/20/2022 10:05:37:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=20.48, Delta=1e-05
2022-01-20 10:06:30,776 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=20.92, Delta=1e-05
01/20/2022 10:06:30:INFO:log fro

2022-01-20 10:12:14,467 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=23.11, Delta=1e-05
01/20/2022 10:12:14:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=23.11, Delta=1e-05
2022-01-20 10:13:26,111 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=23.55, Delta=1e-05
01/20/2022 10:13:26:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=23.55, Delta=1e-05
2022-01-20 10:13:28,157 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=23.55, Delta=1e-05
01/20/2022 10:13:28:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=23.55, Delta=1e-05
2022-01-20 10:14:41,159 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=23.98, Delta=1e-05
01/20/2022 10:14:41:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=23.98, Delta=1e-05
2022-01-20 10:14:41,714 fedbiome

01/20/2022 10:21:35:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=25.06, Delta=1e-05
2022-01-20 10:22:56,414 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=25.26, Delta=1e-05
01/20/2022 10:22:56:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=25.26, Delta=1e-05
2022-01-20 10:23:01,909 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=25.26, Delta=1e-05
01/20/2022 10:23:01:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=25.26, Delta=1e-05
2022-01-20 10:24:01,068 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=25.45, Delta=1e-05
01/20/2022 10:24:01:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=25.45, Delta=1e-05
2022-01-20 10:24:05,397 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=25.45, Delta=1e-05
01/20/2022 10:24:05:INFO:log fro

2022-01-20 10:32:46,281 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=26.62, Delta=1e-05
01/20/2022 10:32:46:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=26.62, Delta=1e-05
2022-01-20 10:33:59,080 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=26.81, Delta=1e-05
01/20/2022 10:33:59:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=26.81, Delta=1e-05
2022-01-20 10:34:03,166 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=26.81, Delta=1e-05
01/20/2022 10:34:03:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=26.81, Delta=1e-05
2022-01-20 10:35:33,636 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / DEBUG - Reached 100 batches for this epoch, ignore remaining data
01/20/2022 10:35:33:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / DEBUG - Reached 100 batches for th

01/20/2022 10:36:10:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / DEBUG - Message received: {'researcher_id': 'researcher_120908d2-a069-4a24-838f-3907e0f50ca4', 'job_id': '4fb36a8b-ab85-4e01-a732-061e0cf7ce7e', 'training_args': {'batch_size': 50, 'lr': 0.001, 'epochs': 3, 'dry_run': False, 'batch_maxnum': 100}, 'model_args': {'num_classes': 10, 'max_grad_norm': 1.2, 'target_epsilon': 50.0, 'target_delta': 1e-05, 'max_physical_batch_size': 100}, 'command': 'train', 'model_url': 'http://localhost:8844/media/uploads/2022/01/20/my_model_36efa04e-f4a5-4e66-a669-c00a1d18eb54.py', 'params_url': 'http://localhost:8844/media/uploads/2022/01/20/aggregated_params_03c4a20c-0435-44d0-b791-a17d732cf9ef.pt', 'model_class': 'CIFAR10DPPlan', 'training_data': {'node_648c36e8-b7f1-47af-8749-f562e4abc8fb': ['dataset_b56c6286-3a54-42f3-86fc-eed3ce78b4ee']}}
2022-01-20 10:36:10,878 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / DEBUG - Message received: {'researcher_id':

2022-01-20 10:36:34,549 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=11.98, Delta=1e-05
01/20/2022 10:36:34:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=11.98, Delta=1e-05
2022-01-20 10:38:09,282 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=16.01, Delta=1e-05
01/20/2022 10:38:09:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=16.01, Delta=1e-05
2022-01-20 10:38:29,472 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=16.01, Delta=1e-05
01/20/2022 10:38:29:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=16.01, Delta=1e-05
2022-01-20 10:39:45,989 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=17.10, Delta=1e-05
01/20/2022 10:39:45:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=17.10, Delta=1e-05
2022-01-20 10:40:05,612 fedbiome

2022-01-20 10:49:36,056 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=20.92, Delta=1e-05
01/20/2022 10:49:36:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=20.92, Delta=1e-05
2022-01-20 10:50:55,711 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=21.35, Delta=1e-05
01/20/2022 10:50:55:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=21.35, Delta=1e-05
2022-01-20 10:51:17,251 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=21.35, Delta=1e-05
01/20/2022 10:51:17:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=21.35, Delta=1e-05
2022-01-20 10:52:09,516 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / DEBUG - Reached 100 batches for this epoch, ignore remaining data
01/20/2022 10:52:09:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / DEBUG - Reached 100 batches for th

2022-01-20 11:00:06,680 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=23.98, Delta=1e-05
01/20/2022 11:00:06:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=23.98, Delta=1e-05
2022-01-20 11:02:00,237 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=24.26, Delta=1e-05
01/20/2022 11:02:00:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=24.26, Delta=1e-05
2022-01-20 11:02:15,416 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=24.26, Delta=1e-05
01/20/2022 11:02:15:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=24.26, Delta=1e-05
2022-01-20 11:03:45,097 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=24.46, Delta=1e-05
01/20/2022 11:03:45:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=24.46, Delta=1e-05
2022-01-20 11:04:08,114 fedbiome

2022-01-20 11:13:40,630 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=25.64, Delta=1e-05
01/20/2022 11:13:40:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=25.64, Delta=1e-05
2022-01-20 11:13:55,610 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=25.64, Delta=1e-05
01/20/2022 11:13:55:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=25.64, Delta=1e-05
2022-01-20 11:15:27,719 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=25.84, Delta=1e-05
01/20/2022 11:15:27:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / INFO - Epsilon=25.84, Delta=1e-05
2022-01-20 11:15:31,644 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=25.84, Delta=1e-05
01/20/2022 11:15:31:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - Epsilon=25.84, Delta=1e-05
2022-01-20 11:16:31,939 fedbiome

2022-01-20 11:24:10,516 fedbiomed INFO - Downloading model params after training on node_648c36e8-b7f1-47af-8749-f562e4abc8fb - from http://localhost:8844/media/uploads/2022/01/20/node_params_b809881a-dfa5-4bfc-8639-dd3e910d698b.pt
01/20/2022 11:24:10:INFO:Downloading model params after training on node_648c36e8-b7f1-47af-8749-f562e4abc8fb - from http://localhost:8844/media/uploads/2022/01/20/node_params_b809881a-dfa5-4bfc-8639-dd3e910d698b.pt
2022-01-20 11:24:16,691 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - results uploaded successfully 
01/20/2022 11:24:16:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / INFO - results uploaded successfully 
2022-01-20 11:24:25,203 fedbiomed INFO - Downloading model params after training on node_c7f1c8d4-9744-4555-b494-d7fdca428181 - from http://localhost:8844/media/uploads/2022/01/20/node_params_863b1e1a-ab1b-4b3d-b978-4523238e5780.pt
01/20/2022 11:24:25:INFO:Downloading model params after training on no

Different timings (in seconds) are reported for each dataset of a node participating in a round :
- `rtime_training` real time (clock time) spent in the training function on the node
- `ptime_training` process time (user and system CPU) spent in the training function on the node
- `rtime_total` real time (clock time) spent in the researcher between sending the request and handling the response, at the `Job()` layer

In [8]:
print("\nList the training rounds : ", exp.training_replies.keys())

print("\nList the nodes for the last training round and their timings : ")
round_data = exp.training_replies[rounds - 1].data
for c in range(len(round_data)):
    print("\t- {id} :\
    \n\t\trtime_training={rtraining:.2f} seconds\
    \n\t\tptime_training={ptraining:.2f} seconds\
    \n\t\trtime_total={rtotal:.2f} seconds".format(id = round_data[c]['node_id'],
        rtraining = round_data[c]['timing']['rtime_training'],
        ptraining = round_data[c]['timing']['ptime_training'],
        rtotal = round_data[c]['timing']['rtime_total']))
print('\n')


List the training rounds :  dict_keys([0, 1, 2])

List the nodes for the last training round and their timings : 
	- node_648c36e8-b7f1-47af-8749-f562e4abc8fb :    
		rtime_training=2854.59 seconds    
		ptime_training=2754.99 seconds    
		rtime_total=2879.60 seconds
	- node_c7f1c8d4-9744-4555-b494-d7fdca428181 :    
		rtime_training=2867.90 seconds    
		ptime_training=2776.26 seconds    
		rtime_total=2894.30 seconds




# Test Model

We define a little testing routine to extract the accuracy metrics on the testing dataset
## Important
This is done to test the model because it can be accessed in a developpement environment  
In production, the data wont be accessible on the nodes, need a test dataset on the server or accessible from the server.

In [9]:

import torch
import torch.nn as nn

import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader
import pandas as pd
import numpy as np
from PIL import Image
import os

def testing_Accuracy(model, data_loader):
    model.eval()
    test_loss = 0
    correct = 0
    
    device = "cpu"

    correct = 0
    
    loader_size = len(data_loader)
    with torch.no_grad():
        for idx, (data, target) in enumerate(data_loader):
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()
            
            #only uses 10% of the dataset, results are similar but faster
            if idx >= loader_size / 10:
                pass
                break

    
        pred = output.argmax(dim=1, keepdim=True)

    test_loss /= len(data_loader.dataset)
    accuracy = 100* correct/(data_loader.batch_size * idx)

    return(test_loss, accuracy)

Test dataset

In [10]:
import torch
import torchvision
import torchvision.transforms as transforms
from torchvision.datasets import CIFAR10
import os

# These values, specific to the CIFAR10 dataset, are assumed to be known.
# If necessary, they can be computed with modest privacy budget.
CIFAR10_MEAN = (0.4914, 0.4822, 0.4465)
CIFAR10_STD_DEV = (0.2023, 0.1994, 0.2010)

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(CIFAR10_MEAN, CIFAR10_STD_DEV),
])

base_dir = tmp_dir_model.name 
if not os.path.isdir(os.path.join(base_dir, "cifar10")):
    os.makedirs(os.path.join(base_dir, "cifar10"))
test_data_dir = os.path.join(base_dir, "cifar10")

test_dataset = CIFAR10(
    root=test_data_dir, train=False, download=True, transform=transform)

test_loader = torch.utils.data.DataLoader(
    test_dataset,
    batch_size=48,
    shuffle=False,
)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to /Users/balelli/ownCloud/INRIA_EPIONE/FedBioMed/fedbiomed/var/tmp/tmpj7dulov8/cifar10/cifar-10-python.tar.gz


  0%|          | 0/170498071 [00:00<?, ?it/s]

Extracting /Users/balelli/ownCloud/INRIA_EPIONE/FedBioMed/fedbiomed/var/tmp/tmpj7dulov8/cifar10/cifar-10-python.tar.gz to /Users/balelli/ownCloud/INRIA_EPIONE/FedBioMed/fedbiomed/var/tmp/tmpj7dulov8/cifar10


We define a util function to calculate the accuracy:

In [11]:
def accuracy(preds, labels):
    return (preds == labels).mean()

We define the model, and we assign to it the model parameters estimated at the last federated optimization round.

In [12]:
from torchvision import models
from opacus.validators import ModuleValidator

model = models.resnet18(num_classes=10)
model = ModuleValidator.fix(model)
ModuleValidator.validate(model, strict=False)

model = exp.model_instance
model.load_state_dict(exp.aggregated_params[rounds - 1]['params'])

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

We define a function to validate our model on our test dataset.

In [13]:
def test(model, test_loader, device):
    model.eval()
    criterion = nn.CrossEntropyLoss()
    losses = []
    top1_acc = []

    with torch.no_grad():
        for images, target in test_loader:
            images = images.to(device)
            target = target.to(device)

            output = model(images)
            loss = criterion(output, target)
            preds = np.argmax(output.detach().cpu().numpy(), axis=1)
            labels = target.detach().cpu().numpy()
            acc = accuracy(preds, labels)

            losses.append(loss.item())
            top1_acc.append(acc)

    top1_avg = np.mean(top1_acc)

    print(
        f"\tTest set:"
        f"Loss: {np.mean(losses):.6f} "
        f"Acc: {top1_avg * 100:.6f} "
    )
    return np.mean(top1_acc)

And we finally test our model!

In [14]:
top1_acc = test(model, test_loader, device)

	Test set:Loss: 2.041063 Acc: 28.797847 


2022-01-20 11:28:26,201 fedbiomed INFO - log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / CRITICAL - Node stopped in signal_handler, probably by user decision (Ctrl C)
01/20/2022 11:28:26:INFO:log from: node_c7f1c8d4-9744-4555-b494-d7fdca428181 / CRITICAL - Node stopped in signal_handler, probably by user decision (Ctrl C)
2022-01-20 11:28:28,203 fedbiomed INFO - log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / CRITICAL - Node stopped in signal_handler, probably by user decision (Ctrl C)
01/20/2022 11:28:28:INFO:log from: node_648c36e8-b7f1-47af-8749-f562e4abc8fb / CRITICAL - Node stopped in signal_handler, probably by user decision (Ctrl C)
