# Fed-BioMed Researcher base example

Use for developing (autoreloads changes made across packages)

In [None]:
%load_ext autoreload
%autoreload 2

## Start the network
Before running this notebook, start the network with `./scripts/fedbiomed_run network`

## Setting the node up
It is necessary to previously configure a node:
1. `./scripts/fedbiomed_run node add`
  * Select option 2 (default) to add MNIST to the node
  * Confirm default tags by hitting "y" and ENTER
  * Pick the folder where MNIST is downloaded (this is due torch issue https://github.com/pytorch/vision/issues/3549)
  * Data must have been added (if you get a warning saying that data must be unique is because it's been already added)
  
2. Check that your data has been added by executing `./scripts/fedbiomed_run node list`
3. Run the node using `./scripts/fedbiomed_run node run`. Wait until you get `Starting task manager`. it means you are online.

## Define an experiment model and parameters


Declare a torch.nn MyTrainingPlan class to send for training on the node

In [1]:
import torch
import torch.nn as nn
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.common.data import DataManager
from torchvision import datasets, transforms

# Here we define the model to be used. 
# You can use any class name (here 'Net')
class MyTrainingPlan(TorchTrainingPlan):
    def __init__(self, model_args: dict = {}):
        super(MyTrainingPlan, self).__init__(model_args)
        
        
        self.model = self.make_model()
        
        # Here we define the custom dependencies that will be needed by our custom Dataloader
        # In this case, we need the torch DataLoader classes
        # Since we will train on MNIST, we need datasets and transform from torchvision
        deps = ["from torchvision import datasets, transforms"]
        
        self.add_dependency(deps)

    def make_model(self):
        model = nn.Sequential(nn.Conv2d(1, 32, 3, 1),
                                  nn.ReLU(),
                                  nn.Conv2d(32, 64, 3, 1),
                                  nn.ReLU(),
                                  nn.MaxPool2d(2),
                                  nn.Dropout(0.25),
                                  nn.Flatten(),
                                  nn.Linear(9216, 128),
                                  nn.ReLU(),
                                  nn.Dropout(0.5),
                                  nn.Linear(128, 10),
                                  nn.LogSoftmax(dim=1))
        return model
        
        
    def forward(self, x):

        return self.model(x)

    def training_data(self, batch_size = 48):
        # Custom torch Dataloader for MNIST data
        transform = transforms.Compose([transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))])
        dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
        train_kwargs = {'batch_size': batch_size, 'shuffle': True}
        return DataManager(dataset=dataset1, **train_kwargs)
    
    def training_step(self, data, target):
        output = self.forward(data)
        loss   = torch.nn.functional.nll_loss(output, target)
        return loss


This group of arguments correspond respectively:
* `model_args`: a dictionary with the arguments related to the model (e.g. number of layers, features, etc.). This will be passed to the model class on the node side.
* `training_args`: a dictionary containing the arguments for the training routine (e.g. batch size, learning rate, epochs, etc.). This will be passed to the routine on the node side.

**NOTE:** typos and/or lack of positional (required) arguments will raise error. 🤓

In [2]:
model_args = {}

training_args = {
    'batch_size': 48, 
    'lr': 1e-3, 
    'epochs': 1, 
    'dry_run': False,  
    'batch_maxnum': 100, # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}

## Declare and run the experiment

- search nodes serving data for these `tags`, optionally filter on a list of node ID with `nodes`
- run a round of local training on nodes with model defined in `model_path` + federation with `aggregator`
- run for `round_limit` rounds, applying the `node_selection_strategy` between the rounds

In [3]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['#MNIST', '#dataset']
rounds = 2

exp = Experiment(tags=tags,
                 model_args=model_args,
                 model_class=MyTrainingPlan,
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None)

2022-05-03 13:01:52,855 fedbiomed INFO - Component environment:
05/03/2022 13:01:52:INFO:Component environment:
2022-05-03 13:01:52,857 fedbiomed INFO - type = ComponentType.RESEARCHER
05/03/2022 13:01:52:INFO:type = ComponentType.RESEARCHER
2022-05-03 13:01:52,940 fedbiomed INFO - Messaging researcher_88ec0705-c7ee-44c4-b5f4-7a4682687364 successfully connected to the message broker, object = <fedbiomed.common.messaging.Messaging object at 0x7fd283f123a0>
05/03/2022 13:01:52:INFO:Messaging researcher_88ec0705-c7ee-44c4-b5f4-7a4682687364 successfully connected to the message broker, object = <fedbiomed.common.messaging.Messaging object at 0x7fd283f123a0>
2022-05-03 13:01:53,004 fedbiomed INFO - Searching dataset with data tags: ['#MNIST', '#dataset'] for all nodes
05/03/2022 13:01:53:INFO:Searching dataset with data tags: ['#MNIST', '#dataset'] for all nodes
2022-05-03 13:02:03,016 fedbiomed INFO - Node selected for training -> node_23891420-db2f-446f-aeb1-e825ec90b63f
05/03/2022 13:02:

Let's start the experiment.

By default, this function doesn't stop until all the `round_limit` rounds are done for all the nodes

In [4]:
exp.run_once(increase=True)

2022-05-03 13:02:03,418 fedbiomed INFO - Sampled nodes in round 0 ['node_23891420-db2f-446f-aeb1-e825ec90b63f']
05/03/2022 13:02:03:INFO:Sampled nodes in round 0 ['node_23891420-db2f-446f-aeb1-e825ec90b63f']
2022-05-03 13:02:03,421 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_23891420-db2f-446f-aeb1-e825ec90b63f 
					[1m Request: [0m: Perform training with the arguments: {'researcher_id': 'researcher_88ec0705-c7ee-44c4-b5f4-7a4682687364', 'job_id': 'ccc48fca-e4be-40b1-bbea-8c74d7b681d4', 'training_args': {'test_ratio': 0.0, 'test_on_local_updates': False, 'test_on_global_updates': False, 'test_metric': None, 'test_metric_args': {}, 'batch_size': 48, 'lr': 0.001, 'epochs': 1, 'dry_run': False, 'batch_maxnum': 100}, 'training': True, 'model_args': {}, 'command': 'train', 'model_url': 'http://localhost:8844/media/uploads/2022/05/03/my_model_6a939341-e6a7-442b-9e93-667ec09c2f09.py', 'params_url': 'http://localhost:8844/media/uploads/2022/05/03/aggregated_params_init_

2022-05-03 13:02:08,833 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_23891420-db2f-446f-aeb1-e825ec90b63f
					[1m MESSAGE:[0m results uploaded successfully [0m
-----------------------------------------------------------------
05/03/2022 13:02:08:INFO:[1mINFO[0m
					[1m NODE[0m node_23891420-db2f-446f-aeb1-e825ec90b63f
					[1m MESSAGE:[0m results uploaded successfully [0m
-----------------------------------------------------------------
2022-05-03 13:02:18,442 fedbiomed INFO - Downloading model params after training on node_23891420-db2f-446f-aeb1-e825ec90b63f - from http://localhost:8844/media/uploads/2022/05/03/node_params_81d7eba7-ec7e-4dfc-af5f-87ddcc8bceeb.pt
05/03/2022 13:02:18:INFO:Downloading model params after training on node_23891420-db2f-446f-aeb1-e825ec90b63f - from http://localhost:8844/media/uploads/2022/05/03/node_params_81d7eba7-ec7e-4dfc-af5f-87ddcc8bceeb.pt
2022-05-03 13:02:18,471 fedbiomed DEBUG - upload (HTTP GET request) of file node_params_58

1

In [5]:
exp.run(rounds=8, increase=True)

2022-05-03 13:02:18,630 fedbiomed DEBUG - Auto increasing total rounds for experiment from 2 to 9
05/03/2022 13:02:18:DEBUG:Auto increasing total rounds for experiment from 2 to 9
2022-05-03 13:02:18,632 fedbiomed INFO - Sampled nodes in round 1 ['node_23891420-db2f-446f-aeb1-e825ec90b63f']
05/03/2022 13:02:18:INFO:Sampled nodes in round 1 ['node_23891420-db2f-446f-aeb1-e825ec90b63f']
2022-05-03 13:02:18,633 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_23891420-db2f-446f-aeb1-e825ec90b63f 
					[1m Request: [0m: Perform training with the arguments: {'researcher_id': 'researcher_88ec0705-c7ee-44c4-b5f4-7a4682687364', 'job_id': 'ccc48fca-e4be-40b1-bbea-8c74d7b681d4', 'training_args': {'test_ratio': 0.0, 'test_on_local_updates': False, 'test_on_global_updates': False, 'test_metric': None, 'test_metric_args': {}, 'batch_size': 48, 'lr': 0.001, 'epochs': 1, 'dry_run': False, 'batch_maxnum': 100}, 'training': True, 'model_args': {}, 'command': 'train', 'model_url': 'htt

2022-05-03 13:02:24,181 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_23891420-db2f-446f-aeb1-e825ec90b63f
					[1m MESSAGE:[0m results uploaded successfully [0m
-----------------------------------------------------------------
05/03/2022 13:02:24:INFO:[1mINFO[0m
					[1m NODE[0m node_23891420-db2f-446f-aeb1-e825ec90b63f
					[1m MESSAGE:[0m results uploaded successfully [0m
-----------------------------------------------------------------
2022-05-03 13:02:33,654 fedbiomed INFO - Downloading model params after training on node_23891420-db2f-446f-aeb1-e825ec90b63f - from http://localhost:8844/media/uploads/2022/05/03/node_params_3ddb3174-8ca5-49ac-87c8-59fb26d6dfff.pt
05/03/2022 13:02:33:INFO:Downloading model params after training on node_23891420-db2f-446f-aeb1-e825ec90b63f - from http://localhost:8844/media/uploads/2022/05/03/node_params_3ddb3174-8ca5-49ac-87c8-59fb26d6dfff.pt
2022-05-03 13:02:33,708 fedbiomed DEBUG - upload (HTTP GET request) of file node_params_be

2022-05-03 13:02:37,184 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_23891420-db2f-446f-aeb1-e825ec90b63f 
					 Epoch: 1 | Completed: 2880/60000 (5%) 
 					 Loss: [1m0.136210[0m 
					 ---------
05/03/2022 13:02:37:INFO:[1mTRAINING[0m 
					 NODE_ID: node_23891420-db2f-446f-aeb1-e825ec90b63f 
					 Epoch: 1 | Completed: 2880/60000 (5%) 
 					 Loss: [1m0.136210[0m 
					 ---------
2022-05-03 13:02:37,772 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_23891420-db2f-446f-aeb1-e825ec90b63f 
					 Epoch: 1 | Completed: 3360/60000 (6%) 
 					 Loss: [1m0.213740[0m 
					 ---------
05/03/2022 13:02:37:INFO:[1mTRAINING[0m 
					 NODE_ID: node_23891420-db2f-446f-aeb1-e825ec90b63f 
					 Epoch: 1 | Completed: 3360/60000 (6%) 
 					 Loss: [1m0.213740[0m 
					 ---------
2022-05-03 13:02:38,351 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_23891420-db2f-446f-aeb1-e825ec90b63f 
					 Epoch: 1 | Completed: 3840/60000 (6%) 
 					 Loss: [1m0.090399[0m 
			

2022-05-03 13:02:49,740 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_23891420-db2f-446f-aeb1-e825ec90b63f 
					 Epoch: 1 | Completed: 480/60000 (1%) 
 					 Loss: [1m0.305031[0m 
					 ---------
05/03/2022 13:02:49:INFO:[1mTRAINING[0m 
					 NODE_ID: node_23891420-db2f-446f-aeb1-e825ec90b63f 
					 Epoch: 1 | Completed: 480/60000 (1%) 
 					 Loss: [1m0.305031[0m 
					 ---------
2022-05-03 13:02:50,321 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_23891420-db2f-446f-aeb1-e825ec90b63f 
					 Epoch: 1 | Completed: 960/60000 (2%) 
 					 Loss: [1m0.126702[0m 
					 ---------
05/03/2022 13:02:50:INFO:[1mTRAINING[0m 
					 NODE_ID: node_23891420-db2f-446f-aeb1-e825ec90b63f 
					 Epoch: 1 | Completed: 960/60000 (2%) 
 					 Loss: [1m0.126702[0m 
					 ---------
2022-05-03 13:02:50,924 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_23891420-db2f-446f-aeb1-e825ec90b63f 
					 Epoch: 1 | Completed: 1440/60000 (2%) 
 					 Loss: [1m0.367814[0m 
					 -

2022-05-03 13:03:04,380 fedbiomed DEBUG - researcher_88ec0705-c7ee-44c4-b5f4-7a4682687364
05/03/2022 13:03:04:DEBUG:researcher_88ec0705-c7ee-44c4-b5f4-7a4682687364
					[1m NODE[0m node_23891420-db2f-446f-aeb1-e825ec90b63f
					[1m MESSAGE:[0m There is no test activated for the round. Please set flag for `test_on_global_updates`, `test_on_local_updates`, or both. Splitting dataset for testing will be ignored[0m
-----------------------------------------------------------------
					[1m NODE[0m node_23891420-db2f-446f-aeb1-e825ec90b63f
					[1m MESSAGE:[0m There is no test activated for the round. Please set flag for `test_on_global_updates`, `test_on_local_updates`, or both. Splitting dataset for testing will be ignored[0m
-----------------------------------------------------------------
2022-05-03 13:03:04,435 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_23891420-db2f-446f-aeb1-e825ec90b63f
					[1m MESSAGE:[0m training with arguments {'history_monitor': <fedbiomed

2022-05-03 13:03:19,573 fedbiomed INFO - Sampled nodes in round 5 ['node_23891420-db2f-446f-aeb1-e825ec90b63f']
05/03/2022 13:03:19:INFO:Sampled nodes in round 5 ['node_23891420-db2f-446f-aeb1-e825ec90b63f']
2022-05-03 13:03:19,574 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_23891420-db2f-446f-aeb1-e825ec90b63f 
					[1m Request: [0m: Perform training with the arguments: {'researcher_id': 'researcher_88ec0705-c7ee-44c4-b5f4-7a4682687364', 'job_id': 'ccc48fca-e4be-40b1-bbea-8c74d7b681d4', 'training_args': {'test_ratio': 0.0, 'test_on_local_updates': False, 'test_on_global_updates': False, 'test_metric': None, 'test_metric_args': {}, 'batch_size': 48, 'lr': 0.001, 'epochs': 1, 'dry_run': False, 'batch_maxnum': 100}, 'training': True, 'model_args': {}, 'command': 'train', 'model_url': 'http://localhost:8844/media/uploads/2022/05/03/my_model_6a939341-e6a7-442b-9e93-667ec09c2f09.py', 'params_url': 'http://localhost:8844/media/uploads/2022/05/03/aggregated_params_e63a8

2022-05-03 13:03:24,821 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_23891420-db2f-446f-aeb1-e825ec90b63f
					[1m MESSAGE:[0m results uploaded successfully [0m
-----------------------------------------------------------------
05/03/2022 13:03:24:INFO:[1mINFO[0m
					[1m NODE[0m node_23891420-db2f-446f-aeb1-e825ec90b63f
					[1m MESSAGE:[0m results uploaded successfully [0m
-----------------------------------------------------------------
2022-05-03 13:03:34,593 fedbiomed INFO - Downloading model params after training on node_23891420-db2f-446f-aeb1-e825ec90b63f - from http://localhost:8844/media/uploads/2022/05/03/node_params_1fc9c324-9912-4e77-b3ad-92b35ad320cf.pt
05/03/2022 13:03:34:INFO:Downloading model params after training on node_23891420-db2f-446f-aeb1-e825ec90b63f - from http://localhost:8844/media/uploads/2022/05/03/node_params_1fc9c324-9912-4e77-b3ad-92b35ad320cf.pt
2022-05-03 13:03:34,629 fedbiomed DEBUG - upload (HTTP GET request) of file node_params_89

2022-05-03 13:03:38,028 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_23891420-db2f-446f-aeb1-e825ec90b63f 
					 Epoch: 1 | Completed: 2880/60000 (5%) 
 					 Loss: [1m0.055199[0m 
					 ---------
05/03/2022 13:03:38:INFO:[1mTRAINING[0m 
					 NODE_ID: node_23891420-db2f-446f-aeb1-e825ec90b63f 
					 Epoch: 1 | Completed: 2880/60000 (5%) 
 					 Loss: [1m0.055199[0m 
					 ---------
2022-05-03 13:03:38,621 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_23891420-db2f-446f-aeb1-e825ec90b63f 
					 Epoch: 1 | Completed: 3360/60000 (6%) 
 					 Loss: [1m0.093898[0m 
					 ---------
05/03/2022 13:03:38:INFO:[1mTRAINING[0m 
					 NODE_ID: node_23891420-db2f-446f-aeb1-e825ec90b63f 
					 Epoch: 1 | Completed: 3360/60000 (6%) 
 					 Loss: [1m0.093898[0m 
					 ---------
2022-05-03 13:03:39,227 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_23891420-db2f-446f-aeb1-e825ec90b63f 
					 Epoch: 1 | Completed: 3840/60000 (6%) 
 					 Loss: [1m0.024994[0m 
			

2022-05-03 13:03:50,622 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_23891420-db2f-446f-aeb1-e825ec90b63f 
					 Epoch: 1 | Completed: 480/60000 (1%) 
 					 Loss: [1m0.003510[0m 
					 ---------
05/03/2022 13:03:50:INFO:[1mTRAINING[0m 
					 NODE_ID: node_23891420-db2f-446f-aeb1-e825ec90b63f 
					 Epoch: 1 | Completed: 480/60000 (1%) 
 					 Loss: [1m0.003510[0m 
					 ---------
2022-05-03 13:03:51,157 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_23891420-db2f-446f-aeb1-e825ec90b63f 
					 Epoch: 1 | Completed: 960/60000 (2%) 
 					 Loss: [1m0.044693[0m 
					 ---------
05/03/2022 13:03:51:INFO:[1mTRAINING[0m 
					 NODE_ID: node_23891420-db2f-446f-aeb1-e825ec90b63f 
					 Epoch: 1 | Completed: 960/60000 (2%) 
 					 Loss: [1m0.044693[0m 
					 ---------
2022-05-03 13:03:51,675 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_23891420-db2f-446f-aeb1-e825ec90b63f 
					 Epoch: 1 | Completed: 1440/60000 (2%) 
 					 Loss: [1m0.028476[0m 
					 -

2022-05-03 13:04:05,242 fedbiomed DEBUG - researcher_88ec0705-c7ee-44c4-b5f4-7a4682687364
05/03/2022 13:04:05:DEBUG:researcher_88ec0705-c7ee-44c4-b5f4-7a4682687364
					[1m NODE[0m node_23891420-db2f-446f-aeb1-e825ec90b63f
					[1m MESSAGE:[0m There is no test activated for the round. Please set flag for `test_on_global_updates`, `test_on_local_updates`, or both. Splitting dataset for testing will be ignored[0m
-----------------------------------------------------------------
					[1m NODE[0m node_23891420-db2f-446f-aeb1-e825ec90b63f
					[1m MESSAGE:[0m There is no test activated for the round. Please set flag for `test_on_global_updates`, `test_on_local_updates`, or both. Splitting dataset for testing will be ignored[0m
-----------------------------------------------------------------
2022-05-03 13:04:05,290 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_23891420-db2f-446f-aeb1-e825ec90b63f
					[1m MESSAGE:[0m training with arguments {'history_monitor': <fedbiomed

8

Local training results for each round and each node are available via `exp.training_replies()` (index 0 to (`rounds` - 1) ).

For example you can view the training results for the last round below.

Different timings (in seconds) are reported for each dataset of a node participating in a round :
- `rtime_training` real time (clock time) spent in the training function on the node
- `ptime_training` process time (user and system CPU) spent in the training function on the node
- `rtime_total` real time (clock time) spent in the researcher between sending the request and handling the response, at the `Job()` layer

In [6]:
print("\nList the training rounds : ", exp.training_replies().keys())

print("\nList the nodes for the last training round and their timings : ")
round_data = exp.training_replies()[rounds - 1].data()
for c in range(len(round_data)):
    print("\t- {id} :\
    \n\t\trtime_training={rtraining:.2f} seconds\
    \n\t\tptime_training={ptraining:.2f} seconds\
    \n\t\trtime_total={rtotal:.2f} seconds".format(id = round_data[c]['node_id'],
        rtraining = round_data[c]['timing']['rtime_training'],
        ptraining = round_data[c]['timing']['ptime_training'],
        rtotal = round_data[c]['timing']['rtime_total']))
print('\n')
    
exp.training_replies()[rounds - 1].dataframe()


List the training rounds :  dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8])

List the nodes for the last training round and their timings : 
	- node_23891420-db2f-446f-aeb1-e825ec90b63f :    
		rtime_training=5.35 seconds    
		ptime_training=21.22 seconds    
		rtime_total=15.02 seconds




Unnamed: 0,success,msg,dataset_id,node_id,params_path,params,timing
0,True,,dataset_57a5e69c-773a-49ff-a666-033a2e0c7816,node_23891420-db2f-446f-aeb1-e825ec90b63f,/home/ybouilla/fedbiomed/var/experiments/Exper...,"{'model.0.weight': [[tensor([[ 0.2718, 0.1161...","{'rtime_training': 5.345576226000048, 'ptime_t..."


Federated parameters for each round are available via `exp.aggregated_params()` (index 0 to (`rounds` - 1) ).

For example you can view the federated parameters for the last round of the experiment :

In [7]:
print("\nList the training rounds : ", exp.aggregated_params().keys())

print("\nAccess the federated params for the last training round :")
print("\t- params_path: ", exp.aggregated_params()[rounds - 1]['params_path'])
print("\t- parameter data: ", exp.aggregated_params()[rounds - 1]['params'].keys())



List the training rounds :  dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8])

Access the federated params for the last training round :
	- params_path:  /home/ybouilla/fedbiomed/var/experiments/Experiment_0005/aggregated_params_2480d0c6-6a57-43fd-bd92-8d559d3a2cd1.pt
	- parameter data:  odict_keys(['model.0.weight', 'model.0.bias', 'model.2.weight', 'model.2.bias', 'model.7.weight', 'model.7.bias', 'model.10.weight', 'model.10.bias'])


Feel free to run other sample notebooks or try your own models :D