# Fedbiomed Researcher base example

Use for developing (autoreloads changes made across packages)

In [1]:
%load_ext autoreload
%autoreload 2

## Start the network
Before running this notebook, start the network with `./scripts/fedbiomed_run network`

## Setting the node up
It is necessary to previously configure a node:
1. `./scripts/fedbiomed_run node add`
  * Select option 2 (default) to add MNIST to the node
  * Confirm default tags by hitting "y" and ENTER
  * Pick the folder where MNIST is downloaded (this is due torch issue https://github.com/pytorch/vision/issues/3549)
  * Data must have been added (if you get a warning saying that data must be unique is because it's been already added)
  
2. Check that your data has been added by executing `./scripts/fedbiomed_run node list`
3. Run the node using `./scripts/fedbiomed_run node run`. Wait until you get `Starting task manager`. it means you are online.

## Define an experiment model and parameters"

Declare a torch.nn MyTrainingPlan class to send for training on the node

In [2]:
from fedbiomed.researcher.environ import environ
import tempfile
tmp_dir_model = tempfile.TemporaryDirectory(dir=environ['TMP_DIR']+'/')
model_file = tmp_dir_model.name + '/class_export_mnist.py'

2022-01-21 08:46:19,056 fedbiomed INFO - Component environment:
2022-01-21 08:46:19,057 fedbiomed INFO - - type = ComponentType.RESEARCHER


Note : write **only** the code to export in the following cell

In [3]:
%%writefile "$model_file"

import torch
import torch.nn as nn
from fedbiomed.common.torchnn import TorchTrainingPlan
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

# Here we define the model to be used. 
# You can use any class name (here 'Net')
class MyTrainingPlan(TorchTrainingPlan):
    def __init__(self):
        super(MyTrainingPlan, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout(0.25)
        self.dropout2 = nn.Dropout(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)
        
        # Here we define the custom dependencies that will be needed by our custom Dataloader
        # In this case, we need the torch DataLoader classes
        # Since we will train on MNIST, we need datasets and transform from torchvision
        deps = ["from torchvision import datasets, transforms",
               "from torch.utils.data import DataLoader"]
        self.add_dependency(deps)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        
        
        output = F.log_softmax(x, dim=1)
        return output

    def training_data(self, batch_size = 48):
        # Custom torch Dataloader for MNIST data
        transform = transforms.Compose([transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))])
        dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
        train_kwargs = {'batch_size': batch_size, 'shuffle': True}
        data_loader = torch.utils.data.DataLoader(dataset1, **train_kwargs)
        return data_loader
    
    def training_step(self, data, target):
        output = self.forward(data)
        loss   = torch.nn.functional.nll_loss(output, target)
        return loss


Writing /home/scansiz/Desktop/Inria/development/fedbiomed/var/tmp/tmpmxpasos3/class_export_mnist.py


## Declaring an Experiment by Providing All the Necessary Arguments


In [6]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['#MNIST', '#dataset']
rounds = 2

model_args = {}

training_args = {
    'batch_size': 48, 
    'lr': 1e-3, 
    'epochs': 1, 
    'dry_run': False,  
    'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}

exp = Experiment(tags=tags,
                 #nodes=None,
                 model_path=model_file,
                 model_args=model_args,
                 model_class='MyTrainingPlan',
                 training_args=training_args,
                 rounds=rounds,
                 aggregator='asdads',
                 node_selection_strategy=None)

2022-01-21 08:46:25,537 fedbiomed INFO - Messaging researcher_420cfc13-37cb-447c-af20-f7ac5cb2b6ab successfully connected to the message broker, object = <fedbiomed.common.messaging.Messaging object at 0x7fe86804daf0>
2022-01-21 08:46:25,575 fedbiomed INFO - Searching dataset with data tags: ['#MNIST', '#dataset'] for all nodes
2022-01-21 08:46:25,578 fedbiomed INFO - log from: node_f290cd48-a70a-4e55-9262-81f802f9c95c / DEBUG - Message received: {'researcher_id': 'researcher_420cfc13-37cb-447c-af20-f7ac5cb2b6ab', 'tags': ['#MNIST', '#dataset'], 'command': 'search'}
2022-01-21 08:46:35,586 fedbiomed INFO - Node selected for training -> node_f290cd48-a70a-4e55-9262-81f802f9c95c


TypeError: FB419: Aggregator type is '<class 'str'>'  and it is not instance of fedbiomed.researcher.aggregators.aggregator.Aggregator.  

In [7]:
print(exp)

NameError: name 'exp' is not defined

Let's start the experiment.

By default, this function doesn't stop until all the `rounds` are done for all the nodes

In [None]:
exp.run()

## Declaring an Experiment Step by Step 
### Building Empty Experiment

In [8]:
from fedbiomed.researcher.experiment import Experiment
exp2 = Experiment()

### Setting Tags 

Tags should list strings that contains tags or a string with single tag. 

---
<div class="note">
    <p>If provided tags is not in correct type `.set_tags` will raise <code>TypeError</code></p>
</div>

In [9]:
tags = ["#MNIST", "#dataset"]
exp2.set_tags(tags = tags)

Following will raise a TypeError: 

In [None]:
tags = True
exp2.set_tags(tags = tags)

### Setting Model Path and Model Model Class

In [None]:
exp2.set_model_path(model_path = model_file)
exp2.set_model_class(model_class = 'MyTrainingPlan')

### Setting Model Arguments and Training Arguments

In [None]:
model_args = {}

training_args = {
    'batch_size': 48, 
    'lr': 1e-3, 
    'epochs': 1, 
    'dry_run': False,  
    'batch_maxnum': 100
}

exp2.set_model_args(model_args = model_args)
exp2.set_training_args(training_args = training_args)

### Setting Training Data

The method `set_trainig_data` gets there arguments: 

- `tags` : List of tags as string for the search request. If it is not provided. The method will try to use `tags` attribute of the object. 
- `nodes`: List of node ids that a search request will be sent. If this argument is not provided search request will be sent to all active nodes.  
- `training_data`: A dictionary or `FederatedDataset` object. If `training_data` provided search request with `tags` and `nodes` will be ignored.

In [None]:
exp2.set_training_data()

### Setting Job 

Setting job will prepare all neccessary assets to be able to run a round. Therefore, `Job` should be set before running the experiment.  

To be able to set `Job`, you should be already set the arguments: `model_path`, `model_class`, `training_data`. Otherwiser `set_job()` will reaise an Exception. 

In [None]:
exp2.set_job()

In [None]:
exp2.set_node_selection_strategy()

In [None]:
exp2.run_once()

In [None]:
exp2.run_once()

### Changing Experiment Parameters with Setters after all The Argument is Already Set
If the `Job` is already initialize and the arguments related to model is modified, `Job` should reinitialize with the method `.set_job()`. This information is also given by Experiment after setting model file.  
  
    
    
<div class="note">
    <p>After runing the experiment changing the model might have some consequances.</p>
</div>

In [None]:
exp2.set_model_path(model_file)
exp2.set_model_class('MyTrainingPlan')

In [None]:
exp2.set_job()

#### Changing Aggregator

Aggregator should be instance of `fedbiomed.researcher.aggregators.aggregator.Aggregator`. Otherwise `set_aggregator` will raise an Expection. Aggregator should be passed as `Callable` class or alredy built object.

Following cell will raise an Exception:

In [None]:
exp2.set_aggregator('ThisIsNotAnAggregator')

Correct usage: 

In [None]:
from fedbiomed.researcher.aggregators.fedavg import FedAverage
# Can be passed as Callable class
exp2.set_aggregator(FedAverage)

# Can be passed as already build class
fedavg = FedAverage()
exp2.set_aggregator(fedavg)

Federated parameters for each round are available in `exp.aggregated_params` (index 0 to (`rounds` - 1) ).

For example you can view the federated parameters for the last round of the experiment :

In [None]:
print("\nList the training rounds : ", exp.aggregated_params.keys())

print("\nAccess the federated params for the last training round :")
print("\t- params_path: ", exp.aggregated_params[rounds - 1]['params_path'])
print("\t- parameter data: ", exp.aggregated_params[rounds - 1]['params'].keys())


Feel free to run other sample notebooks or try your own models :D