# Fedbiomed Researcher base example

Use for developing (autoreloads changes made across packages)

In [None]:
%load_ext autoreload
%autoreload 2

## Start the network
Before running this notebook, start the network with `./scripts/fedbiomed_run network`

## Setting the node up
It is necessary to previously configure a node:
1. `./scripts/fedbiomed_run node add`
  * Select option 2 (default) to add MNIST to the node
  * Confirm default tags by hitting "y" and ENTER
  * Pick the folder where MNIST is downloaded (this is due torch issue https://github.com/pytorch/vision/issues/3549)
  * Data must have been added (if you get a warning saying that data must be unique is because it's been already added)
  
2. Check that your data has been added by executing `./scripts/fedbiomed_run node list`
3. Run the node using `./scripts/fedbiomed_run node run`. Wait until you get `Starting task manager`. it means you are online.

## Define an experiment model and parameters"

Declare a torch.nn MyTrainingPlan class to send for training on the node

In [None]:
from fedbiomed.researcher.environ import environ
import tempfile
tmp_dir_model = tempfile.TemporaryDirectory(dir=environ['TMP_DIR']+'/')
model_file = tmp_dir_model.name + '/class_export_mnist.py'

Note : write **only** the code to export in the following cell

In [None]:
%%writefile "$model_file"

import torch
import torch.nn as nn
from fedbiomed.common.torchnn import TorchTrainingPlan
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

# Here we define the model to be used. 
# You can use any class name (here 'Net')
class MyTrainingPlan(TorchTrainingPlan):
    def __init__(self):
        super(MyTrainingPlan, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout(0.25)
        self.dropout2 = nn.Dropout(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)
        
        # Here we define the custom dependencies that will be needed by our custom Dataloader
        # In this case, we need the torch DataLoader classes
        # Since we will train on MNIST, we need datasets and transform from torchvision
        deps = ["from torchvision import datasets, transforms",
               "from torch.utils.data import DataLoader"]
        self.add_dependency(deps)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        
        
        output = F.log_softmax(x, dim=1)
        return output

    def training_data(self, batch_size = 48):
        # Custom torch Dataloader for MNIST data
        transform = transforms.Compose([transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))])
        dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
        train_kwargs = {'batch_size': batch_size, 'shuffle': True}
        data_loader = torch.utils.data.DataLoader(dataset1, **train_kwargs)
        return data_loader
    
    def training_step(self, data, target):
        output = self.forward(data)
        loss   = torch.nn.functional.nll_loss(output, target)
        return loss


## Declaring an Experiment by Providing all the Arguments


In [None]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['#MNIST', '#dataset']
rounds = 3

model_args = {}

training_args = {
    'batch_size': 48, 
    'lr': 1e-3, 
    'epochs': 1, 
    'dry_run': False,  
    'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}

exp = Experiment(tags=tags,
                 #nodes=None,
                 model_path=model_file,
                 model_args=model_args,
                 model_class='MyTrainingPlan',
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage,
                 node_selection_strategy=None)

In [None]:
exp.run_once()

In [None]:
exp.run(1)

In [None]:
exp.run()

In [None]:
exp.run_once()
exp.run()

In [None]:
exp.set_rounds(4)

In [None]:
exp.run()

In [None]:
exp.run_once(True)

In [None]:
exp.run(1, True)

In [None]:
print('Number of rounds that has ben run    : ' , exp.round_current())
print('Round number for starting next round : ' , exp.round_current() + 1)
print('Round Index                          : ' , list(range(exp.round_current())))

## Declaring an Experiment Step by Step 
### Building Empty Experiment

In [None]:
tags =  ['#MNIST', '#dataset']
from fedbiomed.researcher.requests import Requests
reqs = Requests()
training_data = reqs.search(tags)

In [None]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.strategies.default_strategy import DefaultStrategy
#strategy= DefaultStrategy(training_data)
from fedbiomed.researcher.aggregators.fedavg import FedAverage
strategy = FedAverage()
exp = Experiment(training_data=training_data, node_selection_strategy=strategy)

In [None]:
from fedbiomed.researcher.experiment import Experiment
exp = Experiment()

### Setting Tags 

Tags should list strings that contains tags or a string with single tag. 

---
<div class="note">
    <p>If provided tags is not in correct type `.set_tags` will raise <code>TypeError</code></p>
</div>

In [None]:
tags = ["#MNIST", "#dataset"]
exp.set_tags(tags = tags)

### Setting Model Path and Model Model Class

In [None]:
exp.set_model_path(model_path = model_file)
exp.set_model_class(model_class = 'MyTrainingPlan')

### Setting Model Arguments and Training Arguments

In [None]:
model_args = {}

training_args = {
    'batch_size': 48, 
    'lr': 1e-3, 
    'epochs': 1, 
    'dry_run': False,  
    'batch_maxnum': 100
}

exp.set_model_args(model_args = model_args)
exp.set_training_args(training_args = training_args)

### Setting Training Data

The method `set_trainig_data` gets there arguments: 

- `tags` : List of tags as string for the search request. If it is not provided. The method will try to use `tags` attribute of the object. 
- `nodes`: List of node ids that a search request will be sent. If this argument is not provided search request will be sent to all active nodes.  
- `training_data`: A dictionary or `FederatedDataset` object. If `training_data` provided search request with `tags` and `nodes` will be ignored.

In [None]:
exp.set_training_data()


### Setting Job 

Setting job will prepare all neccessary assets to be able to run a round. Therefore, `Job` should be set before running the experiment.  

To be able to set `Job`, you should be already set the arguments: `model_path`, `model_class`, `training_data`. Otherwiser `set_job()` will reaise an Exception. 

In [None]:
exp.set_job()

In [None]:
exp.set_node_selection_strategy()

Parameters of The Experiment

In [None]:
print('Rounds              :', exp.rounds())
print('Tags                :', exp.tags())
print('Model Path          :', exp.model_path())
print('Model Class         :', exp.model_class())
print('Model Arguments     :', exp.model_args())
print('Training Arguments  :', exp.training_args())
print('Job                 :', exp.job())
print('Training Data       :', exp.training_data())
print('Job                 :', exp.job())
print('Nodes               :', exp.nodes()) # Returns selected nodes after search request
print('Aggregator          :', exp.aggregator())
print('N.S. Stragety       :', exp.node_selection_strategy())
print('Breakpoint State    :', exp.breakpoint())
print('Exp  folder         :', exp.experimentation_folder())
print('Exp  path           :', exp.experimentation_path())



In [None]:
exp.info()

In [None]:
exp.run_once()

In [None]:
print('Number of rounds initial             : ' , exp.rounds())
print('Number of rounds that has ben run    : ' , exp.round_current())
print('Round number for starting next round : ' , exp.round_current() + 1)
print('Round Indexes                        : ' , list(range(exp.round_current())))

In [None]:
exp.run_once()

Check current round, deaclare the the round that will be run. 

In [None]:
print('Number of rounds initial             : ' , exp.rounds())
print('Number of rounds that has ben run    : ' , exp.round_current())
print('Round number for starting next round : ' , exp.round_current() + 1)
print('Round Indexes                        : ' , list(range(exp.round_current())))

Running multiple rounds:

In [None]:
exp.run(rounds=3)

In [None]:
exp.run_once()

Setting rounds to higher value

In [None]:
new_rounds = exp.rounds() + 1
exp.set_rounds(new_rounds)
exp.run_once()

In [None]:
print('Number of rounds initial             : ' , exp.rounds())
print('Number of rounds that has ben run    : ' , exp.round_current())
print('Round number for starting next round : ' , exp.round_current() + 1)
print('Round Indexes                        : ' , list(range(exp.round_current())))

In [None]:
rounds = exp.round_current()

print("\nList the training rounds : ", exp.training_replies().keys())
print("\nList the nodes for the last training round and their timings : ")
for r in exp.training_replies().keys():
    round_data = exp.training_replies()[r].data
    print('\n\t Round %s' % str(r+1))
    for c in range(len(round_data)):
        print("\t\t- {id} :\
        \n\t\t\trtime_training={rtraining:.2f} seconds\
        \n\t\t\tptime_training={ptraining:.2f} seconds\
        \n\t\t\trtime_total={rtotal:.2f} seconds".format(id = round_data[c]['node_id'],
            rtraining = round_data[c]['timing']['rtime_training'],
            ptraining = round_data[c]['timing']['ptime_training'],
            rtotal = round_data[c]['timing']['rtime_total']))
print('\n')

### Run Same Experiment with Multple Rounds

In [None]:
exp.run(rounds=2)

### Changing Experiment Parameters with Setters after all The Argument is Already Set
If the `Job` is already initialize and the arguments related to model is modified, `Job` should reinitialize with the method `.set_job()`. This information is also given by Experiment after setting model file.  
  
    
    
<div class="note">
    <p>After runing the experiment changing the model might have some consequances.</p>
</div>

In [None]:
exp.set_model_path(model_file)
exp.set_model_class('MyTrainingPlan')

In [None]:
exp.set_job()

#### Changing Aggregator

Aggregator should be instance of `fedbiomed.researcher.aggregators.aggregator.Aggregator`. Otherwise `set_aggregator` will raise an Expection. Aggregator should be passed as `Callable` class or alredy built object.

Following cell will raise an Exception:

In [None]:
exp.set_aggregator('ThisIsNotAnAggregator')

Correct usage: 

In [None]:
from fedbiomed.researcher.aggregators.fedavg import FedAverage
# Can be passed as Callable class
exp.set_aggregator(FedAverage)

# Can be passed as already build class
fedavg = FedAverage()
exp.set_aggregator(fedavg)

Federated parameters for each round are available via `exp.aggregated_params()` (index 0 to (`rounds` - 1) ).

For example you can view the federated parameters for the last round of the experiment :

In [None]:
print("\nList the training rounds : ", exp.aggregated_params().keys())

print("\nAccess the federated params for the last training round :")
print("\t- params_path: ", exp.aggregated_params()[rounds - 1]['params_path'])
print("\t- parameter data: ", exp.aggregated_params()[rounds - 1]['params'].keys())


Feel free to run other sample notebooks or try your own models :D