# Fedbiomed Researcher

Use for developing (autoreloads changes made across packages)

In [None]:
%load_ext autoreload
%autoreload 2

## Setting the node up
It is necessary to previously configure a node:
1. `./scripts/fedbiomed_run node add`
  * Select option 2 (default) to add MNIST to the node
  * Confirm default tags by hitting "y" and ENTER
  * Pick the folder where MNIST is downloaded (this is due torch issue https://github.com/pytorch/vision/issues/3549)
  * Data must have been added (if you get a warning saying that data must be unique is because it's been already added)
  
2. Check that your data has been added by executing `./scripts/fedbiomed_run node list`
3. Run the node using `./scripts/fedbiomed_run node run`. Wait until you get `Connected with result code 0`. it means you are online.

## Create an experiment to train a model on the data found

Declare a torch.nn MyTrainingPlan class to send for training on the node

In [1]:
from fedbiomed.researcher.environ import environ
import os
import tempfile
tmp_dir_model = tempfile.TemporaryDirectory(dir=environ['TMP_DIR'])
model_file = os.path.join(tmp_dir_model.name, 'class_export.py')

Note : write **only** the code to export in the following cell

In [2]:
%%writefile "$model_file"

import torch
import torch.nn as nn
from fedbiomed.common.torchnn import TorchTrainingPlan
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

# you can use any class name eg:
# class NetAlter(Torchnn):
class MyTrainingPlan(TorchTrainingPlan):
    def __init__(self):
        super(MyTrainingPlan, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout(0.25)
        self.dropout2 = nn.Dropout(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)
        
        # Here we define the custom dependencies that will be needed by our custom Dataloader
        deps = ["from torchvision import datasets, transforms",
               "from torch.utils.data import DataLoader"]
        self.add_dependency(deps)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output

    def training_data(self, batch_size = 48):
        # Custom torch Dataloader for MNIST data
        transform = transforms.Compose([transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))])
        dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
        train_kwargs = {'batch_size': batch_size, 'shuffle': True}
        data_loader = torch.utils.data.DataLoader(dataset1, **train_kwargs)
        return data_loader
    
    def training_step(self, data, target):
        output = self.forward(data)
        loss   = torch.nn.functional.nll_loss(output, target)
        return loss


Writing /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/tmpwd4bldvg/class_export.py


This group of arguments correspond respectively:
* `model_args`: a dictionary with the arguments related to the model (e.g. number of layers, features, etc.). This will be passed to the model class on the node side.
* `training_args`: a dictionary containing the arguments for the training routine (e.g. batch size, learning rate, epochs, etc.). This will be passed to the routine on the node side.

**NOTE:** typos and/or lack of positional (required) arguments will raise error. 🤓

In [3]:
training_args = {
    'batch_size': 48, 
    'lr': 1e-3, 
    'epochs': 1, 
    'dry_run': False,  
    'batch_maxnum': 200 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}

# Train the federated model

Define an experiment
- search nodes serving data for these `tags`, optionally filter on a list of node ID with `nodes`
- run a round of local training on nodes with model defined in `model_path` + federation with `aggregator`
- run for `rounds` rounds, applying the `node_selection_strategy` between the rounds

In [4]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['#MNIST', '#dataset']
rounds = 2

exp = Experiment(tags=tags,
                 model_path=model_file,
                 model_class='MyTrainingPlan',
                 training_args=training_args,
                 rounds=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None)

Let's start the experiment.

By default, this function doesn't stop until all the `rounds` are done for all the nodes

In [5]:
exp.run()

Retrieve the federated model parameters

In [6]:
fed_model = exp.model_instance
fed_model.load_state_dict(exp.aggregated_params[rounds - 1]['params'])

<All keys matched successfully>

In [7]:
fed_model

MyTrainingPlan(
  (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))
  (dropout1): Dropout(p=0.25, inplace=False)
  (dropout2): Dropout(p=0.5, inplace=False)
  (fc1): Linear(in_features=9216, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=10, bias=True)
)

# Local model

Here we load the local MNIST dataset

In [8]:
from torchvision import datasets, transforms
from fedbiomed.researcher.environ import environ

transform = transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize((0.1307,), (0.3081,))
        ])

datasets.MNIST(root = os.path.join(environ['TMP_DIR'], 'local_mnist.tmp'), download = True, train = True, transform = transform)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist.tmp/MNIST/raw/train-images-idx3-ubyte.gz


100.0%


Extracting /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist.tmp/MNIST/raw/train-images-idx3-ubyte.gz to /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist.tmp/MNIST/raw



3.5%


Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist.tmp/MNIST/raw/train-labels-idx1-ubyte.gz


102.8%


Extracting /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist.tmp/MNIST/raw/train-labels-idx1-ubyte.gz to /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist.tmp/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist.tmp/MNIST/raw/t10k-images-idx3-ubyte.gz


100.0%


Extracting /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist.tmp/MNIST/raw/t10k-images-idx3-ubyte.gz to /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist.tmp/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz


112.7%

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist.tmp/MNIST/raw/t10k-labels-idx1-ubyte.gz
Extracting /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist.tmp/MNIST/raw/t10k-labels-idx1-ubyte.gz to /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist.tmp/MNIST/raw

Processing...



  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


Done!


Dataset MNIST
    Number of datapoints: 60000
    Root location: /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist.tmp
    Split: Train
    StandardTransform
Transform: Compose(
               ToTensor()
               Normalize(mean=(0.1307,), std=(0.3081,))
           )

We create an object localJob, which mimics the functionalities of the class Job to run the model on the input local dataset

In [10]:
# The class local job mimics the class job used in the experiment
from fedbiomed.researcher.job import localJob
from fedbiomed.researcher.environ import environ

# local train on same amount of data as federated with 1 node
training_args['epochs'] *= rounds

local_job = localJob( dataset_path = os.path.join(environ['TMP_DIR'], 'local_mnist.tmp'),
          model_class='MyTrainingPlan',
          model_path=model_file,
          training_args=training_args)


Running the localJob

In [11]:
local_job.start_training()

We retrieve the local models parameters

In [12]:
local_model = local_job.model_instance

# Comparison

We define a little testing routine to extract the accuracy metrics on the testing dataset

In [13]:
import torch
import torch.nn.functional as F


def testing_Accuracy(model, data_loader):
    model.eval()
    test_loss = 0
    correct = 0
    device = 'cpu'

    correct = 0
    
    with torch.no_grad():
        for data, target in data_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

        pred = output.argmax(dim=1, keepdim=True)

    test_loss /= len(data_loader.dataset)
    accuracy = 100* correct/len(data_loader.dataset)

    return(test_loss, accuracy)

Loading the testing dataset and computing accuracy metrics for local and federated models

In [14]:
test_set = datasets.MNIST(root = os.path.join(environ['TMP_DIR'], 'local_mnist_testing.tmp'), download = True, train = False, transform = transform)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=64, shuffle=True)

acc_local = testing_Accuracy(local_model, test_loader)
acc_federated = testing_Accuracy(fed_model, test_loader)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist_testing.tmp/MNIST/raw/train-images-idx3-ubyte.gz


100.0%


Extracting /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist_testing.tmp/MNIST/raw/train-images-idx3-ubyte.gz to /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist_testing.tmp/MNIST/raw


10.6%


Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist_testing.tmp/MNIST/raw/train-labels-idx1-ubyte.gz


102.8%


Extracting /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist_testing.tmp/MNIST/raw/train-labels-idx1-ubyte.gz to /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist_testing.tmp/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist_testing.tmp/MNIST/raw/t10k-images-idx3-ubyte.gz


100.0%


Extracting /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist_testing.tmp/MNIST/raw/t10k-images-idx3-ubyte.gz to /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist_testing.tmp/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz


112.7%

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist_testing.tmp/MNIST/raw/t10k-labels-idx1-ubyte.gz
Extracting /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist_testing.tmp/MNIST/raw/t10k-labels-idx1-ubyte.gz to /Users/jls/Development/fedbiomed/fedbiomed/var/tmp/local_mnist_testing.tmp/MNIST/raw

Processing...





Done!


In [15]:
print('\nAccuracy local training: {:.4f}, \nAccuracy federated training:  {:.4f}\nDifference: {:.4f}'.format(
             acc_local[1], acc_federated[1], acc_local[1]-acc_federated[1]))

print('\nError local training: {:.4f}, \nError federated training:  {:.4f}\nDifference: {:.4f}'.format(
             acc_local[0], acc_federated[0], acc_local[0]-acc_federated[0]))


Accuracy local training: 97.8700, 
Accuracy federated training:  98.0500
Difference: -0.1800

Error local training: 0.0649, 
Error federated training:  0.0604
Difference: 0.0045
