# Fed-BioMed Researcher

This example demonstrates using a convolutional model in PyTorch for recognition of smiling faces, with a CelebA dataset split over 2 nodes.

## Setting the node up

Install the CelebA dataset with the help of the [`README.md`](data/Celeba/README.md), inside the [`notebooks/data` folder](./data/Celeba).
The script create 3 nodes with each their data. The dataset of the node 3 is used in this notebook as a testing set.  
Therefore its not necessary to create a node and run the node 3  

Before using script make sure the correct environment is setup, for the node environment, run : `source ./scripts/fedbiomed_environment node`  
For the sake of testing the resulting model, this file uses the data from node 1 and 2 for training and the data from node 3 to test.
You can create multiple nodes by adding a config parameter to the command controlling nodes, for example :  
creating 2 nodes for training :  
 - `./scripts/fedbiomed_run node config node1.ini start`
 - `./scripts/fedbiomed_run node config node2.ini start`  
 
adding data for each node :  
 - `./scripts/fedbiomed_run node config node1.ini add`
 - `./scripts/fedbiomed_run node config node2.ini add`

It is necessary to previously configure at least 1 node:
1. `./scripts/fedbiomed_run node config (ini file) add`
  * Select option 4 (images) to add an image dataset to the node
  * Add a name (eg: 'celeba') and the tag for the dataset (tag should contain `#celeba` as it is the tag used for this training) and finaly add the description
  * Pick a data folder from the 3 generated inside `data/Celeba/celeba_preprocessed` (eg: `data_node_1`)
  * Data must have been added (if you get a warning saying that data must be unique is because it's been already added)
  
2. Check that your data has been added by executing `./scripts/fedbiomed_run node config (ini file) list`
3. Run the node using `./scripts/fedbiomed_run node config (ini file) start`. Wait until you get `Starting task manager`. it means you are online.

For the sake of testing the resulting model, only nodes 1 and 2 were started during training, datas from node 3 is used to test the model.

## Create an experiment to train a model on the data found

Declare a TorchTrainingPlan Net class to send for training on the node

In [12]:
import torch
import torch.nn as nn
from fedbiomed.common.training_plans import TorchTrainingPlan
import torch.nn.functional as F
from torchvision import transforms
from torch.utils.data import Dataset
from fedbiomed.common.data import DataManager
import pandas as pd
import numpy as np
from PIL import Image
import os


class CelebaTrainingPlan(TorchTrainingPlan):
         
    # Defines model 
    def init_model(self):
        model = self.Net()
        return model 
    
    # Here we define the custom dependencies that will be needed by our custom Dataloader
    def init_dependencies(self):
        deps = ["from torch.utils.data import Dataset",
                "from torchvision import transforms",
                "import pandas as pd",
                "from PIL import Image",
                "import os",
                "import numpy as np"]
        return deps

    # Torch modules class
    class Net(nn.Module):
        
        def __init__(self):
            super().__init__()
            #convolution layers
            self.conv1 = nn.Conv2d(3, 32, 3, 1)
            self.conv2 = nn.Conv2d(32, 32, 3, 1)
            self.conv3 = nn.Conv2d(32, 32, 3, 1)
            self.conv4 = nn.Conv2d(32, 32, 3, 1)
            self.dropout1 = nn.Dropout(0.25)
            self.dropout2 = nn.Dropout(0.5)
            # classifier
            self.fc1 = nn.Linear(3168, 128)
            self.fc2 = nn.Linear(128, 2)

        def forward(self, x):
            x = self.conv1(x)
            x = F.max_pool2d(x, 2)
            x = F.relu(x)

            x = self.conv2(x)
            x = F.max_pool2d(x, 2)
            x = F.relu(x)

            x = self.conv3(x)
            x = F.max_pool2d(x, 2)
            x = F.relu(x)

            x = self.conv4(x)
            x = F.max_pool2d(x, 2)
            x = F.relu(x)

            x = self.dropout1(x)
            x = torch.flatten(x, 1)
            x = self.fc1(x)
            x = F.relu(x)

            x = self.dropout2(x)
            x = self.fc2(x)
            output = F.log_softmax(x, dim=1)
            return output


    class CelebaDataset(Dataset):
        """Custom Dataset for loading CelebA face images"""
        
        # we dont load the full data of the images, we retrieve the image with the get item. 
        # in our case, each image is 218*178 * 3colors. there is 67533 images. this take at leas 7G of ram
        # loading images when needed takes more time during training but it wont impact the ram usage as much as loading everything
        def __init__(self, txt_path, img_dir, transform=None):
            df = pd.read_csv(txt_path, sep="\t", index_col=0)
            self.img_dir = img_dir
            self.txt_path = txt_path
            self.img_names = df.index.values
            self.y = df['Smiling'].values
            self.transform = transform
            print("celeba dataset finished")

        def __getitem__(self, index):
            img = np.asarray(Image.open(os.path.join(self.img_dir,
                                        self.img_names[index])))
            img = transforms.ToTensor()(img)
            label = self.y[index]
            return img, label

        def __len__(self):
            return self.y.shape[0]
    
    # The training_data creates the Dataloader to be used for training in the 
    # general class Torchnn of fedbiomed
    def training_data(self):
        dataset = self.CelebaDataset(self.dataset_path + "/target.csv", self.dataset_path + "/data/")
        train_kwargs = {'shuffle': True}
        return DataManager(dataset, **train_kwargs)
    
    # This function must return the loss to backward it 
    def training_step(self, data, target):
        
        output = self.model().forward(data)
        loss   = torch.nn.functional.nll_loss(output, target)
        return loss


This group of arguments correspond respectively:
* `model_args`: a dictionary with the arguments related to the model (e.g. number of layers, features, etc.). This will be passed to the model class on the node side.
* `training_args`: a dictionary containing the arguments for the training routine (e.g. batch size, learning rate, epochs, etc.). This will be passed to the routine on the node side.

**NOTE:** typos and/or lack of positional (required) arguments will raise error. ðŸ¤“

In [13]:
training_args = {
    'loader_args': { 'batch_size': 32, }, 
    'optimizer_args': {
        'lr': 1e-3
    }, 
    'epochs': 1, 
    'dry_run': False,  
    'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}

# Train the federated model

Define an experiment
- search nodes serving data for these `tags`, optionally filter on a list of node ID with `nodes`
- run a round of local training on nodes with model defined in `model_path` + federation with `aggregator`
- run for `round_limit` rounds, applying the `node_selection_strategy` between the rounds

In [14]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['#celeba']
rounds = 3

exp = Experiment(tags=tags,
                 training_plan_class=CelebaTrainingPlan,
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None)

2023-09-18 11:42:24,529 fedbiomed INFO - Searching dataset with data tags: ['#celeba'] for all nodes
2023-09-18 11:42:34,555 fedbiomed INFO - Node selected for training -> node_22a09353-4d82-49b1-a6b2-ec27849f7d58
Secure RNG turned off. This is perfectly fine for experimentation as it allows for much faster training performance, but remember to turn it on and retrain one last time before production with ``secure_mode`` turned on.
2023-09-18 11:42:34,583 fedbiomed DEBUG - using native torch optimizer
2023-09-18 11:42:34,587 fedbiomed DEBUG - Model file has been saved: /Users/edemairy/Developpement/fedbiomed/var/experiments/Experiment_0083/my_model_a05d0554-90ff-4ed8-8952-97082c991b8d.py
2023-09-18 11:42:34,669 fedbiomed DEBUG - HTTP POST request of file /Users/edemairy/Developpement/fedbiomed/var/experiments/Experiment_0083/my_model_a05d0554-90ff-4ed8-8952-97082c991b8d.py successful, with status code 201
2023-09-18 11:42:34,827 fedbiomed DEBUG - HTTP POST request of file /Users/edemairy

Let's start the experiment.

By default, this function doesn't stop until all the `round_limit` rounds are done for all the nodes

In [15]:
exp.run()

2023-09-18 11:42:34,853 fedbiomed INFO - Sampled nodes in round 0 ['node_22a09353-4d82-49b1-a6b2-ec27849f7d58']
2023-09-18 11:42:34,871 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_22a09353-4d82-49b1-a6b2-ec27849f7d58 
					[1m Request: [0m: Perform training with the arguments: {'researcher_id': 'researcher_ad35dfc5-da2c-4bf9-bb84-e61b3b9434b3', 'job_id': '25334363-0bad-486a-aca8-1b1a4a0acefc', 'training_args': {'loader_args': {'batch_size': 32}, 'optimizer_args': {'lr': 0.001}, 'epochs': 1, 'dry_run': False, 'batch_maxnum': 100, 'num_updates': None, 'test_ratio': 0.0, 'test_on_local_updates': False, 'test_on_global_updates': False, 'test_metric': None, 'test_metric_args': {}, 'log_interval': 10, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None, 'share_persistent_buffers': True, 'random_seed': None}, 'training': True, 'model_args': {}, 'round': 0, 'secagg_servkey_id': None, 'secagg_biprime_id': None, 'secagg_random': None, 'secagg_clipping_range': None, 'comm

2023-09-18 11:45:40,093 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_22a09353-4d82-49b1-a6b2-ec27849f7d58 
					 Round 2 Epoch: 1 | Iteration: 30/100 (30%) | Samples: 960/3200
 					 Loss: [1m0.618258[0m 
					 ---------
2023-09-18 11:45:52,005 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_22a09353-4d82-49b1-a6b2-ec27849f7d58 
					 Round 2 Epoch: 1 | Iteration: 40/100 (40%) | Samples: 1280/3200
 					 Loss: [1m0.661937[0m 
					 ---------
2023-09-18 11:46:03,810 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_22a09353-4d82-49b1-a6b2-ec27849f7d58 
					 Round 2 Epoch: 1 | Iteration: 50/100 (50%) | Samples: 1600/3200
 					 Loss: [1m0.645756[0m 
					 ---------
2023-09-18 11:46:15,482 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_22a09353-4d82-49b1-a6b2-ec27849f7d58 
					 Round 2 Epoch: 1 | Iteration: 60/100 (60%) | Samples: 1920/3200
 					 Loss: [1m0.533509[0m 
					 ---------
2023-09-18 11:46:27,407 fedbiomed INFO - [1mTRAINING[0m 
					 

2023-09-18 11:49:15,530 fedbiomed DEBUG - HTTP POST request of file /Users/edemairy/Developpement/fedbiomed/var/experiments/Experiment_0083/aggregated_params_ed6a33f4-c160-45ff-8d4b-9925352cd606.mpk successful, with status code 201
2023-09-18 11:49:15,531 fedbiomed INFO - Saved aggregated params for round 2 in /Users/edemairy/Developpement/fedbiomed/var/experiments/Experiment_0083/aggregated_params_ed6a33f4-c160-45ff-8d4b-9925352cd606.mpk


3

Retrieve the federated model parameters

In [16]:
fed_model = exp.training_plan().model()
fed_model.load_state_dict(exp.aggregated_params()[rounds - 1]['params'])

<All keys matched successfully>

In [17]:
fed_model

Net(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
  (conv3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
  (conv4): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
  (dropout1): Dropout(p=0.25, inplace=False)
  (dropout2): Dropout(p=0.5, inplace=False)
  (fc1): Linear(in_features=3168, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=2, bias=True)
)

# Test Model

We define a little testing routine to extract the accuracy metrics on the testing dataset
## Important
This is done to test the model because it can be accessed in a developpement environment  
In production, the data wont be accessible on the nodes, we will need a test dataset on the server or accessible from the server.

In [18]:

import torch
import torch.nn as nn

import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader
import pandas as pd
import numpy as np
from PIL import Image
import os

def testing_Accuracy(model, data_loader):
    model.eval()
    test_loss = 0
    correct = 0
    
    device = "cpu"

    correct = 0
    
    loader_size = len(data_loader)
    with torch.no_grad():
        for idx, (data, target) in enumerate(data_loader):
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()
            
            #only uses 10% of the dataset, results are similar but faster
            if idx >= loader_size / 10:
                pass
                break

    
        pred = output.argmax(dim=1, keepdim=True)

    test_loss /= len(data_loader.dataset)
    accuracy = 100* correct/(data_loader.batch_size * idx)

    return(test_loss, accuracy)

The test dataset is the data from the third node

In [20]:

test_dataset_path = "./data/Celeba/celeba_preprocessed/data_node_3"

class CelebaDataset(Dataset):
    """Custom Dataset for loading CelebA face images"""

    def __init__(self, txt_path, img_dir, transform=None):
        df = pd.read_csv(txt_path, sep="\t", index_col=0)
        self.img_dir = img_dir
        self.txt_path = txt_path
        self.img_names = df.index.values
        self.y = df['Smiling'].values
        self.transform = transform
        print("celeba dataset finished")

    def __getitem__(self, index):
        img = np.asarray(Image.open(os.path.join(self.img_dir,
                                        self.img_names[index])))
        img = transforms.ToTensor()(img)
        label = self.y[index]
        return img, label

    def __len__(self):
        return self.y.shape[0]
    

dataset = CelebaDataset(test_dataset_path + "/target.csv", test_dataset_path + "/data/")
train_kwargs = { 'shuffle': True}
data_loader = DataLoader(dataset, **train_kwargs)

celeba dataset finished


Loading the testing dataset and computing accuracy metrics for local and federated models

In [21]:
acc_federated = testing_Accuracy(fed_model, data_loader)

In [22]:
acc_federated[1]

81.96624222682854