# Fed-BioMed to train a federated SGD regressor model

## Data 


This tutorial shows how to deploy in Fed-BioMed to solve a federated regression problem with scikit-learn.

In this tutorial we are using the wrapper of Fed-BioMed for the SGD regressor (https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html).
The goal of the notebook is to train a model on a realistic dataset of (synthetic) medical information mimicking the ADNI dataset (http://adni.loni.usc.edu/). 

## Creating nodes

To proceed with the tutorial, we create 3 clients with corresponding dataframes of clinical information in .csv format. Each client has 300 data points composed by several features corresponding to clinical and medical imaging informations. **The data is entirely synthetic and randomly sampled to mimick the variability of the real ADNI dataset**. The training partitions are availables at the following link:

https://drive.google.com/file/d/1R39Ir60oQi8ZnmHoPz5CoGCrVIglcO9l/view?usp=sharing

The federated task we aim at solve is to predict a clinical variable (the mini-mental state examination, MMSE) from a combination of demographic and imaging features. The regressors variables are the following features:

['SEX', 'AGE', 'PTEDUCAT', 'WholeBrain.bl', 'Ventricles.bl', 'Hippocampus.bl', 'MidTemp.bl', 'Entorhinal.bl']

and the target variable is:

['MMSE.bl']
    

To create the federated dataset, we follow the standard procedure for node creation/population of Fed-BioMed. 
After activating the fedbiomed network with the commands

`source ./scripts/fedbiomed_environment network`

and 

`./scripts/fedbiomed_run network`

we create a first node by using the commands

`source ./scripts/fedbiomed_environment node`

`./scripts/fedbiomed_run node start`

We then poulate the node with the data of first client:

`./scripts/fedbiomed_run node config conf.ini add`

Thn, we select option 1 (csv dataset) to add the .csv partition of client 1, by just picking the .csv of client 1. We use `adni` as tag to save the selected dataset. We can further check that the data has been added by executing `./scripts/fedbiomed_run node list`

Following the same procedure, we create the other two nodes with the datasets of client 2 and client 3 respectively. To do so, we add and launch a `Node`using others configuration files

## Fed-BioMed Researcher

We are now ready to start the reseracher enviroment with the command `source ./scripts/fedbiomed_environment researcher`, and open the Jupyter notebook with `./scripts/fedbiomed_run researcher start`. 

We can first query the network for the adni dataset. In this case, the nodes are sharing the respective partitions unsing the same tag `adni`:

In [None]:
%load_ext autoreload
%autoreload 2

In [1]:
from fedbiomed.researcher.requests import Requests
req = Requests()
req.list(verbose=True)

2023-03-08 16:18:01,160 fedbiomed INFO - Messaging researcher_cdac117b-3411-4777-9eb7-bb3c477c3f29 successfully connected to the message broker, object = <fedbiomed.common.messaging.Messaging object at 0x7fadec94cc10>
2023-03-08 16:18:01,188 fedbiomed INFO - Listing available datasets in all nodes... 
2023-03-08 16:18:11,205 fedbiomed INFO - 
 Node: node_f2b2d532-f811-424d-966f-4b21c0bfd618 | Number of Datasets: 3 
+--------+-------------+------------------------+----------------+--------------------+----------------------------------------------+----------------------+
| name   | data_type   | tags                   | description    | shape              | dataset_id                                   | dataset_parameters   |
| MNIST  | default     | ['#MNIST', '#dataset'] | MNIST database | [60000, 1, 28, 28] | dataset_5eee3fc2-f1d2-47ac-874f-cf50900f0963 |                      |
+--------+-------------+------------------------+----------------+--------------------+--------------------

{'node_f2b2d532-f811-424d-966f-4b21c0bfd618': [{'name': 'MNIST',
   'data_type': 'default',
   'tags': ['#MNIST', '#dataset'],
   'description': 'MNIST database',
   'shape': [60000, 1, 28, 28],
   'dataset_id': 'dataset_5eee3fc2-f1d2-47ac-874f-cf50900f0963',
   'dataset_parameters': None},
  {'name': '',
   'data_type': 'csv',
   'tags': ['perp'],
   'description': '',
   'shape': [100, 21],
   'dataset_id': 'dataset_cb869f51-15d9-4b6a-8bf6-542c27c45192',
   'dataset_parameters': None},
  {'name': '',
   'data_type': 'csv',
   'tags': ['adni'],
   'description': '',
   'shape': [300, 20],
   'dataset_id': 'dataset_15b693a0-2e14-43ae-b7dc-611975719fea',
   'dataset_parameters': None}],
 'node_57b462ee-4d35-4dd9-b3cb-ea964a889e92': [{'name': 'MNIST',
   'data_type': 'default',
   'tags': ['#MNIST', '#dataset'],
   'description': 'MNIST database',
   'shape': [60000, 1, 28, 28],
   'dataset_id': 'dataset_8fe79723-3516-4134-a3d1-16a2eaa366c8',
   'dataset_parameters': None},
  {'name': ''

The code for network and data loader of the sklearn SGDRegressor can now be deployed in Fed-BioMed.
We first import the necessary module `SGDSkLearnModel` from `fedbiomed`:

**__init__** : we add here the needed sklearn libraries
       
**training_data** : you must return here a tuple (data,targets) that must be of the same type of 
your method partial_fit parameters. 

We note that this model performs a common standardization across federated datasets by **centering with respect to the same parameters**.

In [1]:
from fedbiomed.common.training_plans import FedSGDRegressor
from fedbiomed.common.data import DataManager

from declearn.optimizer import Optimizer
from declearn.optimizer.modules import AdamModule
from declearn.optimizer.regularizers import LassoRegularizer

class SGDRegressorTrainingPlan(FedSGDRegressor):
    # Declares and return dependencies
    def init_dependencies(self):
        deps = ["from torchvision import datasets, transforms",
                "from declearn.optimizer import Optimizer",
                "from declearn.optimizer.modules import AdamModule",
                "from declearn.optimizer.regularizers import LassoRegularizer"]
        return deps

    def training_data(self, batch_size):
        dataset = pd.read_csv(self.dataset_path, delimiter=',')
        regressors_col = ['AGE', 'WholeBrain.bl',
                          'Ventricles.bl', 'Hippocampus.bl', 'MidTemp.bl', 'Entorhinal.bl']
        target_col = ['MMSE.bl']
        
        # mean and standard deviation for normalizing dataset
        # it has been computed over the whole dataset
        scaling_mean = np.array([72.3, 0.7, 0.0, 0.0, 0.0, 0.0])
        scaling_sd = np.array([7.3e+00, 5.0e-02, 1.1e-02, 1.0e-03, 2.0e-03, 1.0e-03])
        
        X = (dataset[regressors_col].values-scaling_mean)/scaling_sd
        y = dataset[target_col]
        return DataManager(dataset=X, target=y.values.ravel(), batch_size=batch_size, shuffle=True)

    # Defines and return a declearn optimizer
    def init_optimizer(self, optimizer_args):
        return Optimizer(lrate=.1 ,modules=[AdamModule()], regularizers=[LassoRegularizer()])

**model_args** is a dictionary containing your model arguments, in case of SGDRegressor this will be max_iter and tol. n_features is provided to correctly initialize the SGDRegressor coef_ array.

**training_args** is a dictionary with parameters related to Federated Learning. 

In [2]:
from fedbiomed.common.metrics import MetricTypes
RANDOM_SEED = 1234


model_args = {
    'max_iter':2000,
    'tol': 1e-5,
    'eta0':0.05,
    'n_features': 6,
    'random_state': RANDOM_SEED
}

training_args = {
    'epochs': 5,
    'batch_size': 32,
    'test_ratio':.3,
    'test_metric': MetricTypes.MEAN_SQUARE_ERROR,
    'test_on_local_updates': True,
    'test_on_global_updates': True
}

The experiment can be now defined, by providing the `adni` tag, and running the local training on nodes with model defined in `model_path`, standard `aggregator` (FedAvg) and `client_selection_strategy` (all nodes used). Federated learning is going to be perfomed through 10 optimization rounds.

In [3]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['adni']

# Add more rounds for results with better accuracy
#
#rounds = 40
rounds = 5

# select nodes participating to this experiment
exp = Experiment(tags=tags,
                 model_args=model_args,
                 training_plan_class=SGDRegressorTrainingPlan,
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None)

2023-03-08 16:37:25,890 fedbiomed INFO - Messaging researcher_cdac117b-3411-4777-9eb7-bb3c477c3f29 successfully connected to the message broker, object = <fedbiomed.common.messaging.Messaging object at 0x7f964a1cd070>
2023-03-08 16:37:25,917 fedbiomed INFO - Searching dataset with data tags: ['adni'] for all nodes
2023-03-08 16:37:35,933 fedbiomed INFO - Node selected for training -> node_f2b2d532-f811-424d-966f-4b21c0bfd618
2023-03-08 16:37:35,936 fedbiomed INFO - Node selected for training -> node_57b462ee-4d35-4dd9-b3cb-ea964a889e92
2023-03-08 16:37:35,941 fedbiomed INFO - Checking data quality of federated datasets...
2023-03-08 16:37:35,943 fedbiomed DEBUG - using a declearn Optimizer
2023-03-08 16:37:35,947 fedbiomed DEBUG - Model file has been saved: /home/ybouilla/fedbiomed_2/fedbiomed/var/experiments/Experiment_0040/my_model_2968120a-9c4a-4e9d-989d-dbeca4869e57.py
2023-03-08 16:37:35,970 fedbiomed DEBUG - upload (HTTP POST request) of file /home/ybouilla/fedbiomed_2/fedbiomed/

In [4]:
# start federated training
exp.run()

2023-03-08 16:37:36,007 fedbiomed INFO - Sampled nodes in round 0 ['node_f2b2d532-f811-424d-966f-4b21c0bfd618', 'node_57b462ee-4d35-4dd9-b3cb-ea964a889e92']
2023-03-08 16:37:36,010 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_f2b2d532-f811-424d-966f-4b21c0bfd618 
					[1m Request: [0m: Perform training with the arguments: {'researcher_id': 'researcher_cdac117b-3411-4777-9eb7-bb3c477c3f29', 'job_id': '60e597ba-5d8c-4183-b4ac-1a871cdf3b3c', 'training_args': {'epochs': 5, 'batch_size': 32, 'test_ratio': 0.3, 'test_metric': 'MEAN_SQUARE_ERROR', 'test_on_local_updates': True, 'test_on_global_updates': True, 'optimizer_args': {}, 'num_updates': None, 'dry_run': False, 'batch_maxnum': None, 'test_metric_args': {}, 'log_interval': 10, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None}, 'training': True, 'model_args': {'max_iter': 2000, 'tol': 1e-05, 'eta0': 0.05, 'n_features': 6, 'random_state': 1234, 'verbose': 1}, 'command': 'train', 'training_plan_url': 'http://loc

2023-03-08 16:37:36,711 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_f2b2d532-f811-424d-966f-4b21c0bfd618 
					 Round 1 Epoch: 3 | Iteration: 1/7 (14%) | Samples: 32/224
 					 Loss squared_error: [1m9247786843574697761701888.000000[0m 
					 ---------
2023-03-08 16:37:36,739 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					 Round 1 Epoch: 3 | Iteration: 6/7 (86%) | Samples: 192/224
 					 Loss squared_error: [1m11500920694924274865012736.000000[0m 
					 ---------
2023-03-08 16:37:36,761 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					 Round 1 Epoch: 3 | Iteration: 7/7 (100%) | Samples: 210/210
 					 Loss squared_error: [1m14431423340693340676947968.000000[0m 
					 ---------
2023-03-08 16:37:36,785 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					 Round 1 Epoch: 4 | Iteration: 1/7 (14%) | Samples: 32/224
 					 Loss squa

2023-03-08 16:37:46,102 fedbiomed DEBUG - researcher_cdac117b-3411-4777-9eb7-bb3c477c3f29
					[1m NODE[0m node_f2b2d532-f811-424d-966f-4b21c0bfd618
					[1m MESSAGE:[0m The following parameter(s) has(ve) been detected in the model_args but will be disabled when using a declearn Optimizer: please specify those values in the training_args or in the init_optimizer methodeta0[0m
-----------------------------------------------------------------
					[1m NODE[0m node_57b462ee-4d35-4dd9-b3cb-ea964a889e92
					[1m MESSAGE:[0m The following parameter(s) has(ve) been detected in the model_args but will be disabled when using a declearn Optimizer: please specify those values in the training_args or in the init_optimizer methodeta0[0m
-----------------------------------------------------------------
2023-03-08 16:37:46,132 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_f2b2d532-f811-424d-966f-4b21c0bfd618
					[1m MESSAGE:[0m NPDataLoader expanding 1-dimensional target to becom

2023-03-08 16:37:46,933 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					 Round 2 Epoch: 5 | Iteration: 1/7 (14%) | Samples: 32/224
 					 Loss squared_error: [1m8580732272329988253941760.000000[0m 
					 ---------
2023-03-08 16:37:46,967 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					 Round 2 Epoch: 5 | Iteration: 2/7 (29%) | Samples: 64/224
 					 Loss squared_error: [1m9670176292235517368467456.000000[0m 
					 ---------
2023-03-08 16:37:47,039 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_f2b2d532-f811-424d-966f-4b21c0bfd618 
					 Round 2 Epoch: 5 | Iteration: 7/7 (100%) | Samples: 210/210
 					 Loss squared_error: [1m8240100986991228965879808.000000[0m 
					 ---------
2023-03-08 16:37:47,042 fedbiomed INFO - [1mVALIDATION ON LOCAL UPDATES[0m 
					 NODE_ID: node_f2b2d532-f811-424d-966f-4b21c0bfd618 
					 Round 2 | Iteration: 1/1 (100%) | Samples: 90/90
 					 ME

2023-03-08 16:37:56,254 fedbiomed INFO - [1mVALIDATION ON GLOBAL UPDATES[0m 
					 NODE_ID: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					 Round 3 | Iteration: 1/1 (100%) | Samples: 90/90
 					 MEAN_SQUARE_ERROR: [1m485901504959294319099904.000000[0m 
					 ---------
2023-03-08 16:37:56,256 fedbiomed INFO - [1mVALIDATION ON GLOBAL UPDATES[0m 
					 NODE_ID: node_f2b2d532-f811-424d-966f-4b21c0bfd618 
					 Round 3 | Iteration: 1/1 (100%) | Samples: 90/90
 					 MEAN_SQUARE_ERROR: [1m686658775575017792995328.000000[0m 
					 ---------
2023-03-08 16:37:56,287 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					 Round 3 Epoch: 1 | Iteration: 1/7 (14%) | Samples: 32/224
 					 Loss squared_error: [1m7940555568452612902617088.000000[0m 
					 ---------
2023-03-08 16:37:56,295 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_f2b2d532-f811-424d-966f-4b21c0bfd618 
					 Round 3 Epoch: 1 | Iteration: 1/7 (14%) | Samples: 32/224
 		

2023-03-08 16:37:57,076 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_57b462ee-4d35-4dd9-b3cb-ea964a889e92
					[1m MESSAGE:[0m results uploaded successfully [0m
-----------------------------------------------------------------
2023-03-08 16:38:06,218 fedbiomed INFO - Downloading model params after training on node_f2b2d532-f811-424d-966f-4b21c0bfd618 - from http://localhost:8844/media/uploads/2023/03/08/node_params_12ac34be-978b-46c8-900b-1c4958cebb48.pt
2023-03-08 16:38:06,237 fedbiomed DEBUG - upload (HTTP GET request) of file node_params_6c78b1dd-3f9e-46be-a651-cf28ece21289.pt successful, with status code 200
2023-03-08 16:38:06,245 fedbiomed INFO - Downloading model params after training on node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 - from http://localhost:8844/media/uploads/2023/03/08/node_params_5427c141-9c30-416e-bdff-e1713162beaf.pt
2023-03-08 16:38:06,258 fedbiomed DEBUG - upload (HTTP GET request) of file node_params_8d6c00e0-1a3b-49ee-9762-4b142a047822.pt successf

2023-03-08 16:38:06,530 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					 Round 4 Epoch: 2 | Iteration: 1/7 (14%) | Samples: 32/224
 					 Loss squared_error: [1m11183415067082950675791872.000000[0m 
					 ---------
2023-03-08 16:38:06,578 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					 Round 4 Epoch: 2 | Iteration: 3/7 (43%) | Samples: 96/224
 					 Loss squared_error: [1m6959124016593447975649280.000000[0m 
					 ---------
2023-03-08 16:38:06,582 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_f2b2d532-f811-424d-966f-4b21c0bfd618 
					 Round 4 Epoch: 2 | Iteration: 3/7 (43%) | Samples: 96/224
 					 Loss squared_error: [1m8431533427375037858447360.000000[0m 
					 ---------
2023-03-08 16:38:06,675 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_f2b2d532-f811-424d-966f-4b21c0bfd618 
					 Round 4 Epoch: 2 | Iteration: 7/7 (100%) | Samples: 210/210
 					 Loss square

2023-03-08 16:38:16,379 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_f2b2d532-f811-424d-966f-4b21c0bfd618 
					[1m Request: [0m: Perform training with the arguments: {'researcher_id': 'researcher_cdac117b-3411-4777-9eb7-bb3c477c3f29', 'job_id': '60e597ba-5d8c-4183-b4ac-1a871cdf3b3c', 'training_args': {'epochs': 5, 'batch_size': 32, 'test_ratio': 0.3, 'test_metric': 'MEAN_SQUARE_ERROR', 'test_on_local_updates': True, 'test_on_global_updates': True, 'optimizer_args': {}, 'num_updates': None, 'dry_run': False, 'batch_maxnum': None, 'test_metric_args': {}, 'log_interval': 10, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None}, 'training': True, 'model_args': {'max_iter': 2000, 'tol': 1e-05, 'eta0': 0.05, 'n_features': 6, 'random_state': 1234, 'verbose': 1}, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/03/08/my_model_2968120a-9c4a-4e9d-989d-dbeca4869e57.py', 'params_url': 'http://localhost:8844/media/uploads/2023/03/08/aggrega

2023-03-08 16:38:16,814 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_f2b2d532-f811-424d-966f-4b21c0bfd618 
					 Round 5 Epoch: 3 | Iteration: 6/7 (86%) | Samples: 192/224
 					 Loss squared_error: [1m8410554116469641845932032.000000[0m 
					 ---------
2023-03-08 16:38:16,823 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_f2b2d532-f811-424d-966f-4b21c0bfd618 
					 Round 5 Epoch: 3 | Iteration: 7/7 (100%) | Samples: 210/210
 					 Loss squared_error: [1m8468177776651358728880128.000000[0m 
					 ---------
2023-03-08 16:38:16,838 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					 Round 5 Epoch: 3 | Iteration: 6/7 (86%) | Samples: 192/224
 					 Loss squared_error: [1m8357956356855376113565696.000000[0m 
					 ---------
2023-03-08 16:38:16,845 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_f2b2d532-f811-424d-966f-4b21c0bfd618 
					 Round 5 Epoch: 4 | Iteration: 1/7 (14%) | Samples: 32/224
 					 Loss squar

2023-03-08 16:38:26,519 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_57b462ee-4d35-4dd9-b3cb-ea964a889e92
					[1m MESSAGE:[0m NPDataLoader expanding 1-dimensional target to become 2-dimensional.[0m
-----------------------------------------------------------------
2023-03-08 16:38:26,521 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_57b462ee-4d35-4dd9-b3cb-ea964a889e92
					[1m MESSAGE:[0m NPDataLoader expanding 1-dimensional target to become 2-dimensional.[0m
-----------------------------------------------------------------
2023-03-08 16:38:26,523 fedbiomed INFO - [1mVALIDATION ON GLOBAL UPDATES[0m 
					 NODE_ID: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					 Round 6 | Iteration: 1/1 (100%) | Samples: 90/90
 					 MEAN_SQUARE_ERROR: [1m4808537418400034967257088.000000[0m 
					 ---------


5

					[1m NODE[0m node_f2b2d532-f811-424d-966f-4b21c0bfd618
					[1m MESSAGE:[0m There is no validation activated for the round. Please set flag for `test_on_global_updates`, `test_on_local_updates`, or both. Splitting dataset for validation will be ignored[0m
-----------------------------------------------------------------
					[1m NODE[0m node_57b462ee-4d35-4dd9-b3cb-ea964a889e92
					[1m MESSAGE:[0m There is no validation activated for the round. Please set flag for `test_on_global_updates`, `test_on_local_updates`, or both. Splitting dataset for validation will be ignored[0m
-----------------------------------------------------------------
2023-03-08 16:50:04,047 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_f2b2d532-f811-424d-966f-4b21c0bfd618 
					 Round 6 Epoch: 1 | Iteration: 1/100 (1%) | Samples: 48/4800
 					 Loss: [1m2.313875[0m 
					 ---------
2023-03-08 16:50:04,077 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_57b462ee-4d35-4dd9-b3cb-ea964

2023-03-08 16:50:21,415 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					 Round 6 Epoch: 1 | Iteration: 20/100 (20%) | Samples: 960/4800
 					 Loss: [1m2.327355[0m 
					 ---------
2023-03-08 16:50:21,984 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_f2b2d532-f811-424d-966f-4b21c0bfd618 
					 Round 6 Epoch: 1 | Iteration: 30/100 (30%) | Samples: 1440/4800
 					 Loss: [1m2.300327[0m 
					 ---------
2023-03-08 16:50:22,749 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_f2b2d532-f811-424d-966f-4b21c0bfd618 
					 Round 6 Epoch: 1 | Iteration: 40/100 (40%) | Samples: 1920/4800
 					 Loss: [1m2.301691[0m 
					 ---------
2023-03-08 16:50:23,432 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_f2b2d532-f811-424d-966f-4b21c0bfd618 
					 Round 6 Epoch: 1 | Iteration: 50/100 (50%) | Samples: 2400/4800
 					 Loss: [1m2.273708[0m 
					 ---------
2023-03-08 16:50:23,725 fedbiomed INFO - [1mTRAINING[0m 
					 

2023-03-08 16:51:13,478 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					 Round 6 Epoch: 1 | Iteration: 60/100 (60%) | Samples: 2880/4800
 					 Loss: [1m2.308129[0m 
					 ---------
2023-03-08 16:51:13,946 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_f2b2d532-f811-424d-966f-4b21c0bfd618 
					 Round 6 Epoch: 1 | Iteration: 70/100 (70%) | Samples: 3360/4800
 					 Loss: [1m2.307227[0m 
					 ---------
2023-03-08 16:51:14,112 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					 Round 6 Epoch: 1 | Iteration: 70/100 (70%) | Samples: 3360/4800
 					 Loss: [1m2.290194[0m 
					 ---------
2023-03-08 16:51:14,595 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_f2b2d532-f811-424d-966f-4b21c0bfd618 
					 Round 6 Epoch: 1 | Iteration: 80/100 (80%) | Samples: 3840/4800
 					 Loss: [1m2.299529[0m 
					 ---------
2023-03-08 16:51:14,752 fedbiomed INFO - [1mTRAINING[0m 
					

2023-03-08 16:51:30,988 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_f2b2d532-f811-424d-966f-4b21c0bfd618
					[1m MESSAGE:[0m results uploaded successfully [0m
-----------------------------------------------------------------
2023-03-08 16:51:31,046 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					 Round 6 Epoch: 1 | Iteration: 100/100 (100%) | Samples: 4800/4800
 					 Loss: [1m2.310928[0m 
					 ---------
2023-03-08 16:51:31,165 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_57b462ee-4d35-4dd9-b3cb-ea964a889e92
					[1m MESSAGE:[0m results uploaded successfully [0m
-----------------------------------------------------------------


# Declearn Optimizers with Scikit learn Perceptron Classifier

In [1]:
from fedbiomed.common.training_plans import FedPerceptron
from fedbiomed.common.data import DataManager
import numpy as np

from declearn.optimizer import Optimizer
from declearn.optimizer.modules import AdamModule
from declearn.optimizer.regularizers import LassoRegularizer

class SkLearnClassifierTrainingPlan(FedPerceptron):
    def init_dependencies(self):
        """Define additional dependencies.
        
        In this case, we rely on torchvision functions for preprocessing the images.
        """
        return ["from torchvision import datasets, transforms",
                "from declearn.optimizer import Optimizer",
                "from declearn.optimizer.modules import AdamModule",
                "from declearn.optimizer.regularizers import LassoRegularizer",]

    def training_data(self, batch_size):
        """Prepare data for training.
        
        This function loads a MNIST dataset from the node's filesystem, applies some
        preprocessing and converts the full dataset to a numpy array. 
        Finally, it returns a DataManager created with these numpy arrays.
        """
        transform = transforms.Compose([transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))])
        dataset = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
        
        X_train = dataset.data.numpy()
        X_train = X_train.reshape(-1, 28*28)
        Y_train = dataset.targets.numpy()
        return DataManager(dataset=X_train, target=Y_train, batch_size=batch_size, shuffle=False)
    
    # Defines and return a declearn optimizer
    def init_optimizer(self, optimizer_args):
        return Optimizer(lrate=.1 ,modules=[AdamModule()], regularizers=[LassoRegularizer()])

In [2]:
model_args = {'n_features': 28*28,
              'n_classes' : 10,
              'eta0':1e-6,
              'random_state':1234,
              'alpha':0.1 }

training_args = {
    'epochs': 3, 
    'batch_maxnum': 20,  # can be used to debugging to limit the number of batches per epoch
    'optimizer_args': {
        "lr" : 1e-3
    },
#    'log_interval': 1,  # output a logging message every log_interval batches
    'batch_size': 4
}

In [3]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['#MNIST', '#dataset']
rounds = 3

# select nodes participating in this experiment
exp = Experiment(tags=tags,
                 model_args=model_args,
                 training_plan_class=SkLearnClassifierTrainingPlan,
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None)

2023-03-08 15:07:51,586 fedbiomed INFO - Messaging researcher_cdac117b-3411-4777-9eb7-bb3c477c3f29 successfully connected to the message broker, object = <fedbiomed.common.messaging.Messaging object at 0x7efc7d3ada00>
2023-03-08 15:07:51,633 fedbiomed INFO - Searching dataset with data tags: ['#MNIST', '#dataset'] for all nodes
2023-03-08 15:08:01,646 fedbiomed INFO - Node selected for training -> node_57b462ee-4d35-4dd9-b3cb-ea964a889e92
2023-03-08 15:08:01,648 fedbiomed INFO - Node selected for training -> node_f2b2d532-f811-424d-966f-4b21c0bfd618
2023-03-08 15:08:01,652 fedbiomed INFO - Checking data quality of federated datasets...
2023-03-08 15:08:01,656 fedbiomed DEBUG - using a declearn Optimizer
2023-03-08 15:08:01,661 fedbiomed DEBUG - Model file has been saved: /home/ybouilla/fedbiomed_2/fedbiomed/var/experiments/Experiment_0034/my_model_39dae24e-eb33-4339-a1ce-f2c4f17c5884.py
2023-03-08 15:08:01,682 fedbiomed DEBUG - upload (HTTP POST request) of file /home/ybouilla/fedbiome

In [4]:
exp.run(increase=True)

2023-03-08 15:08:01,710 fedbiomed INFO - Sampled nodes in round 0 ['node_57b462ee-4d35-4dd9-b3cb-ea964a889e92', 'node_f2b2d532-f811-424d-966f-4b21c0bfd618']
2023-03-08 15:08:01,711 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					[1m Request: [0m: Perform training with the arguments: {'researcher_id': 'researcher_cdac117b-3411-4777-9eb7-bb3c477c3f29', 'job_id': 'c2547501-cfbe-4815-9391-fb82ce6bee27', 'training_args': {'epochs': 3, 'batch_maxnum': 20, 'optimizer_args': {'lr': 0.001}, 'batch_size': 4, 'num_updates': None, 'dry_run': False, 'test_ratio': 0.0, 'test_on_local_updates': False, 'test_on_global_updates': False, 'test_metric': None, 'test_metric_args': {}, 'log_interval': 10, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None}, 'training': True, 'model_args': {'n_features': 784, 'n_classes': 10, 'eta0': 1e-06, 'random_state': 1234, 'alpha': 0.1, 'loss': 'perceptron', 'verbose': 1}, 'command': 'train', 'training_pla

2023-03-08 15:08:02,569 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					 Round 1 Epoch: 2 | Iteration: 20/20 (100%) | Samples: 80/80
 					 Loss perceptron: [1m2585.546683[0m 
					 ---------
2023-03-08 15:08:02,602 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					 Round 1 Epoch: 3 | Iteration: 1/20 (5%) | Samples: 4/80
 					 Loss perceptron: [1m0.000000[0m 
					 ---------
2023-03-08 15:08:02,790 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_f2b2d532-f811-424d-966f-4b21c0bfd618 
					 Round 1 Epoch: 3 | Iteration: 10/20 (50%) | Samples: 40/80
 					 Loss perceptron: [1m5400.453603[0m 
					 ---------
2023-03-08 15:08:02,848 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					 Round 1 Epoch: 3 | Iteration: 10/20 (50%) | Samples: 40/80
 					 Loss perceptron: [1m5400.453603[0m 
					 ---------
2023-03-08 15:08:03,004 fedbiomed

2023-03-08 15:08:11,898 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_f2b2d532-f811-424d-966f-4b21c0bfd618
					[1m MESSAGE:[0m NPDataLoader expanding 1-dimensional target to become 2-dimensional.[0m
-----------------------------------------------------------------
2023-03-08 15:08:11,904 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					 Round 2 Epoch: 1 | Iteration: 1/20 (5%) | Samples: 4/80
 					 Loss perceptron: [1m0.000000[0m 
					 ---------
2023-03-08 15:08:11,938 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_f2b2d532-f811-424d-966f-4b21c0bfd618 
					 Round 2 Epoch: 1 | Iteration: 1/20 (5%) | Samples: 4/80
 					 Loss perceptron: [1m0.000000[0m 
					 ---------
2023-03-08 15:08:12,075 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					 Round 2 Epoch: 1 | Iteration: 10/20 (50%) | Samples: 40/80
 					 Loss perceptron: [1m0.000000[0m 
					 ---------
2023-03-0

2023-03-08 15:08:21,838 fedbiomed DEBUG - researcher_cdac117b-3411-4777-9eb7-bb3c477c3f29
					[1m NODE[0m node_57b462ee-4d35-4dd9-b3cb-ea964a889e92
					[1m MESSAGE:[0m There is no validation activated for the round. Please set flag for `test_on_global_updates`, `test_on_local_updates`, or both. Splitting dataset for validation will be ignored[0m
-----------------------------------------------------------------
					[1m NODE[0m node_f2b2d532-f811-424d-966f-4b21c0bfd618
					[1m MESSAGE:[0m There is no validation activated for the round. Please set flag for `test_on_global_updates`, `test_on_local_updates`, or both. Splitting dataset for validation will be ignored[0m
-----------------------------------------------------------------
2023-03-08 15:08:21,917 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_57b462ee-4d35-4dd9-b3cb-ea964a889e92
					[1m MESSAGE:[0m NPDataLoader expanding 1-dimensional target to become 2-dimensional.[0m
-------------------------------------

2023-03-08 15:08:31,901 fedbiomed DEBUG - upload (HTTP POST request) of file /home/ybouilla/fedbiomed_2/fedbiomed/var/experiments/Experiment_0034/aggregated_params8e97f64f-1efd-412d-8036-4e85627e5f1c.pt successful, with status code 201
2023-03-08 15:08:31,902 fedbiomed INFO - Saved aggregated params for round 2 in /home/ybouilla/fedbiomed_2/fedbiomed/var/experiments/Experiment_0034/aggregated_params8e97f64f-1efd-412d-8036-4e85627e5f1c.pt


3

2023-03-08 16:13:20,271 fedbiomed INFO - [1mCRITICAL[0m
					[1m NODE[0m node_57b462ee-4d35-4dd9-b3cb-ea964a889e92
					[1m MESSAGE:[0m Node stopped in signal_handler, probably by user decision (Ctrl C)[0m
-----------------------------------------------------------------
2023-03-08 16:13:26,503 fedbiomed INFO - [1mCRITICAL[0m
					[1m NODE[0m node_f2b2d532-f811-424d-966f-4b21c0bfd618
					[1m MESSAGE:[0m Node stopped in signal_handler, probably by user decision (Ctrl C)[0m
-----------------------------------------------------------------
2023-03-08 16:13:27,123 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_57b462ee-4d35-4dd9-b3cb-ea964a889e92
					[1m MESSAGE:[0m Starting task manager[0m
-----------------------------------------------------------------
2023-03-08 16:13:33,121 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_f2b2d532-f811-424d-966f-4b21c0bfd618
					[1m MESSAGE:[0m Starting task manager[0m
------------------------------------------------

##  Testing

Once the federated model is obtained, it is possible to test it locally on an independent testing partition.
The test dataset is available at this link:

https://drive.google.com/file/d/1zNUGp6TMn6WSKYVC8FQiQ9lJAUdasxk1/

In [None]:
!pip install matplotlib
!pip install gdown

Download the testing dataset on the local temporary folder.

In [None]:
import os
import gdown
import tempfile
import zipfile
import pandas as pd
import numpy as np

from fedbiomed.common.constants import ComponentType
from fedbiomed.researcher.environ import environ


resource = "https://drive.google.com/uc?id=19kxuI146WA2fhcOU2_AvF8dy-ppJkzW7"

tmpdir = tempfile.TemporaryDirectory(dir=environ['TMP_DIR'])
base_dir = tmpdir.name

test_file = os.path.join(base_dir, "test_data.zip")
gdown.download(resource, test_file, quiet=False)

zf = zipfile.ZipFile(test_file)

for file in zf.infolist():
    zf.extract(file, base_dir)

# loading testing dataset
test_data = pd.read_csv(os.path.join(base_dir,'adni_validation.csv'))

In [None]:
from sklearn.linear_model import SGDRegressor
import matplotlib.pyplot as plt

In [None]:
%matplotlib inline

Here we extract the relevant regressors and target from the testing data 

In [None]:
regressors_col = ['AGE', 'WholeBrain.bl', 'Ventricles.bl',
                  'Hippocampus.bl', 'MidTemp.bl', 'Entorhinal.bl']
target_col = ['MMSE.bl']
X_test = test_data[regressors_col].values
y_test = test_data[target_col].values

To inspect the model evolution across FL rounds, we export `exp.aggregated_params()` containing models parameters collected at the end of each round. The MSE (Mean Squarred Error) should be decreasing at each iteration with the federated parameters obtained at each round. 

In [None]:
scaling_mean = np.array([72.3, 0.7, 0.0, 0.0, 0.0, 0.0])
scaling_sd = np.array([7.3e+00, 5.0e-02, 1.1e-02, 1.0e-03, 2.0e-03, 1.0e-03])

testing_error = []


# we create here several instances of SGDRegressor using same sklearn arguments
# we have used for Federated Learning training
fed_model = exp.training_plan().model()
regressor_args = {key: model_args[key] for key in model_args.keys() if key in fed_model.get_params().keys()}

for i in range(rounds):
    fed_model.coef_ = exp.aggregated_params()[i]['params']['coef_'].copy()
    fed_model.intercept_ = exp.aggregated_params()[i]['params']['intercept_'].copy()  
    mse = np.mean((fed_model.predict((X_test-scaling_mean)/scaling_sd) - y_test)**2)
    testing_error.append(mse)

plt.plot(testing_error)
plt.title('FL testing loss')
plt.xlabel('FL round')
plt.ylabel('testing loss (MSE)')

We finally inspect the predictions of the final federated model on the testing data.

In [None]:
y_predicted = fed_model.predict((X_test-scaling_mean)/scaling_sd)
plt.scatter(y_predicted, y_test, label='model prediction')
plt.xlabel('predicted')
plt.ylabel('target')
plt.title('Federated model testing prediction')

first_diag = np.arange(np.min(y_test.flatten()),
                       np.max(y_test.flatten()+1))
plt.scatter(first_diag, first_diag, label='correct Target')
plt.legend()

In [None]:
a = X_test / scaling_sd
a.shape

In [None]:
X_test.shape

In [None]:
X_test[:,1] / scaling_sd[1] - a[:,1]