# Fedbiomed Researcher to train a federated scikit learn model.

## Perceptron
Binary Classification
### Purpose of the exercise :
Three datasets `c1.csv` , `c2.csv` and `c3.csv` has been generated with a target column of 2 different classes.
We will fit a Perceptron (classifier) using Federated Learning.

### Get the data 

We use the make_classification dataset from sklearn datasets

In [17]:
from sklearn import datasets
import numpy as np

In [18]:
X,y = datasets.make_classification(n_samples=300, n_features=20,n_clusters_per_class=2, weights=None, flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0, scale=1.0,shuffle=True, random_state=123)
X.shape,y.shape

((300, 20), (300,))

In [19]:
np.unique(y)

array([0, 1])

In [11]:
C1 = X[:150,:]
C2 = X[150:250,:]
C3 = X[250:300,:]

y1 = y[:150].reshape([150,1])
y2 = y[150:250].reshape([100,1])
y3 = y[250:300].reshape([50,1])

C1.shape ,C2.shape , C3.shape , y1.shape, y2.shape, y3.shape

((150, 20), (100, 20), (50, 20), (150, 1), (100, 1), (50, 1))

In [12]:
n1 = np.concatenate((C1, y1), axis=1)
np.savetxt('== local path to c1.csv',n1,delimiter=',')

n2 = np.concatenate((C2, y2), axis=1)
np.savetxt('== local path to c2.csv',n2,delimiter=',')

n3 = np.concatenate((C3, y3), axis=1)
np.savetxt('== local path to c3.csv',n3,delimiter=',')

### Start the network
Before running this notebook, start the network with `./scripts/fedbiomed_run network`

### Setting the node up
Before running this notebook you need to configure 2 nodes: <br/>
* **Node 1 :** `./scripts/fedbiomed_run node add`
  * Select option 1 to add a csv file to the node
  * Choose the name, tags and description of the dataset (you can write 'perp' always and it will be good)
  * Pick the c1.csv file in your machine.
  * Check that your data has been added in node 1 by executing `./scripts/fedbiomed_run node list`
  * Run the node using `./scripts/fedbiomed_run node start`. <br/>

* **Node 2 :** Open a second terminal and run ./scripts/fedbiomed_run node add config n2.ini
  * Select option 1 to add a csv file to the node
  * Choose the name, tags and description of the dataset (you can write 'perp' always and it will be good)
  * Pick the c2.csv file in your machine.
  * Check that your data has been added in node 2 by executing `./scripts/fedbiomed_run node config n2.ini list `
  * Run the node using `./scripts/fedbiomed_run node config n2.ini start`.
 

 Wait until you get `Starting task manager`. it means node is online.


In [1]:
%load_ext autoreload
%autoreload 2

**model_args** is a dictionnary containing your model arguments, in case of SGDRegressor this will be max_iter and tol.

**training_args** is a dictionnary with parameters , related to Federated Learning. 

In [2]:
n_features = 20
n_classes = 2

model_args = {'max_iter':1000, 'tol': 1e-3 , 
               'n_features' : n_features, 'n_classes' : n_classes}

training_args = {   
    'epochs': 5, 
}

Hereafter the template of the class you should provide to Fedbiomed :
    
**training_data** : you must return here the (X,y) that must be of the same type of 
your method partial_fit parameters. 

In [3]:
from fedbiomed.common.training_plans import FedPerceptron
from fedbiomed.common.data import DataManager
class PerceptronTraining(FedPerceptron):
    def __init__(self, model_args: dict = {}):
        super().__init__(model_args)
        self.add_dependency(["from fedbiomed.common.training_plans import FedPerceptron",
                             "from sklearn.linear_model import Perceptron"])
    
    def training_data(self):
        NUMBER_COLS = 20
        dataset = pd.read_csv(self.dataset_path,header=None,delimiter=',')
        X = dataset.iloc[:,0:NUMBER_COLS].values
        y = dataset.iloc[:,NUMBER_COLS]       
        return DataManager(dataset=X,target=y.values)

In [3]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['perp']
rounds = 2

# search for corresponding datasets across nodes datasets
exp = Experiment(tags=tags,
                 model_args=model_args,
                 model_class=PerceptronTraining,
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None)


2022-05-06 17:34:56,592 fedbiomed INFO - Component environment:
2022-05-06 17:34:56,593 fedbiomed INFO - type = ComponentType.RESEARCHER
2022-05-06 17:34:56,658 fedbiomed INFO - Messaging researcher_994d2281-2f1b-4f9e-84fe-61a703e1bdfe successfully connected to the message broker, object = <fedbiomed.common.messaging.Messaging object at 0x7f9657290d00>
2022-05-06 17:34:56,684 fedbiomed INFO - Searching dataset with data tags: ['perp'] for all nodes
2022-05-06 17:35:06,697 fedbiomed INFO - Node selected for training -> node_92294303-6bcf-4018-9a03-8a1fcdcbd1c6
2022-05-06 17:35:06,699 fedbiomed INFO - Node selected for training -> node_658efe2a-5e4b-4daf-9931-32eb6ecefc32
2022-05-06 17:35:06,702 fedbiomed INFO - Checking data quality of federated datasets...
2022-05-06 17:35:06,706 fedbiomed DEBUG - Model file has been saved: /home/gentoo/Projects/Fedbiomed/fedbiomed/var/experiments/Experiment_0115/my_model_d76c4e63-4bd3-4815-b203-8e6f443f0d01.py
2022-05-06 17:35:06,718 fedbiomed DEBUG -

model id sklearntarining plan 140283806599872
sklearn models perceptron model get param {'alpha': 0.0001, 'class_weight': None, 'early_stopping': False, 'eta0': 1.0, 'fit_intercept': True, 'l1_ratio': 0.15, 'max_iter': 1000, 'n_iter_no_change': 5, 'n_jobs': None, 'penalty': None, 'random_state': 0, 'shuffle': True, 'tol': 0.001, 'validation_fraction': 0.1, 'verbose': 1, 'warm_start': False}
perceptron model id  140283806599872


In [4]:
exp.run()

2022-05-06 17:35:06,732 fedbiomed INFO - Sampled nodes in round 0 ['node_92294303-6bcf-4018-9a03-8a1fcdcbd1c6', 'node_658efe2a-5e4b-4daf-9931-32eb6ecefc32']
2022-05-06 17:35:06,733 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_92294303-6bcf-4018-9a03-8a1fcdcbd1c6 
					[1m Request: [0m: Perform training with the arguments: {'researcher_id': 'researcher_994d2281-2f1b-4f9e-84fe-61a703e1bdfe', 'job_id': '40ecd359-2e35-404d-8126-61ae68b12c13', 'training_args': {'test_ratio': 0.0, 'test_on_local_updates': False, 'test_on_global_updates': False, 'test_metric': None, 'test_metric_args': {}, 'epochs': 5}, 'training': True, 'model_args': {'max_iter': 1000, 'tol': 0.001, 'n_features': 20, 'n_classes': 2, 'verbose': 1}, 'command': 'train', 'model_url': 'http://localhost:8844/media/uploads/2022/05/06/my_model_d76c4e63-4bd3-4815-b203-8e6f443f0d01.py', 'params_url': 'http://localhost:8844/media/uploads/2022/05/06/aggregated_params_init_19a3e1fc-acb1-4f51-b945-57869584eaf7.pt', '

2022-05-06 17:35:16,776 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_92294303-6bcf-4018-9a03-8a1fcdcbd1c6 
					[1m Request: [0m: Perform training with the arguments: {'researcher_id': 'researcher_994d2281-2f1b-4f9e-84fe-61a703e1bdfe', 'job_id': '40ecd359-2e35-404d-8126-61ae68b12c13', 'training_args': {'test_ratio': 0.0, 'test_on_local_updates': False, 'test_on_global_updates': False, 'test_metric': None, 'test_metric_args': {}, 'epochs': 5}, 'training': True, 'model_args': {'max_iter': 1000, 'tol': 0.001, 'n_features': 20, 'n_classes': 2, 'verbose': 1}, 'command': 'train', 'model_url': 'http://localhost:8844/media/uploads/2022/05/06/my_model_d76c4e63-4bd3-4815-b203-8e6f443f0d01.py', 'params_url': 'http://localhost:8844/media/uploads/2022/05/06/aggregated_params_99ee4cb0-3ae1-4b48-9a08-4b9cec2f1086.pt', 'model_class': 'PerceptronTraining', 'training_data': {'node_92294303-6bcf-4018-9a03-8a1fcdcbd1c6': ['dataset_85dbe3b2-92c0-42ed-bf2c-4be87275e95d']}} 
 ----------

functional, avg_params {'intercept_': array([5.]), 'coef_': array([[-0.12474042,  5.66732564, -6.3416912 , -4.51930135,  0.75957523,
         4.55437519, -2.73479259,  4.6809859 , 10.86244747, -2.80533604,
         0.14944813,  6.32690381,  6.0994531 ,  0.4334556 , 17.52621439,
         3.02164415,  1.45598664,  1.90231433,  0.88189201,  3.38628669]])}
model_params [{'intercept_': array([5.]), 'coef_': array([[-0.12474042,  5.66732564, -6.3416912 , -4.51930135,  0.75957523,
         4.55437519, -2.73479259,  4.6809859 , 10.86244747, -2.80533604,
         0.14944813,  6.32690381,  6.0994531 ,  0.4334556 , 17.52621439,
         3.02164415,  1.45598664,  1.90231433,  0.88189201,  3.38628669]])}, {'intercept_': array([1.]), 'coef_': array([[-2.16376827,  1.0088073 , -3.6794123 ,  3.49096956,  5.79394287,
        -0.31781383, -6.59000541, -1.16107907,  8.60159037, -1.96699011,
         1.14419943,  7.66078309, -0.19299558, -3.97413981, 17.6455732 ,
         4.86684449, -3.46780145, -0.60764

2022-05-06 17:35:26,790 fedbiomed INFO - Downloading model params after training on node_658efe2a-5e4b-4daf-9931-32eb6ecefc32 - from http://localhost:8844/media/uploads/2022/05/06/node_params_4237f2f1-7001-4014-aacb-665d843b084a.pt
2022-05-06 17:35:26,804 fedbiomed DEBUG - upload (HTTP GET request) of file node_params_340c6efd-340f-481f-a7ad-14917ab7bd6a.pt successful, with status code 200
2022-05-06 17:35:26,808 fedbiomed INFO - Downloading model params after training on node_92294303-6bcf-4018-9a03-8a1fcdcbd1c6 - from http://localhost:8844/media/uploads/2022/05/06/node_params_06619bab-5611-418d-82ab-bf8665bbadf2.pt
2022-05-06 17:35:26,822 fedbiomed DEBUG - upload (HTTP GET request) of file node_params_0bb19f09-09c3-417b-8773-91a33b4a86cf.pt successful, with status code 200
2022-05-06 17:35:26,830 fedbiomed INFO - Nodes that successfully reply in round 1 ['node_658efe2a-5e4b-4daf-9931-32eb6ecefc32', 'node_92294303-6bcf-4018-9a03-8a1fcdcbd1c6']
2022-05-06 17:35:26,859 fedbiomed DEBUG -

functional, avg_params {'intercept_': array([2.6]), 'coef_': array([[-4.23933644e-01, -5.74644172e-04, -9.34072679e-01,
         4.93367363e+00,  2.61774435e+00, -3.80790588e-01,
        -7.58171075e+00, -1.84849395e+00,  1.23454891e+01,
        -3.87859801e+00, -4.80808399e-01,  7.55290974e+00,
        -1.71337036e+00, -3.57203845e+00,  2.04338065e+01,
         2.67574587e+00, -5.36194682e-01,  1.49016823e+00,
        -3.36832151e+00,  3.14003880e+00]])}
model_params [{'intercept_': array([2.6]), 'coef_': array([[-4.23933644e-01, -5.74644172e-04, -9.34072679e-01,
         4.93367363e+00,  2.61774435e+00, -3.80790588e-01,
        -7.58171075e+00, -1.84849395e+00,  1.23454891e+01,
        -3.87859801e+00, -4.80808399e-01,  7.55290974e+00,
        -1.71337036e+00, -3.57203845e+00,  2.04338065e+01,
         2.67574587e+00, -5.36194682e-01,  1.49016823e+00,
        -3.36832151e+00,  3.14003880e+00]])}, {'intercept_': array([2.6]), 'coef_': array([[ 2.3350742 ,  5.50478077, -7.53124499, -4.

2

## Lets validate the trained model with the test dataset c3.csv.

In [25]:
import pandas as pd

In [26]:
data = pd.read_csv('== local path to c3.csv')

In [27]:
X_test = data.iloc[:,:n_features]
y_test = data.iloc[:,n_features]

F1 score computed with federated algorithm :

For that, we are exporting `exp.aggregated_params()` containing models parameters collected at the end of each round

In [28]:
from sklearn.metrics import f1_score
loss_metric = f1_score
    
testing_error = []

for i in range(rounds):
    fed_model = exp.model_instance().get_model()
    fed_model.coef_ = exp.aggregated_params()[i]['params']['coef_']
    fed_model.intercept_ = exp.aggregated_params()[i]['params']['intercept_']
    metric = loss_metric(fed_model.predict(X_test),y_test.ravel())
    print('F1 score metric: ', metric, )
    testing_error.append(metric)

F1 score metric:  0.8727272727272727
F1 score metric:  0.8627450980392156


##  SGD regressor

### Data 


This tutorial shows how to deploy in Fed-BioMed to solve a federated regression problem with scikit-learn.

In this tutorial we are using the wrapper of Fed-BioMed for the SGD regressor (https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html).
The goal of the notebook is to train a model on a realistic dataset of (synthetic) medical information mimicking the ADNI dataset (http://adni.loni.usc.edu/). 

### Creating nodes

To proceed with the tutorial, we create 3 clients with corresponding dataframes of clinical information in .csv format. Each client has 300 data points composed by several features corresponding to clinical and medical imaging informations. **The data is entirely synthetic and randomly sampled to mimick the variability of the real ADNI dataset**. The training partitions are availables at the following link:

https://drive.google.com/file/d/1R39Ir60oQi8ZnmHoPz5CoGCrVIglcO9l/view?usp=sharing

The federated task we aim at solve is to predict a clinical variable (the mini-mental state examination, MMSE) from a combination of demographic and imaging features. The regressors variables are the following features:

['SEX', 'AGE', 'PTEDUCAT', 'WholeBrain.bl', 'Ventricles.bl', 'Hippocampus.bl', 'MidTemp.bl', 'Entorhinal.bl']

and the target variable is:

['MMSE.bl']
    

To create the federated dataset, we follow the standard procedure for node creation/population of Fed-BioMed. 
After activating the fedbiomed network with the commands

`source ./scripts/fedbiomed_environment network`

and 

`./scripts/fedbiomed_run network`

we create a first node by using the commands

`source ./scripts/fedbiomed_environment node`

`./scripts/fedbiomed_run node start`

We then poulate the node with the data of first client:

`./scripts/fedbiomed_run node config conf.ini add`

Thn, we select option 1 (csv dataset) to add the .csv partition of client 1, by just picking the .csv of client 1. We use `adni` as tag to save the selected dataset. We can further check that the data has been added by executing `./scripts/fedbiomed_run node list`

Following the same procedure, we create the other two nodes with the datasets of client 2 and client 3 respectively. To do so, we add and launch a `Node`using others configuration files

### Fed-BioMed Researcher

We are now ready to start the reseracher enviroment with the command `source ./scripts/fedbiomed_environment researcher`, and open the Jupyter notebook with `./scripts/fedbiomed_run researcher`. 

We can first query the network for the adni dataset. In this case, the nodes are sharing the respective partitions unsing the same tag `adni`:

In [2]:
from fedbiomed.researcher.requests import Requests
req = Requests()
req.list(verbose=True)

2022-05-06 13:49:25,495 fedbiomed INFO - Component environment:
2022-05-06 13:49:25,496 fedbiomed INFO - type = ComponentType.RESEARCHER
2022-05-06 13:49:25,641 fedbiomed INFO - Messaging researcher_994d2281-2f1b-4f9e-84fe-61a703e1bdfe successfully connected to the message broker, object = <fedbiomed.common.messaging.Messaging object at 0x7f2d4e4e3c40>
2022-05-06 13:49:25,709 fedbiomed INFO - Listing available datasets in all nodes... 
2022-05-06 13:49:35,725 fedbiomed INFO - 
 Node: node_658efe2a-5e4b-4daf-9931-32eb6ecefc32 | Number of Datasets: 5 
+---------+-------------+--------------------------+-----------------+---------------------+
| name    | data_type   | tags                     | description     | shape               |
| MNIST   | default     | ['#MNIST', '#dataset']   | MNIST database  | [60000, 1, 28, 28]  |
+---------+-------------+--------------------------+-----------------+---------------------+
| bb      | images      | ['#bb']                  | bb              | [

{'node_658efe2a-5e4b-4daf-9931-32eb6ecefc32': [{'name': 'MNIST',
   'data_type': 'default',
   'tags': ['#MNIST', '#dataset'],
   'description': 'MNIST database',
   'shape': [60000, 1, 28, 28]},
  {'name': 'bb',
   'data_type': 'images',
   'tags': ['#bb'],
   'description': 'bb',
   'shape': [111909, 3, 64, 64]},
  {'name': 'MEDNIST',
   'data_type': 'mednist',
   'tags': ['#MEDNIST', '#dataset'],
   'description': 'MEDNIST dataset',
   'shape': [58954, 3, 64, 64]},
  {'name': 'perp',
   'data_type': 'csv',
   'tags': ['perp'],
   'description': 'perp',
   'shape': [150, 21]},
  {'name': 'adni',
   'data_type': 'csv',
   'tags': ['adni'],
   'description': 'adni',
   'shape': [300, 20]}]}

The code for network and data loader of the sklearn SGDRegressor can now be deployed in Fed-BioMed.
We first import the necessary module `SGDSkLearnModel` from `fedbiomed`:

**__init__** : we add here the needed sklearn libraries
       
**training_data** : you must return here a tuple (data,targets) that must be of the same type of 
your method partial_fit parameters. 

We note that this model performs a common standardization across federated datasets by **centering with respect to the same parameters**.

In [32]:
from fedbiomed.common.training_plans import FedSGDRegressor
from fedbiomed.common.data import DataManager

class SGDRegressorTrainingPlan(FedSGDRegressor):
    def __init__(self, model_args: dict = {}):
        super().__init__(model_args)
        self.add_dependency(["from fedbiomed.common.training_plans import FedSGDRegressor",
                             "from sklearn.linear_model import SGDRegressor"])
    
    def training_data(self):
        dataset = pd.read_csv(self.dataset_path,delimiter=',')
        regressors_col = ['SEX', 'AGE', 'PTEDUCAT', 'WholeBrain.bl',
                          'Ventricles.bl', 'Hippocampus.bl', 'MidTemp.bl', 'Entorhinal.bl']
        target_col = ['MMSE.bl']
        
        # mean and standard deviation for normalizing dataset
        # it has been computed over the whole dataset
        scaling_mean = np.array([0.8, 72.3, 16.2, 0.7, 0.0, 0.0, 0.0, 0.0])
        scaling_sd = np.array([3.5e-01, 7.3e+00, 2.7e+00, 5.0e-02, 1.1e-02, 1.0e-03, 2.0e-03, 1.0e-03])
        
        X = (dataset[regressors_col].values-scaling_mean)/scaling_sd
        y = dataset[target_col]
        return DataManager(dataset=X, target=y.values.ravel())
    

**model_args** is a dictionary containing your model arguments, in case of SGDRegressor this will be max_iter and tol. n_features is provided to correctly initialize the SGDRegressor coef_ array.

**training_args** is a dictionary with parameters related to Federated Learning. 

In [33]:
from fedbiomed.common.metrics import MetricTypes
RANDOM_SEED = 1234


model_args = {
    'max_iter':2000,
    'tol': 1e-5,
    'eta0':0.05,
    'n_features': 8,
    'random_state': RANDOM_SEED
}

training_args = {
    'epochs': 5,
    'test_ratio':.3,
    'test_metric': MetricTypes.MEAN_SQUARE_ERROR,
    'test_on_local_updates': True,
    'test_on_global_updates': True
}

The experiment can be now defined, by providing the `adni` tag, and running the local training on nodes with model defined in `model_path`, standard `aggregator` (FedAvg) and `client_selection_strategy` (all nodes used). Federated learning is going to be perfomed through 10 optimization rounds.

In [34]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['adni']

# Add more rounds for results with better accuracy
#
#rounds = 40
rounds = 2

# select nodes participating to this experiment
exp = Experiment(tags=tags,
                 model_args=model_args,
                 model_class=SGDRegressorTrainingPlan,
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None)

2022-05-06 16:04:15,162 fedbiomed INFO - Searching dataset with data tags: ['adni'] for all nodes
2022-05-06 16:04:25,175 fedbiomed INFO - Node selected for training -> node_658efe2a-5e4b-4daf-9931-32eb6ecefc32
2022-05-06 16:04:25,181 fedbiomed DEBUG - Model file has been saved: /home/gentoo/Projects/Fedbiomed/fedbiomed/var/experiments/Experiment_0092/my_model_a5c47a11-6285-4a54-9bf1-f093bfa76f28.py
2022-05-06 16:04:25,197 fedbiomed DEBUG - upload (HTTP POST request) of file /home/gentoo/Projects/Fedbiomed/fedbiomed/var/experiments/Experiment_0092/my_model_a5c47a11-6285-4a54-9bf1-f093bfa76f28.py successful, with status code 201
2022-05-06 16:04:25,208 fedbiomed DEBUG - upload (HTTP POST request) of file /home/gentoo/Projects/Fedbiomed/fedbiomed/var/experiments/Experiment_0092/aggregated_params_init_6c2c579f-2c56-4648-bfd2-ceb78179074c.pt successful, with status code 201


model id sklearntarining plan 140662511542176


In [35]:
# start federated training
exp.run()

2022-05-06 16:04:31,627 fedbiomed INFO - Sampled nodes in round 0 ['node_658efe2a-5e4b-4daf-9931-32eb6ecefc32']
2022-05-06 16:04:31,628 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_658efe2a-5e4b-4daf-9931-32eb6ecefc32 
					[1m Request: [0m: Perform training with the arguments: {'researcher_id': 'researcher_994d2281-2f1b-4f9e-84fe-61a703e1bdfe', 'job_id': 'c88b28bb-85bb-4501-a168-db93ec017c4e', 'training_args': {'test_ratio': 0.3, 'test_on_local_updates': True, 'test_on_global_updates': True, 'test_metric': <MetricTypes.MEAN_SQUARE_ERROR: (4, <_MetricCategory.REGRESSION: 2>)>, 'test_metric_args': {}, 'epochs': 5}, 'training': True, 'model_args': {'max_iter': 2000, 'tol': 1e-05, 'eta0': 0.05, 'n_features': 8, 'random_state': 1234, 'verbose': 1}, 'command': 'train', 'model_url': 'http://localhost:8844/media/uploads/2022/05/06/my_model_a5c47a11-6285-4a54-9bf1-f093bfa76f28.py', 'params_url': 'http://localhost:8844/media/uploads/2022/05/06/aggregated_params_init_6c2c57

2022-05-06 16:04:41,708 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_658efe2a-5e4b-4daf-9931-32eb6ecefc32 
					 Epoch: 4 | Completed: 210/210 (100%) 
 					 Loss squared_loss: [1m12.000877[0m 
					 ---------
2022-05-06 16:04:41,709 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_658efe2a-5e4b-4daf-9931-32eb6ecefc32
					[1m MESSAGE:[0m No `testing_step` method found in TrainingPlan: using defined metric MEAN_SQUARE_ERROR for model evaluation.[0m
-----------------------------------------------------------------
2022-05-06 16:04:41,710 fedbiomed INFO - [1mTESTING ON LOCAL UPDATES[0m 
					 NODE_ID: node_658efe2a-5e4b-4daf-9931-32eb6ecefc32 
					 Completed: 90/90 (100%) 
 					 MEAN_SQUARE_ERROR: [1m53.378930[0m 
					 ---------
2022-05-06 16:04:41,721 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_658efe2a-5e4b-4daf-9931-32eb6ecefc32
					[1m MESSAGE:[0m results uploaded successfully [0m
---------------------------------------------------------------

2

## SGDClassifier
### Purpose of the exercise :

Three datasets `c1_3class.csv` , `c2_3class.csv` and `c3_3class.csv` has been generated with a target column of 3 different classes.
We will fit a SGCClassifier (classifier) using Federated Learning.

### Get the data 

We use the make_classification dataset from sklearn datasets

In [13]:
from sklearn import datasets
import numpy as np

In [14]:
X,y = datasets.make_classification(n_samples=300, n_features=20,n_informative = 3, n_classes=3,n_clusters_per_class=2, weights=None, flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0, scale=1.0,shuffle=True, random_state=123)
X.shape,y.shape

((300, 20), (300,))

In [15]:
C1 = X[:150,:]
C2 = X[150:250,:]
C3 = X[250:300,:]

y1 = y[:150].reshape([150,1])
y2 = y[150:250].reshape([100,1])
y3 = y[250:300].reshape([50,1])

C1.shape ,C2.shape , C3.shape , y1.shape, y2.shape, y3.shape

((150, 20), (100, 20), (50, 20), (150, 1), (100, 1), (50, 1))

In [16]:
n1 = np.concatenate((C1, y1), axis=1)
np.savetxt('== local path to c1_3class.csv',n1,delimiter=',')

n2 = np.concatenate((C2, y2), axis=1)
np.savetxt('== local path to c2_3class.csv',n2,delimiter=',')

n3 = np.concatenate((C3, y3), axis=1)
np.savetxt('== local path to c3_3class.csv',n3,delimiter=',')

### Start the network
Before running this notebook, start the network with `./scripts/fedbiomed_run network`

### Setting the node up
Before running this notebook you need to configure 2 nodes: <br/>
* **Node 1 :** `./scripts/fedbiomed_run node add`
  * Select option 1 to add a csv file to the node
  * Choose the name, tags and description of the dataset (you can write **'perp1'** always and it will be good)
  * Pick the c1.csv file in your machine.
  * Check that your data has been added in node 1 by executing `./scripts/fedbiomed_run node list`
  * Run the node using `./scripts/fedbiomed_run node start`. <br/>

* **Node 2 :** Open a second terminal and run ./scripts/fedbiomed_run node add config n2.ini
  * Select option 1 to add a csv file to the node
  * Choose the name, tags and description of the dataset (you can write **'perp1'** always and it will be good)
  * Pick the c2.csv file in your machine.
  * Check that your data has been added in node 2 by executing `./scripts/fedbiomed_run node config n2.ini list `
  * Run the node using `./scripts/fedbiomed_run node config n2.ini start`.
 

 Wait until you get `Starting task manager`. it means node is online.


In [1]:
%load_ext autoreload
%autoreload 2

**model_args** is a dictionnary containing your model arguments, in case of SGDRegressor this will be max_iter and tol.

**training_args** is a dictionnary with parameters , related to Federated Learning. 

In [2]:
n_features = 20
n_classes = 3

model_args = {'max_iter':1000, 'tol': 1e-3 , 
               'n_features' : n_features, 'n_classes' : n_classes}

training_args = {   
    'epochs': 5, 
}

Hereafter the template of the class you should provide to Fedbiomed :
    
**training_data** : you must return here the (X,y) that must be of the same type of 
your method partial_fit parameters. 

In [3]:
from fedbiomed.common.training_plans import FedSGDClassifier
from fedbiomed.common.data import DataManager
class SGDClassifierTrainingPlan(FedSGDClassifier):
    def __init__(self, model_args: dict = {}):
        super().__init__(model_args)
        self.add_dependency(["from fedbiomed.common.training_plans import FedSGDClassifier",
                             "from sklearn.linear_model import SGDClassifier"])
    
    def training_data(self):
        NUMBER_COLS = 20
        dataset = pd.read_csv(self.dataset_path,header=None,delimiter=',')
        X = dataset.iloc[:,0:NUMBER_COLS].values
        y = dataset.iloc[:,NUMBER_COLS]       
        return DataManager(dataset=X,target=y.values)

In [4]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['perp1']
rounds = 2

# search for corresponding datasets across nodes datasets
exp = Experiment(tags=tags,
                 model_args=model_args,
                 model_class=SGDClassifierTrainingPlan,
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None)


2022-05-06 17:05:12,179 fedbiomed INFO - Component environment:
2022-05-06 17:05:12,179 fedbiomed INFO - type = ComponentType.RESEARCHER
2022-05-06 17:05:12,260 fedbiomed INFO - Messaging researcher_994d2281-2f1b-4f9e-84fe-61a703e1bdfe successfully connected to the message broker, object = <fedbiomed.common.messaging.Messaging object at 0x7f25670f22b0>
2022-05-06 17:05:12,312 fedbiomed INFO - Searching dataset with data tags: ['perp1'] for all nodes
2022-05-06 17:05:22,322 fedbiomed INFO - Node selected for training -> node_658efe2a-5e4b-4daf-9931-32eb6ecefc32
2022-05-06 17:05:22,324 fedbiomed INFO - Node selected for training -> node_92294303-6bcf-4018-9a03-8a1fcdcbd1c6
2022-05-06 17:05:22,327 fedbiomed INFO - Checking data quality of federated datasets...
2022-05-06 17:05:22,328 fedbiomed DEBUG - Model file has been saved: /home/gentoo/Projects/Fedbiomed/fedbiomed/var/experiments/Experiment_0106/my_model_1f7305ac-47a9-4669-95a4-bbbbb7ad74b2.py
2022-05-06 17:05:22,339 fedbiomed DEBUG 

model id sklearntarining plan 139798741976400


In [5]:
exp.run()

2022-05-06 17:05:27,761 fedbiomed INFO - Sampled nodes in round 0 ['node_658efe2a-5e4b-4daf-9931-32eb6ecefc32', 'node_92294303-6bcf-4018-9a03-8a1fcdcbd1c6']
2022-05-06 17:05:27,762 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_658efe2a-5e4b-4daf-9931-32eb6ecefc32 
					[1m Request: [0m: Perform training with the arguments: {'researcher_id': 'researcher_994d2281-2f1b-4f9e-84fe-61a703e1bdfe', 'job_id': '66d9cbac-3185-4d08-b8e5-ebf872edfb49', 'training_args': {'test_ratio': 0.0, 'test_on_local_updates': False, 'test_on_global_updates': False, 'test_metric': None, 'test_metric_args': {}, 'epochs': 5}, 'training': True, 'model_args': {'max_iter': 1000, 'tol': 0.001, 'n_features': 20, 'n_classes': 3, 'verbose': 1}, 'command': 'train', 'model_url': 'http://localhost:8844/media/uploads/2022/05/06/my_model_1f7305ac-47a9-4669-95a4-bbbbb7ad74b2.py', 'params_url': 'http://localhost:8844/media/uploads/2022/05/06/aggregated_params_init_031978c3-8744-4d8a-b1c5-bda1190e486b.pt', '

2022-05-06 17:05:27,819 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_92294303-6bcf-4018-9a03-8a1fcdcbd1c6 
					 Epoch: 2 | Completed: 150/150 (100%) 
 					 Loss hinge: [1m30.200886[0m 
					 ---------
					[1m NODE[0m node_92294303-6bcf-4018-9a03-8a1fcdcbd1c6
					[1m MESSAGE:[0m Loss plot displayed on Tensorboard may be inaccurate (due to some plain SGD scikit learn limitations)[0m
-----------------------------------------------------------------
2022-05-06 17:05:27,822 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_92294303-6bcf-4018-9a03-8a1fcdcbd1c6 
					 Epoch: 3 | Completed: 150/150 (100%) 
 					 Loss hinge: [1m24.707545[0m 
					 ---------
					[1m NODE[0m node_92294303-6bcf-4018-9a03-8a1fcdcbd1c6
					[1m MESSAGE:[0m Loss plot displayed on Tensorboard may be inaccurate (due to some plain SGD scikit learn limitations)[0m
-----------------------------------------------------------------
2022-05-06 17:05:27,826 fedbiomed INFO - [1mTRAINING[0m 

2022-05-06 17:05:37,848 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_658efe2a-5e4b-4daf-9931-32eb6ecefc32 
					 Epoch: 2 | Completed: 100/100 (100%) 
 					 Loss hinge: [1m18.682613[0m 
					 ---------
					[1m NODE[0m node_92294303-6bcf-4018-9a03-8a1fcdcbd1c6
					[1m MESSAGE:[0m Loss plot displayed on Tensorboard may be inaccurate (due to some plain SGD scikit learn limitations)[0m
-----------------------------------------------------------------
2022-05-06 17:05:37,850 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_92294303-6bcf-4018-9a03-8a1fcdcbd1c6 
					 Epoch: 0 | Completed: 150/150 (100%) 
 					 Loss hinge: [1m23.668177[0m 
					 ---------
					[1m NODE[0m node_658efe2a-5e4b-4daf-9931-32eb6ecefc32
					[1m MESSAGE:[0m Loss plot displayed on Tensorboard may be inaccurate (due to some plain SGD scikit learn limitations)[0m
-----------------------------------------------------------------
2022-05-06 17:05:37,851 fedbiomed INFO - [1mTRAINING[0m 

2

## BernoulliNB
Binary Classification
### Description :

Perfom binary classification on datasets of tag **'perp'** explained in previous sections. 
We will fit a BernoulliNB (classifier) using Federated Learning.

**model_args** is a dictionnary containing your model arguments, in case of SGDRegressor this will be max_iter and tol.

**training_args** is a dictionnary with parameters , related to Federated Learning. 

In [1]:
n_features = 20
n_classes = 2

model_args = {'max_iter':1000, 'tol': 1e-3 , 
               'n_features' : n_features, 'n_classes' : n_classes}

training_args = {   
    'epochs': 5, 
}

Hereafter the template of the class you should provide to Fedbiomed :
    
**training_data** : you must return here the (X,y) that must be of the same type of 
your method partial_fit parameters. 

In [2]:
from fedbiomed.common.training_plans import FedBernoulliNB
from fedbiomed.common.data import DataManager
class BernoulliNBTrainingPlan(FedBernoulliNB):
    def __init__(self, model_args: dict = {}):
        super().__init__(model_args)
        self.add_dependency(["from fedbiomed.common.training_plans import FedBernoulliNB",
                             "from sklearn.naive_bayes import BernoulliNB"])
    
    def training_data(self):
        NUMBER_COLS = 20
        dataset = pd.read_csv(self.dataset_path,header=None,delimiter=',')
        X = dataset.iloc[:,0:NUMBER_COLS].values
        y = dataset.iloc[:,NUMBER_COLS]       
        return DataManager(dataset=X,target=y.values)

In [3]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['perp']
rounds = 2

# search for corresponding datasets across nodes datasets
exp = Experiment(tags=tags,
                 model_args=model_args,
                 model_class=BernoulliNBTrainingPlan,
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None)


2022-05-06 17:18:13,421 fedbiomed INFO - Component environment:
2022-05-06 17:18:13,422 fedbiomed INFO - type = ComponentType.RESEARCHER
2022-05-06 17:18:13,503 fedbiomed INFO - Messaging researcher_994d2281-2f1b-4f9e-84fe-61a703e1bdfe successfully connected to the message broker, object = <fedbiomed.common.messaging.Messaging object at 0x7f67164a37f0>
2022-05-06 17:18:13,550 fedbiomed INFO - Searching dataset with data tags: ['perp'] for all nodes
2022-05-06 17:18:23,562 fedbiomed INFO - Node selected for training -> node_658efe2a-5e4b-4daf-9931-32eb6ecefc32
2022-05-06 17:18:23,564 fedbiomed INFO - Node selected for training -> node_92294303-6bcf-4018-9a03-8a1fcdcbd1c6
2022-05-06 17:18:23,568 fedbiomed INFO - Checking data quality of federated datasets...
2022-05-06 17:18:23,574 fedbiomed DEBUG - Model file has been saved: /home/gentoo/Projects/Fedbiomed/fedbiomed/var/experiments/Experiment_0111/my_model_5b132a3c-db88-4874-a745-29379500e7e8.py
2022-05-06 17:18:23,604 fedbiomed DEBUG -

model id sklearntarining plan 140080854781280


In [4]:
exp.run()

2022-05-06 17:18:30,299 fedbiomed INFO - Sampled nodes in round 0 ['node_658efe2a-5e4b-4daf-9931-32eb6ecefc32', 'node_92294303-6bcf-4018-9a03-8a1fcdcbd1c6']
2022-05-06 17:18:30,300 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_658efe2a-5e4b-4daf-9931-32eb6ecefc32 
					[1m Request: [0m: Perform training with the arguments: {'researcher_id': 'researcher_994d2281-2f1b-4f9e-84fe-61a703e1bdfe', 'job_id': 'fa864698-6077-4132-9e77-54530a098e65', 'training_args': {'test_ratio': 0.0, 'test_on_local_updates': False, 'test_on_global_updates': False, 'test_metric': None, 'test_metric_args': {}, 'epochs': 5}, 'training': True, 'model_args': {'max_iter': 1000, 'tol': 0.001, 'n_features': 20, 'n_classes': 2}, 'command': 'train', 'model_url': 'http://localhost:8844/media/uploads/2022/05/06/my_model_5b132a3c-db88-4874-a745-29379500e7e8.py', 'params_url': 'http://localhost:8844/media/uploads/2022/05/06/aggregated_params_init_015fcac7-e7d8-4a22-95aa-c9db8553ff8c.pt', 'model_class': 


--------------------
Fed-BioMed researcher stopped due to unknown error:
local variable 't' referenced before assignment
More details in the backtrace extract below
--------------------
Traceback (most recent call last):
  File "/home/gentoo/Projects/Fedbiomed/fedbiomed/fedbiomed/researcher/experiment.py", line 61, in payload
    ret = function(*args, **kwargs)
  File "/home/gentoo/Projects/Fedbiomed/fedbiomed/fedbiomed/researcher/experiment.py", line 1382, in run_once
    aggregated_params = self._aggregator.aggregate(model_params,
  File "/home/gentoo/Projects/Fedbiomed/fedbiomed/fedbiomed/researcher/aggregators/fedavg.py", line 36, in aggregate
    return federated_averaging(model_params, weights)
  File "/home/gentoo/Projects/Fedbiomed/fedbiomed/fedbiomed/researcher/aggregators/functional.py", line 57, in federated_averaging
    if t == 'tensor':
UnboundLocalError: local variable 't' referenced before assignment
--------------------


2022-05-06 17:24:16,527 fedbiomed INFO - [1mCRITICAL[0m
					[1m NODE[0m node_92294303-6bcf-4018-9a03-8a1fcdcbd1c6
					[1m MESSAGE:[0m Node stopped in signal_handler, probably by user decision (Ctrl C)[0m
-----------------------------------------------------------------
2022-05-06 17:24:22,269 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_92294303-6bcf-4018-9a03-8a1fcdcbd1c6
					[1m MESSAGE:[0m Starting task manager[0m
-----------------------------------------------------------------
2022-05-06 17:24:22,919 fedbiomed INFO - [1mCRITICAL[0m
					[1m NODE[0m node_658efe2a-5e4b-4daf-9931-32eb6ecefc32
					[1m MESSAGE:[0m Node stopped in signal_handler, probably by user decision (Ctrl C)[0m
-----------------------------------------------------------------
2022-05-06 17:24:27,819 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_658efe2a-5e4b-4daf-9931-32eb6ecefc32
					[1m MESSAGE:[0m Starting task manager[0m
------------------------------------------------