## Asycronous clients

This example illustrates how one can work with asyncronous clients using FEDn.

In [None]:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from client.entrypoint import compile_model, load_parameters, make_data 


from fedn import APIClient
import uuid
import json
import matplotlib.pyplot as plt
import numpy as np
import copy

import warnings
warnings.filterwarnings("ignore")

### ML model

As a centralized model baseline we generate synthetic data for a classification problem with 4 features. We train a MLPClassifier on 80k training points, and test on 20k using ReLU activation, and Adam as optimizer, using a maximum of 1000 epochs.   

In [None]:
X, y = make_classification(n_samples=100000, n_features=4, n_informative=4, n_redundant=0, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

We train a centralized baseline model for a maximum of 1000 epochs.  

In [None]:
clf = MLPClassifier(max_iter=1000)
clf.fit(X_train, y_train)
central_test_acc = accuracy_score(y_test, clf.predict(X_test))

print("Training accuracy: ", accuracy_score(y_train, clf.predict(X_train)))
print("Test accuracy: ", accuracy_score(y_test, clf.predict(X_test)))

Next we simulate the FL training procedure of a single FL client. The client will in each iteration (=simulated global round) draw a random number of data points in the interval (n_min, n_max) from (X_train, y_train) and perform 'n_epochs' partial fits on the sampled dataset. Then for each global round we test on the centralized test set (X_test, y_test). In this experiment we simulate 600 global rounds. The client performs 10 local epochs in each round. 

In [None]:
clf = compile_model()

n_global_rounds=600
n_epochs = 10
central_acc_one_client = []
for i in range(n_global_rounds):
    x,y,_,_ = make_data(n_min=10,n_max=100)
    for j in range(n_epochs):
        clf.partial_fit(x, y)
    central_acc_one_client.append(accuracy_score(y_test, clf.predict(X_test)))

plt.plot(range(n_global_rounds),[central_test_acc]*n_global_rounds)
plt.plot(range(n_global_rounds), central_acc_one_client)

Here we simulate the scenario that a number 'n_clients' clients send locally collected/sampled datasets to a central server (by scaling n_min and n_max by n_clients). The server then performs incremenatal learning using the collected data batches (which are thus larger than in the experiment above). 

In [None]:
clf = compile_model()

n_global_rounds=600
n_epochs = 10
n_clients = 10
central_acc_all_clients = []
for i in range(n_global_rounds):
    x,y,_,_ = make_data(n_min=n_clients*10, n_max=n_clients*100)
    for j in range(n_epochs):
        clf.partial_fit(x, y)
    central_acc_all_clients.append(accuracy_score(y_test, clf.predict(X_test)))

plt.plot(range(n_global_rounds),[central_test_acc]*n_global_rounds)
plt.plot(range(n_global_rounds), central_acc_one_client)
plt.plot(range(n_global_rounds), central_acc_all_clients)
plt.legend(['Central baseline, all data','Incremental learning, one client','Inceremental learning, all clients'])

### Federated learning

Now we run federated learning experiments over a FEDn network. For this we will use the script 'run_clients.py' to start clients running in subprocesses on the host machine. In a separate terminal, locate into this example folder. Edit the configurations in the script as needed to test different scenarios. Once clients are up and running, you can proceed below and exectute experiments. Not that these runs can take a long time to complete, depending on the number of global rounds. 

We make a client connection to the FEDn API service. Here we assume that FEDn is deployed locally in pseudo-distributed mode with default ports.

In [None]:
DISCOVER_HOST = '127.0.0.1'
DISCOVER_PORT = 8092
client = APIClient(DISCOVER_HOST, DISCOVER_PORT)

Initialize FEDn with the compute package and seed model

In [None]:
client.set_active_package('package.tgz', 'numpyhelper')
client.set_active_model('seed.npz')

In [None]:
session_config = {
    "helper": "numpyhelper",
    "id": "run_fedavg",
    "aggregator": "fedavg",
    "round_timeout": 20,
    "rounds": 600,
    "validate": False,
    "model_id": initial_model,
}

session = client.start_session(**session_config)
if session['success'] is False:
    print(session['message'])

Next, we retrive global models for this session and score the models on the central test set. 

In [None]:
def load_fedn_model(model_id):

    data = client.download_model(model_id, 'temp.npz')
    parameters = load_parameters('temp.npz')
    model = compile_model()
    n = len(parameters)//2
    model.coefs_ = parameters[:n]
    model.intercepts_ = parameters[n:]
    return model

Traverse the model trail and plot test accuracy on the central test set

In [None]:
model_trail_fedavg = client.get_model_trail()

acc_fedavg = []
for model in model_trail_fedavg: 
    model = load_fedn_model(model['id'])
    acc_fedavg.append(accuracy_score(y_test, model.predict(X_test)))

Plot the result.

In [None]:
x = range(1,len(acc_fedavg)+1)

plt.plot(x,[central_test_acc]*len(x))
plt.plot(range(n_global_rounds), central_acc_one_client)
plt.plot(range(n_global_rounds), central_acc_all_clients)
plt.plot(range(len(acc_fedavg)),acc_fedavg)
plt.legend(['Centralized baseline', 'Incremental learning, one client','Inceremental learning, all clients', 'FL'])