# Federated Learning

Federated learning is a machine learning paradigm that enables decentralized training of a shared model by multiple clients while preserving data privacy. The main idea behind this new paradigm is that each client trains a local model on its own data and then sends only the model updates to a central server, rather than sending the raw data. This allows the model to be trained on a large amount of data without compromising data privacy.

Federated learning was first proposed by Google in 2016 (McMahan et al., 2016) and has since been applied in various fields, such as healthcare (Hard et al., 2018), finance (Yoon et al., 2018), and natural language processing (Li et al., 2020).For example, federated learning could be used to train a model that can make personalized recommendations for each user, without requiring the raw data from each user to be shared with a central server. This would enable the model to be trained on a large amount of data, while preserving data privacy.

Federated learning has several key advantages over traditional centralized machine learning methods. First, it allows training on a much larger amount of data, as the data remains distributed across the clients. Second, it preserves data privacy by not requiring the raw data to be sent to the central server. And third, it enables collaboration among multiple clients, allowing them to jointly train a shared model without sharing their data.

In the federated learning process, a central server sends a machine learning model to multiple devices. Each device trains the model on its local data, and then sends the updated model back to the central server. The central server aggregates the updates from each device and uses them to improve the global model. This process is repeated until the model has converged and can make accurate predictions on new data. Inside this process, main concepts are:

* Client: a device or edge node that has a local dataset and participates in the training of the federated model
* Server: a central server that coordinates the training of the federated model and receives the model updates from the clients
* Federated dataset: the collection of decentralized datasets from different clients
* Federated model: a machine learning model that is trained on the federated dataset using federated learning
* Federated optimization: the process of training the federated model using the decentralized data and model updates from the clients.
* Aggregation: the process to mixture the models based, for example, in a weighted average or an alternative approach.
* Rounds: the times a federated model is distributed among Clients after performing an aggregation.

It is a relatively new approach, therefore, there is no so many libraries that adapt it being the main actors: TensorFlow Federated, PySyft, OpenMined and Flower. Surely, one that has to be mentioned is TensorFlow Federated  although it is only a theoretical approach, becasuse nowadays it do not allow the deployment of the solution and only simulates the federated space. On the other hand, the Flower allows that distribution although the modifications a little more difficult. That is why the later has been chosen for this tutorial, due to a more gentle apporach and the possible use later. 


# Introduction to Flower (FLWR)

It is a Python library that provides tools for implementing the comunications and coordinations of federated learning. It was designed to be easy to use and scalable. What Flower is not is a learning framework so, it is going to wrapped some others model such as Tensorflowm Pytorch or Scikit-learn ones in the comunications.

To use Flower for federated learning, you will need to install the library:

In [None]:
!pip install flwr[simulation]

For a simulation environment it is better to execute the same comand but with the keyword *simulation* to ensure the load of the simulation environment. However, if you plan to used it distributedly, the line should be like `!pip install flwr` on both the server and the client devices. Once you have installed the `flwr` package, you can import it in your Python code using the following statement:

In [None]:
import flwr as fl
import tensorflow as tf

FLWR offers a number of classes and functions that you can use to set up a federated learning environment, train and evaluate a model, and implement regular updates to the model. For more information, you can refer to the FLWR [documentation](https://flower.dev/docs/quickstart-tensorflow.html). First we are going to define a model that can be used with the system. It shouuld be burn in mind that the mind has to be serializable to be able to go through the network. So, not models are going to be suitable for a federated learning apporach. For this example an Artificial Neural Netrwork (ANN) is going to be used based on Tensoflow and more specificly in Keras. 

In [None]:
# Define a simple model using TensorFlow
def generate_ann():
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(10, activation='softmax')
    ])

    model.compile(
        loss=tf.keras.losses.sparse_categorical_crossentropy,
        optimizer=tf.keras.optimizers.Adam(),
        metrics=['accuracy']
    )
    return model

So, due to the fact that we are going to use a Deep Learning model defined in Tensorfow, it is convinient to load the data in a `Dataset` class in order that the framework can deal with a possible hardware aceleration in GPU on the nodes. To do that, the following lines are going to load the MNIST dataset and generate a dataset with batches of 32 which, nowadays, could be run in most machines.

In [None]:
# Load and partition the dataset that are present on each device
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(10000).batch(32)
test_dataset = tf.data.Dataset.from_tensor_slices(x_test, y_test)


Now let's introduce the two pieces of the puzzle, the `Client` and the `Server`. Flower would start a `Server`to coordinate the client devices and perform the orchestation of the model. The server interacts with clients through an interface called `Client`. When the server selects a particular client for training, it sends training instructions over the network. The client receives those instructions and calls one of the Client methods to run your code (i.e., to train the neural network we defined earlier).

Flower provides a convenience class called NumPyClient which makes it easier to implement the Client interface when your workload uses Keras. The NumPyClient interface defines three methods which can be implemented in the following way:

In [None]:
#Create a class to contain the details of the client and be the interface
class MyClient(fl.client.NumPyClient):
    def __init__(self, net, train_dataset, test_dataset):
        self.net = generate_ann()
        self.trainloader = train_dataset
        self.valloader = test_datset
    def get_parameters(self, config):
        return model.get_weights()

    def fit(self, parameters, config):
        model.set_weights(parameters)
        model.fit(self.trainloader, epochs=1, batch_size=32, steps_per_epoch=3)
        return model.get_weights(), len(x_train), {}

    def evaluate(self, parameters, config):
        model.set_weights(parameters)
        loss, accuracy = model.evaluate(self.valloader)
        return loss, len(x_test), {"accuracy": float(accuracy)}

In the code above some points, we have defined the functions for the cliente that are required in this particular case.From this point, we can start a client with the following code:

In [None]:
# Start the client
fl.client.start_numpy_client(server_address="[::]:8080", client=MyClient())

The string `[::]:8080` tells the client which server to connect to. In this particular case, the code will be run in the same machine than the server. In case of a truly federated workload, all that needs to change is the server_address we point the client at.

The another piece of the puzzle is the class that will contain the server this is going to be on a separate file for example `server.py`. The contend should be something like:

In [None]:
import flwr as fl

fl.server.start_server(config=fl.server.ServerConfig(num_rounds=3))

 In this particular case, we can run two clients and a server in separated terminals of the machines. It should be as simple as in different terminal execute the commandline `python client.py` twice and `python server.py` once.

In the server terminal we should receive an out put similar to: 

```shell
INFO flower 2022-11-28 11:15:46,741 | app.py:76 | Flower server running (insecure, 3 rounds)
INFO flower 2022-11-28 11:15:46,742 | server.py:72 | Getting initial parameters
INFO flower 2022-11-28 11:16:01,770 | server.py:74 | Evaluating initial parameters
INFO flower 2022-11-28 11:16:01,770 | server.py:87 | [TIME] FL starting
DEBUG flower 2022-11-28 11:16:12,341 | server.py:165 | fit_round: strategy sampled 2 clients (out of 2)
DEBUG flower 2022-11-28 11:21:17,235 | server.py:177 | fit_round received 2 results and 0 failures
DEBUG flower 2022-11-28 11:21:17,512 | server.py:139 | evaluate: strategy sampled 2 clients
DEBUG flower 2022-11-28 11:21:29,628 | server.py:149 | evaluate received 2 results and 0 failures
DEBUG flower 2022-11-28 11:21:29,696 | server.py:165 | fit_round: strategy sampled 2 clients (out of 2)
DEBUG flower 2022-11-28 11:25:59,917 | server.py:177 | fit_round received 2 results and 0 failures
DEBUG flower 2022-11-28 11:26:00,227 | server.py:139 | evaluate: strategy sampled 2 clients
DEBUG flower 2022-11-28 11:26:11,457 | server.py:149 | evaluate received 2 results and 0 failures
DEBUG flower 2022-11-28 11:26:11,530 | server.py:165 | fit_round: strategy sampled 2 clients (out of 2)
DEBUG flower 2022-11-28 11:30:43,389 | server.py:177 | fit_round received 2 results and 0 failures
DEBUG flower 2022-11-28 11:30:43,630 | server.py:139 | evaluate: strategy sampled 2 clients
DEBUG flower 2022-11-28 11:30:53,384 | server.py:149 | evaluate received 2 results and 0 failures
INFO flower 2022-11-28 11:30:53,384 | server.py:122 | [TIME] FL finished in 891.6143046000007
INFO flower 2022-11-28 11:30:53,385 | app.py:109 | app_fit: losses_distributed [(1, 2.3196680545806885), (2, 2.3202896118164062), (3, 2.1818180084228516)]
INFO flower 2022-11-28 11:30:53,385 | app.py:110 | app_fit: accuracies_distributed []
INFO flower 2022-11-28 11:30:53,385 | app.py:111 | app_fit: losses_centralized []
INFO flower 2022-11-28 11:30:53,385 | app.py:112 | app_fit: accuracies_centralized []
DEBUG flower 2022-11-28 11:30:53,442 | server.py:139 | evaluate: strategy sampled 2 clients
DEBUG flower 2022-11-28 11:31:02,848 | server.py:149 | evaluate received 2 results and 0 failures
INFO flower 2022-11-28 11:31:02,848 | app.py:121 | app_evaluate: federated loss: 2.1818180084228516
INFO flower 2022-11-28 11:31:02,848 | app.py:125 | app_evaluate: results [('ipv4:127.0.0.1:31539', EvaluateRes(loss=2.1818180084228516, num_examples=10000, accuracy=0.0, metrics={'accuracy': 0.21610000729560852})), ('ipv4:127.0.0.1:31540', EvaluateRes(loss=2.1818180084228516, num_examples=10000, accuracy=0.0, metrics={'accuracy': 0.21610000729560852}))]
INFO flower 2022-11-28 11:31:02,848 | app.py:127 | app_evaluate: failures [] flower 2020-11-18 11:07:56,396 | app.py:77 | app_evaluate: failures []
```

With that, the first fererated approach is completed. As it can be seen the system goes through 3 rounds of fitting and evaluating in all clientes before the results are retived to the server aggregated and redistributed.

### Exercise
Implement the Client and Server code on two separate files and compare the results with the ones here. Was your result similar?

`Answer here`:

# Updating parameters
So, the key element in this kind of approach is the server sends the global model parameters to the client, and the client updates the local model with the parameters received from the server. It then trains the model on the local data (which changes the model parameters locally) and sends the updated/changed model parameters back to the server (or, alternatively, it sends just the gradients back to the server, not the full model parameters).

In `flwr`, this communications is basicly done by two helper functions to load and retrive the local parameters: `set_parameters` and `get_parameters`. This requirement blends extremely weel with non-state approaches such as **PyTorch** or **JAX**, although as it has been proof in the previous example can be also used with **Tensorflow** or, even, **scikit-learn**.

Therefore, the basic structure for any client in this library has the same shape being:

In [None]:
class FlowerClient(fl.client.NumPyClient):
    def __init__(self, net, trainloader, valloader):
        self.net = net
        self.trainloader = trainloader # Dataset for train
        self.valloader = valloader # Dataset to validate

    def get_parameters(self, config):
        return get_parameters(self.net) # To be implemented specific for the framework

    def fit(self, parameters, config):
        set_parameters(self.net, parameters) # also to be implemented specificly for the framework
        train(self.net, self.trainloader, epochs=1)
        return get_parameters(self.net), len(self.trainloader), {}

    def evaluate(self, parameters, config):
        set_parameters(self.net, parameters)
        loss, accuracy = test(self.net, self.valloader)
        return float(loss), len(self.valloader), {"accuracy": float(accuracy)}

In Flower, clients can be created by extending classes `lwr.client.Client` or `lwr.client.NumPyClient`. In this the previous example `NumPyClient` was used because it is easier to implement and requires less code as template. Apart from the extended class, three are the main methods to be implemented:

* get_parameters: Return the current local model parameters

* fit: Receive model parameters from the server, train the model parameters on the local data, and return the (updated) model parameters to the server

* evaluate: Receive model parameters from the server, evaluate the model parameters on the local data, and return the evaluation result to the server

As you can see, the `MyClient` class implemented in the previous example follows this very same structure.

#### Be aware: 
Sometimes, specially when we are simulating several *Clients* in a single device, it could be usesful to use a function to create the client ehn it is required. This is particular important in stateless framework , such as PyTorch, which can make use of cleaper implementation that only create the clients when they are requiredd to train or evaluate. For example, the following code loads different examples for each client before discarting them:

In [None]:
def client_fn(cid: str) -> FlowerClient:
    """Create a Flower client representing a single organization."""

    # Load model
    net = Net().to(DEVICE)

    # Load data (CIFAR-10)
    # Note: each client gets a different trainloader/valloader, so each client
    # will train and evaluate on their own unique data
    trainloader = trainloaders[int(cid)]
    valloader = valloaders[int(cid)]

    # Create a  single Flower client representing a single organization
    return FlowerClient(net, trainloader, valloader)

It might be notice that `myClient`can not be used in this same sense due to the state that it keeps internaly through the function `generate_ann`, however if it is taken out is can be sed in the same way.

So, the clients are already setup to load, fit and evaluate, however, we lack how to integrate the results from the different clients. In Flower terms, it is what is called an Stategy, such as the *Federated Average (FedAvg)*. In a first approach we can use the built-in implementations of the framework, althouhg custom ones can also be used. Let's see an example:

In [None]:
# Create FedAvg strategy
strategy = fl.server.strategy.FedAvg(
        fraction_fit=1.0,  # Sample 100% of available clients for training
        fraction_evaluate=0.5,  # Sample 50% of available clients for evaluation
        min_fit_clients=10,  # Never sample less than 10 clients for training
        min_evaluate_clients=5,  # Never sample less than 5 clients for evaluation
        min_available_clients=10,  # Wait until all 10 clients are available
)

# Start simulation
fl.simulation.start_simulation(
    client_fn=client_fn,
    num_clients=10,
    config=fl.server.ServerConfig(num_rounds=5),
    strategy=strategy,
)

This code would corresponds to the script running on the server and it uses the simulation function to test this kind of apporach in a single device with the precious optimization mentioned to not overload the device performing it. It tells the framework to generate 10 clients and randomly select all of them (`fraction_fit = 1.0`) and train the model on all of them.After receiving the updates from the clients, it perform the aggregation strategy before returning the global model to the clients for the next of the 5 rounds.

One point to highlight is the fact that the framework si not only going to manage the `losses_distributed`, but none of the other metrics. Due to the diversity on the treatment of those measures, the framework cannot know authicaly  handle the aggregation of those metrics. Users need to tell the framework how to handle/aggregate these custom metrics. The strategy will then call these functions whenever it receives fit or evaluate metrics from clients. The two possible functions are `fit_metrics_aggregation_fn` and `evaluate_metrics_aggregation_fn`. For example, the following code would create the average weightd and the previous example can be adapted as if follows:

In [None]:
def weighted_average(metrics: List[Tuple[int, Metrics]]) -> Metrics:
    # Multiply accuracy of each client by number of examples used
    accuracies = [num_examples * m["accuracy"] for num_examples, m in metrics]
    examples = [num_examples for num_examples, _ in metrics]

    # Aggregate and return custom metric (weighted average)
    return {"accuracy": sum(accuracies) / sum(examples)}

# Create FedAvg strategy
strategy = fl.server.strategy.FedAvg(
        fraction_fit=1.0,
        fraction_evaluate=0.5,
        min_fit_clients=10,
        min_evaluate_clients=5,
        min_available_clients=10,
        evaluate_metrics_aggregation_fn=weighted_average,  # put the metric aggregation for the evaluation
)

# Start simulation
fl.simulation.start_simulation(
    client_fn=client_fn,
    num_clients=NUM_CLIENTS,
    config=fl.server.ServerConfig(num_rounds=5),
    strategy=strategy,
)


We will revisit the definition of custom strategies in the following Unit to define ourown to try to minimize some of the problems that federated learning has to address.

### Exercise

Implement the simulation and test it with the CIFAR-10 dataset in a simulation environment.

In [None]:
import tensorflow as tf

# Code to load the dataset
def load_datasets(n_clients):
    # Download and transform CIFAR-10 (train and test)
    cifar10 = tf.keras.datasets.cifar10
 
    # Distribute it to train and test set
    (x_train, y_train), (x_test, y_test) = cifar10.load_data()
    #Normalize data
    #TODO
    
    #Prepare the datasets
    train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
    test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))


    # Split training set into 10 partitions to simulate the individual dataset
    train_partition_size = len(x_train) // n_clients
    test_partition_size = len(x_test) // n_clients
    
    #Randomize the datasets
    
    train_dataset = train_dataset.shuffle(10_000)
    test_dataset = train_dataset.shuffle(2_000)

    # Split each partition 
    train_ds = []
    test_ds = []
    for _ in range(n_clients):
        train_ds.append(train_dataset.take(train_partition_size))    
        train_dataset = train_dataset.skip(train_partition_size)
        test_ds.append(test_dataset.take(test_partition_size))
        test_dataset = test_dataset.skip(test_partition_size)

train_ds, test_ds = load_datasets()

#TODO Client, client_fn and simulation

# Aggregation

In order to close this lesson, lets take a closer look to the key point of those strategies, i. e., the aggregation algorithm. These are the ones responsable to combine the updates from the clients in order to generate the global model and they are defined in the Strategies as we have seen. Generally speaking, there are several  types of aggregation that can be used in federated learning (Reddi et. el, 2020), including:

* Federated averaging (`flwr.server.strategy.FedAvg`): In this approach, each device computes an update to the model parameters based on its local data, and these updates are then averaged together to create the global model. This approach is simple and effective, but it can be sensitive to the size of the updates and the quality of the data on each device.

* Federated weighted averaging: This approach is similar to federated averaging, but each device's update is given a different weight based on the size of its data set or the quality of its data. This can help to give more influence to devices with larger or higher-quality data.

* Federated averaging with momentum (`flwr.server.strategy.FedAvgM`): This approach is similar to federated averaging, but it incorporates a momentum term in order to smooth out the updates and help the model converge more quickly.

* Federated stochastic gradient descent(`flwr.server.strategy.FedAdagrad`): In this approach, each device computes an update to the model parameters based on a small batch of its local data, rather than the entire data set. This can help to reduce the communication overhead and improve the convergence rate of the model.

* Federated ADAM (`flwr.server.strategy.FedAdam`): This approach is a variant of federated stochastic gradient descent that uses the ADAM optimization algorithm to adaptively adjust the learning rate based on the gradient and second moment estimates.

All the previous ones are implemented, with the exception of Federated weigthed averaging, are implemented in the framework and can be use through the different stratefies together with other more uncommon. The choice of aggregation method will depend on the specific characteristics of the data and the requirements of the task.


#### References
* Hard, A., Konečný, J., McMahan, H. B., Richemond-Barakat, C., Sivek, J. S., & Talwar, K. (2018). Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1812.02903.
* Li, Y., Bonawitz, K., & Talwar, K. (2020). Fedprox: An optimizer for communication-efficient federated learning. arXiv preprint arXiv:2002.04283.
* McMahan, H. B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2016). Communication-efficient learning of deep networks from decentralized data. arXiv preprint arXiv:1602.05629.
* Yoon, J., Hard, A., Konečný, J., McMahan, H. B., & Sohl-Dickstein, J. (2018). Federal regression: A simple and scalable method for heterogeneous federated learning. arXiv preprint arXiv:1812.03862.
* Reddi, S., Charles, Z., Zaheer, M., Garrett, Z., Rush, K., Konečný, J., Kumar, S. and McMahan, H.B., 2020. Adaptive federated optimization. arXiv preprint arXiv:2003.00295.