# Revisiting the Strategies in Federated Learning

As mentioned in the previous unit, Strategies are at the core of federated learning. They determine how clients are selected, which updates are used, and how the new changes are aggregated.

In this unit, we will focus on custom Strategies. To begin, we need to set up the environment for this notebook's development.

### Exercise
As you are familiar with one of the deep learning frameworks, you can implement the following part based on your preference, either PyTorch, Tensorflow, or JAX.


In [None]:
import numpy as np
import tensorflow as tf
from typing import List, Dict, Optional, Tuple, Union

import flwr as fl


#Load the CIFAR-10 in NUM_CLIENTS different subsets for the training and test as it has been in the previous unit
import numpy as np

NUM_CLIENTS = 10


# Code to load the dataset
def load_datasets(num_clients: int):
    # Distribute it to train and test set
    (x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
    # Normalize data
    x_train = x_train.astype("float32") / 255.0
    x_test = x_test.astype("float32") / 255.0

    x_train, y_train = x_train[:10_000], y_train[:10_000]
    x_test, y_test = x_test[:1000], y_test[:1000]

    # Randomize the datasets
    #TODO

    # Split training set into 'num_clients' partitions to simulate the individual dataset
    #TODO

    # Split each partition
    #TODO
    
    return train_ds, val_ds, test_ds


trainloaders, valloaders, testloader = load_datasets(NUM_CLIENTS)

# Define the model to be used in the clients

# The part to adjust for each framework
def generate_ann():
    model = tf.keras.Sequential([
        tf.keras.layers.Flatten(input_shape=(32, 32, 3)),
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(10, activation='softmax')
    ])

    model.compile(
        loss=tf.keras.losses.sparse_categorical_crossentropy,
        optimizer=tf.keras.optimizers.Adam(),
        metrics=['accuracy']
    )
    return model


def get_parameters(net) -> List[np.array]:
    return net.get_weights()


def set_parameters(net, parameters: List[np.ndarray]):
    #TODO
    return net


def train(net, trainloader, epochs: int):
    #TODO
    return net


def test(net, testloader):
    #TODO
    return loss, accuracy

# Class to contain a Client
class FlowerClient(fl.client.NumPyClient):
    #TODO

def client_fn(Context) -> Client:
    #TODO

Considering the previous code, the model developed has several possibilities for implementing the Strategy object such as `FedAvg` or `FedAdagrad`,  as seen in the  previous Unit. For example, the following code should create a strategy. 

In [None]:
# Create an instance of the model and get the parameters
model = generate_ann()
params = get_parameters(model)

In [None]:
from flwr.server import ServerApp, ServerAppComponents

num_rounds = 3

def server_fn(context: Context):
    #TODO...


    # Create FedAvg strategy
    strategy = fl.server.strategy.FedAvg(
            fraction_fit=0.3,  
            fraction_evaluate=0.3,  
            min_fit_clients=3,
            min_evaluate_clients=2,
            min_available_clients=NUM_CLIENTS, 
            initial_parameters=fl.common.ndarrays_to_parameters(params), # Initial parameters
    )

    # Define ServerConfig
    #TODO

    # Return the configuration and strategy for this server
    return ServerAppComponents(strategy=strategy, config=config)

# Create Server
server_app = ServerApp(server_fn=server_fn)

    #Start the simulation
fl.simulation.run_simulation(
    server_app=server_app, client_app=client_app, num_supernodes=NUM_CLIENTS
)

It may be worth mentioning that Flower, by default, initializes the global model by making a call to one random client before distributing it to the remaining clients. However, sometimes more control is required, such as when performing fine-tuning. In such situations, we use server-side initialization, and the `initial_parameters` parameter will hold the initial version of the model for all clients. It is important to note that this parameter must be a serialization of the data, so the utility function `ndarrays_to_parameters` can be quite handy in this case.

Now, let's move on to customizing the type of evaluation performed on the models. Broadly speaking, there are two possibilities: server-side evaluation and client-side evaluation.

**Centralized evaluation** (server-side) is similar to traditional machine learning, where the server holds a partition solely for evaluating the aggregated model. This approach reduces communication and is suitable for situations with limited bandwidth. There is no need to send the model to the clients for evaluation, and the entire evaluation dataset is available at all times.

**Federated evaluation** (client-side) is more complex, but it usually represents real-world scenarios more accurately. In this approach, the evaluation dataset is distributed among the clients, which means that we can leverage a larger dataset spread among the resources of the clients. However, this approach comes with a cost. Since we don't have a central dataset, we should be aware that our evaluation dataset can change over consecutive rounds of learning if some clients are not always available. Moreover, the dataset held by each client can also change over consecutive rounds. This can lead to evaluation results that are not stable, so even if we don't change the model, we can see our evaluation results fluctuate over consecutive rounds. Additionally, this approach can significantly increase the number of communications because the models have to be distributed among the clients and retrieved for evaluation.

The previous code snippet is an example of Flower performing Federated evaluations, as it uses the `evaluation` function that is executed on each `Client` and later aggregated after being sent to the server. On the other hand, a Centralized evaluation could be performed with a similar approach, as shown in the following code snippet:


In [None]:
def get_evaluate_fn(input_dataset):
# The `evaluate` function will be by Flower called after every round
    def evaluate_fn(
        server_round: int, parameters: fl.common.NDArrays, 
        config: Dict[str, fl.common.Scalar]) -> Optional[Tuple[float, Dict[str, fl.common.Scalar]]]:
        #Load the test data
        dataset = input_dataset
        
        # Update the model with the latest parameters
        set_parameters(model, parameters)
    
        # Evaluate the model on the test dataset
        loss, accuracy = test(model, dataset)
    
        # Log the evaluation results
        print(f"Server-side evaluation round {server_round} with loss {loss} / accuracy {accuracy}")
        
        return loss, {"accuracy": accuracy}

    return evaluate_fn

def server_fn(context: Context):
    #TODO ... 


    # Create FedAvg strategy
    strategy = fl.server.strategy.FedAvg(
            fraction_fit=0.3,  
            fraction_evaluate=0.3,  
            min_fit_clients=3,
            min_evaluate_clients=2,
            min_available_clients=NUM_CLIENTS, 
            initial_parameters=fl.common.ndarrays_to_parameters(params),
            evaluate_fn=get_evaluate_fn(testloader[0]),  # Pass the evaluation function
    )

    # Define ServerConfig
    #TODO ...

    # Return the configuration and strategy for this server
    return ServerAppComponents(strategy=strategy, config=config)

# Create Server
server_app = ServerApp(server_fn=server_fn)

#Start the simulation
fl.simulation.run_simulation(
    server_app=server_app, client_app=client_app, num_supernodes=NUM_CLIENTS
)


Additionally, it is possible to implement a custom strategy from scratch by implementing the necessary methods and extending `flwr.server.strategy.Strategy`. The required methods for a custom strategy are as follows:
* `num_fit_clients`: returns the number of clients to be selected for the next round of training.
* `num_rounds`: returns the number of rounds of training to perform.
* `on_fit`: called when a client has completed training and returned its updated model. This method should update the global model based on the returned model.
* `on_evaluate`: called when a client has completed an evaluation and returned its evaluation result. This method should aggregate the evaluation results.


You can see an schema of the methods and an example in the following [link](https://flower.ai/docs/framework/tutorial-series-build-a-strategy-from-scratch-pytorch.html#Build-a-Strategy-from-scratch)

# Challenges for Federated Learning

While federated learning can solve problems that traditional centralized machine learning struggles with, such as privacy and reduced hardware requirements, it also presents its own challenges. In this section, we will cover some of these challenges, including the non-IID (independent and identically distributed) nature of the data, the heterogeneous nature of devices, and the limited communication bandwidth.

## Non-IID data
The assumption of independence and identical distribution, or i.i.d., is commonly made in machine learning and statistical analysis. This means that each data point is independent of all other data points, and that the distribution of the data is the same across all data points.

Non-i.i.d. data, on the other hand, violates one or both of these assumptions. This can occur for a variety of reasons. For example, data may be collected in a way that introduces dependencies between data points, such as when data is collected over time or in a specific order. Additionally, the distribution of the data may vary across different subgroups or regions, making it non-i.i.d. Non-i.i.d. data is a common challenge in federated learning because the data is distributed across many devices, and each device may have a different distribution of data due to variations in data collection methods or data sources. As a result, traditional machine learning algorithms may not perform well on non-i.i.d. data.

In a non-IID data problem (see Figure 1(a)), "non-IIDness" (see Figure 1(c)) refers to the presence of couplings (such as co-occurrence, neighborhood, dependency, linkage, correlation, and causality) and heterogeneities within and between two or more aspects, such as entities, entity classes, entity properties (variables), processes, facts, and states of affairs, or other types of entities or properties (such as learners and learned results) that appear or are produced prior to, during, and after a target process (such as a learning task). Conversely, IIDness ignores or simplifies these relationships, as shown in Figure 1(b).

![Diagram with IID and non-IID data](https://datasciences.org/wp-content/themes/dslabNew/images/datasciences/IIDness.png)
Credit: [Source of the image](https://datasciences.org/non-iid-learning/)

Non-i.i.d. data can be more challenging to work with than i.i.d. data because standard statistical assumptions and techniques may not be applicable. Therefore, special techniques may need to be employed to analyze non-i.i.d. data, which may include techniques that take into account the dependencies between data points or the varying data distributions.

In this context, non-i.i.d. data refers to the fact that the data on each device may differ in terms of distribution, characteristics, and relevance to the task at hand. For instance, the data on one device may comprise mainly images of dogs, while the data on another device may consist mainly of images of cats. This can pose a challenge in training a model that performs well on all the devices because the data on each device can vary significantly from the data on the other devices.

To address non-i.i.d. data in federated learning, special techniques are often employed to weigh the contributions of each device's data to the overall model, or to adjust the model's parameters in a way that considers the differences in the data. Furthermore, techniques such as data augmentation and transfer learning could help to generalize the model beyond the device's data.

When discussing Flower, the approach to addressing this problem would involve [implementing](https://flower.ai/docs/framework/how-to-implement-strategies.html) a custom strategy, similar to the following example, that uses a custom aggregation of the results.


In [None]:
from flwr.common import EvaluateRes, FitRes, Scalar
from flwr.server.client_proxy import ClientProxy

class AggregateCustomMetricStrategy(fl.server.strategy.FedAvg):
    #aggregate_evaluate is responsible for aggregating the results 
    #returned by the clients that were selected and asked to evaluate in configure_evaluate.
    def aggregate_evaluate(
        self,
        server_round: int,
        results: List[Tuple[ClientProxy, EvaluateRes]],
        failures: List[Union[Tuple[ClientProxy, FitRes], BaseException]],
    ) -> Tuple[Optional[float], Dict[str, Scalar]]:
        """Aggregate evaluation accuracy using weighted average."""

        if not results:
            return None, {}

        # Call aggregate_evaluate from base class (FedAvg) to aggregate loss and metrics
        aggregated_loss, aggregated_metrics = super().aggregate_evaluate(server_round, results, failures)

        # Weigh accuracy of each client by number of examples used
        accuracies = [r.metrics["accuracy"] * r.num_examples for _, r in results]
        examples = [r.num_examples for _, r in results]

        # Aggregate and print custom metric
        aggregated_accuracy = sum(accuracies) / sum(examples)
        print(f"Round {server_round} accuracy aggregated from client results: {aggregated_accuracy}")

        # Return aggregated loss and metrics (i.e., aggregated accuracy)
        return aggregated_loss, {"accuracy": aggregated_accuracy}


def server_fn(context: Context):
    # instantiate the model
    #TODO

    # Create strategy and run server
    strategy = AggregateCustomMetricStrategy(
    #TODO
    )
    
    # Define ServerConfig
    #TODO ...

    # Wrap everything into a `ServerAppComponents` object
    return ServerAppComponents(strategy=strategy, config=config)


# Create your ServerApp
server_app = ServerApp(server_fn=server_fn)

#Start the simulation
fl.simulation.run_simulation(
    server_app=server_app, client_app=client_app, num_supernodes=NUM_CLIENTS
)


## Heterogeneity of the devices

The heterogeneity of the devices in the network, which means they may have different hardware and software configurations and may be running different versions of the operating system, is one of the problems of federated learning. This can lead to several problems, including:

* Inefficient communication: Different devices may have varying network speeds and bandwidth, which can make it difficult to transmit model updates between devices in a timely manner.

* Incompatible updates: If different devices are running different versions of the operating systems, they may not be able to exchange model updates due to compatibility issues.

* Data heterogeneity: The data on different devices may differ in terms of quality, quantity, and format, making it challenging to train a model that generalizes well across all devices.



To mitigate the impact of heterogeneous devices in federated learning, researchers are developing techniques such as device-aware aggregation algorithms and communication optimization. These techniques aim to address issues such as inefficient communication and incompatible updates resulting from differences in network speeds, bandwidth, operating system versions, and data heterogeneity across the devices.


Consider a network of five devices (A, B, C, D, and E) that are participating in federated learning to train a global model. Each device has its own data and trains a local model on that data. The local models are then transmitted back to a central server, where they are aggregated and used to update the global model.


In the above scenario, the participating devices (A, B, C, D, and E) in the federated learning network are heterogeneous in nature, meaning they possess different hardware and software configurations. For instance, Device A and Device B may be running distinct versions of the operating system, and Device C may have a slower network connection in comparison to the other devices.


This heterogeneity in the devices can create challenges in the federated learning process. For instance, Device A may face difficulty sending its local model update to the server because of compatibility issues with Device B, and Device C may experience a slower transmission due to its slower network connection.


To overcome the challenges posed by heterogeneous devices in federated learning, researchers are developing techniques to mitigate their impact. These techniques may include device-aware aggregation algorithms, which take into account the different hardware and software configurations of the devices, and communication optimization techniques such as data compression and intelligent routing. By adapting the way that data is aggregated and transmitted, these techniques can help to ensure that all devices are able to contribute effectively to the global model, regardless of their individual characteristics.



It is also worth mentioning that a local configuration can be provided to the `Clients` by means of the `config` parameter of the function in the `FlowerClient`. This parameter is a Python `Dict` which holds values that can be used internally for different purposes, such as limiting the number of epochs on certain clients or establishing the number of rounds.


The modification for the strategy in this case would require the use of parameter `on_fit_config` to indicate the function to retrieve the correct configuration.


```python

...

def fit_config(server_round: int):
    """Return training configuration dict for each round.

    Perform two rounds of training with one local epoch, increase to two local
    epochs afterwards.
    """
    config = {
        "server_round": server_round,  # The current round of federated learning
        "local_epochs": 1 if server_round < 2 else 2,
    }
    return config

...

strategy = fl.server.strategy.FedAvg(
    fraction_fit=0.3,
    fraction_evaluate=0.3,
    min_fit_clients=3,
    min_evaluate_clients=3,
    min_available_clients=10,
    initial_parameters=fl.common.ndarrays_to_parameters(get_parameters(model)),
    evaluate_fn=evaluate,
    on_fit_config_fn=fit_config,  # Pass the fit_config function
)

...
```

However, sometimes limiting the number of rounds or the number of epochs for each client is not enough, especially when the number of clients is too large to handle. In such cases, it may be necessary to reduce the number of clients used for training and evaluation. For instance, consider a scenario where there are 1000 clients, each with only 50 samples for training and 10 for evaluation. Although the amount of data in each client is limited, the communication overhead can still be overwhelming. In such cases, it is better to train for a longer time with a smaller number of clients in each round.

In [None]:
NUM_CLIENTS = 1000

trainloaders, valloaders, testloader = load_datasets(NUM_CLIENTS)

def fit_config(server_round: int):
    config = {
        "server_round": server_round,
        "local_epochs": 3,
    }
    return config

def server_fn(context: Context):
    # instantiate the model
    #TODO

    # Create strategy and run server
    strategy = fl.server.strategy.FedAvg(
        fraction_fit=0.025,  # Train on 25 clients (each round)
        fraction_evaluate=0.05,  # Evaluate on 50 clients (each round)
        min_fit_clients=20,
        min_evaluate_clients=40,
        min_available_clients=NUM_CLIENTS,
        initial_parameters=fl.common.ndarrays_to_parameters(params),
        on_fit_config_fn=fit_config
    )
   
    # Define ServerConfig
    #TODO ...

    # Wrap everything into a `ServerAppComponents` object
    return ServerAppComponents(strategy=strategy, config=config)

# Create your ServerApp
server_app = ServerApp(server_fn=server_fn)

#Start the simulation
fl.simulation.run_simulation(
    server_app=server_app, client_app=client_app, num_supernodes=NUM_CLIENTS
)

In addition to the techniques mentioned earlier, federated transfer learning, secure aggregation, and data augmentation are other approaches that can help in the scaling of the federated learning system. The limitation of resources, including bandwidth, storage, and computation power, is one of the main challenges of federated learning.


### Exercise

As evident from the previous results, the outcomes are not remarkable, mainly attributed to the limited number of patterns for each client. In response, suggest an alternative architecture for the network and experiment with at least four different configurations for the fraction_fit and evaluate. Subsequently, analyze the data and draw conclusions from your findings.