# Get Started with Flower Framework

Welcome to Federated Learning Tutorial using Flower Framework. Flower is a unified approach to federated learning, analytics, and evaluation. Open source, python, and easy to learn and personalize.

https://flower.ai/

## Step 0: Preparation

Before we begin with any actual code, let's make sure that we have everything we need.

### Instaling dependencies
First, we should install the necessary packages

In [9]:
# Linux
!pip install -q flwr[simulation] flwr_datasets[vision] matplotlib

# MacOs
#!pip3 install -U 'flwr[simulation]' torch torchvision scipy

Now that we have all dependencies installed, we can import everything we need for this tutorial:

In [10]:
from collections import OrderedDict
from typing import List, Tuple

import matplotlib.pyplot as plt
import requests
import pandas as pd
import numpy as np
from datasets.utils.logging import disable_progress_bar

import flwr as fl
from flwr.common import Metrics

from sklearn.model_selection import train_test_split


disable_progress_bar()

It is possible to switch to a runtime that has GPU acceleration enabled (on Google Colab: Runtime > Change runtime type > Hardware accelerator: GPU > Save). Note, however, that Google Colab is not always able to offer GPU acceleration. If you see an error related to GPU availability in one of the following sections, consider switching back to CPU-based execution by setting DEVICE = torch.device("cpu"). If the runtime has GPU acceleration enabled, you should see the output Training on cuda, otherwise it'll say Training on cpu.

### Loading the data

Federated learning can be applied to many different types of tasks across different domains. In this tutorial, we introduce federated learning by training a simple Linear Regression on the popular Abalone dataset. Abalone can be used in classification and regression tasks using 9 features: Sex, Length, Diameter, Height, Whole_weight, Shucked_weight, Viscera_weight, Shell_weight, and Rings.

We simulate having multiple datasets from multiple organizations (also called the "cross-silo" setting in federated learning) by splitting the original Abalone dataset into multiple partitions. Each partition will represent the data from a single organization. We're doing this purely for experimentation purposes.

Each organization will act as a client in the federated learning system. So having 3 organizations participate in a federation means having 3 clients connected to the federated learning server.


Let's now create the Federated Dataset abstraction that from flwr-datasets that partitions the Abalone. We will create small training and test set for each edge device and wrap each of them into a PyTorch DataLoader:

In [12]:
NUM_CLIENTS = 3
BATCH_SIZE = 32

def load_datasets():
  # URL dataset Abalone at UCI
  url = "https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data"

  # Name of columns in the Abalone dataset
  columns = ["Sex", "Length", "Diameter", "Height", "Whole_weight", "Shucked_weight",
            "Viscera_weight", "Shell_weight", "Rings"]

  # Downloading the dataset
  response = requests.get(url)
  response.raise_for_status()  # Verifica se a requisição foi bem sucedida

  # Saving content to a local file (optional)
  with open("abalone.data", "wb") as file:
      file.write(response.content)

  # Loading the dataset into a Pandas DataFrame
  df = pd.read_csv(url, header=None, names=columns)

  # Using numpy array_split to split DataFrame into NUM_CLIENTS parts
  partition = np.array_split(df, NUM_CLIENTS)

  trainloaders = []
  testloaders = []
  # Splitting each partition into training and testing sets
  for i, part in enumerate(partition):
    trainloaders, testloaders = train_test_split(part, test_size=0.2, random_state=42)  # 80% train, 20% test

  return trainloaders, testloaders


# trainloaders, valloaders, testloader = load_datasets()
load_datasets()

(     Sex  Length  Diameter  Height  Whole_weight  Shucked_weight  \
 3528   I   0.350     0.265   0.085        0.1735          0.0775   
 3247   F   0.610     0.495   0.190        1.2130          0.4640   
 3769   F   0.560     0.430   0.145        0.8980          0.3895   
 3462   F   0.625     0.470   0.170        1.2550          0.5250   
 3946   M   0.525     0.410   0.165        0.8005          0.2635   
 ...   ..     ...       ...     ...           ...             ...   
 3880   I   0.380     0.300   0.100        0.2860          0.1305   
 3915   I   0.560     0.445   0.165        1.0285          0.4535   
 4079   M   0.550     0.385   0.130        0.7275          0.3430   
 3645   I   0.475     0.335   0.100        0.4425          0.1895   
 3911   I   0.355     0.270   0.100        0.2160          0.0830   
 
       Viscera_weight  Shell_weight  Rings  
 3528          0.0340         0.056      6  
 3247          0.3060         0.365     15  
 3769          0.2325         0.245

### Implementing Flower Client

In [None]:
from sklearn.metrics import accuracy_score

class FlowerClient(fl.client.NumPyClient):
    def __init__(self, X_train, y_train, X_test, y_test):
        self.model = LogisticRegression(max_iter=100)
        self.X_train = X_train
        self.y_train = y_train
        self.X_test = X_test
        self.y_test = y_test

    def get_parameters(self):
        return [val for _, val in sorted(self.model.get_params().items())]

    def set_parameters(self, parameters):
        params_dict = {k: v for k, v in zip(sorted(self.model.get_params().keys()), parameters)}
        self.model.set_params(**params_dict)

    def fit(self, parameters, config):
        self.set_parameters(parameters)
        self.model.fit(self.X_train, self.y_train)
        return self.get_parameters(), len(self.X_train), {}

    def evaluate(self, parameters, config):
        self.set_parameters(parameters)
        y_pred = self.model.predict(self.X_test)
        loss = 1 - accuracy_score(self.y_test, y_pred)
        return loss, len(self.X_test), {}

### Using the Virtual Client Engine

In [None]:
def client_fn(cid: str) -> FlowerClient:
    """Create a Flower client representing a single organization."""

    # Load model
    net = Net().to(DEVICE)

    # Load data (CIFAR-10)
    # Note: each client gets a different trainloader/valloader, so each client
    # will train and evaluate on their own unique data
    trainloader = trainloaders[int(cid)]
    valloader = valloaders[int(cid)]

    # Create a  single Flower client representing a single organization
    return FlowerClient(net, trainloader, valloader).to_client()

### Start the training

In [None]:
# Create FedAvg strategy
strategy = fl.server.strategy.FedAvg(
    fraction_fit=1.0,  # Sample 100% of available clients for training
    fraction_evaluate=0.5,  # Sample 50% of available clients for evaluation
    min_fit_clients=10,  # Never sample less than 10 clients for training
    min_evaluate_clients=5,  # Never sample less than 5 clients for evaluation
    min_available_clients=10,  # Wait until all 10 clients are available
)

# Specify the resources each of your clients need. By default, each
# client will be allocated 1x CPU and 0x GPUs
client_resources = {"num_cpus": 1, "num_gpus": 0.0}
if DEVICE.type == "cuda":
    # here we are assigning an entire GPU for each client.
    client_resources = {"num_cpus": 1, "num_gpus": 1.0}
    # Refer to our documentation for more details about Flower Simulations
    # and how to setup these `client_resources`.

# Start simulation
fl.simulation.start_simulation(
    client_fn=client_fn,
    num_clients=NUM_CLIENTS,
    config=fl.server.ServerConfig(num_rounds=5),
    strategy=strategy,
    client_resources=client_resources,
)


def start_server():
    strategy = fl.server.strategy.FedAvg(
        fraction_fit=1.0,
        fraction_evaluate=1.0,
        min_fit_clients=2,
        min_evaluate_clients=2,
        min_available_clients=2,
    )
    fl.server.start_server(config={"num_rounds": 3}, strategy=strategy)


The Accuracy

In [None]:
def weighted_average(metrics: List[Tuple[int, Metrics]]) -> Metrics:
    # Multiply accuracy of each client by number of examples used
    accuracies = [num_examples * m["accuracy"] for num_examples, m in metrics]
    examples = [num_examples for num_examples, _ in metrics]

    # Aggregate and return custom metric (weighted average)
    return {"accuracy": sum(accuracies) / sum(examples)}

## Step 2: Simulation

In [None]:
# Create FedAvg strategy
strategy = fl.server.strategy.FedAvg(
    fraction_fit=1.0,
    fraction_evaluate=0.5,
    min_fit_clients=10,
    min_evaluate_clients=5,
    min_available_clients=10,
    evaluate_metrics_aggregation_fn=weighted_average,  # <-- pass the metric aggregation function
)

# Start simulation
fl.simulation.start_simulation(
    client_fn=client_fn,
    num_clients=NUM_CLIENTS,
    config=fl.server.ServerConfig(num_rounds=5),
    strategy=strategy,
    client_resources=client_resources,
)