# Get Started with Flower Framework

Welcome to Federated Learning Tutorial using Flower Framework. Flower is a unified approach to federated learning, analytics, and evaluation. Open source, python, and easy to learn and personalize.

https://flower.ai/

## Step 0: Preparation

Before we begin with any actual code, let's make sure that we have everything we need.

### Instaling dependencies
First, we should install the necessary packages

In [1]:
# Linux
!pip install protobuf==4.25.3
!pip install -q flwr[simulation] matplotlib
#!pip install --upgrade tensorflow-metadata

# MacOs
#!pip3 install -U 'flwr[simulation]' torch torchvision scipy



Now that we have all dependencies installed, we can import everything we need for this tutorial:

In [1]:
from collections import OrderedDict
from typing import List, Tuple

import matplotlib.pyplot as plt
import requests
import pandas as pd
import numpy as np

import flwr as fl
from flwr.common import Metrics

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris

It is possible to switch to a runtime that has GPU acceleration enabled (on Google Colab: Runtime > Change runtime type > Hardware accelerator: GPU > Save). Note, however, that Google Colab is not always able to offer GPU acceleration. If you see an error related to GPU availability in one of the following sections, consider switching back to CPU-based execution by setting DEVICE = torch.device("cpu"). If the runtime has GPU acceleration enabled, you should see the output Training on cuda, otherwise it'll say Training on cpu.

### Loading the data

Federated learning can be applied to many different types of tasks across different domains. In this tutorial, we introduce federated learning by training a simple Linear Regression on the popular Abalone dataset. Abalone can be used in classification and regression tasks using 9 features: Sex, Length, Diameter, Height, Whole_weight, Shucked_weight, Viscera_weight, Shell_weight, and Rings.

We simulate having multiple datasets from multiple organizations (also called the "cross-silo" setting in federated learning) by splitting the original Abalone dataset into multiple partitions. Each partition will represent the data from a single organization. We're doing this purely for experimentation purposes.

Each organization will act as a client in the federated learning system. So having 3 organizations participate in a federation means having 3 clients connected to the federated learning server.


Let's now create the Federated Dataset abstraction that from flwr-datasets that partitions the Abalone. We will create small training and test set for each edge device and wrap each of them into a PyTorch DataLoader:

In [3]:
NUM_CLIENTS = 2

# def load_datasets():
#   # URL dataset Abalone at UCI
#   url = "https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data"

#   # Name of columns in the Abalone dataset
#   columns = ["Sex", "Length", "Diameter", "Height", "Whole_weight", "Shucked_weight",
#             "Viscera_weight", "Shell_weight", "Rings"]

#   # Downloading the dataset
#   response = requests.get(url)
#   response.raise_for_status()  # Verifica se a requisição foi bem sucedida

#   # Saving content to a local file (optional)
#   with open("abalone.data", "wb") as file:
#       file.write(response.content)

#   # Loading the dataset into a Pandas DataFrame
#   df = pd.read_csv(url, header=None, names=columns)

#   # Using numpy array_split to split DataFrame into NUM_CLIENTS parts
#   partition = np.array_split(df, NUM_CLIENTS)

#   trainloaders = []
#   testloaders = []
#   # Splitting each partition into training and testing sets
#   for i, part in enumerate(partition):
#     trainloaders, testloaders = train_test_split(part, test_size=0.2, random_state=42)  # 80% train, 20% test

#   return trainloaders, testloaders


# # trainloaders, valloaders, testloader = load_datasets()
# load_datasets()

def load_data():
    data = load_iris()
    X, y = data.data, data.target

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    return X_train, X_test, y_train, y_test

def partition_data(X, y, num_clients):
    partition_size = len(X) // num_clients
    partitions = [(X[i * partition_size:(i + 1) * partition_size], y[i * partition_size:(i + 1) * partition_size])
                  for i in range(num_clients)]
    return partitions


### Implementing Flower Client

In [21]:
from collections import OrderedDict
from typing import Dict, List, Tuple
from sklearn.metrics import log_loss
from flwr.common import NDArrays, Scalar

class SklearnClient(fl.client.NumPyClient):
    def __init__(self, X_train, y_train, X_test, y_test):
      #super().__init__()
      self.model = LogisticRegression(penalty="l2", max_iter=100, warm_start=True)
      self.X_train = X_train
      self.y_train = y_train
      self.X_test = X_test
      self.y_test = y_test
      self.initial_fit()

    def initial_fit(self):
        # Ensure that the model is fitted with at least one sample from each class
        unique_classes = np.unique(self.y_train)
        X_init = []
        y_init = []
        for cls in unique_classes:
            idx = np.where(self.y_train == cls)[0][0]
            X_init.append(self.X_train[idx])
            y_init.append(self.y_train[idx])
        X_init = np.array(X_init)
        y_init = np.array(y_init)
        self.model.fit(X_init, y_init)

    def get_parameters(self, config):
      print(self.X_train)
      print(self.y_train)
      if not hasattr(self.model, "coef_"):
        self.model.fit(self.X_train[:1], self.y_train[:1])

      if self.model.fit_intercept:
        params = [
            self.model.coef_,
            self.model.intercept_
        ]
      else:
        params = [
            self.model.coef_
        ]
      print("****** params *******")
      print(params)
      print("++++++ config +++++++")
      print(config)
      return [param.astype(np.float32) for param in params]

    def set_parameters(self, parameters):
      if self.model.fit_intercept:
        self.model.coef_ = parameters[0]
        self.model.intercept_ = parameters[1]
      else:
        self.model.coef_ = parameters[0]
      return self.model

    def fit(self, parameters, config):
        self.set_parameters(parameters)
        self.model.fit(self.X_train, self.y_train)
        print(f"Training finished for round {config}")
        return self.get_parameters(), len(self.X_train), {}

    def evaluate(self, parameters, config):
        self.set_parameters(parameters)
        y_pred = self.model.predict(self.X_test)
        loss = 1 - accuracy_score(self.y_test, y_pred)
        # loss = log_loss(self.y_test, model.predict_proba(X_test))
        accuracy = self.model.score(self.X_test, self.y_test)
        return loss, len(self.X_test), {"accuracy": accuracy}
        #return 0, 10, {}


  and should_run_async(code)


### Using the Virtual Client Engine

In [16]:
def client_fn(cid: str) -> fl.client.Client:
  X_train, X_test, y_train, y_test = load_data()
  num_clients = NUM_CLIENTS  # Number of clients should match the number of partitions
  partitions = partition_data(X_train, y_train, num_clients)

  partition_id = int(cid)

  print("##### Partition id #######" + cid)

  X_train_cid, y_train_cid = partitions[partition_id]

  return SklearnClient(X_train_cid, y_train_cid, X_test, y_test).to_client()

### Start the training

In [19]:
def fit_config(server_round) -> Dict:
    """Send round number to client."""
    config = {
        "server_round": server_round
    }
    return config

# Define the strategy
strategy = fl.server.strategy.FedAvg(
    fraction_fit=1.0,
    fraction_evaluate=1.0,
    min_fit_clients=1,
    min_evaluate_clients=1,
    min_available_clients=1,
    on_fit_config_fn=fit_config
)

# Simulation configuration
client_resources = {"num_cpus": 1}
num_clients = NUM_CLIENTS
num_rounds = 5


  and should_run_async(code)


In [20]:
history = fl.simulation.start_simulation(
 # strategy=strategy, # the strategy that will construct a client
  client_fn=client_fn, # a function to construct a client
  num_clients=num_clients, # total number of clients in the experiment
  config=fl.server.ServerConfig(num_rounds=1), #let's run for 5 rounds
  client_resources=client_resources,
)


[92mINFO [0m:      Starting Flower simulation, config: num_rounds=1, no round_timeout
INFO:flwr:Starting Flower simulation, config: num_rounds=1, no round_timeout
  self.pid = _posixsubprocess.fork_exec(
2024-06-05 16:15:02,837	INFO worker.py:1621 -- Started a local Ray instance.
[92mINFO [0m:      Flower VCE: Ray initialized with resources: {'memory': 7795256526.0, 'object_store_memory': 3897628262.0, 'node:172.28.0.12': 1.0, 'node:__internal_head__': 1.0, 'CPU': 2.0}
INFO:flwr:Flower VCE: Ray initialized with resources: {'memory': 7795256526.0, 'object_store_memory': 3897628262.0, 'node:172.28.0.12': 1.0, 'node:__internal_head__': 1.0, 'CPU': 2.0}
[92mINFO [0m:      Optimize your simulation with Flower VCE: https://flower.ai/docs/framework/how-to-run-simulations.html
INFO:flwr:Optimize your simulation with Flower VCE: https://flower.ai/docs/framework/how-to-run-simulations.html
[92mINFO [0m:      Flower VCE: Resources for each Virtual Client: {'num_cpus': 1}
INFO:flwr:Flower 

[2m[36m(ClientAppActor pid=56086)[0m ##### Partition id #######0
[2m[36m(ClientAppActor pid=56086)[0m [[4.6 3.6 1.  0.2]
[2m[36m(ClientAppActor pid=56086)[0m  [5.7 4.4 1.5 0.4]
[2m[36m(ClientAppActor pid=56086)[0m  [6.7 3.1 4.4 1.4]
[2m[36m(ClientAppActor pid=56086)[0m  [4.8 3.4 1.6 0.2]
[2m[36m(ClientAppActor pid=56086)[0m  [4.4 3.2 1.3 0.2]
[2m[36m(ClientAppActor pid=56086)[0m  [6.3 2.5 5.  1.9]
[2m[36m(ClientAppActor pid=56086)[0m  [6.4 3.2 4.5 1.5]
[2m[36m(ClientAppActor pid=56086)[0m  [5.2 3.5 1.5 0.2]
[2m[36m(ClientAppActor pid=56086)[0m  [5.  3.6 1.4 0.2]
[2m[36m(ClientAppActor pid=56086)[0m  [5.2 4.1 1.5 0.1]
[2m[36m(ClientAppActor pid=56086)[0m  [5.8 2.7 5.1 1.9]
[2m[36m(ClientAppActor pid=56086)[0m  [6.  3.4 4.5 1.6]
[2m[36m(ClientAppActor pid=56086)[0m  [6.7 3.1 4.7 1.5]
[2m[36m(ClientAppActor pid=56086)[0m  [5.4 3.9 1.3 0.4]
[2m[36m(ClientAppActor pid=56086)[0m  [5.4 3.7 1.5 0.2]
[2m[36m(ClientAppActor pid=56086)[0m  [5.5 2

[91mERROR [0m:     Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/flwr/simulation/ray_transport/ray_client_proxy.py", line 73, in _submit_job
    out_mssg, updated_context = self.actor_pool.get_client_result(
  File "/usr/local/lib/python3.10/dist-packages/flwr/simulation/ray_transport/ray_actor.py", line 399, in get_client_result
    return self._fetch_future_result(cid)
  File "/usr/local/lib/python3.10/dist-packages/flwr/simulation/ray_transport/ray_actor.py", line 280, in _fetch_future_result
    res_cid, out_mssg, updated_context = ray.get(
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 2524, in get
    raise value.as_instanceof_cause()
ra