# Get Started with Flower Framework

Welcome to Federated Learning Tutorial using Flower Framework. Flower is a unified approach to federated learning, analytics, and evaluation. Open source, python, and easy to learn and personalize.

https://flower.ai/

## Step 0: Preparation

Before we begin with any actual code, let's make sure that we have everything we need.

### Instaling dependencies
First, we should install the necessary packages

In [3]:
# Linux
!pip install -q flwr[simulation] matplotlib

# MacOs
#!pip3 install -U 'flwr[simulation]' torch torchvision scipy

Now that we have all dependencies installed, we can import everything we need for this tutorial:

In [4]:
from collections import OrderedDict
from typing import List, Tuple

import matplotlib.pyplot as plt
import requests
import pandas as pd
import numpy as np

import flwr as fl
from flwr.common import Metrics

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris

It is possible to switch to a runtime that has GPU acceleration enabled (on Google Colab: Runtime > Change runtime type > Hardware accelerator: GPU > Save). Note, however, that Google Colab is not always able to offer GPU acceleration. If you see an error related to GPU availability in one of the following sections, consider switching back to CPU-based execution by setting DEVICE = torch.device("cpu"). If the runtime has GPU acceleration enabled, you should see the output Training on cuda, otherwise it'll say Training on cpu.

### Loading the data

Federated learning can be applied to many different types of tasks across different domains. In this tutorial, we introduce federated learning by training a simple Linear Regression on the popular Abalone dataset. Abalone can be used in classification and regression tasks using 9 features: Sex, Length, Diameter, Height, Whole_weight, Shucked_weight, Viscera_weight, Shell_weight, and Rings.

We simulate having multiple datasets from multiple organizations (also called the "cross-silo" setting in federated learning) by splitting the original Abalone dataset into multiple partitions. Each partition will represent the data from a single organization. We're doing this purely for experimentation purposes.

Each organization will act as a client in the federated learning system. So having 3 organizations participate in a federation means having 3 clients connected to the federated learning server.


Let's now create the Federated Dataset abstraction that from flwr-datasets that partitions the Abalone. We will create small training and test set for each edge device and wrap each of them into a PyTorch DataLoader:

In [5]:
NUM_CLIENTS = 3
BATCH_SIZE = 32

# def load_datasets():
#   # URL dataset Abalone at UCI
#   url = "https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data"

#   # Name of columns in the Abalone dataset
#   columns = ["Sex", "Length", "Diameter", "Height", "Whole_weight", "Shucked_weight",
#             "Viscera_weight", "Shell_weight", "Rings"]

#   # Downloading the dataset
#   response = requests.get(url)
#   response.raise_for_status()  # Verifica se a requisição foi bem sucedida

#   # Saving content to a local file (optional)
#   with open("abalone.data", "wb") as file:
#       file.write(response.content)

#   # Loading the dataset into a Pandas DataFrame
#   df = pd.read_csv(url, header=None, names=columns)

#   # Using numpy array_split to split DataFrame into NUM_CLIENTS parts
#   partition = np.array_split(df, NUM_CLIENTS)

#   trainloaders = []
#   testloaders = []
#   # Splitting each partition into training and testing sets
#   for i, part in enumerate(partition):
#     trainloaders, testloaders = train_test_split(part, test_size=0.2, random_state=42)  # 80% train, 20% test

#   return trainloaders, testloaders


# # trainloaders, valloaders, testloader = load_datasets()
# load_datasets()

def load_data():
    data = load_iris()
    X, y = data.data, data.target
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    return X_train, X_test, y_train, y_test

def partition_data(X, y, num_clients):
    partition_size = len(X) // num_clients
    partitions = [(X[i * partition_size:(i + 1) * partition_size],
                   y[i * partition_size:(i + 1) * partition_size])
                  for i in range(num_clients)]
    return partitions


### Implementing Flower Client

In [6]:
class SklearnClient(fl.client.NumPyClient):
    def __init__(self, X_train, y_train, X_test, y_test):
        self.model = LogisticRegression(max_iter=100)
        self.X_train = X_train
        self.y_train = y_train
        self.X_test = X_test
        self.y_test = y_test

    def get_parameters(self):
        return [val for _, val in sorted(self.model.get_params().items())]

    def set_parameters(self, parameters):
        params_dict = {k: v for k, v in zip(sorted(self.model.get_params().keys()), parameters)}
        self.model.set_params(**params_dict)

    def fit(self, parameters, config):
        self.set_parameters(parameters)
        self.model.fit(self.X_train, self.y_train)
        return self.get_parameters(), len(self.X_train), {}

    def evaluate(self, parameters, config):
        self.set_parameters(parameters)
        y_pred = self.model.predict(self.X_test)
        loss = 1 - accuracy_score(self.y_test, y_pred)
        return loss, len(self.X_test), {}


### Using the Virtual Client Engine

In [7]:
def client_fn(cid: str) -> SklearnClient:
    X_train, X_test, y_train, y_test = load_data()
    num_clients = 3  # Number of clients should match the number of partitions
    partitions = partition_data(X_train, y_train, num_clients)

    partition_id = int(cid)
    X_train_cid, y_train_cid = partitions[partition_id]

    return SklearnClient(X_train_cid, y_train_cid, X_test, y_test)


### Start the training

In [8]:
# Define the strategy
strategy = fl.server.strategy.FedAvg(
    fraction_fit=1.0,
    fraction_evaluate=1.0,
    min_fit_clients=2,
    min_evaluate_clients=2,
    min_available_clients=2,
)

# Simulation configuration
client_resources = {"num_cpus": 1}
num_clients = 3
num_rounds = 5


In [None]:
fl.simulation.start_simulation(
  strategy=strategy,
  client_fn=client_fn,
  num_clients=num_clients,
  config=fl.server.ServerConfig(num_rounds=num_rounds, round_timeout=None),
  client_resources=client_resources,
)
