# Handbook
## Decentralized Federated Learning: Enhancing Reliability with Blockchain

### Introduction

This handbook supplements the [same name tutorial of DSN 2025](https://dsn2025.github.io/cptutorials.html). 
It practices the concepts explained in python, but still at high-level using [mocks](https://en.wikipedia.org/wiki/Mock_object).

This handbook is a [jupyter notebook](https://jupyter.org/). If you are not familiar, a Jupyter Notebook is an interactive environment for writing and running code. 
It combines code and text in a single document, making it ideal for exploratory programming and sharing results.
You will only be required to run some pre-defined cells (with shift+enter when the cell is focused) and to write some piece of code in blank cells, just as if you were in your preferred IDE.


We will follow the same structure as the tutorial.
If you have any question, or found any mistake, feel free to tell us.

We are using "Components" and "Systems", combination of components, any of which having specific properties. Your goal is to combine the proper components to build systems with the desired properties.

To setup the python project, please run:

In [2]:
# %load "https://raw.githubusercontent.com/maxper4/Tuto-DSN-2025/refs/heads/master/src/system_base.py"
class SystemBase:
    required_components_names = set([])
    intrinsic_properties = []

    def __init__(self, name, components):
        if not isinstance(name, str):
            raise TypeError("Name should be a string")
        self.name = name
        if not isinstance(components, dict):
            raise TypeError("Components should be a dictionary")
        if self.required_components_names != components.keys():
            raise ValueError(f"Components for {self.name} should be: {self.required_components_names}")
        if not all(isinstance(c, Component) or isinstance(c, SystemBase) for c in components.values()):
            raise TypeError("All components should be instances of Component")
        
        self.components = components
        self.properties = self._merge_properties()

    def refresh(self):
        self.properties = self._merge_properties()

    def _merge_properties(self):
        all = list(set([a for _,c in self.components.items() for a in c.properties] + self.intrinsic_properties))
        if "property 1" in all and "property 2" in all:
            all.remove("property 1")
            all.remove("property 2")
            all.append("combination property 1 and 2")
        if "untrusted" in all and self.__class__.__name__ == "FederatedLearning":
            all.append("unscalable")
        if "corrupted" in all and "privacy-preserving" in all:
            all.remove("privacy-preserving")
            all.remove("corrupted")
            all.append("privacy-leaking")
        if "not cool" in all and "cool" in all:
            all.remove("not cool")
        if "untrusted" in all and "trusted" in all:
            all.remove("trusted")
        if "centralized" in all and "decentralized" in all:
            all.remove("decentralized")
        if "data-available" in all and "unavailable" in all:
            all.remove("unavailable")
            all.remove("data-available")
            all.append("available")
        return all
    
    def display(self):
        print(f"{self.name}\nProperties: {', '.join(self.properties)}")
        print("Components:")
        for n, component in self.components.items():
            print(f"{n}: {component.info()}")
        print()

    def info(self):
        return f"{self.name} ({', '.join(self.properties)})"

class Component():
    def __init__(self, name, properties):
        if not isinstance(name, str):
            raise TypeError("Name should be a string")
        self.name = name
        if not isinstance(properties, list):
            raise TypeError("Properties should be a list")
        if not all(isinstance(attr, str) for attr in properties):
            raise TypeError("All properties should be strings")
        if properties == None:
            raise ValueError("Properties cannot be empty")
        self.properties = properties

    def info(self):
        return f"{self.name} ({', '.join(self.properties)})"

class ExampleSystem(SystemBase):
    required_components_names = set(["Component 1", "Component 2"])
    intrinsic_properties = ["intrinsic property"]

class Blockchain(SystemBase):
    required_components_names = set(["Consensus", "Transactions"]) 
    intrinsic_properties = ["trusted"]

class FederatedLearning(SystemBase):
    required_components_names = set(["Aggregation", "Models Storage"])
    intrinsic_properties = ["privacy-preserving"]

class DistributedStorage(Component):
    def __init__(self):
        super().__init__("Distributed Storage", ["scalable", "unavailable"])

class CentralizedServer(Component):
    def __init__(self):
        super().__init__("Centralized Server", ["untrusted"])
        
    def corrupt(self):
        self.properties.append("corrupted")
        print("Centralized Server has been corrupted! It will now try to mislead the system.")

In [3]:
empty = Component("Empty", [])

We show in the next cell an example component and combine it in a system.

In [4]:
example_component = Component("Example Name", ["property 1", "property 2"])
example_system = ExampleSystem("Example System", {
        "Component 1": example_component,
        "Component 2": example_component
    })
example_system.display()

Example System
Properties: intrinsic property, combination property 1 and 2
Components:
Component 1: Example Name (property 1, property 2)
Component 2: Example Name (property 1, property 2)



### 1. Federated Learning


#### Properties

We start by building a basic Federated Learning system using a centralized server component. As we explained in the tutorial, we cannot trust it.

In [5]:
from src.system_base import FederatedLearning, CentralizedServer

centralized_server = CentralizedServer()
centralized_server.info()

'Centralized Server (untrusted)'

A Federated Learning system is composed of two main parts: the aggregation, combining the models from the clients, and the storage of the aggregated model.
We first use a centralized server component to aggregate the models from the clients and store the aggregated model. Please replace the empty component by the adequate one in the next cell.

In [6]:
centralized_federated_learning = FederatedLearning("Centralized Federated Learning", {
        "Aggregation": centralized_server,
        "Models Storage": centralized_server
    })
centralized_federated_learning.display()

Centralized Federated Learning
Properties: untrusted, privacy-preserving, unscalable
Components:
Aggregation: Centralized Server (untrusted)
Models Storage: Centralized Server (untrusted)



You should obtain the following properties: "untrusted", "privacy-preserving" and "unscalable".

#### Implementation

Let's have a closer look at how it really works in practice.
In the next cell, we will code the client side of a step of Federated Learning.
You are provided with some (private) data, a function train producing a new model, and a function send communicating with the server.


In [7]:
def client_federated_learning(data, train, send):
    model = train(data)
    send(model)

Now, we will implement the server side of a step of Federated Learning. You are provided with a function receive to get the models from the clients, a function to send to the clients and a function aggregate to combine them into a new model.

In [8]:
def server_federated_learning(aggregate, receive, send):
    ""

#### Limitations

We imagine that we have a full implementation of a Federated Learning system using a centralized server. 
Now let's see what happens if the centralized server is compromised.

In [18]:
centralized_server.corrupt()

Centralized Server has been corrupted! It will now try to mislead the system.


The repercussions of a compromised server are severe. The server can manipulate the aggregated model, leading to biased or malicious outcomes. This is a significant risk in Federated Learning systems, as it undermines the trustworthiness of the entire system.

In [19]:
centralized_federated_learning.refresh()
centralized_federated_learning.display() # privacy-leaking -> tampered model

Centralized Federated Learning
Properties: untrusted, unscalable, privacy-leaking
Components:
Aggregation: Centralized Server (untrusted, corrupted)
Models Storage: Centralized Server (untrusted, corrupted)



### 2. Blockchain


#### Properties

Blockchain is proposed as a solution to enhance the reliability of Federated Learning systems. It provides a decentralized and tamper-proof ledger that can be used to store the aggregated model and the updates from the clients. Let's see its properties.

In [11]:
blockchain = Blockchain("Blockchain Example", {
        "Consensus": empty,
        "Transactions": empty
    })
blockchain.display()

Blockchain Example
Properties: trusted
Components:
Consensus: Empty ()
Transactions: Empty ()



#### Implementation

#### Limitations


### 3. Blockchain based Federated Learning

#### Properties
We replace the centralized server with a blockchain. This decentralized ledger will store the aggregated model and the updates from the clients in a decentralized manner, enhancing the reliability of the system.

Exercise 1 slides

#### Limitations

### 4. Off-chain Data

The blockchain based Federated Learning solution is not enough to solve all the problems. This design is not scalable, as the blockchain is not designed to store large amounts of data.
We need to store the data off-chain, while still ensuring that the data is tamper-proof and can be trusted.

#### Properties

We will use a decentralized storage system to store the data off-chain. This system will be used to store the data in a tamper-proof way, while still allowing the clients to access the data.

In [12]:
distributed_storage = DistributedStorage()
distributed_storage.info()

'Distributed Storage (scalable, unavailable)'

This system is unavailable, as we cannot guarantee that the data stored is kept by a server that is alive.
This is problematic when combined with a blockchain, because clients may never receive the new model they need to train.
Try to see what happens when you use a blockchain based Federated Learning system with an off-chain data storage system.

In [13]:
external_storage_fl = FederatedLearning("External Storage Federated Learning", {
        "Aggregation": empty,
        "Models Storage": empty
    })
external_storage_fl.display()

TypeError: All components should be instances of Component

We can provide the missing properties using availability and integrity proofs (PoA&I). These proofs are stored on the blockchain.

In [57]:
poai = Component("POAI", ["data-available"]) # POAI -> Certificate

Now, you need to combine these proofs, the off-chain data storage system, and the blockchain based Federated Learning system to build a complete system that is reliable, scalable, and tamper-proof.

In [58]:
poai_blockchain = Blockchain("POAI Blockchain", {
        "Consensus": empty, # leave empty
        "Transactions": poai
    })
poai_blockchain.display()

poai_fl = FederatedLearning("POAI Federated Learning", {
        "Aggregation": empty,
        "Models Storage": empty
    })
poai_fl.display()

POAI Blockchain
Properties: data-available, trusted
Components:
Consensus: Empty ()
Transactions: POAI (data-available)

POAI Federated Learning
Properties: privacy-preserving
Components:
Aggregation: Empty ()
Models Storage: Empty ()



#### Implementation

Let's have a look at how there proofs are implemented in practice. They certify that enough correct servers are storing the data and that the data is not tampered with. 

We split the implementations in two parts: retreive_data, when a server gets the data from the off-chain storage system, and make_poai, when a server creates the proof of availability and integrity. f is a bound on the number of server that can crash, so if we receive a signature from at least f+1 servers, we can be sure that the data is available and not tampered with...

In [59]:
def retreive_data(data, hash, sign, send):
    return "TODO"

def make_poai(data, hash, receive, f):
    return "TODO"

### 5. Final Solution