Decentralized Federated Policy Gradient with Byzantine Fault-Tolerance and Provably Fast Convergence

This repository contains the code for reproducing the experimental results and simulations as presented in the paper:

Philip Jordan, Florian Grötschla, Flint Xiaofeng Fan and Roger Wattenhofer. "Decentralized Federated Policy Gradient with Byzantine Fault-Tolerance and Provably Fast Convergence." 2024.

TLDR

Our paper introduces a decentralized Byzantine fault-tolerant federated reinforcement learning method: Each agent is acting in its local environment, and, without exchanging raw trajectories (but only policies and policy gradients), agents aim to collaboratively learn a common task. The merit of our method is in its provable convergence (and competitive sample complexity bounds) despite a fraction of agents sending Byzantine updates, and without relying on a central trusted server.

Methods

Centralized Byzantine-Tolerant Federated Policy Gradient

ByzPG, the first method presented in our paper, serves as a warm-up for combining Byzantine filtering/aggregation in the context of variance-reduced policy gradient.

Decentralized Byzantine-Tolerant Federated Policy Gradient

The DecByzPG algorithm, our main contribution, is a decentralized variant of ByzPG. It combines the aggregation methods from ByzPG with a decentralized communication and agreement protocol that enables Byzantine fault-tolerance despite the lack of a central trusted entity. For details of our algorithm, we refer to the paper.

Experiments

Our experiments aim to demonstrate

speed-up when increasing the number of agents, and
resilience against various Byzantine attacks.

We apply our algorithms to the CartPole-v1 and LunarLander-v2 environments from the OpenAI Gym toolkit.

Setup and Run Instructions

The setup has been tested for Python 3.8.16. To install the necessary dependencies, run

pip install -r requirements.txt

Then, running run_experiments.sh reproduces all experiments contained in the paper with the same choice of seeds. As described in the paper, running a single experiment terminates within few hours on a consumer-grade CPU. Running all experiments sequentially however might take a long time. We therefore recommend running as many experiments as possible in parallel on a cluster.

Note that the results of the experiments will be stored in an experiments directory. Each experiment will be associated with a unique ID obtained from the set of chosen hyperparameters. Results (in form of the history of obtained rewards during training) of an experiment with ID x and seed i can be found in experiments/x/seed_i/rew_hist.npy, and the respective choice of parameters is contained in experiments/x/config.json.

Plots

Below, K denotes the number of agents in the system among which an 𝛼-fraction is Byzantine.

`DecByzPG` without Byzantine Agents

Speed-up with increasing number of agents in both in both CartPole and LunarLander, as suggested by our theoretical sample complexity results.

`DecByzPG` under Attack (CartPole)

Comparing DecByzPG for CartPole under above attacks to (a) PAGE-PG, the SOTA (in terms of sample complexity) single-agent PG method that DecByzPG reduces to for K=1, and (b) Dec-PAGE-PG, a naive decentralized (but not fault-tolerant) version of PAGE-PG where aggregation of gradients is done by averaging, and no agreement mechanism is used.

`DecByzPG` without Byzantine Agents (LunarLander)

Equivalent to above, but for LunarLander. In both environments, we can observe that DecByzPG performs nearly on par with the unattacked Dec-PAGE-PG. In all cases DecByzPG with K=13 and 𝛼 > 0 outperforms PAGE-PG with K = 1 (and 𝛼 = 0), meaning that despite the presence of Byzantines, joining the federation is empirical beneficial for faster convergence.

For plots of ByzPG we refer to Appendix E of the paper.

Cite

To cite our work, please use the following BibTeX entry:

@article{jordan2024decentralized,
    title={Decentralized Federated Policy Gradient with Byzantine Fault-Tolerance and Provably Fast Convergence}, 
    author={Philip Jordan and Florian Grötschla and Flint Xiaofeng Fan and Roger Wattenhofer},
    url={https://arxiv.org/abs/2401.03489},
    publisher={arXiv},
    year={2024},
}

References

The code contained in this repository includes parts that are built on top of the following implementations (and respective publications):

[https://github.com/epfml/byzantine-robust-noniid-optimizer] Karimireddy, Sai Praneeth, Lie He, and Martin Jaggi. "Byzantine-robust learning on heterogeneous datasets via bucketing." arXiv preprint arXiv:2006.09365 (2020).
[https://github.com/flint-xf-fan/Byzantine-Federeated-RL] Fan, Xiaofeng, et al. "Fault-tolerant federated reinforcement learning with theoretical guarantee." Advances in Neural Information Processing Systems 34 (2021): 1007-1021.

Some auxiliary methods are borrowed directly from existing code. Whenever this is the case, we have added a comment referencing its source.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
plots		plots
README.md		README.md
agent.py		agent.py
aggregation.py		aggregation.py
agreement.py		agreement.py
attacks.py		attacks.py
environments.py		environments.py
models.py		models.py
options.py		options.py
requirements.txt		requirements.txt
run.py		run.py
run_experiments.sh		run_experiments.sh
utils.py		utils.py
worker.py		worker.py

philip-jordan/decentralized-byzantine-RL

Folders and files

Latest commit

History

Repository files navigation

Decentralized Federated Policy Gradient with Byzantine Fault-Tolerance and Provably Fast Convergence

Table of Contents

TLDR

Methods

Centralized Byzantine-Tolerant Federated Policy Gradient

Decentralized Byzantine-Tolerant Federated Policy Gradient

Experiments

Setup and Run Instructions

Plots

DecByzPG without Byzantine Agents

DecByzPG under Attack (CartPole)

DecByzPG without Byzantine Agents (LunarLander)

Cite

References

About

Topics

Resources

Stars

Watchers

Forks

Languages

`DecByzPG` without Byzantine Agents

`DecByzPG` under Attack (CartPole)

`DecByzPG` without Byzantine Agents (LunarLander)