# Chapter 6: Federated Learning Simulations

| Chapter  | Colab   | Kaggle          | Gradient      | Studio Lab             | Binder             |
|:---------|:--------|:----------------|:--------------|:-----------------------|:-------------------|
| [Chapter 6: Federated Learning Simulations](6_more_state_of_the_art_research_questions/Chapter_6_Federated_Learning_Simulations.ipynb)               | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/matthew-mcateer/practicing_trustworthy_machine_learning/blob/main/6_more_state_of_the_art_research_questions/Chapter_6_Federated_Learning_Simulations.ipynb)          | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/matthew-mcateer/practicing_trustworthy_machine_learning/blob/main/6_more_state_of_the_art_research_questions/Chapter_6_Federated_Learning_Simulations.ipynb)          | [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/matthew-mcateer/practicing_trustworthy_machine_learning/blob/main/6_more_state_of_the_art_research_questions/Chapter_6_Federated_Learning_Simulations.ipynb)          | [![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/matthew-mcateer/practicing_trustworthy_machine_learning/blob/main/6_more_state_of_the_art_research_questions/Chapter_6_Federated_Learning_Simulations.ipynb)          | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/matthew-mcateer/practicing_trustworthy_machine_learning/HEAD?urlpath=https%3A%2F%2Fgithub.com%2Fmatthew-mcateer%2Fpracticing_trustworthy_machine_learning%2Fblob%2Fmain%2F6_more_state_of_the_art_research_questions%2FChapter_6_Federated_Learning_Simulations.ipynb)         |


<!--
Originally found on GitHub at https://github.com/matthew-mcateer/practicing_trustworthy_machine_learning/blob/main/6_more_state_of_the_art_research_questions/Chapter_6_Federated_Learning_Simulations.ipynb
-->


## FLUTE: Federated Learning Utilities and Tools for Experimentation

Federated learning (FL) provides privacy and accountability benefits to machine learning pipelines.
However, like any distributed system, there come additional engineering challenges.
One thing that can make federated learning easier is being able to simulate your FL approach ahead of time.

One such tool for this is Microsoft's FLUTE tool, which lets you use a multi-GPU environment to simulate federated learning algorithms that have been spread out among many different devices.

In [None]:
# Recommend checking that you have multiple indexable GPUs for these simulations
!nvidia-smi

## Installation

The FLUTE requirements are listed in the `requirements.txt`. Ideally this installation should be done inside of a virtual environment or docker container

FLUTE uses [`torch.distributed API`](https://pytorch.org/docs/stable/distributed.html) as its main communication backbone, supporting three built-in backends.
The authors recommend using the NCCL backend for distributed GPU training and Gloo for distributed CPU training.

FLUTE is not available as a package from sources like `conda` or `pip`. This is partly because the authors intend for experiments and prototypes to be run from the root of the FLUTE repo directly.




In [None]:
!git clone https://github.com/microsoft/msrflute.git
%cd msrflute
!pip install -r requirements.txt

Once the initial setup is complete, you can add your own dataset to the local repo to launch a local run.

### Example datasets

In [None]:
%cd testing
!python create_data.py --task nlg_gru
!python create_data.py --task mlm_bert
!python create_data.py --task classif_cnn
!python create_data.py --task ecg_cnn
%cd ..

### Example task #1: `nlg_gru`

After this initial setup you can use your data for launching a local run. However the following instructions will be adapted to run `nlg_gru` task.

This task involves training a GRU model on the preprocessed reddit dataset by [LEAF: A Benchmark for Federated Settings](https://arxiv.org/abs/1812.01097)

For running this example, you need to first download and preprocess the data. Instructions can be found [here](https://github.com/microsoft/msrflute/tree/main/testing).

Once the data is available you can run FLUTE from root as follows (using the GPU-only `NCCL` backend)

In [None]:
!python -m torch.distributed.run \
    --nproc_per_node=4 e2e_trainer.py \
        -dataPath ./testing \
        -outputPath scratch \
        -config testing/hello_world_nlg_gru.yaml \
        -task nlg_gru \
        -backend nccl

### Example task #2: `mlm_bert`

This experiment trains a BERT model using Federated learning.

Like the previous example, it uses the preprocessed reddit dataset by [LEAF: A Benchmark for Federated Settings](https://arxiv.org/abs/1812.01097)

For running this example, you need to first download and preprocess the data. Instructions can be found [here](https://github.com/microsoft/msrflute/tree/main/testing).

Once the data is available you can run FLUTE from root as follows (using the GPU-only `NCCL` backend):


In [None]:
!python -m torch.distributed.run \
    --nproc_per_node=4 e2e_trainer.py \
        -dataPath ./testing \
        -outputPath scratch \
        -config testing/hello_world_mlm_bert.yaml \
        -task mlm_bert \
        -backend nccl

### Example task #3: `classif_cnn`

For running this example, you need to first download and preprocess the data. Instructions can be found [here](https://github.com/microsoft/msrflute/tree/main/testing).

This particular example trains on a divided-up version of the classic [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset for image classification.

Once the data is available you can run FLUTE from root as follows (using the CPU-only `Gloo` backend):


In [None]:
!python -m torch.distributed.run \
    --nproc_per_node=4 e2e_trainer.py \
        -dataPath ./testing \
        -outputPath scratch \
        -config testing/hello_world_classif_cnn.yaml \
        -task classif_cnn \
        -backend nccl

### Example task #4: `ecg_cnn`

For running this example, you need to first download and preprocess the data. Instructions can be found [here](https://github.com/microsoft/msrflute/tree/main/testing).

Once the data is available you can run FLUTE from root as follows (using the CPU-only `Gloo` backend):

In [None]:
!python -m torch.distributed.run \
    --nproc_per_node=4 e2e_trainer.py \
        -dataPath ./testing \
        -outputPath scratch \
        -config testing/hello_world_ecg_cnn.yaml \
        -task ecg_cnn \
        -backend gloo

# References

- [FLUTE Overview — FLUTE  documentation](https://microsoft.github.io/msrflute/overview.html)
- [Welcome to FLUTE documentation! — FLUTE  documentation](https://microsoft.github.io/msrflute/)
- [microsoft/msrflute: Federated Learning Utilities and Tools for Experimentation](https://github.com/microsoft/msrflute)
- [Project FLUTE - Microsoft Research](https://www.microsoft.com/en-us/research/project/project-flute/)
- [Caldas, S., Duddu, S. M. K., Wu, P., Li, T., Konečný, J., McMahan, H. B., ... & Talwalkar, A. (2018). Leaf: A benchmark for federated settings. arXiv preprint arXiv:1812.01097.](https://arxiv.org/abs/1812.01097)

