# FLSim Tutorial: Image classification with CIFAR-10

## Introduction

In this tutorial, we will train a simple CNN image classifier on CIFAR-10 with federated learning using FLSim.

### Prerequisites

To get the most of this tutorial, you should be comfortable training machine learning models with **PyTorch** and familiar with the concept of **federated learning (FL)**. If you are unfamimiliar with either of them or could use a refresher, please take a look at the following resources before proceeding with the tutorial:

- McMahan & Ramage (2017): [Federated Learning: Collaborative Machine Learning without Centralized Training Data](https://ai.googleblog.com/2017/04/federated-learning-collaborative.html). A short blog post from Google AI introducing the main idea of FL in a beginner-friendly way.
- McMahan et al. (2017): [Communication-Efficient Learning of Deep Networks from Decentralized Data](https://arxiv.org/pdf/1602.05629.pdf). This paper first proposes the approach of federated learning. The described algorithm is now known as federated averaging (or FedAvg for short).
- PyTorch has [extensive tutorials](https://pytorch.org/tutorials/) on their website. In particular, take a look at their [image classification tutorial using CIFAR-10](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html).

Now that you're familiar with PyTorch and FL, let's move on!

### Objectives 

In this tutorial, you will learn how to 

1. Build a data pipeline for federated learning with FLSim,
2. Create an image classification model compatible with FL training,
3. Set hyperparameters for FL training, and
4. Launch an FL training flow using FLSim.

## Training an image classifier with FLSim

### 0. About the dataset

For this tutorial, we will use the [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html). The CIFAR-10 dataset consists of 60k 3-channel color images with 32x32 pixels from 10 classes, with 6k images per class. 
There are 50k training images (5k training images per class) and 10k test images (1k test images per class).
The classes are ‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, and ‘truck’.

![img](https://pytorch.org/tutorials/_images/cifar10.png)

We can get the CIFAR-10 dataset from `torchvision.datasets`.


In [1]:
from torchvision.datasets.cifar import CIFAR10

### 1. Data pipeline

First, let's define how to build the data pipeline for federated learning. 
Recall that in FL, we have multiple client devices, each with their own data.
This means that in addition to non-FL data processing, we need to split the dataset into multiple smaller dataset, which will each represent a client device's data.

1. First, let us create data transforms and training, eval, and test datasets. This step is identical to preparing data in non-federated learning.

In [2]:
from torchvision import transforms


IMAGE_SIZE = 32


# 1. Create training, eval, and test datasets like in non-federated learning.
transform = transforms.Compose(
    [
        transforms.Resize(IMAGE_SIZE),
        transforms.CenterCrop(IMAGE_SIZE),
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
    ]
)
train_dataset = CIFAR10(
    root="./cifar10", train=True, download=True, transform=transform
)
test_dataset = CIFAR10(
    root="./cifar10", train=False, download=True, transform=transform
)


Files already downloaded and verified


Files already downloaded and verified


There are a few extra steps to enable training with federated learning. In particular, we need to

2. Create a sharder, which defines a mapping from examples in the training data to clients. 
In other words, **a sharder groups rows of data into client datasets**, returning a list of list of examples. 
FLSim provides a number of sharding strategies such as random or column-based sharding. 
In this tutorial, we use sequential sharding, which assigns the first `SAMPLES_PER_CLIENT` rows to client 0, the second `SAMPLES_PER_CLIENT` rows to client 1, etc. 

3. Create a data loader, which will shard and batchify training, eval, and test data. 
For each dataset, the data loader first assigns rows to clients using the sharder and then splits each client's data into batches of size `batch_size`. 
We choose not to drop the last batch.

4. Lastly, wrap the data loader with a data provider and return it. 
The data provider creates clients from the groupings in the data loader and adds metadata (e.g. number of examples, number of batches per client). 
Our data is now formatted such that the trainer will accept it.

Note that the concept of a client or device only applies to the training data, the eval and test data are identical to non-federated learning.

In [3]:
from flsim.baselines.data_providers import FLVisionDataLoader, LEAFDataProvider
from flsim.data.data_sharder import FLDataSharder, ShardingStrategyType


SAMPLES_PER_CLIENT = 5000


# 2. Create a sharder, which maps samples in the training data to clients.
sharder = FLDataSharder(
    ShardingStrategyType.SEQUENTIAL, shard_size_for_sequential=SAMPLES_PER_CLIENT
)

# 3. Shard and batchify training, eval, and test data.
fl_data_loader = FLVisionDataLoader(
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    test_dataset=test_dataset,
    sharder=sharder,
    batch_size=32,
    drop_last=False,
)

# 4. Wrap the data loader with a data provider.
data_provider = LEAFDataProvider(fl_data_loader)
print(f"Clients in total: {data_provider.num_users()}")


Creating FL User: 0user [00:00, ?user/s]Creating FL User: 1user [00:45, 45.73s/user]Creating FL User: 2user [00:45, 18.93s/user]Creating FL User: 3user [00:46, 10.37s/user]Creating FL User: 4user [00:46,  6.34s/user]Creating FL User: 5user [00:46,  4.12s/user]Creating FL User: 6user [00:46,  2.77s/user]Creating FL User: 7user [00:46,  1.92s/user]Creating FL User: 8user [00:46,  1.36s/user]Creating FL User: 9user [00:47,  1.01user/s]Creating FL User: 10user [00:47,  1.37user/s]Creating FL User: 10user [00:47,  4.73s/user]
Creating FL User: 0user [00:00, ?user/s]Creating FL User: 1user [00:08,  8.58s/user]Creating FL User: 2user [00:08,  3.62s/user]Creating FL User: 2user [00:08,  4.37s/user]
Creating FL User: 0user [00:00, ?user/s]Creating FL User: 1user [00:08,  8.29s/user]Creating FL User: 2user [00:08,  3.50s/user]Creating FL User: 2user [00:08,  4.22s/user]

Clients in total: 10





### 2. Create the model

Now, let's see how we can create a model that is compatible with FL-training.

1. First, we define a standard, non-FL image classification pytorch `nn.Module`; in this tutorial we use a simple CNN.

In [4]:
from flsim.baselines.models.cnn import SimpleConvNet

# 1. Define our model, a simple CNN.
model = SimpleConvNet(in_channels=3, num_classes=10)
model

SimpleConvNet(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2))
  (conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2))
  (conv3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2))
  (conv4): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2))
  (bn_relu): Sequential(
    (0): GroupNorm(32, 32, eps=1e-05, affine=True)
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (dropout): Dropout(p=0, inplace=False)
  (fc): Linear(in_features=288, out_features=10, bias=True)
)

After we have our standard PyTorch model, we can

2. Create a `torch.device` and choose where the model will be allocated (CUDA or CPU).

3. Wrap the pytorch module with the FLSim `FLModel`. `FLModel` is accepted by the trainer and handles moving our model, data, and predictions to GPU if desired. It also collects and returns metrics for each batch it predicts on. You can find its implementation [here](https://github.com/facebookresearch/FLSim/blob/main/baselines/models/cv_model.py)

4. Move the model to GPU and enable CUDA if desired.

The model now supports FL training!

In [5]:
import torch
from flsim.baselines.models.cv_model import FLModel


USE_CUDA = True

# 2. Choose where the model will be allocated.
cuda_enabled = torch.cuda.is_available() and USE_CUDA
device = torch.device(f"cuda:{0}" if cuda_enabled else "cpu")

# 3. Wrap the model in FLModel.
global_model = FLModel(model, device)

# 4. Enable CUDA if desired.
if cuda_enabled:
    global_model.fl_cuda()


### 3. Hyperparameters

We can represent the hyperparameters for FL training in a JSON config. In particular, we specify a FedAvg implementation with 10 clients participating in each round.

This config is passed to the FL trainer.

In [6]:
json_config = {
    "trainer": {
        "_base_": "base_sync_trainer",
        # there are different types of aggegator
        # fed avg doesn't require lr, while others such as fed_sgd fed_adam do
        "aggregator": {"_base_": "base_fed_avg_sync_aggregator"},
        "client": {
            # number of client's local epochs
            "epochs": 1,
            "optimizer": {
                "_base_": "base_optimizer_sgd",
                # client's local learning rate
                "lr": 0.01,
                # client's local momentum
                "momentum": 0.9,
            },
        },
        # type of user selection sampling
        "active_user_selector": {"_base_": "base_sequential_active_user_selector"},
        # number of users per round for aggregation
        "users_per_round": 5,
        # total number of global epochs
        # total #rounds = ceil(total_users / users_per_round) * epochs
        "epochs": 1,
        # frequency of reporting train metrics
        "train_metrics_reported_per_epoch": 10,
        # frequency of evaluation per epoch
        "eval_epoch_frequency": 1,
        "do_eval": True,
        # should we report train metrics after global aggregation
        "report_train_metrics_after_aggregation": True,
    }
}


Even though we recommend a JSON config for ease of representation, FLSim is compatible with the Hydra config system and can work with YAML configs just like any other [PyTorch Lightning](https://www.pytorchlightning.ai/) project. Here, we convert the JSON config to OmegaConf via Hydra for consumption by FLSim. 

In [7]:
import flsim.configs
from flsim.utils.config_utils import fl_config_from_json
from omegaconf import OmegaConf


cfg = fl_config_from_json(json_config)
print(OmegaConf.to_yaml(cfg))


trainer:
  _target_: flsim.trainers.sync_trainer.SyncTrainer
  _recursive_: false
  epochs: 1.0
  do_eval: true
  always_keep_trained_model: false
  timeout_simulator:
    _target_: ???
    _recursive_: false
  train_metrics_reported_per_epoch: 10
  eval_epoch_frequency: 1.0
  active_user_selector:
    _target_: flsim.active_user_selectors.simple_user_selector.SequentialActiveUserSelector
    _recursive_: false
    user_selector_seed: null
  report_train_metrics: true
  report_train_metrics_after_aggregation: true
  use_train_clients_for_aggregation_metrics: true
  client:
    _target_: flsim.clients.base_client.Client
    _recursive_: false
    epochs: 1
    optimizer:
      _target_: flsim.optimizers.local_optimizers.LocalOptimizerSGD
      _recursive_: false
      lr: 0.01
      momentum: 0.9
      weight_decay: 0.0
    lr_scheduler:
      _target_: ???
      _recursive_: false
      base_lr: 0.001
    max_clip_norm_normalized: null
    only_federated_params: true
    random_seed: n

### 4. Training
Recall that we already built the data provider and created a model compatible with FL training. 
Now, to launch the FL training flow we only need to take a few more steps:

1. First, we need to create a metric reporter, which will collect, evaluate, and report relevent training, aggretaion, and evaluation/test metrics.
You can find its implementation [here](https://github.com/facebookresearch/FLSim/blob/main/tutorials/metrics_reporter/fl_metrics_reporter.py).

2. We also need to instantiate the trainer with the model and hyperparameter config we defined earlier.

In [8]:
from flsim.interfaces.metrics_reporter import Channel
from flsim.tutorials.metrics_reporter.fl_metrics_reporter import MetricsReporter
from hydra.utils import instantiate


# 1. Create a metric reporter.
metrics_reporter = MetricsReporter([Channel.TENSORBOARD, Channel.STDOUT])

# 2. Instantiate the trainer.
trainer_config = cfg.trainer
trainer = instantiate(trainer_config, model=global_model, cuda_enabled=cuda_enabled)


Finally, we're ready to run FL training given the above JSON config. We can utilize `eval_score` to store the evaluation metrics.

In [9]:
# Launch FL training.
final_model, eval_score = trainer.train(
    data_provider=data_provider,
    metric_reporter=metrics_reporter,
    num_total_users=data_provider.num_users(),
    distributed_world_size=1,
)


Epoch:   0%|          | 0/1 [00:00<?, ?epoch/s]Round:   0%|          | 0/2 [00:00<?, ?round/s]

Train finished Global Round: 1
(epoch = 1, round = 1, global round = 1), Loss/Training: 1.929538930905093


(epoch = 1, round = 1, global round = 1), Accuracy/Training: 30.492
(epoch = 1, round = 1, global round = 1), round_to_target/Training: 10000000000.0
reporting (epoch = 1, round = 1, global round = 1) for aggregation


(epoch = 1, round = 1, global round = 1), Loss/Aggregation: 1.620859497519815


Round:  50%|█████     | 1/2 [02:51<02:51, 171.38s/round]

(epoch = 1, round = 1, global round = 1), Accuracy/Aggregation: 43.06
(epoch = 1, round = 1, global round = 1), round_to_target/Aggregation: 10000000000.0


Train finished Global Round: 2
(epoch = 1, round = 2, global round = 2), Loss/Training: 1.6210210218551053


(epoch = 1, round = 2, global round = 2), Accuracy/Training: 41.252
(epoch = 1, round = 2, global round = 2), round_to_target/Training: 10000000000.0
reporting (epoch = 1, round = 2, global round = 2) for aggregation


(epoch = 1, round = 2, global round = 2), Loss/Aggregation: 1.3944371010847152


(epoch = 1, round = 2, global round = 2), Accuracy/Aggregation: 50.04
(epoch = 1, round = 2, global round = 2), round_to_target/Aggregation: 10000000000.0
Running (epoch = 1, round = 2, global round = 2) for Eval


(epoch = 1, round = 2, global round = 2), Loss/Eval: 1.408292086450917
(epoch = 1, round = 2, global round = 2), Accuracy/Eval: 48.74
(epoch = 1, round = 2, global round = 2), round_to_target/Eval: 10000000000.0


Round:  50%|█████     | 1/2 [07:41<07:41, 461.52s/round]
Epoch:   0%|          | 0/1 [07:41<?, ?epoch/s]


After training finishes, we evaluate the model and report the test set accuracy before concluding this tutorial.

In [10]:
# We can now test our model.
trainer.test(
    data_iter=data_provider.test_data(),
    metric_reporter=MetricsReporter([Channel.STDOUT]),
)


Running (epoch = 1, round = 1, global round = 1) for Test


(epoch = 1, round = 1, global round = 1), Loss/Test: 1.408292086450917
(epoch = 1, round = 1, global round = 1), Accuracy/Test: 48.74
(epoch = 1, round = 1, global round = 1), round_to_target/Test: 10000000000.0


{'Accuracy': 48.74, 'round_to_target': 10000000000.0}

## Summary

In this tutorial, we first showed how to get the CIFAR-10 dataset. 
We then built a data provider by sharding the data to simulate multiple client devices, each with their own data, and splitting each client's data into batches. 
We defined a simple CNN as our model, wrapped it with a model compatible with FL training, and moved it to GPU. 
Lastly, we set the hyperparameters for FL training, launched the training flow, and evaluated our model.

### Additional resources
- [FLSim tutorials](https://github.com/facebookresearch/FLSim/tree/main/tutorials) - check out our other tutorial on sentiment classification.
- Kairouz et al. (2021): [Advances and Open Problems in Federated Learning](https://arxiv.org/pdf/1912.04977.pdf). As the title suggests, an in-depth overview of advances and open problems in FL.
- If you're interested in federated learning with **differential privacy**, take a look at [Opacus](https://opacus.ai/), a library that enables training PyTorch models with differential privacy. 
You can find a blog post introducing Opacus [here](https://ai.facebook.com/blog/introducing-opacus-a-high-speed-library-for-training-pytorch-models-with-differential-privacy/).

