# FLSim Tutorial: Sentiment Classification with LEAF's Sent140



## Introduction

In this tutorial, we will train a binary sentiment classifier on LEAF's Sent140 dataset with federated learning using FLSim. 


### Prerequisites

To get the most of this tutorial, you should be comfortable training machine learning models with **PyTorch** and familiar with the concept of **federated learning (FL)**. If you are unfamimiliar with either of them or could use a refresher, please take a look at the following resources before proceeding with the tutorial:

- McMahan & Ramage (2017): [Federated Learning: Collaborative Machine Learning without Centralized Training Data](https://ai.googleblog.com/2017/04/federated-learning-collaborative.html). A short blog post from Google AI introducing the main idea of FL in a beginner-friendly way.
- McMahan et al. (2017): [Communication-Efficient Learning of Deep Networks from Decentralized Data](https://arxiv.org/pdf/1602.05629.pdf). This paper first proposes the approach of federated learning. The described algorithm is now known as federated averaging (or FedAvg for short).
- PyTorch has [extensive tutorials](https://pytorch.org/tutorials/) on their website.
- If you're new to **sentiment classification**, you can find Pang and Lee's survey on the topic [here](https://www.cs.cornell.edu/home/llee/omsa/omsa-published.pdf). 

Now that you're familiar with PyTorch and FL and have a sense of sentiment classification, let's move on!

### Objectives 

By the end of this tutorial, we will have learnt how to

1. Build a data pipeline for federated learning with FLSim,
2. Create an image classification model compatible with FL training,
3. Set hyperparameters for FL training, 
4. Create a metrics reporter to collect metrics, and
5. Launch an FL training flow using FLSim.

## Training a sentiment classifier with FLSim

### Prerequisite
First, let's install flsim via pip with the command below.

In [1]:
!pip install --quiet flsim

[?25l[K     |█                               | 10 kB 17.7 MB/s eta 0:00:01[K     |██▏                             | 20 kB 12.7 MB/s eta 0:00:01[K     |███▎                            | 30 kB 9.2 MB/s eta 0:00:01[K     |████▎                           | 40 kB 8.2 MB/s eta 0:00:01[K     |█████▍                          | 51 kB 5.5 MB/s eta 0:00:01[K     |██████▌                         | 61 kB 6.0 MB/s eta 0:00:01[K     |███████▌                        | 71 kB 5.7 MB/s eta 0:00:01[K     |████████▋                       | 81 kB 6.4 MB/s eta 0:00:01[K     |█████████▊                      | 92 kB 5.0 MB/s eta 0:00:01[K     |██████████▊                     | 102 kB 5.5 MB/s eta 0:00:01[K     |███████████▉                    | 112 kB 5.5 MB/s eta 0:00:01[K     |█████████████                   | 122 kB 5.5 MB/s eta 0:00:01[K     |██████████████                  | 133 kB 5.5 MB/s eta 0:00:01[K     |███████████████                 | 143 kB 5.5 MB/s eta 0:00:01[K   

### 0. About the dataset

For this tutorial, we're using [LEAF's](https://leaf.cmu.edu/) [Sentiment140 (Sent140) dataset](https://leaf.cmu.edu/build/html/tutorials/sent140-md.html), which consists of 1.6 million tweets by 660k users. Note that the mean number of samples per user is 2.42 and the standard deviation is 4.71.

![Sent140 distribution of samples across users](https://leaf.cmu.edu/webpage/images/twitter_hist.png)

Before the next step in this tutorial, you need to download the dataset and partition the data by users. 
We've included a script, `get_data.sh`, which will download and preproces the data for you. 
In particular, we sample 1% of the entire dataset in a non-IID manner and
partition 90% of sampled users into train and 10% of sampled users into test (as opposed to individual samples).
We require all users to have at least one sample.

For more information on the various preprocessing options, see [here](https://github.com/TalwalkarLab/leaf/tree/master/data/sent140). You can find the LEAF paper [here](https://arxiv.org/pdf/1812.01097.pdf).


In [27]:
import os
# Clone the LEAF repo
!git clone https://github.com/JohnlNguyen/leaf.git
# cd into sent140 directory
os.chdir("leaf/data/sent140")
# preprocess the data into user splits
!sh ./preprocess.sh --sf 0.01 -s niid -t 'user' --tf 0.90 -k 1 --spltseed 1

Cloning into 'leaf'...
remote: Enumerating objects: 763, done.[K
remote: Counting objects: 100% (20/20), done.[K
remote: Compressing objects: 100% (12/12), done.[K
remote: Total 763 (delta 12), reused 16 (delta 8), pack-reused 743[K
Receiving objects: 100% (763/763), 6.78 MiB | 20.79 MiB/s, done.
Resolving deltas: 100% (358/358), done.
------------------------------
retrieving raw data
URL transformed to HTTPS due to an HSTS policy
--2021-11-23 22:06:04--  https://cs.stanford.edu/people/alecmgo/trainingandtestdata.zip
Resolving cs.stanford.edu (cs.stanford.edu)... 171.64.64.64
Connecting to cs.stanford.edu (cs.stanford.edu)|171.64.64.64|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 81363704 (78M) [application/zip]
Saving to: ‘trainingandtestdata.zip’


2021-11-23 22:06:06 (46.4 MB/s) - ‘trainingandtestdata.zip’ saved [81363704/81363704]

Archive:  trainingandtestdata.zip
  inflating: testdata.manual.2009.06.14.csv  
  inflating: training.1600000.processe

We can find the preprocessed training and test data here:

In [31]:
!ls data/train; ls data/test

all_data_0_01_keep_1_train_9.json
all_data_0_01_keep_1_test_9.json


Note: if you use different preprocessing options, you will need to change these!

In [32]:
TRAIN_DATA = "data/train/all_data_0_01_keep_1_train_9.json"
TEST_DATA = "data/test/all_data_0_01_keep_1_test_9.json"

We can now get a rough idea of the structure of the training data:

In [33]:
import json


with open(TRAIN_DATA, "r") as f:
    training_data = json.load(f)

    # get overall structure of the data
    for key, val in training_data.items():
        print(key, type(val), len(val))


users <class 'list'> 5765
num_samples <class 'list'> 5765
user_data <class 'dict'> 5765


We can compute the minimum, maximum, and mean number of samples per user:

In [37]:
import numpy as np

print(f"Min # samples per user: {min(training_data['num_samples'])}")
print(f"Max # samples per user: {max(training_data['num_samples'])}")
print(f"Mean # samples per user: {np.mean(training_data['num_samples']):.2f}")

Min # samples per user: 1
Max # samples per user: 211
Mean # samples per user: 2.53


Let us also look at the data for an example user:

In [38]:
EXAMPLE_USER = training_data["users"][0]
training_data["user_data"][EXAMPLE_USER]

{'x': [['2005866144',
   'Tue Jun 02 10:18:00 PDT 2009',
   'NO_QUERY',
   'eriinL',
   'has to go to knightdale next year. ',
   'training'],
  ['2015140062',
   'Wed Jun 03 03:49:28 PDT 2009',
   'NO_QUERY',
   'eriinL',
   'Theatre exam ',
   'training']],
 'y': [0, 1]}

### 1. Data pipeline

Now, let us define how to build the data pipeline for federated learning:

1. To load the training and test data, we define a new dataset class, `Sent140Dataset`, which converts each user's tweets (features) into a `torch.Tensor`, discarding tweet metadata such as time, and stores each tweet's sentiment (label) as well.



In [39]:
import itertools
import re
import string
import unicodedata

import torch
from torch.utils.data import Dataset


# 1. Sent140Dataset will store the tweets and corresponding sentiment for each user.


class Sent140Dataset(Dataset):
    def __init__(self, data_root, max_seq_len):
        self.data_root = data_root
        self.max_seq_len = max_seq_len
        self.all_letters = {c: i for i, c in enumerate(string.printable)}
        self.num_letters = len(self.all_letters)
        self.UNK = self.num_letters

        with open(data_root, "r+") as f:
            self.dataset = json.load(f)

        self.data = {}
        self.targets = {}

        self.num_classes = 2  # binary sentiment classification

        # Populate self.data and self.targets
        for user_id, user_data in self.dataset["user_data"].items():
            self.data[user_id] = self.process_x(list(user_data["x"]))
            self.targets[user_id] = self.process_y(list(user_data["y"]))

    def __len__(self):
        return len(self.data)

    def __iter__(self):
        for user_id in self.data.keys():
            yield self.__getitem__(user_id)

    def __getitem__(self, user_id: str):
        if user_id not in self.data or user_id not in self.targets:
            raise IndexError(f"User {user_id} is not in dataset")

        return self.data[user_id], self.targets[user_id]

    def unicodeToAscii(self, s):
        return "".join(
            c
            for c in unicodedata.normalize("NFD", s)
            if unicodedata.category(c) != "Mn" and c in self.all_letters
        )

    def line_to_indices(self, line: str, max_seq_len: int):
        line_list = self.split_line(line)  # split phrase in words
        line_list = line_list
        chars = self.flatten_list([list(word) for word in line_list])
        indices = [
            self.all_letters.get(letter, self.UNK)
            for i, letter in enumerate(chars)
            if i < max_seq_len
        ]
        # Add padding
        indices = indices + [self.UNK] * (max_seq_len - len(indices))
        return indices

    def process_x(self, raw_x_batch):
        x_batch = [e[4] for e in raw_x_batch]  # e[4] contains the actual tweet
        x_batch = [self.line_to_indices(e, self.max_seq_len) for e in x_batch]
        x_batch = torch.LongTensor(x_batch)
        return x_batch

    def process_y(self, raw_y_batch):
        y_batch = [int(e) for e in raw_y_batch]
        return y_batch

    def split_line(self, line):
        """
        Split given line/phrase into list of words

        Args:
            line: string representing phrase to be split

        Return:
            list of strings, with each string representing a word
        """
        return re.findall(r"[\w']+|[.,!?;]", line)

    def flatten_list(self, nested_list):
        return list(itertools.chain.from_iterable(nested_list))


2. We can now load the train and test dataset.


In [40]:
MAX_SEQ_LEN = 25


# 2. Load the train and test datasets.
train_dataset = Sent140Dataset(
    data_root=TRAIN_DATA,
    max_seq_len=MAX_SEQ_LEN,
)
test_dataset = Sent140Dataset(
    data_root=TEST_DATA,
    max_seq_len=MAX_SEQ_LEN,
)


Recall our `EXAMPLE_USER` from earlier? Their data now looks like this:

In [41]:
train_dataset[EXAMPLE_USER]

(tensor([[ 17,  10,  28,  29,  24,  16,  24,  29,  24,  20,  23,  18,  16,  17,
           29,  13,  10,  21,  14,  23,  14,  33,  29,  34,  14],
         [ 55,  17,  14,  10,  29,  27,  14,  14,  33,  10,  22, 100, 100, 100,
          100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100]]), [0, 1])

To complete our data pipeline, we only need to

3. Create a data loader, which will batchify training, eval, and test data. There is no need to create a sharder since the data is already sharded. For each dataset, the data loader splits each client's data into batches of size `batch_size`. We choose not to drop the last batch.

4. Lastly, wrap the data loader with a data provider and return it. 
The data provider creates clients from the groupings in the data loader and adds metadata (e.g. number of examples, number of batches per client). 
Our data is now formatted such that the trainer will accept it.

In [48]:
import random
from typing import Any, Dict, Generator, Iterable, Iterator, List, Tuple

from flsim.data.data_provider import IFLDataProvider, IFLUserData
from flsim.interfaces.data_loader import IFLDataLoader
from flsim.utils.data.data_utils import batchify
from tqdm import tqdm

class LEAFDataLoader(IFLDataLoader):
    SEED = 2137
    random.seed(SEED)

    def __init__(
        self,
        train_dataset: Dataset,
        eval_dataset: Dataset,
        test_dataset: Dataset,
        batch_size: int,
        drop_last: bool = False,
    ):
        self.train_dataset = train_dataset
        self.eval_dataset = eval_dataset
        self.test_dataset = test_dataset
        self.batch_size = batch_size
        self.drop_last = drop_last

    def fl_train_set(self, **kwargs) -> Iterable[Dict[str, Generator]]:
        yield from self._batchify(self.train_dataset, self.drop_last)

    def fl_eval_set(self, **kwargs) -> Iterable[Dict[str, Generator]]:
        yield from self._batchify(self.eval_dataset, drop_last=False)

    def fl_test_set(self, **kwargs) -> Iterable[Dict[str, Generator]]:
        yield from self._batchify(self.test_dataset, drop_last=False)

    def _batchify(
        self, dataset: Dataset, drop_last=False
    ) -> Generator[Dict[str, Generator], None, None]:
        for one_user_inputs, one_user_labels in dataset:
            data = list(zip(one_user_inputs, one_user_labels))
            random.shuffle(data)
            one_user_inputs, one_user_labels = zip(*data)
            batch = {
                "features": batchify(one_user_inputs, self.batch_size, drop_last),
                "labels": batchify(one_user_labels, self.batch_size, drop_last),
            }
            yield batch


class LEAFUserData(IFLUserData):
    def __init__(self, user_data: Dict[str, Generator]):
        self._user_batches = []
        self._num_batches = 0
        self._num_examples = 0
        for features, labels in zip(user_data["features"], user_data["labels"]):
            self._num_batches += 1
            self._num_examples += LEAFUserData.get_num_examples(labels)
            self._user_batches.append(LEAFUserData.fl_training_batch(features, labels))

    def __iter__(self) -> Iterator[Dict[str, torch.Tensor]]:
        """
        Iterator to return a user batch data
        """
        for batch in self._user_batches:
            yield batch

    def num_examples(self) -> int:
        """
        Returns the number of examples
        """
        return self._num_examples

    def num_batches(self) -> int:
        """
        Returns the number of batches
        """
        return self._num_batches

    @staticmethod
    def get_num_examples(batch: List) -> int:
        return len(batch)

    @staticmethod
    def fl_training_batch(
        features: List[torch.Tensor], labels: List[float]
    ) -> Dict[str, torch.Tensor]:
        return {"features": torch.stack(features), "labels": torch.Tensor(labels)}


class LEAFDataProvider(IFLDataProvider):
    def __init__(self, data_loader):
        self.data_loader = data_loader
        self.train_users = self._create_fl_users(data_loader.fl_train_set())
        self.eval_users = self._create_fl_users(data_loader.fl_eval_set())
        self.test_users = self._create_fl_users(data_loader.fl_test_set())

    def user_ids(self) -> List[int]:
        return list(self.train_users.keys())

    def num_users(self) -> int:
        return len(self.train_users)

    def get_user_data(self, user_index: int) -> IFLUserData:
        if user_index in self.train_users:
            return self.train_users[user_index]
        else:
            raise IndexError(
                f"Index {user_index} is out of bound for list with len {self.num_users()}"
            )

    def train_data(self) -> Iterable[IFLUserData]:
        for user_data in self.train_users.values():
            yield user_data

    def eval_data(self) -> Iterable[Dict[str, torch.Tensor]]:
        for user_data in self.eval_users.values():
            for batch in user_data:
                yield batch

    def test_data(self) -> Iterable[Dict[str, torch.Tensor]]:
        for user_data in self.test_users.values():
            for batch in user_data:
                yield batch

    def _create_fl_users(self, iterator: Iterator) -> Dict[int, IFLUserData]:
        return {
            user_index: LEAFUserData(user_data)
            for user_index, user_data in tqdm(
                enumerate(iterator), desc="Creating FL User", unit="user"
            )
        }

# 3. Batchify training, eval, and test data. Note that train_dataset is already sharded.
dataloader = LEAFDataLoader(
    train_dataset,
    test_dataset,
    test_dataset,
    batch_size=32,
    drop_last=False,
)

# 4. Wrap the data loader with a data provider.
data_provider = LEAFDataProvider(dataloader)


Creating FL User: 5765user [00:00, 13622.88user/s]
Creating FL User: 641user [00:00, 15256.34user/s]
Creating FL User: 641user [00:00, 14588.76user/s]


### 2. Create the model

Now, let's see how we can create a model that is compatible with FL-training.

1. First, we define a standard, non-FL sentiment classification pytorch `nn.Module`; in this tutorial we use a simple char-LSTM.

In [49]:
from torch import nn


# 1. Define our model, a simple char-LSTM.

class CharLSTM(nn.Module):
    def __init__(
        self,
        num_classes,
        n_hidden,
        num_embeddings,
        embedding_dim,
        max_seq_len,
        dropout_rate,
    ):
        super().__init__()
        self.dropout_rate = dropout_rate
        self.n_hidden = n_hidden
        self.num_classes = num_classes
        self.max_seq_len = max_seq_len
        self.num_embeddings = num_embeddings

        self.embedding = nn.Embedding(
            num_embeddings=self.num_embeddings, embedding_dim=embedding_dim
        )
        self.lstm = nn.LSTM(
            input_size=embedding_dim,
            hidden_size=self.n_hidden,
            num_layers=2,
            batch_first=True,
            dropout=self.dropout_rate,
        )
        self.fc = nn.Linear(self.n_hidden, self.num_classes)
        self.dropout = nn.Dropout(p=self.dropout_rate)

    def forward(self, x):
        seq_lens = torch.sum(x != (self.num_embeddings - 1), 1) - 1
        x = self.embedding(x)  # [B, S] -> [B, S, E]
        out, _ = self.lstm(x)  # [B, S, E] -> [B, S, H]
        out = out[torch.arange(out.size(0)), seq_lens]
        out = self.fc(self.dropout(out))  # [B, S, H] -> # [B, S, C]
        return out


We initialize our model wich such parameters that it is compatible with our dataset.

In [50]:
model = CharLSTM(
    num_classes=train_dataset.num_classes,
    n_hidden=100,
    num_embeddings=train_dataset.num_letters + 1,
    embedding_dim=100,
    max_seq_len=MAX_SEQ_LEN,
    dropout_rate=0.1,
)

model


CharLSTM(
  (embedding): Embedding(101, 100)
  (lstm): LSTM(100, 100, num_layers=2, batch_first=True, dropout=0.1)
  (fc): Linear(in_features=100, out_features=2, bias=True)
  (dropout): Dropout(p=0.1, inplace=False)
)

After we have our standard PyTorch model, we can

2. Create a `torch.device` and choose where the model will be allocated (CUDA or CPU). 

3. Wrap the pytorch module with the FLSim `FLModel`. `FLModel` is accepted by the trainer and handles moving our model, data, and predictions to GPU if desired. It also collects and returns metrics for each batch it predicts on. You can find its implementation [here](https://github.com/facebookresearch/FLSim/blob/main/baselines/models/cv_model.py)

4. Move the model to GPU and enable CUDA if desired.

The model now supports FL training!

In [71]:
from typing import Optional

import torch
import torch.nn as nn
import torch.nn.functional as F
from flsim.interfaces.model import IFLModel
from flsim.utils.simple_batch_metrics import FLBatchMetrics

class FLModel(IFLModel):
    def __init__(self, model: nn.Module, device: Optional[str] = None):
        self.model = model
        self.device = device

    def fl_forward(self, batch) -> FLBatchMetrics:
        features = batch["features"]  # [B, C, 28, 28]
        batch_label = batch["labels"]
        stacked_label = batch_label.view(-1).long().clone().detach()
        if self.device is not None:
            features = features.to(self.device)

        output = self.model(features)

        if self.device is not None:
            output, batch_label, stacked_label = (
                output.to(self.device),
                batch_label.to(self.device),
                stacked_label.to(self.device),
            )

        loss = F.cross_entropy(output, stacked_label)
        num_examples = self.get_num_examples(batch)
        output = output.detach().cpu()
        stacked_label = stacked_label.detach().cpu()
        del features
        return FLBatchMetrics(
            loss=loss,
            num_examples=num_examples,
            predictions=output,
            targets=stacked_label,
            model_inputs=[],
        )

    def fl_create_training_batch(self, **kwargs):
        features = kwargs.get("features", None)
        labels = kwargs.get("labels", None)
        return LEAFUserData.fl_training_batch(features, labels)

    def fl_get_module(self) -> nn.Module:
        return self.model

    def fl_cuda(self) -> None:
        self.model = self.model.to(self.device)

    def get_eval_metrics(self, batch) -> FLBatchMetrics:
        with torch.no_grad():
            return self.fl_forward(batch)

    def get_num_examples(self, batch) -> int:
        return LEAFUserData.get_num_examples(batch["labels"])


USE_CUDA = True

# 2. Choose where the model will be allocated.
cuda_enabled = torch.cuda.is_available() and USE_CUDA
device = torch.device(f"cuda:{0}" if cuda_enabled else "cpu")

# 3. Wrap the model in FLModel.
global_model = FLModel(model, device)

# 4. Enable CUDA if desired.
if cuda_enabled:
    global_model.fl_cuda()


### 3. Metrics Reporting

After we created our data pipeline and FL model, we then define our metrics reporter. The metrics reporter allows us to collect metrics and log them onto tensorboard. 

There are three functions that we care about: 

1. `compare_metrics`: This function compares the current eval metric that is returned from `create_eval_metrics` which we will define below. 

2. `compute_scores`: This function calculates the metrics that we care both. In this case, we would like to report the top1 accuracy. 

3. `create_eval_metrics`: This function creates the eval metrics dictionary that can be used by `compare_metrics` above.

In [72]:
from typing import Any, Dict, List, Optional

import torch
from flsim.common.timeline import Timeline
from flsim.interfaces.metrics_reporter import Channel, TrainingStage
from flsim.metrics_reporter.tensorboard_metrics_reporter import FLMetricsReporter

class MetricsReporter(FLMetricsReporter):
    ACCURACY = "Accuracy"

    def __init__(
        self,
        channels: List[Channel],
        target_eval: float = 0.0,
        window_size: int = 5,
        log_dir: Optional[str] = None,
    ):
        super().__init__(channels, log_dir)
        self.set_summary_writer(log_dir=log_dir)

    def compare_metrics(self, eval_metrics, best_metrics):
        print(f"Current eval accuracy: {eval_metrics}%, Best so far: {best_metrics}%")
        if best_metrics is None:
            return True

        current_accuracy = eval_metrics.get(self.ACCURACY, float("-inf"))
        best_accuracy = best_metrics.get(self.ACCURACY, float("-inf"))
        return current_accuracy > best_accuracy

    def compute_scores(self) -> Dict[str, Any]:
        # compute accuracy
        correct = torch.Tensor([0])
        for i in range(len(self.predictions_list)):
            all_preds = self.predictions_list[i]
            pred = all_preds.data.max(1, keepdim=True)[1]

            assert pred.device == self.targets_list[i].device, (
                f"Pred and targets moved to different devices: "
                f"pred >> {pred.device} vs. targets >> {self.targets_list[i].device}"
            )
            if i == 0:
                correct = correct.to(pred.device)

            correct += pred.eq(self.targets_list[i].data.view_as(pred)).sum()

        # total number of data
        total = sum(len(batch_targets) for batch_targets in self.targets_list)

        accuracy = 100.0 * correct.item() / total
        return {self.ACCURACY: accuracy}

    def create_eval_metrics(
        self, scores: Dict[str, Any], total_loss: float, **kwargs
    ) -> Any:
        timeline: Timeline = kwargs.get("timeline", Timeline(global_round=1))
        stage: TrainingStage = kwargs.get("stage", None)
        accuracy = scores[self.ACCURACY]
        return {
            self.ACCURACY: accuracy
        }

### 3. Hyperparameters

We can represent the hyperparameters for FL training in a JSON config.

This config is passed to the FL trainer.

In [73]:
json_config = {
    "trainer": {
        "_base_": "base_sync_trainer",
        # there are different types of aggegator
        # fed avg with lr requires a learning rate, wheras e.g. fed_avg doesn't
        "server": {
            "_base_": "base_sync_server",
            "server_optimizer": {
              "_base_": "base_fed_avg_with_lr",
              # server's learning rate
              "lr": 0.7,
              # server's global momentum
              "momentum": 1,
            },
            # aggregate client models using weighted average based on number of 
            # examples of the local dataset
            "aggregation_type": "WEIGHTED_AVERAGE",
            # type of user selection sampling
            "active_user_selector": {
              "_base_": "base_uniformly_random_active_user_selector"
            },
        },
        "client": {
            # number of client's local epochs
            "epochs": 1,
            "optimizer": {
                "_base_": "base_optimizer_sgd",
                # client's local learning rate
                "lr": 1,
                # client's local momentum
                "momentum": 0,
            },
        },
        # number of users per round for aggregation
        "users_per_round": 10,
        # total number of global epochs
        # total #rounds = ceil(total_users / users_per_round) * epochs
        "epochs": 1,
        # frequency of reporting train metrics
        "train_metrics_reported_per_epoch": 4,
        # keep the trained model always (as apposed to only when it
        # performs better than the previous model on eval)
        "always_keep_trained_model": False,
        # frequency of evaluation per epoch
        "eval_epoch_frequency": 1,
        "do_eval": True,
        # should we report train metrics after global aggregation
        "report_train_metrics_after_aggregation": True,
    }
}

Even though we recommend a JSON config for ease of representation, FLSim is compatible with the Hydra config system and can work with YAML configs just like any other [PyTorch Lightning](https://www.pytorchlightning.ai/) project. Here, we convert the JSON config to OmegaConf via Hydra for consumption by FLSim. 

In [74]:
import flsim.configs
from flsim.utils.config_utils import fl_config_from_json
from omegaconf import OmegaConf


cfg = fl_config_from_json(json_config)

  message="hydra.experimental.initialize() is no longer experimental."
  message="hydra.experimental.compose() is no longer experimental."


### 4. Training
Recall that we already built the data provider and created a model compatible with FL training. 
Now, to launch the FL training flow we only need to take a few more steps:

1. First, we need to create a metric reporter, which will collect, evaluate, and report relevent training, aggretaion, and evaluation/test metrics.
You can find its implementation [here](https://github.com/facebookresearch/FLSim/blob/main/tutorials/metrics_reporter/fl_metrics_reporter.py).

2. We also need to instantiate the trainer with the model and hyperparameter config we defined earlier.

In [75]:
from flsim.interfaces.metrics_reporter import Channel
from hydra.utils import instantiate

# 1. Create a metric reporter.
metrics_reporter = MetricsReporter([Channel.TENSORBOARD, Channel.STDOUT])


# 2. Instantiate the trainer.
trainer_config = cfg.trainer
trainer = instantiate(trainer_config, model=global_model, cuda_enabled=cuda_enabled)


Finally, we're ready to run FL training given the above JSON config. We can utilize `eval_score` to store the evaluation metrics.

In [76]:
# Launch FL training.
final_model, eval_score = trainer.train(
    data_provider=data_provider,
    metric_reporter=metrics_reporter,
    num_total_users=data_provider.num_users(),
    distributed_world_size=1,
)

  self.dropout, self.training, self.bidirectional, self.batch_first)
Round:  25%|██▌       | 145/577 [00:58<03:03,  2.36round/s]

Train finished Global Round: 145
(epoch = 1, round = 145, global round = 145), Loss/Training: 0.8866520769668088
(epoch = 1, round = 145, global round = 145), Accuracy/Training: 50.01446340757882
reporting (epoch = 1, round = 145, global round = 145) for aggregation
(epoch = 1, round = 145, global round = 145), Loss/Aggregation: 0.8339145928621292
(epoch = 1, round = 145, global round = 145), Accuracy/Aggregation: 50.0


Round:  50%|█████     | 289/577 [01:57<02:04,  2.31round/s]

Train finished Global Round: 289
(epoch = 1, round = 289, global round = 289), Loss/Training: 1.7769925692442676
(epoch = 1, round = 289, global round = 289), Accuracy/Training: 50.197748707027685
reporting (epoch = 1, round = 289, global round = 289) for aggregation
(epoch = 1, round = 289, global round = 289), Loss/Aggregation: 2.158547518253215
(epoch = 1, round = 289, global round = 289), Accuracy/Aggregation: 86.36363636363636


Round:  75%|███████▌  | 433/577 [02:55<01:04,  2.24round/s]

Train finished Global Round: 433
(epoch = 1, round = 433, global round = 433), Loss/Training: 16.83782064347785
(epoch = 1, round = 433, global round = 433), Accuracy/Training: 50.70821529745042
reporting (epoch = 1, round = 433, global round = 433) for aggregation
(epoch = 1, round = 433, global round = 433), Loss/Aggregation: 12.382562828063964
(epoch = 1, round = 433, global round = 433), Accuracy/Aggregation: 61.904761904761905


Round: 100%|█████████▉| 576/577 [03:54<00:00,  2.44round/s]

Train finished Global Round: 577
(epoch = 1, round = 577, global round = 577), Loss/Training: 28.58063744585804
(epoch = 1, round = 577, global round = 577), Accuracy/Training: 49.727668845315904
reporting (epoch = 1, round = 577, global round = 577) for aggregation
(epoch = 1, round = 577, global round = 577), Loss/Aggregation: 43.41543006896973
(epoch = 1, round = 577, global round = 577), Accuracy/Aggregation: 38.095238095238095
Running (epoch = 1, round = 577, global round = 577) for Eval
(epoch = 1, round = 577, global round = 577), Loss/Eval: 34.649913735850205
(epoch = 1, round = 577, global round = 577), Accuracy/Eval: 50.590687977762336


Round: 100%|█████████▉| 576/577 [03:56<00:00,  2.43round/s]
Epoch:   0%|          | 0/1 [03:56<?, ?epoch/s]


After training finishes, we evaluate the model and report the test set accuracy before concluding this tutorial.

In [77]:
# We can now test our model.
trainer.test(
    data_iter=data_provider.test_data(),
    metric_reporter=MetricsReporter([Channel.STDOUT]),
)


Running (epoch = 1, round = 1, global round = 1) for Test
(epoch = 1, round = 1, global round = 1), Loss/Test: 34.6846507969675
(epoch = 1, round = 1, global round = 1), Accuracy/Test: 50.590687977762336


{'Accuracy': 50.590687977762336}

## Summary

In this tutorial, we first showed how to get and preprocess LEAF's Sent140 dataset. 
We then built a data provider by splitting each user's data into batches. 
We defined a simple char-LSTM as our model, wrapped it with a model compatible with FL training, and moved it to GPU. 
Lastly, we set the hyperparameters for FL training, launched the training flow, and evaluated our model.

### Additional resources
- [FLSim tutorials](https://github.com/facebookresearch/FLSim/tree/main/tutorials) - check out our other tutorial on sentiment classification.
- Kairouz et al. (2021): [Advances and Open Problems in Federated Learning](https://arxiv.org/pdf/1912.04977.pdf). As the title suggests, an in-depth overview of advances and open problems in FL.
- If you're interested in federated learning with **differential privacy**, take a look at [Opacus](https://opacus.ai/), a library that enables training PyTorch models with differential privacy. 
You can find a blog post introducing Opacus [here](https://ai.facebook.com/blog/introducing-opacus-a-high-speed-library-for-training-pytorch-models-with-differential-privacy/).

