# FederatedRuntime 101: Quickstart with MNIST

Welcome to the first **FederatedRuntime** Tutorial ! 
This tutorial demonstrates how to deploy Federated-Learning experiment based on workflow interface on a distributed computing infrastructure.

Data scientists often start by developing and fine-tuning Federated machine-learning models in a local environment before transitioning to a Federated setup. OpenFL supports this methodology and the Tutorial guides the user through the following steps:
- **Simulate** a Federated Learning experiment locally using `LocalRuntime` 
- **Deploy** this experiment on Federated Infrastructure using `FederatedRuntime` from from a familiar Jupyter notebook environment

**Key Features covered**:  
1. **Simulate** Federated Learning experiment using `LocalRuntime`. Explore [101 MNIST](https://github.com/securefederatedai/openfl/blob/develop/openfl-tutorials/experimental/workflow/101_MNIST.ipynb) for insights
2. Enable creation of workspace content by annotating Jupyter notebook with export directives.
3. **Deploy** the experiment on Federated infrastructure (Director and Envoy nodes) using `FederatedRuntime`.
   NOTE: Participants in the Federation should be launched using the steps described in [README.md](https://github.com/securefederatedai/openfl/blob/develop/openfl-tutorials/experimental/workflow/FederatedRuntime/101_MNIST/README.md) before deploying the experiment.

Let's get started !


**Methodology for annotating Jupyter Notebook**
1. User annotates the relevant cells in the Jupyter notebook using the `#| export` directive. This indicates which parts of the notebook should be extracted for further processing or deployment.
2. `FederatedRuntime` leverages the experimental workflow module `notebooktools` to transform the annotated Jupyter notebook into a workspace enabling its deployment on distributed infrastructure.

# Getting Started

We begin by specifying the module where cells marked with the `#| export` directive will be automatically exported. 

The export directive is used to identify specific code cells in the Jupyter notebook that should be included in the generated python module. This python module is required to distribute the FL experiment.

In the following cell `#| default_exp` experiment directive sets the name of the python module as `experiment`. This name can be customized according to the user’s requirements and preferences.

In [None]:
#| default_exp experiment

Once we have specified the name of the module, subsequent cells of the notebook need to be *appended* by the `#| export` directive as shown below. 

User should ensure that *all* the notebook functionality required in the Federated Learning experiment is included in this directive

### Installing Pre-requisites
We start by installing OpenFL and dependencies of the workflow interface 
> These dependencies are required to be exported and become the requirements for the Federated Learning Workspace 

In [None]:
#| export

!pip install git+https://github.com/securefederatedai/openfl.git
!pip install -r ../../../workflow_interface_requirements.txt
!pip install torch==2.3.1
!pip install torchvision==0.18.1
!pip install -U ipywidgets


### Model definition

We begin with the quintessential example of a pytorch CNN model trained on the MNIST dataset. Let's start by defining
- Hyperparameters
- Model definition, and 
- Helper functions to train and validate the model like we would for any other deep learning experiment

> This cell and all the subsequent cells are important ingredients of the Federated Learning experiment and therefore annotated with the `#| export` directive

In [None]:
# | export

import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch
import numpy as np
import random

# Hyperparameters
learning_rate = 0.01
momentum = 0.5
batch_size = 32
log_interval = 10

# Model definition
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2(x), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x)


# Helper function to validate the model
def validate(model, test_loader):
    model.eval()
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            output = model(data)
            pred = output.data.max(1, keepdim=True)[1]
            correct += pred.eq(target.data.view_as(pred)).sum()
    accuracy = float(correct / len(test_loader.dataset))
    return accuracy


# Helper function to train the model
def train_model(model, optimizer, data_loader, round_number, log=False):
    train_loss = 0
    model.train()
    for batch_idx, (X, y) in enumerate(data_loader):
        optimizer.zero_grad()

        output = model(X)
        loss = F.nll_loss(output, y)
        loss.backward()

        optimizer.step()

        train_loss += loss.item() * len(X)
        if batch_idx % log_interval == 0 and log:
            print(
                "Train Epoch: {:3} [{:5}/{:<5} ({:<.0f}%)] Loss: {:<.4f}".format(
                    round_number,
                    batch_idx * len(X),
                    len(data_loader.dataset),
                    100.0 * batch_idx / len(data_loader),
                    loss.item(),
                )
            )

    train_loss /= len(data_loader.dataset)
    return train_loss


# Helper function to initialize seed for reproducibility
def initialize_seed(random_seed=42):
    torch.manual_seed(random_seed)
    np.random.seed(random_seed)
    random.seed(random_seed)

### Dataset definition

We now download the training and test datasets of MNIST, a necessary step to demonstrate the functionality of the LocalRuntime.

In [None]:
#| export

import torchvision

# Train and Test datasets
mnist_train = torchvision.datasets.MNIST(
    "../files/",
    train=True,
    download=True,
    transform=torchvision.transforms.Compose(
        [
            torchvision.transforms.ToTensor(),
            torchvision.transforms.Normalize((0.1307,), (0.3081,)),
        ]
    ),
)

mnist_test = torchvision.datasets.MNIST(
    "../files/",
    train=False,
    download=True,
    transform=torchvision.transforms.Compose(
        [
            torchvision.transforms.ToTensor(),
            torchvision.transforms.Normalize((0.1307,), (0.3081,)),
        ]
    ),
)

### Workflow definition

Next we import the `FLSpec`, placement decorators (`aggregator/collaborator`), and define the `FedAvg` helper function

- `FLSpec` – Defines the flow specification. User defined flows are subclasses of this.
- `aggregator/collaborator` - placement decorators that define where the task will be assigned
- `FedAvg` - helper function for Federated Averaging


In [None]:
# | export

from copy import deepcopy

from openfl.experimental.workflow.interface import FLSpec
from openfl.experimental.workflow.placement import aggregator, collaborator


# Helper function for federated averaging
def FedAvg(agg_model, models, weights=None):
    state_dicts = [model.state_dict() for model in models]
    agg_state_dict = agg_model.state_dict()
    for key in models[0].state_dict():
        agg_state_dict[key] = torch.from_numpy(
            np.average([state[key].numpy() for state in state_dicts], axis=0, weights=weights)
        )

    agg_model.load_state_dict(agg_state_dict)
    return agg_model

Let us now define the Workflow. Here we use the same tasks as the [101 MNIST](https://github.com/securefederatedai/openfl/blob/develop/openfl-tutorials/experimental/workflow/101_MNIST.ipynb)

In [None]:
# | export

class FederatedFlow_TorchMNIST(FLSpec):
    """
    This Flow trains a CNN on MNIST Model in Federated Learning
    """

    def __init__(self, model=None, optimizer=None, learning_rate=1e-2, momentum=0.5, rounds=3, **kwargs):
        super().__init__(**kwargs)

        if model is not None:
            self.model = model
            self.optimizer = optimizer
        else:
            initialize_seed()
            self.model = Net()
            self.optimizer = optim.SGD(self.model.parameters(), lr=learning_rate, momentum=momentum)

        self.learning_rate = learning_rate
        self.momentum = momentum
        self.rounds = rounds
        self.results = []

    @aggregator
    def start(self):
        """
        This is the start of the Flow.
        """
        print(f"Initializing Workflow .... ")

        self.collaborators = self.runtime.collaborators
        self.current_round = 0

        self.next(self.aggregated_model_validation, foreach="collaborators")

    @collaborator
    def aggregated_model_validation(self):
        """
        Perform validation of aggregated model on collaborators.
        """
        print(f"<Collab: {self.input}> Performing Validation on aggregated model ... ")
        self.agg_validation_score = validate(self.model, self.test_loader)
        print(
            f"<Collab: {self.input}> Aggregated Model validation score = {self.agg_validation_score:.4f}"
        )

        self.next(self.train)

    @collaborator
    def train(self):
        """
        Train model on Local collaborator dataset.
        """
        print(f"<Collab: {self.input}>: Training Model on local dataset ... ")

        self.optimizer = optim.SGD(self.model.parameters(), lr=self.learning_rate, momentum=self.momentum)

        self.loss = train_model(
            model=self.model,
            optimizer=self.optimizer,
            data_loader=self.train_loader,
            round_number=self.current_round,
            log=True,
        )

        self.next(self.local_model_validation)

    @collaborator
    def local_model_validation(self):
        """
        Validate locally trained model.
        """
        print(f"<Collab: {self.input}> Performing Validation on locally trained model ... ")
        self.local_validation_score = validate(self.model, self.test_loader)
        print(
            f"<Collab: {self.input}> Local model validation score = {self.local_validation_score:.4f}"
        )
        self.next(self.join)

    @aggregator
    def join(self, inputs):
        """
        Model aggregation step.
        """
        print(f"<Agg>: Joining models from collaborators...")

        # Average Training loss, aggregated and locally trained model accuracy
        self.average_loss = sum(input.loss for input in inputs) / len(inputs)
        self.aggregated_model_accuracy = sum(input.agg_validation_score for input in inputs) / len(inputs)
        self.local_model_accuracy = sum(input.local_validation_score for input in inputs) / len(inputs)

        print(f"Avg. aggregated model validation score = {self.aggregated_model_accuracy:.4f}")
        print(f"Avg. training loss = {self.average_loss:.4f}")
        print(f"Avg. local model validation score = {self.local_model_accuracy:.4f}")

        # FedAvg
        self.model = FedAvg(self.model, [input.model for input in inputs])

        self.results.append(
            [
                self.current_round,
                self.aggregated_model_accuracy,
                self.average_loss,
                self.local_model_accuracy,
            ]
        )

        self.current_round += 1
        if self.current_round < self.rounds:
            self.next( self.aggregated_model_validation, foreach="collaborators")
        else:
            self.next(self.end)

    @aggregator
    def end(self):
        """
        This is the last step in the Flow.
        """
        print(f"This is the end of the flow")

### Simulation: LocalRuntime

We now import & define the `LocalRuntime`, participants (`Aggregator/Collaborator`), and initialize the private attributes for participants

- `Runtime` – Defines where the flow runs. `LocalRuntime` simulates the flow on local node.
- `Aggregator/Collaborator` - (Local) Participants in the simulation


In [None]:
# | export

from openfl.experimental.workflow.interface import Aggregator, Collaborator
from openfl.experimental.workflow.runtime import LocalRuntime

# Setup Aggregator & initialize private attributes
aggregator = Aggregator()
aggregator.private_attributes = {}

# Setup Collaborators & initialize shards of MNIST dataset as private attributes
n_collaborators = 2
collaborator_names = ["Portland", "Seattle"]

collaborators = [Collaborator(name=name) for name in collaborator_names]
for idx, collaborator in enumerate(collaborators):
    local_train = deepcopy(mnist_train)
    local_test = deepcopy(mnist_test)
    local_train.data = mnist_train.data[idx::n_collaborators]
    local_train.targets = mnist_train.targets[idx::n_collaborators]
    local_test.data = mnist_test.data[idx::n_collaborators]
    local_test.targets = mnist_test.targets[idx::n_collaborators]

    collaborator.private_attributes = {
        "train_loader": torch.utils.data.DataLoader(
            local_train, batch_size=batch_size, shuffle=False
        ),
        "test_loader": torch.utils.data.DataLoader(
            local_test, batch_size=batch_size, shuffle=False
        ),
    }

local_runtime = LocalRuntime(
    aggregator=aggregator, collaborators=collaborators, backend="single_process"
)
print(f"Local runtime collaborators = {local_runtime.collaborators}")

### Start Simulation

Now that we have our flow and runtime defined, let's run the simulation ! 

In [None]:
#| export

model = None
optimizer = None
flflow = FederatedFlow_TorchMNIST(model, optimizer, learning_rate, momentum, rounds=2, checkpoint=True)
flflow.runtime = local_runtime
flflow.run()

Let us check the simulation results

In [None]:
from tabulate import tabulate 

headers = ["Rounds", "Agg Model Validation Score", "Local Train loss", "Local Model Validation score"]
print('********** Simulation results **********')
simulation_results = flflow.results
print(tabulate(simulation_results, headers=headers, tablefmt="outline"))


### Setup Federation: Director & Envoys

Before we can deploy the experiment, let us create participants in Federation: Director and Envoys. As the Tutorial uses two collaborators we shall launch three participants:
1. Director: The central node in the Federation
2. Portland: The first envoy in the Federation
3. Seattle: The second envoy in the Federation 

The participants can be launched by following steps mentioned in [README]((https://github.com/securefederatedai/openfl/blob/develop/openfl-tutorials/experimental/workflow/FederatedRuntime/101_MNIST/README.md))


### Deploy: FederatedRuntime

We now import and instantiate `FederatedRuntime` to enable deployment of experiment on distributed infrastructure. Initializing the `FederatedRuntime` requires following inputs to be provided by the user:

- `director_info` – director information including fqdn of the director node, port, and certificate information
- `collaborators` - names of the collaborators participating in experiment
- `notebook_path`- path to this jupyter notebook


In [None]:
#| export

from openfl.experimental.workflow.runtime import FederatedRuntime

director_info = {
    'director_node_fqdn':'localhost',
    'director_port':50050,
}

federated_runtime = FederatedRuntime(
    collaborators=collaborator_names,
    director=director_info, 
    notebook_path='./101_MNIST_FederatedRuntime.ipynb'
)

Let us connect to federation & check if the envoys are connected to the director by using the `get_envoys` method of `FederatedRuntime`. If the participants are launched successful in previous step the status of `Portland` and `Seattle` should be displayed as `Online`

In [None]:
federated_runtime.get_envoys()

Now that we have our distributed infrastructure ready, let us modify the flow runtime to `FederatedRuntime` instance and deploy the experiment. 

Progress of the flow is available on 
1. Jupyter notebook: if `checkpoint` attribute of the flow object is set to `True`
2. Director and Envoy terminals  


In [None]:
flflow.results = [] # clear results from previous run
flflow.runtime = federated_runtime
flflow.run()

Let us compare the simulation results from `LocalRuntime` and federation results from `FederatedRuntime`

In [None]:
headers = ["Rounds", "Agg Model Validation Score", "Local Train loss", "Local Model Validation score"]
print('********** Simulation results **********')
print(tabulate(simulation_results, headers=headers, tablefmt="outline"))

print('********** Federation results **********')
federation_results = flflow.results
print(tabulate(federation_results, headers=headers, tablefmt="outline"))
