# Federated Evaluation with MNIST

### Introduction

Welcome to the first OpenFL Federated Evaluation Workflow Interface tutorial! This notebook demonstrates OpenFL capability of running your first horizontal federated evaluation workflow. This work has the following goals:

- Template for federated evaluation exposing key metrics post evaluation run (e.g model accuracy)
- Build on top of first example of learning via workflow API (refer [101 MNIST Notebook](https://github.com/securefederatedai/openfl/tree/develop/openfl-tutorials/experimental/workflow/101_MNIST.ipynb) ) using MNIST dataset and perform fedeval (federated evaluation) 

# Getting Started

First we start by installing OpenFL as per [installation guide](https://openfl.readthedocs.io/en/latest/installation.html).

The next step is to install the Workflow API dependencies, and PyTorch for the ML part.

In [None]:
!pip install -r ../../workflow_interface_requirements.txt
# Uncomment this section if running in Google Colab
#!pip install -r https://raw.githubusercontent.com/securefederatedai/openfl/refs/heads/develop/openfl-tutorials/experimental/workflow/workflow_interface_requirements.txt
!pip install torch==2.7.0 torchvision==0.22.0

In [None]:
#| default_exp experiment

One foundational pre-requisite for evaluation is to have a pre-trained model available and thats exactly what this notebook expects as a pre-requisite:
- A pre-trained model that can be loaded for evaluation

For this tutorial, let's use the final output model of [101 MNIST Notebook](https://github.com/securefederatedai/openfl/tree/develop/openfl-tutorials/experimental/workflow/101_MNIST.ipynb) run, a sample of same is saved at [Pre-trained model](../pretrainedmodels/cnn_mnist.pth)

Sample of the output of training run model that was saved

Let's first define our dataloaders, model, optimizer, and some helper functions like we would for any other deep learning experiment, however 
notice the difference in this setup compared to a typical training/learning setup as detailed in [101 MNIST Notebook](https://github.com/securefederatedai/openfl/tree/develop/openfl-tutorials/experimental/workflow/101_MNIST.ipynb) :
- There is no need to download the training set as we will do only evaluation
- No optimizer settings needed

In [None]:
# | export

import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch
import torchvision
import numpy as np
import requests

# Set parameters
batch_size_test = 1000
learning_rate = 0.01
log_interval = 10

# Set random seed for reproducibility
random_seed = 1
torch.backends.cudnn.enabled = False
torch.manual_seed(random_seed)

# Load and transform the MNIST test dataset
mnist_test = torchvision.datasets.MNIST(
    "./files/",
    train=False,
    download=True,
    transform=torchvision.transforms.Compose(
        [
            torchvision.transforms.ToTensor(),
            torchvision.transforms.Normalize((0.1307,), (0.3081,)),
        ]
    ),
)

class Net(nn.Module):
    """
    A Convolutional Neural Network (CNN) for digit classification on the MNIST dataset.

    Architecture:
    - Conv2D layer with 10 filters, kernel size 5
    - MaxPooling layer with size 2
    - Conv2D layer with 20 filters, kernel size 5
    - Dropout2D for regularization
    - MaxPooling layer with size 2
    - Fully connected layer (320 -> 50)
    - Dropout
    - Fully connected output layer (50 -> 10)
    - LogSoftmax activation at output
    """
    def __init__(self):
        """
        Initialize CNN layers.
        """
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        """
        Define the forward pass of the network.

        Args:
            x (Tensor): Input tensor of shape (batch_size, 1, 28, 28)

        Returns:
            Tensor: Log-probabilities for each digit class
        """
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x)

def inference(network,test_loader):
    """
    Evaluate the trained model on the test dataset.

    Args:
        network (nn.Module): Trained neural network model.
        test_loader (DataLoader): DataLoader for the test dataset.

    Returns:
        float: Accuracy of the model on the test dataset.
    """
    network.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
      for data, target in test_loader:
        output = network(data)
        test_loss += F.nll_loss(output, target, size_average=False).item()
        pred = output.data.max(1, keepdim=True)[1]
        correct += pred.eq(target.data.view_as(pred)).sum()
    test_loss /= len(test_loader.dataset)
    print('\nTest set: Avg. loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
      test_loss, correct, len(test_loader.dataset),
      100. * correct / len(test_loader.dataset)))
    accuracy = float(correct / len(test_loader.dataset))
    return accuracy

Next we import the `FLSpec`, `LocalRuntime`, and placement decorators.

- `FLSpec` – Defines the flow specification. User defined flows are subclasses of this.
- `Runtime` – Defines where the flow runs, infrastructure for task transitions (how information gets sent). The `LocalRuntime` runs the flow on a single node.
- `aggregator/collaborator` - placement decorators that define where the task will be assigned

Now we come to the flow definition. The OpenFL Workflow Interface adopts the conventions set by Metaflow, that every workflow begins with `start` and concludes with the `end` task. The aggregator begins with an optionally passed in model and optimizer. The aggregator begins the flow with the `start` task, where the list of collaborators is extracted from the runtime (`self.collaborators = self.runtime.collaborators`) and is then used as the list of participants to run the task listed in `self.next`, `evaluate`. The model, optimizer, and anything that is not explicitly excluded from the next function will be passed from the `start` function on the aggregator to the `evaluate` task on the collaborator. Where the tasks run is determined by the placement decorator that precedes each task definition (`@aggregator` or `@collaborator`). Once each of the collaborators (defined in the runtime) complete the `evaluate` task, they finally `join` at the aggregator doing just model evaluation/validation without any training. It is in `join` that an accuracy of model per collaborator is shown.

![FedEval.png](../../../../../docs/images/FedEval.png)

In [None]:
# | export

from copy import deepcopy

from openfl.experimental.workflow.interface import FLSpec, Aggregator, Collaborator
from openfl.experimental.workflow.runtime import LocalRuntime
from openfl.experimental.workflow.placement import aggregator, collaborator
class FederatedEvaluationFlow(FLSpec):

    def __init__(self, model=None, rounds=1, **kwargs):
        super().__init__(**kwargs)
        if model is not None:
            self.model = model
        else:
            self.model = Net()
            
        self.rounds = rounds

    @aggregator
    def start(self):
        print(f'Performing initialization for model')
        self.collaborators = self.runtime.collaborators
        self.private = 10
        self.current_round = 0
        self.next(self.evaluate, foreach='collaborators', exclude=['private'])

    @collaborator
    def evaluate(self):
        print(f'Performing model evaluation for collaborator {self.input}')
        self.agg_validation_score = inference(self.model, self.test_loader)
        print(f'{self.input} value of {self.agg_validation_score}')
        self.next(self.join)

    @aggregator
    def join(self, inputs):
        self.aggregated_model_accuracy = sum(
            input.agg_validation_score for input in inputs) / len(inputs)
        print(f'Average aggregated model accuracy values = {self.aggregated_model_accuracy}')

        self.current_round += 1
        if self.current_round < self.rounds:
            self.next(self.evaluate,
                      foreach='collaborators', exclude=['private'])
        else:
            self.next(self.end)

    @aggregator
    def end(self):
        print(f'This is the end of the flow')

Now let's setup the participants in similar fashion as basic learning/training tutorial but notice the difference in the setup below since we are doing only evaluation there is no need to configure training related data, targets and data loader.

In [None]:
# | export

# Setup participants
aggregator = Aggregator()
aggregator.private_attributes = {}

# Setup collaborators with private attributes
n_collaborators = 2
collaborator_names = ["Bengaluru", "Portland"]
collaborators = [Collaborator(name=name) for name in collaborator_names]
for idx, collaborator in enumerate(collaborators):
    local_test = deepcopy(mnist_test)
    local_test.data = mnist_test.data[idx::len(collaborators)]
    local_test.targets = mnist_test.targets[idx::len(collaborators)]
    collaborator.private_attributes = {
            'test_loader': torch.utils.data.DataLoader(local_test,batch_size=batch_size_test, shuffle=True)
    }

local_runtime = LocalRuntime(aggregator=aggregator, collaborators=collaborators, backend='single_process')
print(f'Local runtime collaborators = {local_runtime.collaborators}')

Now that we have our evaluation flow and runtime defined, let's run the experiment! Since its evaluation we need to run it only for one round of validation and for that first we will load a pre-trained model

In [None]:
# Download the file 
url = 'https://github.com/securefederatedai/openfl/raw/refs/heads/develop/openfl-tutorials/experimental/pretrainedmodels/cnn_mnist.pth'
response = requests.get(url)
with open('cnn_mnist.pth', 'wb') as f:
    f.write(response.content)

# Load the model
model = Net()
model.load_state_dict(torch.load('cnn_mnist.pth'))
best_model = model
flflow = FederatedEvaluationFlow(model, checkpoint=True)
flflow.runtime = local_runtime
flflow.run()

Now that the flow has completed, let's get the model accuracy

In [None]:
print(f'\nFinal model accuracy for {flflow.rounds} rounds of evaluation: {flflow.aggregated_model_accuracy}')

It should ideally report +-0.05 as per the pre-trained models' accuracy that is used in this experiment which, as detailed above, was ~0.846

Now that the flow is complete, let's dig into some of the information captured along the way

In [None]:
run_id = flflow._run_id
from metaflow import Metaflow, Flow, Task, Step
m = Metaflow()
s = Step(f'FederatedEvaluationFlow/{run_id}/evaluate')
list(s)

Now we see **4** steps: **2** collaborators each performed **1** rounds of model evaluation
Let's look at one of those data points

In [None]:
t = Task(f'FederatedEvaluationFlow/{run_id}/evaluate/2')

Now let's look at the data artifacts this task generated

In [None]:
t.data

In [None]:
t.data.input

Now let's look at its log output (stdout)

In [None]:
print(t.stdout)

For more details on checkpointing and using Metaflow to dig into more details of a federation run please refer to previous tutorials on learning like [101 MNIST Notebook](https://github.com/securefederatedai/openfl/tree/develop/openfl-tutorials/experimental/workflow/101_MNIST.ipynb)

### Setup Federation: Director & Envoys

Before we can deploy the experiment, let us create participants in Federation: Director and Envoys. As the Tutorial uses two collaborators we shall launch three participants:

1. Director: The central node in the Federation
2. Bengaluru: The first envoy in the Federation
3. Portland: The second envoy in the Federation
   
The participants can be launched by following steps mentioned in [README]((https://github.com/securefederatedai/openfl/blob/develop/openfl-tutorials/experimental/workflow/FederatedRuntime/101_MNIST/README.md))

### Deploy: FederatedRuntime

We now import and instantiate `FederatedRuntime` to enable deployment of experiment on distributed infrastructure. Initializing the `FederatedRuntime` requires following inputs to be provided by the user:

- `director_info` – director information including fqdn of the director node, port, and certificate information
- `collaborators` - names of the collaborators participating in experiment
- `notebook_path`- path to this jupyter notebook

In [None]:
#| export

from openfl.experimental.workflow.runtime import FederatedRuntime

director_info = {
    'director_node_fqdn': 'localhost',
    'director_port': 50050,
}

federated_runtime = FederatedRuntime(
    collaborators = collaborator_names,
    director=director_info,
    notebook_path='./MNIST_FederatedEvaluation.ipynb'
)

Let us connect to federation & check if the envoys are connected to the director by using the `get_envoys` method of `FederatedRuntime`. If the participants are launched successful in previous step the status of `Bengaluru` and `Portland` should be displayed as `Online`

In [None]:
federated_runtime.get_envoys()

Now that we have our distributed infrastructure ready, let us modify the flow runtime to `FederatedRuntime` instance and deploy the experiment. 

Progress of the flow is available on 
1. Jupyter notebook: if `checkpoint` attribute of the flow object is set to `True`
2. Director and Envoy terminals  

In [None]:
# Download the file 
url = 'https://github.com/securefederatedai/openfl/raw/refs/heads/develop/openfl-tutorials/experimental/pretrainedmodels/cnn_mnist.pth'
response = requests.get(url)
with open('cnn_mnist.pth', 'wb') as f:
    f.write(response.content)

print("downloading cnn_mnist.pth file is completed")
# Load the model
model = Net()
model.load_state_dict(torch.load('cnn_mnist.pth'))
print("cnn_mnist.pth ")
best_model = model
flflow = FederatedEvaluationFlow(model, checkpoint=True)
flflow.runtime = federated_runtime
flflow.run()

Now that the flow has completed, let's get the model accuracy

In [None]:
print(f'\nFinal model accuracy for {flflow.rounds} rounds of evaluation: {flflow.aggregated_model_accuracy}')

# Congratulations!
You've successfully completed your first Federated Evaluation workflow interface quickstart notebook