# Building a Strategy

Welcome to the third part of the Flower federated learning tutorial. In previous parts of this tutorial, we introduced federated learning with PyTorch and Flower ([part 1](https://flower.dev/docs/tutorial/Flower-1-Intro-to-FL-PyTorch.html)) and we learned how strategies can be used to customize the execution on both the server and the clients ([part 2](https://flower.dev/docs/tutorial/Flower-2-Strategies-in-FL-PyTorch.html)).

In this notebook, we'll continue to customize the federated learning system we built previously by creating a custom version of FedAvg (again, using [Flower](https://flower.dev/) and [PyTorch](https://pytorch.org/)).

> Join the Flower community on Slack to connect, ask questions, and get help: [Join Slack](https://flower.dev/join-slack) 🌻 We'd love to hear from you in the `#introductions` channel! If anything is unclear, head over to the `#questions` channel.

Let's build a new `Strategy` from scratch!

## Preparation

Before we begin with the actual code, let's make sure that we have everything we need.

### Installing dependencies

First, we install the necessary packages:

In [1]:
%pip install -q flwr[simulation] torch torchvision

Note: you may need to restart the kernel to use updated packages.


Now that we have all dependencies installed, we can import everything we need for this tutorial:

In [2]:
from collections import OrderedDict
from typing import Dict, List, Optional, Tuple

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, random_split
from torchvision.datasets import CIFAR10

import flwr as fl

DEVICE = torch.device("cpu")  # Try "cuda" to train on GPU
print(
    f"Training on {DEVICE} using PyTorch {torch.__version__} and Flower {fl.__version__}"
)

Training on cpu using PyTorch 1.13.1+cu117 and Flower 1.3.0


It is possible to switch to a runtime that has GPU acceleration enabled (on Google Colab: `Runtime > Change runtime type > Hardware acclerator: GPU > Save`). Note, however, that Google Colab is not always able to offer GPU acceleration. If you see an error related to GPU availability in one of the following sections, consider switching back to CPU-based execution by setting `DEVICE = torch.device("cpu")`. If the runtime has GPU acceleration enabled, you should see the output `Training on cuda`, otherwise it'll say `Training on cpu`.

### Data loading

Let's now load the CIFAR-10 training and test set, partition them into ten smaller datasets (each split into training and validation set), and wrap everything in their own `DataLoader`. We introduce a new parameter `num_clients` which allows us to call `load_datasets` with different numbers of clients.

In [3]:
NUM_CLIENTS = 10


def load_datasets(num_clients: int):
    # Download and transform CIFAR-10 (train and test)
    transform = transforms.Compose(
        [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
    )
    trainset = CIFAR10("./dataset", train=True, download=True, transform=transform)
    testset = CIFAR10("./dataset", train=False, download=True, transform=transform)

    # Split training set into `num_clients` partitions to simulate different local datasets
    partition_size = len(trainset) // num_clients
    lengths = [partition_size] * num_clients
    datasets = random_split(trainset, lengths, torch.Generator().manual_seed(42))

    # Split each partition into train/val and create DataLoader
    trainloaders = []
    valloaders = []
    for ds in datasets:
        len_val = len(ds) // 10  # 10 % validation set
        len_train = len(ds) - len_val
        lengths = [len_train, len_val]
        ds_train, ds_val = random_split(ds, lengths, torch.Generator().manual_seed(42))
        trainloaders.append(DataLoader(ds_train, batch_size=32, shuffle=True))
        valloaders.append(DataLoader(ds_val, batch_size=32))
    testloader = DataLoader(testset, batch_size=32)
    return trainloaders, valloaders, testloader


trainloaders, valloaders, testloader = load_datasets(NUM_CLIENTS)

Files already downloaded and verified
Files already downloaded and verified


### Model training/evaluation

Let's continue with the usual model definition (including `set_parameters` and `get_parameters`), training and test functions:

In [4]:
class Net(nn.Module):
    def __init__(self) -> None:
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


def get_parameters(net) -> List[np.ndarray]:
    return [val.cpu().numpy() for _, val in net.state_dict().items()]


def set_parameters(net, parameters: List[np.ndarray]):
    params_dict = zip(net.state_dict().keys(), parameters)
    state_dict = OrderedDict({k: torch.Tensor(v) for k, v in params_dict})
    net.load_state_dict(state_dict, strict=True)


def train(net, trainloader, epochs: int):
    """Train the network on the training set."""
    criterion = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(net.parameters())
    net.train()
    for epoch in range(epochs):
        correct, total, epoch_loss = 0, 0, 0.0
        for images, labels in trainloader:
            images, labels = images.to(DEVICE), labels.to(DEVICE)
            optimizer.zero_grad()
            outputs = net(images)
            loss = criterion(net(images), labels)
            loss.backward()
            optimizer.step()
            # Metrics
            epoch_loss += loss
            total += labels.size(0)
            correct += (torch.max(outputs.data, 1)[1] == labels).sum().item()
        epoch_loss /= len(trainloader.dataset)
        epoch_acc = correct / total
        print(f"Epoch {epoch+1}: train loss {epoch_loss}, accuracy {epoch_acc}")


def test(net, testloader):
    """Evaluate the network on the entire test set."""
    criterion = torch.nn.CrossEntropyLoss()
    correct, total, loss = 0, 0, 0.0
    net.eval()
    with torch.no_grad():
        for images, labels in testloader:
            images, labels = images.to(DEVICE), labels.to(DEVICE)
            outputs = net(images)
            loss += criterion(outputs, labels).item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    loss /= len(testloader.dataset)
    accuracy = correct / total
    return loss, accuracy

### Flower client

To implement the Flower client, we (again) create a subclass of `flwr.client.NumPyClient` and implement the three methods `get_parameters`, `fit`, and `evaluate`. Here, we also pass the `cid` to the client and use it log additional details:

In [5]:
class FlowerClient(fl.client.NumPyClient):
    def __init__(self, cid, net, trainloader, valloader):
        self.cid = cid
        self.net = net
        self.trainloader = trainloader
        self.valloader = valloader

    def get_parameters(self, config):
        print(f"[Client {self.cid}] get_parameters")
        return get_parameters(self.net)

    def fit(self, parameters, config):
        print(f"[Client {self.cid}] fit, config: {config}")
        set_parameters(self.net, parameters)
        train(self.net, self.trainloader, epochs=1)
        return get_parameters(self.net), len(self.trainloader), {}

    def evaluate(self, parameters, config):
        print(f"[Client {self.cid}] evaluate, config: {config}")
        set_parameters(self.net, parameters)
        loss, accuracy = test(self.net, self.valloader)
        return float(loss), len(self.valloader), {"accuracy": float(accuracy)}


def client_fn(cid) -> FlowerClient:
    net = Net().to(DEVICE)
    trainloader = trainloaders[int(cid)]
    valloader = valloaders[int(cid)]
    return FlowerClient(cid, net, trainloader, valloader)

Let's test what we have so far before we continue:

In [6]:
# Specify client resources if you need GPU (defaults to 1 CPU and 0 GPU)
client_resources = None
if DEVICE.type == "cuda":
    client_resources = {"num_gpus": 1}

fl.simulation.start_simulation(
    client_fn=client_fn,
    num_clients=2,
    config=fl.server.ServerConfig(num_rounds=3),
    client_resources=client_resources,
)

INFO flwr 2023-02-27 16:09:24,834 | app.py:145 | Starting Flower simulation, config: ServerConfig(num_rounds=3, round_timeout=None)
2023-02-27 16:09:36,432	INFO worker.py:1529 -- Started a local Ray instance. View the dashboard at [1m[32m127.0.0.1:8265 [39m[22m
E0227 16:09:41.246400000   21620 socket_utils_common_posix.cc:223] check for SO_REUSEPORT: {"created":"@1677510581.246328500","description":"Protocol not available","errno":92,"file":"external/com_github_grpc_grpc/src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":202,"os_error":"Protocol not available","syscall":"getsockopt(SO_REUSEPORT)"}
INFO flwr 2023-02-27 16:09:41,310 | app.py:179 | Flower VCE: Ray initialized with resources: {'node:10.246.68.42': 1.0, 'CPU': 8.0, 'memory': 1535262720.0, 'object_store_memory': 767631360.0}
INFO flwr 2023-02-27 16:09:41,311 | server.py:86 | Initializing global parameters
INFO flwr 2023-02-27 16:09:41,312 | server.py:270 | Requesting initial parameters from one random client
IN

[2m[36m(launch_and_get_parameters pid=21895)[0m [Client 0] get_parameters
[2m[36m(launch_and_fit pid=21893)[0m [Client 1] fit, config: {}
[2m[36m(launch_and_fit pid=21897)[0m [Client 0] fit, config: {}
[2m[36m(launch_and_fit pid=21893)[0m Epoch 1: train loss 0.06428961455821991, accuracy 0.24355555555555555


DEBUG flwr 2023-02-27 16:09:57,277 | server.py:229 | fit_round 1 received 2 results and 0 failures
DEBUG flwr 2023-02-27 16:09:57,296 | server.py:165 | evaluate_round 1: strategy sampled 2 clients (out of 2)


[2m[36m(launch_and_fit pid=21897)[0m Epoch 1: train loss 0.06426512449979782, accuracy 0.2371111111111111


[2m[33m(raylet)[0m E0227 16:10:06.400418400   22250 socket_utils_common_posix.cc:223] check for SO_REUSEPORT: {"created":"@1677510606.400379500","description":"Protocol not available","errno":92,"file":"external/com_github_grpc_grpc/src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":202,"os_error":"Protocol not available","syscall":"getsockopt(SO_REUSEPORT)"}
[2m[33m(raylet)[0m E0227 16:10:06.411430100   22249 socket_utils_common_posix.cc:223] check for SO_REUSEPORT: {"created":"@1677510606.411400000","description":"Protocol not available","errno":92,"file":"external/com_github_grpc_grpc/src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":202,"os_error":"Protocol not available","syscall":"getsockopt(SO_REUSEPORT)"}
[2m[33m(raylet)[0m E0227 16:10:06.429365100   22248 socket_utils_common_posix.cc:223] check for SO_REUSEPORT: {"created":"@1677510606.429331000","description":"Protocol not available","errno":92,"file":"external/com_github_grpc_grpc/src/core/lib/i

[2m[36m(launch_and_evaluate pid=21893)[0m [Client 1] evaluate, config: {}


[2m[33m(raylet)[0m E0227 16:10:15.131718200   22385 socket_utils_common_posix.cc:223] check for SO_REUSEPORT: {"created":"@1677510615.131657300","description":"Protocol not available","errno":92,"file":"external/com_github_grpc_grpc/src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":202,"os_error":"Protocol not available","syscall":"getsockopt(SO_REUSEPORT)"}
[2m[33m(raylet)[0m E0227 16:10:16.031351600   22390 socket_utils_common_posix.cc:223] check for SO_REUSEPORT: {"created":"@1677510616.031325800","description":"Protocol not available","errno":92,"file":"external/com_github_grpc_grpc/src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":202,"os_error":"Protocol not available","syscall":"getsockopt(SO_REUSEPORT)"}


[2m[36m(launch_and_evaluate pid=21893)[0m [Client 0] evaluate, config: {}


DEBUG flwr 2023-02-27 16:10:16,989 | server.py:179 | evaluate_round 1 received 2 results and 0 failures
DEBUG flwr 2023-02-27 16:10:16,993 | server.py:215 | fit_round 2: strategy sampled 2 clients (out of 2)
[2m[33m(raylet)[0m E0227 16:10:18.200148800   22455 socket_utils_common_posix.cc:223] check for SO_REUSEPORT: {"created":"@1677510618.200113800","description":"Protocol not available","errno":92,"file":"external/com_github_grpc_grpc/src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":202,"os_error":"Protocol not available","syscall":"getsockopt(SO_REUSEPORT)"}


[2m[36m(launch_and_fit pid=21893)[0m [Client 1] fit, config: {}
[2m[36m(launch_and_fit pid=21893)[0m Epoch 1: train loss 0.05502162501215935, accuracy 0.3497777777777778
[2m[36m(launch_and_fit pid=21894)[0m [Client 0] fit, config: {}


DEBUG flwr 2023-02-27 16:10:32,211 | server.py:229 | fit_round 2 received 2 results and 0 failures
DEBUG flwr 2023-02-27 16:10:32,221 | server.py:165 | evaluate_round 2: strategy sampled 2 clients (out of 2)


[2m[36m(launch_and_fit pid=21894)[0m Epoch 1: train loss 0.05576556548476219, accuracy 0.3462222222222222


[2m[33m(raylet)[0m [2023-02-27 16:10:36,403 E 21786 21786] (raylet) node_manager.cc:3097: 3 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: 850165510f138284c87ee700799f16aee8e48ba17bcf2250ad51e35f, IP: 10.246.68.42) over the last time period. To see more information about the Workers killed on this node, use `ray logs raylet.out -ip 10.246.68.42`
[2m[33m(raylet)[0m 
[2m[33m(raylet)[0m Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. To adjust the kill threshold, set the environment variable `RAY_memory_usage_threshold` when starting Ray. To disable worker killing, set the environment variable `RAY_memory_monitor_refresh_ms` to zero.


[2m[36m(launch_and_evaluate pid=21893)[0m [Client 0] evaluate, config: {}


DEBUG flwr 2023-02-27 16:10:42,546 | server.py:179 | evaluate_round 2 received 2 results and 0 failures
DEBUG flwr 2023-02-27 16:10:42,547 | server.py:215 | fit_round 3: strategy sampled 2 clients (out of 2)


[2m[36m(launch_and_evaluate pid=21896)[0m [Client 1] evaluate, config: {}
[2m[36m(launch_and_fit pid=21896)[0m [Client 1] fit, config: {}
[2m[36m(launch_and_fit pid=21893)[0m [Client 0] fit, config: {}


DEBUG flwr 2023-02-27 16:10:50,741 | server.py:229 | fit_round 3 received 2 results and 0 failures
DEBUG flwr 2023-02-27 16:10:50,752 | server.py:165 | evaluate_round 3: strategy sampled 2 clients (out of 2)


[2m[36m(launch_and_fit pid=21893)[0m Epoch 1: train loss 0.05216120183467865, accuracy 0.3868888888888889
[2m[36m(launch_and_fit pid=21896)[0m Epoch 1: train loss 0.05108512565493584, accuracy 0.3957777777777778
[2m[36m(launch_and_evaluate pid=21893)[0m [Client 1] evaluate, config: {}
[2m[36m(launch_and_evaluate pid=21896)[0m [Client 0] evaluate, config: {}


DEBUG flwr 2023-02-27 16:10:55,966 | server.py:179 | evaluate_round 3 received 2 results and 0 failures
INFO flwr 2023-02-27 16:10:55,967 | server.py:144 | FL finished in 69.31255890000011
INFO flwr 2023-02-27 16:10:55,969 | app.py:202 | app_fit: losses_distributed [(1, 0.0634206577539444), (2, 0.054430290699005124), (3, 0.05187856781482697)]
INFO flwr 2023-02-27 16:10:55,971 | app.py:203 | app_fit: metrics_distributed {}
INFO flwr 2023-02-27 16:10:55,972 | app.py:204 | app_fit: losses_centralized []
INFO flwr 2023-02-27 16:10:55,975 | app.py:205 | app_fit: metrics_centralized {}


History (loss, distributed):
	round 1: 0.0634206577539444
	round 2: 0.054430290699005124
	round 3: 0.05187856781482697

## Build a Strategy from scratch

Let’s overwrite the `configure_fit` method such that it passes a higher learning rate (potentially also other hyperparameters) to the optimizer of a fraction of the clients. We will keep the sampling of the clients as it is in `FedAvg` and then change the configuration dictionary (one of the `FitIns` attributes).

In [7]:
from typing import Callable, Union

from flwr.common import (
    EvaluateIns,
    EvaluateRes,
    FitIns,
    FitRes,
    MetricsAggregationFn,
    NDArrays,
    Parameters,
    Scalar,
    ndarrays_to_parameters,
    parameters_to_ndarrays,
)
from flwr.server.client_manager import ClientManager
from flwr.server.client_proxy import ClientProxy
from flwr.server.strategy.aggregate import aggregate, weighted_loss_avg


class FedCustom(fl.server.strategy.Strategy):
    def __init__(
        self,
        fraction_fit: float = 1.0,
        fraction_evaluate: float = 1.0,
        min_fit_clients: int = 2,
        min_evaluate_clients: int = 2,
        min_available_clients: int = 2,
    ) -> None:
        super().__init__()
        self.fraction_fit = fraction_fit
        self.fraction_evaluate = fraction_evaluate
        self.min_fit_clients = min_fit_clients
        self.min_evaluate_clients = min_evaluate_clients
        self.min_available_clients = min_available_clients

    def __repr__(self) -> str:
        return "FedCustom"

    def initialize_parameters(
        self, client_manager: ClientManager
    ) -> Optional[Parameters]:
        """Initialize global model parameters."""
        net = Net()
        ndarrays = get_parameters(net)
        return fl.common.ndarrays_to_parameters(ndarrays)

    def configure_fit(
        self, server_round: int, parameters: Parameters, client_manager: ClientManager
    ) -> List[Tuple[ClientProxy, FitIns]]:
        """Configure the next round of training."""

        # Sample clients
        sample_size, min_num_clients = self.num_fit_clients(
            client_manager.num_available()
        )
        clients = client_manager.sample(
            num_clients=sample_size, min_num_clients=min_num_clients
        )

        # Create custom configs
        n_clients = len(clients)
        half_clients = n_clients // 2
        standard_config = {"lr": 0.001}
        higher_lr_config = {"lr": 0.003}
        fit_configurations = []
        for idx, client in enumerate(clients):
            if idx < half_clients:
                fit_configurations.append((client, FitIns(parameters, standard_config)))
            else:
                fit_configurations.append(
                    (client, FitIns(parameters, higher_lr_config))
                )
        return fit_configurations

    def aggregate_fit(
        self,
        server_round: int,
        results: List[Tuple[ClientProxy, FitRes]],
        failures: List[Union[Tuple[ClientProxy, FitRes], BaseException]],
    ) -> Tuple[Optional[Parameters], Dict[str, Scalar]]:
        """Aggregate fit results using weighted average."""

        weights_results = [
            (parameters_to_ndarrays(fit_res.parameters), fit_res.num_examples)
            for _, fit_res in results
        ]
        parameters_aggregated = ndarrays_to_parameters(aggregate(weights_results))
        metrics_aggregated = {}
        return parameters_aggregated, metrics_aggregated

    def configure_evaluate(
        self, server_round: int, parameters: Parameters, client_manager: ClientManager
    ) -> List[Tuple[ClientProxy, EvaluateIns]]:
        """Configure the next round of evaluation."""
        if self.fraction_evaluate == 0.0:
            return []
        config = {}
        evaluate_ins = EvaluateIns(parameters, config)

        # Sample clients
        sample_size, min_num_clients = self.num_evaluation_clients(
            client_manager.num_available()
        )
        clients = client_manager.sample(
            num_clients=sample_size, min_num_clients=min_num_clients
        )

        # Return client/config pairs
        return [(client, evaluate_ins) for client in clients]

    def aggregate_evaluate(
        self,
        server_round: int,
        results: List[Tuple[ClientProxy, EvaluateRes]],
        failures: List[Union[Tuple[ClientProxy, EvaluateRes], BaseException]],
    ) -> Tuple[Optional[float], Dict[str, Scalar]]:
        """Aggregate evaluation losses using weighted average."""

        if not results:
            return None, {}

        loss_aggregated = weighted_loss_avg(
            [
                (evaluate_res.num_examples, evaluate_res.loss)
                for _, evaluate_res in results
            ]
        )
        metrics_aggregated = {}
        return loss_aggregated, metrics_aggregated

    def evaluate(
        self, server_round: int, parameters: Parameters
    ) -> Optional[Tuple[float, Dict[str, Scalar]]]:
        """Evaluate global model parameters using an evaluation function."""

        # Let's assume we won't perform the global model evaluation on the server side.
        return None

    def num_fit_clients(self, num_available_clients: int) -> Tuple[int, int]:
        """Return sample size and required number of clients."""
        num_clients = int(num_available_clients * self.fraction_fit)
        return max(num_clients, self.min_fit_clients), self.min_available_clients

    def num_evaluation_clients(self, num_available_clients: int) -> Tuple[int, int]:
        """Use a fraction of available clients for evaluation."""
        num_clients = int(num_available_clients * self.fraction_evaluate)
        return max(num_clients, self.min_evaluate_clients), self.min_available_clients

The only thing left is to use the newly created custom Strategy `FedCustom` when starting the experiment:

In [8]:
fl.simulation.start_simulation(
    client_fn=client_fn,
    num_clients=2,
    config=fl.server.ServerConfig(num_rounds=3),
    strategy=FedCustom(),  # <-- pass the new strategy here
    client_resources=client_resources,
)

INFO flwr 2023-02-27 16:10:58,958 | app.py:145 | Starting Flower simulation, config: ServerConfig(num_rounds=3, round_timeout=None)
2023-02-27 16:11:17,094	INFO worker.py:1529 -- Started a local Ray instance. View the dashboard at [1m[32m127.0.0.1:8265 [39m[22m
INFO flwr 2023-02-27 16:11:27,684 | app.py:179 | Flower VCE: Ray initialized with resources: {'CPU': 8.0, 'node:10.246.68.42': 1.0, 'memory': 2242046363.0, 'object_store_memory': 1121023180.0}
INFO flwr 2023-02-27 16:11:27,694 | server.py:86 | Initializing global parameters
INFO flwr 2023-02-27 16:11:27,730 | server.py:266 | Using initial parameters provided by strategy
INFO flwr 2023-02-27 16:11:27,734 | server.py:88 | Evaluating initial parameters
INFO flwr 2023-02-27 16:11:27,736 | server.py:101 | FL starting
DEBUG flwr 2023-02-27 16:11:27,737 | server.py:215 | fit_round 1: strategy sampled 2 clients (out of 2)


[2m[36m(launch_and_fit pid=22814)[0m [Client 0] fit, config: {'lr': 0.003}
[2m[36m(launch_and_fit pid=22815)[0m [Client 1] fit, config: {'lr': 0.001}


DEBUG flwr 2023-02-27 16:11:49,905 | server.py:229 | fit_round 1 received 2 results and 0 failures
DEBUG flwr 2023-02-27 16:11:49,916 | server.py:165 | evaluate_round 1: strategy sampled 2 clients (out of 2)


[2m[36m(launch_and_fit pid=22814)[0m Epoch 1: train loss 0.06461510807275772, accuracy 0.23733333333333334
[2m[36m(launch_and_fit pid=22815)[0m Epoch 1: train loss 0.06464684754610062, accuracy 0.21733333333333332
[2m[36m(launch_and_evaluate pid=22814)[0m [Client 0] evaluate, config: {}
[2m[36m(launch_and_evaluate pid=22814)[0m [Client 1] evaluate, config: {}


DEBUG flwr 2023-02-27 16:12:02,792 | server.py:179 | evaluate_round 1 received 2 results and 0 failures
DEBUG flwr 2023-02-27 16:12:02,795 | server.py:215 | fit_round 2: strategy sampled 2 clients (out of 2)


[2m[36m(launch_and_fit pid=22813)[0m [Client 1] fit, config: {'lr': 0.001}
[2m[36m(launch_and_fit pid=22811)[0m [Client 0] fit, config: {'lr': 0.003}
[2m[36m(launch_and_fit pid=22813)[0m Epoch 1: train loss 0.056005481630563736, accuracy 0.3371111111111111


[2m[33m(raylet)[0m [2023-02-27 16:12:17,068 E 22710 22710] (raylet) node_manager.cc:3097: 4 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: 90e58021652c07462a827ecdbba8b29ac54ca7227c723a81612cf42c, IP: 10.246.68.42) over the last time period. To see more information about the Workers killed on this node, use `ray logs raylet.out -ip 10.246.68.42`
[2m[33m(raylet)[0m 
[2m[33m(raylet)[0m Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. To adjust the kill threshold, set the environment variable `RAY_memory_usage_threshold` when starting Ray. To disable worker killing, set the environment variable `RAY_memory_monitor_refresh_ms` to zero.


[2m[36m(launch_and_fit pid=22810)[0m [Client 0] fit, config: {'lr': 0.003}


DEBUG flwr 2023-02-27 16:12:24,933 | server.py:229 | fit_round 2 received 2 results and 0 failures
DEBUG flwr 2023-02-27 16:12:24,947 | server.py:165 | evaluate_round 2: strategy sampled 2 clients (out of 2)


[2m[36m(launch_and_fit pid=22810)[0m Epoch 1: train loss 0.05676985904574394, accuracy 0.33955555555555555
[2m[36m(launch_and_evaluate pid=22810)[0m [Client 0] evaluate, config: {}
[2m[36m(launch_and_evaluate pid=22813)[0m [Client 1] evaluate, config: {}


DEBUG flwr 2023-02-27 16:12:32,370 | server.py:179 | evaluate_round 2 received 2 results and 0 failures
DEBUG flwr 2023-02-27 16:12:32,372 | server.py:215 | fit_round 3: strategy sampled 2 clients (out of 2)
[2m[33m(raylet)[0m E0227 16:12:36.095384300   23983 socket_utils_common_posix.cc:223] check for SO_REUSEPORT: {"created":"@1677510756.095348300","description":"Protocol not available","errno":92,"file":"external/com_github_grpc_grpc/src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":202,"os_error":"Protocol not available","syscall":"getsockopt(SO_REUSEPORT)"}
[2m[33m(raylet)[0m E0227 16:12:36.212568300   23984 socket_utils_common_posix.cc:223] check for SO_REUSEPORT: {"created":"@1677510756.212524800","description":"Protocol not available","errno":92,"file":"external/com_github_grpc_grpc/src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":202,"os_error":"Protocol not available","syscall":"getsockopt(SO_REUSEPORT)"}
[2m[33m(raylet)[0m E0227 16:12:36.37235

[2m[36m(launch_and_fit pid=22810)[0m [Client 1] fit, config: {'lr': 0.001}
[2m[36m(launch_and_fit pid=22810)[0m Epoch 1: train loss 0.05133531987667084, accuracy 0.4011111111111111
[2m[36m(launch_and_fit pid=22807)[0m [Client 0] fit, config: {'lr': 0.003}


DEBUG flwr 2023-02-27 16:12:50,789 | server.py:229 | fit_round 3 received 2 results and 0 failures
DEBUG flwr 2023-02-27 16:12:50,803 | server.py:165 | evaluate_round 3: strategy sampled 2 clients (out of 2)


[2m[36m(launch_and_fit pid=22807)[0m Epoch 1: train loss 0.052270177751779556, accuracy 0.3968888888888889
[2m[36m(launch_and_evaluate pid=22807)[0m [Client 0] evaluate, config: {}
[2m[36m(launch_and_evaluate pid=22807)[0m [Client 1] evaluate, config: {}


DEBUG flwr 2023-02-27 16:12:58,111 | server.py:179 | evaluate_round 3 received 2 results and 0 failures
INFO flwr 2023-02-27 16:12:58,112 | server.py:144 | FL finished in 90.3753501000001
INFO flwr 2023-02-27 16:12:58,114 | app.py:202 | app_fit: losses_distributed [(1, 0.06192973351478577), (2, 0.055736652016639715), (3, 0.05282370221614838)]
INFO flwr 2023-02-27 16:12:58,117 | app.py:203 | app_fit: metrics_distributed {}
INFO flwr 2023-02-27 16:12:58,119 | app.py:204 | app_fit: losses_centralized []
INFO flwr 2023-02-27 16:12:58,123 | app.py:205 | app_fit: metrics_centralized {}


History (loss, distributed):
	round 1: 0.06192973351478577
	round 2: 0.055736652016639715
	round 3: 0.05282370221614838

[2m[33m(raylet)[0m [2023-02-27 16:13:17,085 E 22710 22710] (raylet) node_manager.cc:3097: 2 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: 90e58021652c07462a827ecdbba8b29ac54ca7227c723a81612cf42c, IP: 10.246.68.42) over the last time period. To see more information about the Workers killed on this node, use `ray logs raylet.out -ip 10.246.68.42`
[2m[33m(raylet)[0m 
[2m[33m(raylet)[0m Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. To adjust the kill threshold, set the environment variable `RAY_memory_usage_threshold` when starting Ray. To disable worker killing, set the environment variable `RAY_memory_monitor_refresh_ms` to zero.
[2m[33m(raylet)[0m [2023-02-27 16:14:17,086 E 22710 22710] (raylet) node_ma

## Recap

In this notebook, we’ve seen how to implement a custom strategy. A custom strategy enables granular control over client node configuration, result aggregation, and more. To define a custom strategy, you only have to overwrite the abstract methods of the (abstract) base class `Strategy`. To make custom strategies even more powerful, you can pass custom functions to the constructor of your new class (`__init__`) and then call these functions whenever needed. 

## Next steps

Before you continue, make sure to join the Flower community on Slack: [Join Slack](https://flower.dev/join-slack/)

There's a dedicated `#questions` channel if you need help, but we'd also love to hear who you are in `#introductions`!

The [Flower Federated Learning Tutorial - Part 4](https://flower.dev/docs/tutorial/Flower-4-Client-and-NumPyClient-PyTorch.html) introduces `Client`, the flexible API underlying `NumPyClient`.