# 基于 PyTorch 的联邦学习自定义 loss function 教程
## 引言
### 背景
在联邦学习中，尤其是监督学习中，我们常常需要使用损失函数监督模型的训练；通过之前的[入门教程](https://www.secretflow.org.cn/docs/secretflow/latest/zh-Hans/tutorial/Federated_Learning_with_Pytorch_backend), 我们已经展示如何通过 `secretflow.ml.nn.utils.TorchModel` 调用 `torch.nn.CrossEntropyLoss` ，依此类推，我们可以调用 [torch.nn loss function](https://pytorch.org/docs/stable/nn.html#loss-functions) 中的任意损失函数。然而，当我们需要根据自己的任务自定义损失函数时，需要怎样做呢？本教程将回答这一问题。
### 教程提醒
注意，本自定义 loss function 教程主要关注输入形式为$(\hat{y},y)$的损失函数，而不讨论超出此范围的自定义损失函数。
具体到本教程，本教程将给出如何自定义实现
$$
Loss(\hat{y},y) = 0.8*CEL(\hat{y},y) + 0.2*MSE(\hat{y},y)
$$
其中，$CEL$ 表示 [cross entropy loss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss) ，$MSE$ 表示[mean squared error](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss)

再度提醒，本教程只是作为教程示例，展示代码的实现，而不作为实际生产应用的模型训练指导。

让我们开始吧！

## 基础教程
为突出重点，简化教程，本教程将以 [使用Pytorch后端来进行联邦学习](https://www.secretflow.org.cn/docs/secretflow/latest/zh-Hans/tutorial/Federated_Learning_with_Pytorch_backend) 为基础，重点突出自定义损失函数的做法。所以，为了让代码能够顺利运行，让我们先把之前的代码复制过来。因此如果您对原教程非常熟悉，则不需要再阅读这部分代码。

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import secretflow as sf

# Check the version of your SecretFlow
print('The version of SecretFlow: {}'.format(sf.__version__))

# In case you have a running secretflow runtime already.
sf.shutdown()

sf.init(['alice', 'bob', 'charlie'], address='local')
alice, bob, charlie = sf.PYU('alice'), sf.PYU('bob'), sf.PYU('charlie')

The version of SecretFlow: 1.4.0.dev20231225


2023-12-29 02:56:32,404	INFO worker.py:1538 -- Started a local Ray instance.


In [3]:
from secretflow.ml.nn.utils import BaseModule, TorchModel
from secretflow.ml.nn.fl.utils import metric_wrapper, optim_wrapper
from secretflow.ml.nn import FLModel
from torchmetrics import Accuracy, Precision
from secretflow.security.aggregation import SecureAggregator
from secretflow.utils.simulation.datasets import load_mnist
from torch import nn, optim
from torch.nn import functional as F

2023-12-29 02:56:34.323341: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /content/conda-env/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-12-29 02:56:35.136013: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /content/conda-env/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-12-29 02:56:35.136079: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /content/conda-env/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64


In [4]:
class ConvNet(BaseModule):
    """Small ConvNet for MNIST."""

    def __init__(self):
        super(ConvNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 3, kernel_size=3)
        self.fc_in_dim = 192
        self.fc = nn.Linear(self.fc_in_dim, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 3))
        x = x.view(-1, self.fc_in_dim)
        x = self.fc(x)
        return F.softmax(x, dim=1)

## 自定义损失函数
如前所述，我们将自定义损失函数：
$$
Loss(\hat{y},y) = 0.8*CEL(\hat{y},y) + 0.2*MSE(\hat{y},y)
$$
其中，$CEL$ 表示 [cross entropy loss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss)，$MSE$ 表示 [mean squared error](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss)

为实现这一个自定义损失函数，我们需要自行编写一个继承自 [torch.nn.module](https://github.com/pytorch/pytorch/tree/main/torch/nn/modules) 的类，而且至少实现两个基础的函数：`__init__` 和 `forward`，其中:
- `__init__` 执行该类的初始化部分代码，本教程我们对基础损失函数 `CrossEntropyLoss` 和 `MSELoss` 进行了初始化的操作
- `forward`  执行该类的调用时的运算代码，也就是自定义损失函数的运算逻辑，此处我们对上面所提及的自定义函数进行了实现

### 实现自定义函数

In [5]:
class CustomLossFunction(nn.Module):
    def __init__(self, *args, **kwargs) -> None:
        super().__init__(*args, **kwargs)
        self.cross_entropy_loss = nn.CrossEntropyLoss(reduction='sum')
        self.mse_loss = nn.MSELoss()

    def forward(self, input, target):
        return 0.8 * self.cross_entropy_loss(input, target) + 0.2 * self.mse_loss(
            input, target
        )

### 指定自定义函数
在下面的单元格里，我们通过
``
loss_fn = CustomLossFunction
``
指定我们自定义的损失函数。

In [6]:
(train_data, train_label), (test_data, test_label) = load_mnist(
    parts={alice: 0.4, bob: 0.6},
    normalized_x=True,
    categorical_y=True,
    is_torch=True,
)

# here we use the loss function we defined above
loss_fn = CustomLossFunction

optim_fn = optim_wrapper(optim.Adam, lr=1e-2)
model_def = TorchModel(
    model_fn=ConvNet,
    loss_fn=loss_fn,
    optim_fn=optim_fn,
    metrics=[
        metric_wrapper(Accuracy, task="multiclass", num_classes=10, average='micro'),
        metric_wrapper(Precision, task="multiclass", num_classes=10, average='micro'),
    ],
)

## 剩余的代码
我们将原教程的代码继续复制过来，以展现代码的顺利运行。

In [7]:
device_list = [alice, bob]
server = charlie
aggregator = SecureAggregator(server, [alice, bob])

# spcify params
fl_model = FLModel(
    server=server,
    device_list=device_list,
    model=model_def,
    aggregator=aggregator,
    strategy='fed_avg_w',  # fl strategy
    backend="torch",  # backend support ['tensorflow', 'torch']
)

INFO:root:Create proxy actor <class 'secretflow.security.aggregation.secure_aggregator._Masker'> with party alice.
INFO:root:Create proxy actor <class 'secretflow.security.aggregation.secure_aggregator._Masker'> with party bob.
INFO:root:Create proxy actor <class 'secretflow.ml.nn.fl.backend.torch.strategy.fed_avg_w.PYUFedAvgW'> with party alice.
INFO:root:Create proxy actor <class 'secretflow.ml.nn.fl.backend.torch.strategy.fed_avg_w.PYUFedAvgW'> with party bob.


In [8]:
history = fl_model.fit(
    train_data,
    train_label,
    validation_data=(test_data, test_label),
    epochs=5,
    batch_size=32,
    aggregate_freq=1,
)

INFO:root:FL Train Params: {'x': FedNdarray(partitions={PYURuntime(alice): <secretflow.device.device.pyu.PYUObject object at 0x7f5e1fed9580>, PYURuntime(bob): <secretflow.device.device.pyu.PYUObject object at 0x7f5e1fed90a0>}, partition_way=<PartitionWay.HORIZONTAL: 'horizontal'>), 'y': FedNdarray(partitions={PYURuntime(alice): <secretflow.device.device.pyu.PYUObject object at 0x7f5e1ff25fa0>, PYURuntime(bob): <secretflow.device.device.pyu.PYUObject object at 0x7f5e1ff25df0>}, partition_way=<PartitionWay.HORIZONTAL: 'horizontal'>), 'batch_size': 32, 'batch_sampling_rate': None, 'epochs': 5, 'verbose': 1, 'callbacks': None, 'validation_data': (FedNdarray(partitions={PYURuntime(alice): <secretflow.device.device.pyu.PYUObject object at 0x7f5e1fef4910>, PYURuntime(bob): <secretflow.device.device.pyu.PYUObject object at 0x7f5e1fef4a90>}, partition_way=<PartitionWay.HORIZONTAL: 'horizontal'>), FedNdarray(partitions={PYURuntime(alice): <secretflow.device.device.pyu.PYUObject object at 0x7f5e1

Epoch 1/5


2023-12-29 02:57:12.484149: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /content/conda-env/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-12-29 02:57:12.484223: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /content/conda-env/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-12-29 02:57:12.484273: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /content/conda-env/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-12-29 02:

Epoch 2/5
[2m[36m(PYUFedAvgW pid=479273)[0m {'train-loss': 38.253143310546875, 'train_multiclassaccuracy': tensor(0.0122), 'train_multiclassprecision': tensor(0.0122), 'val_eval_multiclassaccuracy': tensor(0.0206), 'val_eval_multiclassprecision': tensor(0.0206)}
[2m[36m(PYUFedAvgW pid=479314)[0m {'train-loss': 58.460208892822266, 'train_multiclassaccuracy': tensor(0.0120), 'train_multiclassprecision': tensor(0.0120), 'val_eval_multiclassaccuracy': tensor(0.0296), 'val_eval_multiclassprecision': tensor(0.0296)}


Train Processing: : 100%|█████████▉| 749/750 [00:33<00:00, 22.47it/s, {'multiclassaccuracy': 0.031749304, 'multiclassprecision': 0.031749304, 'val_multiclassaccuracy': 0.03495, 'val_multiclassprecision': 0.03495}]
Train Processing: :   0%|          | 3/750 [00:00<00:27, 26.89it/s]

Epoch 3/5
[2m[36m(PYUFedAvgW pid=479273)[0m {'train-loss': 38.20553970336914, 'train_multiclassaccuracy': tensor(0.0319), 'train_multiclassprecision': tensor(0.0319), 'val_eval_multiclassaccuracy': tensor(0.0307), 'val_eval_multiclassprecision': tensor(0.0307)}
[2m[36m(PYUFedAvgW pid=479314)[0m {'train-loss': 58.438533782958984, 'train_multiclassaccuracy': tensor(0.0316), 'train_multiclassprecision': tensor(0.0316), 'val_eval_multiclassaccuracy': tensor(0.0392), 'val_eval_multiclassprecision': tensor(0.0392)}


Train Processing: : 100%|█████████▉| 749/750 [00:36<00:00, 20.28it/s, {'multiclassaccuracy': 0.038556945, 'multiclassprecision': 0.038556945, 'val_multiclassaccuracy': 0.0396, 'val_multiclassprecision': 0.0396}]
Train Processing: :   1%|          | 4/750 [00:00<00:21, 34.18it/s]

Epoch 4/5
[2m[36m(PYUFedAvgW pid=479273)[0m {'train-loss': 38.216060638427734, 'train_multiclassaccuracy': tensor(0.0387), 'train_multiclassprecision': tensor(0.0387), 'val_eval_multiclassaccuracy': tensor(0.0353), 'val_eval_multiclassprecision': tensor(0.0353)}
[2m[36m(PYUFedAvgW pid=479314)[0m {'train-loss': 59.388458251953125, 'train_multiclassaccuracy': tensor(0.0384), 'train_multiclassprecision': tensor(0.0384), 'val_eval_multiclassaccuracy': tensor(0.0439), 'val_eval_multiclassprecision': tensor(0.0439)}


Train Processing: : 100%|█████████▉| 749/750 [00:42<00:00, 17.48it/s, {'multiclassaccuracy': 0.04316111, 'multiclassprecision': 0.04316111, 'val_multiclassaccuracy': 0.04484167, 'val_multiclassprecision': 0.04484167}]
Train Processing: :   0%|          | 3/750 [00:00<00:25, 29.18it/s]

Epoch 5/5
[2m[36m(PYUFedAvgW pid=479273)[0m {'train-loss': 38.23821258544922, 'train_multiclassaccuracy': tensor(0.0430), 'train_multiclassprecision': tensor(0.0430), 'val_eval_multiclassaccuracy': tensor(0.0397), 'val_eval_multiclassprecision': tensor(0.0397)}
[2m[36m(PYUFedAvgW pid=479314)[0m {'train-loss': 57.66978454589844, 'train_multiclassaccuracy': tensor(0.0434), 'train_multiclassprecision': tensor(0.0434), 'val_eval_multiclassaccuracy': tensor(0.0500), 'val_eval_multiclassprecision': tensor(0.0500)}


Train Processing: : 100%|█████████▉| 749/750 [00:45<00:00, 16.36it/s, {'multiclassaccuracy': 0.04427014, 'multiclassprecision': 0.04427014, 'val_multiclassaccuracy': 0.045062497, 'val_multiclassprecision': 0.045062497}]


[2m[36m(PYUFedAvgW pid=479273)[0m {'train-loss': 38.21043014526367, 'train_multiclassaccuracy': tensor(0.0442), 'train_multiclassprecision': tensor(0.0442), 'val_eval_multiclassaccuracy': tensor(0.0402), 'val_eval_multiclassprecision': tensor(0.0402)}
[2m[36m(PYUFedAvgW pid=479314)[0m {'train-loss': 57.23878860473633, 'train_multiclassaccuracy': tensor(0.0443), 'train_multiclassprecision': tensor(0.0443), 'val_eval_multiclassaccuracy': tensor(0.0499), 'val_eval_multiclassprecision': tensor(0.0499)}


## 小节
通过本教程，我们将学会如何基于 PyTorch 在SecretFlow 中自定义实现输入形式为 $(\hat{y},y)$ 的损失函数。