# 标签平滑 label smoothing
* 本质上是修改标签的分布，将one-hot的**脉冲分布**改变为 **均匀分布+脉冲分布**
* 那改变为正态分布呢？如果**相邻标签之间是有关联的**，是可以改变为正态分布的，**个人认为**

## 参考资料
1. 公式+简单文字说明+代码实现 https://zhuanlan.zhihu.com/p/116466239
2. pytorch代码实现 直接上代码 https://stackoverflow.com/questions/55681502/label-smoothing-in-pytorch
3. 搬运翻译 hinton的解释 https://zhuanlan.zhihu.com/p/101553787

## 代码演示

In [48]:
import torch
import torch.nn.functional as F

In [49]:
# one-hot label 每个样本只需要给出对应label的index
# 等价于 [[0,1,0], [1,0,0]]
targets = torch.tensor([1,0]) # batch_size=2  num_class = 3 

In [42]:
targets

tensor([1, 0])

In [50]:
targets.size(0) # batch_size =2

2

In [43]:
torch.empty(size=(targets.size(0), 3))

tensor([[5.4128e+22, 2.6222e-09, 1.6987e-07],
        [1.3667e+22, 2.6589e+23, 1.0721e-08]])

In [31]:
targets.data.unsqueeze(1)

tensor([[1],
        [0]])

In [58]:
smoothing=0.1 # epsilon=0.1
num_class = 3
# 对标签做平滑
targets_ = torch.empty(size=(targets.size(0), num_class)).fill_(smoothing /(num_class-1))\
                        .scatter_(1, targets.data.unsqueeze(1), 1.-smoothing)

In [59]:
# 平滑后的标签
targets_

tensor([[0.0500, 0.9000, 0.0500],
        [0.9000, 0.0500, 0.0500]])

In [60]:
# 造数据 做为输入 用于演示
inputs = torch.tensor([[1,2,3], [4,2,1]], dtype=torch.float)

In [33]:
lsm = F.log_softmax(inputs, -1) # 计算对数softmax

In [61]:
lsm # LogSoftMax 的首字母 

tensor([[-2.4076, -1.4076, -0.4076],
        [-0.1698, -2.1698, -3.1698]])

In [63]:
-(targets_ * lsm).sum(-1) # 计算交叉熵 

tensor([1.4076, 0.4198])

## 参考资料2对应的实现

In [None]:
import torch
from torch.nn.modules.loss import _WeightedLoss
import torch.nn.functional as F

class SmoothCrossEntropyLoss(_WeightedLoss):
    def __init__(self, weight=None, reduction='mean', smoothing=0.0):
        super().__init__(weight=weight, reduction=reduction)
        self.smoothing = smoothing  0.1
        self.weight = weight 
        self.reduction = reduction 

    @staticmethod
    def _smooth_one_hot(targets:torch.Tensor, n_classes:int, smoothing=0.0):
        assert 0 <= smoothing < 1
        with torch.no_grad():
            targets = torch.empty(size=(targets.size(0), n_classes),
                    device=targets.device) \
                .fill_(smoothing /(n_classes-1)) \
                .scatter_(1, targets.data.unsqueeze(1), 1.-smoothing)
        return targets

    def forward(self, inputs, targets):
        targets = SmoothCrossEntropyLoss._smooth_one_hot(targets, inputs.size(-1),
            self.smoothing)
        lsm = F.log_softmax(inputs, -1)

        if self.weight is not None:
            lsm = lsm * self.weight.unsqueeze(0)

        loss = -(targets * lsm).sum(-1) # jao

        if  self.reduction == 'sum':
            loss = loss.sum()
        elif  self.reduction == 'mean':
            loss = loss.mean()

        return loss

## 总结
* 对于新提出的loss，去看对应的数学公式，是能最快速去明白的
* 看完数学公式，看代码实现