[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/itmorn/AI.handbook/blob/main/DL/torch/nn/LossFunction/CrossEntropyLoss.ipynb)

# CrossEntropyLoss
计算input logits和目标之间的交叉熵损失。

以单个样本简单来说，CrossEntropyLoss的input是长度为num_classes的向量，先对input做Softmax转换，得到logits，再计算损失 $-\sum{target*logits}$ 。

如果是多个样本，我们知道，每个样本可以算出一个自己的loss，然后对多个样本进行reduce便可得到一个数值，reduce的方式可以通过reduction参数指定。  

如果是类别不平衡的时候，可以通过weight对损失做加权。

如果想做标签平滑可以使用label_smoothing参数。

**定义**：  
torch.nn.CrossEntropyLoss(weight=None, ignore_index=-100, reduction='mean', label_smoothing=0.0)

**参数**:  
- weight (Tensor, optional) – a manual rescaling weight given to each class. If given, has to be a Tensor of size C  给每个类一个手动缩放的权重。如果给定，必须是一个大小为C的张量

- ignore_index (int, optional) – Specifies a target value that is ignored and does not contribute to the input gradient. When size_average is True, the loss is averaged over non-ignored targets. Note that ignore_index is only applicable when the target contains class indices.  指定一个目标值，该目标值被忽略，不影响输入梯度。

- reduction (str, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'.   指定应用于输出的缩减:'none' | 'mean' | 'sum'。

- label_smoothing (float, optional) – A float in [0.0, 1.0]. Specifies the amount of smoothing when computing the loss, where 0.0 means no smoothing. The targets become a mixture of the original ground truth and a uniform distribution as described in Rethinking the Inception Architecture for Computer Vision. Default: 0.0.  标签平滑


# 图解CrossEntropyLoss
<p align="center">
<img src="./imgs/CrossEntropyLoss.svg"
    width="2000" /></p>


In [73]:
# 单个样本简单举例
import torch
import torch.nn as nn
torch.manual_seed(666)

loss = nn.CrossEntropyLoss(reduction='mean')
N = 1
C = 5 #num_classes
input = torch.randn(N, C, requires_grad=True)

target = torch.empty(N, dtype=torch.long).random_(5)  # 支持target是类别索引的方式
# target = torch.tensor([[0.0, 1.0, 0.0, 0.0, 0.0]])  # 也支持target是onehot的方式
print("input:\n", input, "\n")
print("target:\n", target, "\n")

output = loss(input, target)
print("output:\n", output, "\n")
output.backward()

-torch.log(input.softmax(dim=1)[0][1])  # 可以看到，和调包结果一致


input:
 tensor([[-2.1188,  0.0635, -1.4555, -0.0126, -0.1548]], requires_grad=True) 

target:
 tensor([1]) 

output:
 tensor(1.1192, grad_fn=<NllLossBackward0>) 



tensor(1.1192, grad_fn=<NegBackward0>)

In [226]:
# 演示label_smoothing
import torch
import torch.nn as nn
torch.manual_seed(666)

label_smoothing = 0.5
N = 1
C = 5
input = torch.randn(N, C, requires_grad=True)
onehot_target = torch.tensor([0, 1, 0, 0, 0])
print("input:\n", input, "\n")
# print("onehot_target:\n", onehot_target, "\n")

loss = nn.CrossEntropyLoss(reduction='sum', label_smoothing=label_smoothing)
# new_onehot_labels = onehot_labels * (1 - label_smoothing) + label_smoothing / num_classes
# 值为1的元素减去label_smoothing，然后把label_smoothing再平分给每个元素

output = loss(input, target)
print("output:\n", output, "\n")
output.backward()

new_onehot_labels = onehot_target * (1 - label_smoothing) + label_smoothing / C
# print("weight:\n", weight, "\n")
print("new_onehot_labels:\n", new_onehot_labels, "\n")

-(input.softmax(dim=1)[0].log()*new_onehot_labels).sum()  # 可以看到，和调包结果一致

input:
 tensor([[-2.1188,  0.0635, -1.4555, -0.0126, -0.1548]], requires_grad=True) 

output:
 tensor(1.5187, grad_fn=<AddBackward0>) 

new_onehot_labels:
 tensor([0.1000, 0.6000, 0.1000, 0.1000, 0.1000]) 



tensor(1.5187, grad_fn=<NegBackward0>)

In [229]:
# 演示weight
import torch
import torch.nn as nn
torch.manual_seed(666)

N = 1
C = 5
input = torch.randn(N, C, requires_grad=True)
weight = torch.tensor([4.0, 3.0, 2.0, 1.0, 1.0])
onehot_target = torch.tensor([0, 1, 0, 0, 0])
print("input:\n", input, "\n")
# print("onehot_target:\n", onehot_target, "\n")

loss = nn.CrossEntropyLoss(weight=weight, reduction='sum', label_smoothing=0)
# new_onehot_labels = onehot_labels * (1 - label_smoothing) + label_smoothing / num_classes
# 值为1的元素减去label_smoothing，然后把label_smoothing再平分给每个元素


output = loss(input, target)
print("output:\n", output, "\n")
output.backward()


-(input.softmax(dim=1)[0].log()*onehot_target*weight).sum()  # 可以看到，和调包结果一致

input:
 tensor([[-2.1188,  0.0635, -1.4555, -0.0126, -0.1548]], requires_grad=True) 

output:
 tensor(3.3575, grad_fn=<NllLossBackward0>) 



tensor(3.3575, grad_fn=<NegBackward0>)

In [243]:
# 演示label_smoothing + weight
import torch
import torch.nn as nn
torch.manual_seed(666)

label_smoothing = 0.5
N = 1
C = 5
input = torch.randn(N, C, requires_grad=True)
weight = torch.tensor([3.0, 2.0, 2.0, 1.0, 1.0])
onehot_target = torch.tensor([0, 1, 0, 0, 0])
print("input:\n", input, "\n")
print("weight:\n", weight, "\n")


loss = nn.CrossEntropyLoss(
    weight=weight, reduction='sum', label_smoothing=label_smoothing)
# new_onehot_labels = onehot_labels * (1 - label_smoothing) + label_smoothing / num_classes
# 值为1的元素减去label_smoothing，然后把label_smoothing再平分给每个元素

output = loss(input, target)
print("output:\n", output, "\n")
output.backward()

new_onehot_labels = onehot_target * (1 - label_smoothing) + label_smoothing / C
# print("weight:\n", weight, "\n")
print("new_onehot_labels:\n", new_onehot_labels, "\n")

-(input.softmax(dim=1)[0].log()*new_onehot_labels*weight).sum()  # 可以看到，和调包结果一致

input:
 tensor([[-2.1188,  0.0635, -1.4555, -0.0126, -0.1548]], requires_grad=True) 

weight:
 tensor([3., 2., 2., 1., 1.]) 

output:
 tensor(3.1143, grad_fn=<AddBackward0>) 

new_onehot_labels:
 tensor([0.1000, 0.6000, 0.1000, 0.1000, 0.1000]) 



tensor(3.1143, grad_fn=<NegBackward0>)