Pytorch常用的损失函数
===

# 1.L1Loss
$$Loss(y_i,\hat{y_i})=|y_i-\hat{y_i}|$$
其实要求维度要一样（可以是向量或者矩阵），得到的 loss 维度也是对应一样的

In [1]:
import torch
loss_fn = torch.nn.L1Loss(reduce=False, size_average=False)
input = torch.autograd.Variable(torch.randn(3,4))
target = torch.autograd.Variable(torch.randn(3,4))
loss = loss_fn(input, target)
print(input); print(target); print(loss)
print(input.size(), target.size(), loss.size())

tensor([[ 1.2627,  0.6468, -0.1317,  0.5093],
        [-0.9002, -0.0177, -0.4270, -1.3892],
        [-2.4988, -1.0690,  1.7272,  0.3021]])
tensor([[ 0.8076,  1.8310, -0.3598,  1.2736],
        [-0.2235,  1.1021, -0.4336, -1.9632],
        [-0.1213,  0.6735, -0.6558, -1.0412]])
tensor([[ 0.4551,  1.1842,  0.2281,  0.7643],
        [ 0.6766,  1.1198,  0.0066,  0.5740],
        [ 2.3775,  1.7425,  2.3830,  1.3433]])
torch.Size([3, 4]) torch.Size([3, 4]) torch.Size([3, 4])


# 2.SmoothL1Loss
误差在 (-1,1) 上是平方损失，其他情况是L1损失
$$Loss(y_i,\hat{y_i})=\begin{cases}
\frac{1}{2}(y_i-\hat{y_i})^2 & & |y_i-\hat{y_i}|=1\\
|y_i-y_i|-\frac{1}{2} & &otherwise
\end{cases}$$

In [2]:
import torch
loss_fn = torch.nn.SmoothL1Loss(reduce=False, size_average=False)
input = torch.autograd.Variable(torch.randn(3,4))
target = torch.autograd.Variable(torch.randn(3,4))
loss = loss_fn(input, target)
print(input); print(target); print(loss)
print(input.size(), target.size(), loss.size())

tensor([[ 0.0061, -2.6850, -1.4937, -0.3037],
        [-0.6357, -0.9975, -0.9051, -1.2309],
        [-0.5110, -1.2267, -0.0272, -0.9562]])
tensor([[ 0.2577,  0.6965,  0.0527, -0.8612],
        [-0.1864,  0.1997,  1.2710, -3.0299],
        [-0.7090,  0.0409,  0.2179, -0.4879]])
tensor([[ 0.0316,  2.8815,  1.0465,  0.1554],
        [ 0.1010,  0.6972,  1.6761,  1.2989],
        [ 0.0196,  0.7676,  0.0301,  0.1096]])
torch.Size([3, 4]) torch.Size([3, 4]) torch.Size([3, 4])


# 3.MSELoss
均方损失函数
$$Loss(y_i,\hat{y_i})=(y_i-\hat{y_i})^2$$

In [3]:
import torch
loss_fn = torch.nn.MSELoss(reduce=False, size_average=False)
input = torch.autograd.Variable(torch.randn(3,4))
target = torch.autograd.Variable(torch.randn(3,4))
loss = loss_fn(input, target)
print(input); print(target); print(loss)
print(input.size(), target.size(), loss.size())

tensor([[-0.3953, -0.2969, -0.1650,  0.6222],
        [-0.3519, -0.5679, -0.7107,  2.1467],
        [ 0.3295, -1.0067, -0.2704,  0.7086]])
tensor([[-0.5145, -0.2495,  0.1167, -1.5256],
        [-2.7255, -0.7890, -0.0175,  0.9011],
        [ 1.1331,  0.4010, -1.9795,  0.3350]])
tensor([[ 0.0142,  0.0022,  0.0793,  4.6133],
        [ 5.6340,  0.0489,  0.4806,  1.5515],
        [ 0.6457,  1.9818,  2.9209,  0.1395]])
torch.Size([3, 4]) torch.Size([3, 4]) torch.Size([3, 4])


# 4.BCELoss
二分类用的交叉损失函数，也可以用于多分类。需要在前面加上Sigmoid激活函数。其target也就是y值需要进行one hot编码，另外BCELoss还可以用于Multi-label classification。我们知道离散版的交叉熵定义为
$$H(p,q)=-\sum_ip_ilog_2q_i$$
其中p,q都是向量，且都是概率分布。如果是二分类的话，因为只有正例和反例，且两者的概率和为1，那么只需要预测一个概率就好了，因此可以简化成
$$Loss(y_i,\hat{y_i})=-\omega_i[\hat{y_i}log_2y_i+(1-\hat{y_i})log_2(1-y_i)]$$

In [5]:
import torch
import torch.nn.functional as F
loss_fn = torch.nn.BCELoss(reduce=False, size_average=False)
input = torch.autograd.Variable(torch.randn(3, 4))
target = torch.autograd.Variable(torch.FloatTensor(3, 4).random_(2))
loss = loss_fn(F.sigmoid(input), target)
print(input); print(target); print(loss)

tensor([[ 0.3046, -0.0491, -1.1262, -2.1335],
        [-1.6867,  0.9672, -0.1571,  0.1454],
        [ 0.1643,  1.9178, -0.3885,  0.1484]])
tensor([[ 0.,  1.,  0.,  0.],
        [ 1.,  0.,  1.,  0.],
        [ 0.,  1.,  0.,  1.]])
tensor([[ 0.8570,  0.7180,  0.2808,  0.1119],
        [ 1.8565,  1.2894,  0.7748,  0.7685],
        [ 0.7787,  0.1371,  0.5176,  0.6217]])


# 5.BCEWithLogitsLoss
结合了Sigmoid和BCELoss，数值结果更加稳定

# 6.CrossEntropyLoss
多分类用的交叉熵损失函数，用这个loss前面不需要加Softmax层。它相当于LogSoftMax + NLLLoss

## 6.1.交叉熵损失
在[文档](00.基础知识.05.其它-07.熵与KL散度.ipynb)中介绍了交叉熵损失，一般来说交叉熵损失可以用来分类，也可以用于语义分割，对于分类问题，其输出层通常为Sigmoid或Softmax，当然也有可能直接输出加权之后的。

## 6.2.Softmax
假设有K个类别，Softmax的计算过程为
$$\sigma(z)_j=\frac{e^{z_j}}{\sum_{k=1}^Ke^{z_j}},j=0,1,...,k-1$$
![images](images/00_07_02_001.png)<br/>
softMax的结果相当于输入图像被分到每个标签的概率分布，该函数是单调增函数，即输入值越大，输出也就越大，输入图像属于该标签的概率就越大。

# 7.NLLLoss-Negative Log Likelihood
用于多分类的负对数似然损失函数
$$Loss(x,label)=-X_{label}$$
如果需要得到log分布，则需要在网络的最后一层加上LogSoftmax