# 机器学习中的loss总结

## 分类loss

## 排序loss



In [3]:
import paddle
import paddle.nn as nn
import paddle.nn.functional as F

## reduction 选 sum or mean
各种loss中都有一个reduction参数，一般有三个选项: None, sum, mean，这个参数的含义是对一个batch的每个样本对于的loss进行求和或者取平均，默认是取平均。取平均可以规避对batch size的依赖，从而不需要根据batch_size的设定来调整learning_rate的大小。看下数学分析:\
比如，MSEloss

$$
Loss=\begin{cases}
\sum_{i=1}^N (\hat{y_i}-y_i)^2,  & reduction=sum \\
\frac{1}{N}\sum_{i=1}^N (\hat{y_i}-y_i)^2, & reduction=mean
\end{cases}
$$
其中batch_size=N, $\hat{y_i}=f(x_i)$, $x_i$是第 $i$ 个样本，f($\cdot$)是模型。\
求偏导\
$$
\frac{\partial{Loss}}{\partial{X}}=\begin{cases}
\sum_{i=1}^N 2*(\hat{y_i}-y_i)*\frac{\partial{\hat{y_i}}}{\partial{x_i}},  & reduction=sum \\
\frac{1}{N}\sum_{i=1}^N 2*(\hat{y_i}-y_i)*\frac{\partial{\hat{y_i}}}{\partial{x_i}}, & reduction=mean
\end{cases}
$$
可以看出，reduction=sum时，偏导会受batch_size影响,当reduction=mean时，偏导基本不受batch_size影响，因为平均之后，均值基本偏差不大。所以默认的reduction选mean。\
代码验证：

In [25]:
# setup
batch_size=128
feature_num = 10
model = nn.Linear(feature_num, 1)
x = paddle.randn([batch_size, feature_num])
y = paddle.randn([batch_size, 1])

# mean
criterion = nn.MSELoss(reduction='mean')
out = model(x)
loss = criterion(out, y)
loss.backward()
print(model.weight.grad.abs().sum())
# batch_size=10 跑10次
# 12.3  12.4  6.8  5.9  14  6  9  9  5.9  23.6
# batch_size=128 跑10次
# 9   9   4.8   10.7   5.7  10  8.4  7.4  8.6

# sum
model.clear_gradients()
criterion = nn.MSELoss(reduction='sum')
out = model(x)
loss = criterion(out, y)
loss.backward()
print(model.weight.grad.abs().sum())
# batch_size=10 跑10次 
# 122.9  123.6  68  58.9  143  61  91  90  58.6  236.6
# batch_size=128 跑10次
# 1160   1182   619   1380  732  1366.7  1076  956  1107

Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=False,
       [8.65152359])
Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=False,
       [1107.39501953])


通过上面的实验可以看出:\
<font color=red>**reduction=mean 可以使loss不受batch_size变化的影响，使梯度的更新更稳定，进而不需根据batch_size调整learning_rate**</font>
