## 交叉熵 和 CE loss 解惑

Cross-entropy [[wiki]](https://en.wikipedia.org/wiki/Cross-entropy)</br>
```In information theory, the cross-entropy between two probability distributions p and  q over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distribution q, rather than the true distribution displaystyle p.```


In [103]:
import torch
from torch import nn

In [104]:
# 原信息论公式实现
def cross_entropy(p, q):
    '''
    p: the true distribution p
    q: an estimated probability distribution q
    '''
    return torch.sum(p * torch.log(q)) / p.shape[0]

def ce_loss(p, logits):
    q = nn.Softmax()(logits)
    return -1 * cross_entropy(p,q)


In [105]:
class CrossEntropyLoss(nn.Module):
    def forward(self, logits, labels):
        batch_size = logits.shape[0]
        y = logits - torch.max(logits, dim=1, keepdim=True)[0] # 防止exp(x)数值溢出  [B, 1]

        lse = torch.log(torch.sum(torch.exp(y), dim=1, keepdim=True)) # [B,1]
        zy =  torch.sum(y*labels, dim=1, keepdim=True) # [B,1]
        return torch.sum(lse-zy) / batch_size

In [106]:
class NewCrossEntropyLoss(nn.Module):
    def forward(self, logits, labels):
        batch_size = logits.shape[0]
        y = logits - torch.max(logits, dim=1, keepdim=True)[0] # 防止exp(x)数值溢出  [B, 1]

        lse = torch.log(torch.sum(torch.exp(y), dim=1, keepdim=True)) # [B,1]
        return -1 * torch.sum(labels * (y-lse)) / batch_size

In [107]:
class New2CrossEntropyLoss(nn.Module):
    def forward(self, logits, labels):
        batch_size = logits.shape[0]
        y = logits - torch.max(logits, dim=1, keepdim=True)[0] # 防止exp(x)数值溢出  [B, 1]

        lse = torch.log(torch.sum(torch.exp(y), dim=1, keepdim=True)) # [B,1]
        zy =  torch.sum(y*labels, dim=1, keepdim=True) # [B,1]
        # return torch.sum(lse*labels - zy) / batch_size # 错？？好离谱啊
        return torch.sum( torch.sum(lse*labels, dim=1, keepdim=True) - zy) / batch_size # 对
        # return torch.sum((lse-y)*labels) / batch_size # 对

In [108]:
# Example of target with class probabilities
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5).softmax(dim=1)

out_pytroch = nn.CrossEntropyLoss()(input, target)
out_my = CrossEntropyLoss()(input, target)
out_new = NewCrossEntropyLoss()(input, target)
out_new2 = New2CrossEntropyLoss()(input, target)
out_true = ce_loss(target, input)

print('true CE loss: {:.4f}\
      \npytorch CE loss: {:.4f}\
        \nmy CE loss: {:.4f}\
      \nmy new 1 CE loss: {:.4f}\
        \nmy new 2 CE loss: {:.4f}\n'.format(
          out_true,
          out_pytroch,
          out_my,
          out_new,
          out_new2))

true CE loss: 1.8797      
pytorch CE loss: 1.8797        
my CE loss: 1.8797      
my new 1 CE loss: 1.8797        
my new 2 CE loss: 1.8797



  return self._call_impl(*args, **kwargs)


## Softmax 不是均衡值
重复操作会压平分布

In [109]:
X = torch.randn(1, 4)
sm_x = X.softmax(dim=1)
sm_sm_x = sm_x.softmax(dim=1)
print(X)
print(sm_x)
print(sm_sm_x)

print()
print('softmax的压平过程')
Y= X
for i in range(7):
    Y = Y.softmax(dim=1)
    print('iter-{}:'.format(i+1), Y)

tensor([[-1.5604,  1.0262, -0.5743,  0.4062]])
tensor([[0.0415, 0.5510, 0.1112, 0.2964]])
tensor([[0.1989, 0.3311, 0.2133, 0.2567]])

softmax的压平过程
iter-1: tensor([[0.0415, 0.5510, 0.1112, 0.2964]])
iter-2: tensor([[0.1989, 0.3311, 0.2133, 0.2567]])
iter-3: tensor([[0.2372, 0.2708, 0.2407, 0.2513]])
iter-4: tensor([[0.2468, 0.2552, 0.2477, 0.2503]])
iter-5: tensor([[0.2492, 0.2513, 0.2494, 0.2501]])
iter-6: tensor([[0.2498, 0.2503, 0.2499, 0.2500]])
iter-7: tensor([[0.2500, 0.2501, 0.2500, 0.2500]])
