torch.nn 模块与torch.nn.functional区别：

import tortch.nn as nn
- torch.nn 通常用于定义和组装神经网络的模型、层和损失函数等，例如 nn.Linear, nn.RNN, nn.CrossEntropyLoss。
- torch.nn 该模块是面向对象的，需要创建层对象并将它们组装成模型类，通常需要定义一个继承自 nn.Module的类，并实现forward方法来定义前向传播

import torch.nn.functional as F
- F 提供一系列的函数接口，用于执行各种Tensor的操作，如激活函数、池化、归一化等
- F 无状态的。不包含任何可学习的参数，通常用于自定义的前向传播函数时执行各种的非线性操作

## NLLLoss
Negative Log Likelihood Loss 最大似然函数

In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F
seed=42
torch.manual_seed(seed)

<torch._C.Generator at 0x7fbf280f4e30>

In [3]:
logits = torch.randn(3,4)
logits

tensor([[ 0.3367,  0.1288,  0.2345,  0.2303],
        [-1.1229, -0.1863,  2.2082, -0.6380],
        [ 0.4617,  0.2674,  0.5349,  0.8094]])

## softmax 归一化
$\sigma(\mathbf{z})_j=\frac{e^{z_j}}{\sum_{k=1}^Ke^{z_k}}\quad\mathrm{for~}j=1,\ldots,K.$

求 $q_k$

In [4]:
## Softmax
sm = nn.Softmax(dim=-1)
probs = sm(logits)
probs

tensor([[0.2767, 0.2248, 0.2498, 0.2488],
        [0.0302, 0.0770, 0.8439, 0.0490],
        [0.2317, 0.1908, 0.2493, 0.3281]])

In [5]:
## F.softmax
F.softmax(logits, dim=-1)

tensor([[0.2767, 0.2248, 0.2498, 0.2488],
        [0.0302, 0.0770, 0.8439, 0.0490],
        [0.2317, 0.1908, 0.2493, 0.3281]])

In [6]:
## 对softmax结果进行log
log_result = torch.log(probs)
log_result

tensor([[-1.2849, -1.4928, -1.3871, -1.3912],
        [-3.5008, -2.5643, -0.1698, -3.0160],
        [-1.4621, -1.6565, -1.3889, -1.1144]])

## LogSoftmax 是对softmax结果取log值，为负值

求 $logq_k$

In [7]:
## LogSoftmax 
# 对softmax结果进行log
log_sm = nn.LogSoftmax(dim=1)
log_probs  = log_sm(logits)
log_probs

tensor([[-1.2849, -1.4928, -1.3871, -1.3912],
        [-3.5008, -2.5643, -0.1698, -3.0160],
        [-1.4621, -1.6565, -1.3889, -1.1144]])

## nn.NLLLoss 的结果就是LogSoftmax输出与Label对应的的值取出，取反，求均值

计算 $-\sum_{k=1}^N(p_k * logq_k)$ 或 $-\frac{1}{N} \sum_{k=1}^N(p_k * logq_k)$

https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html

In [8]:
labels = torch.tensor([1,0,2])
loss_fn =  nn.NLLLoss(reduction="sum") # reduction='mean'  'none','sum','mean', default is 'mean'
loss = loss_fn(log_probs, labels)
loss/3

tensor(2.1275)

In [9]:
# 1,0,2
-(log_probs[0][1]+log_probs[1][0]+log_probs[2][2])/3

tensor(2.1275)

In [10]:
loss_fn2 = nn.NLLLoss(reduction="mean")
loss = loss_fn2(log_probs, labels)
loss

tensor(2.1275)

## CrossEntropyLoss
softmax+log+NULLLoss

$H(p,q)=-\sum_{k=1}^N(p_k*logq_k)$

In [11]:
loss_fn3 = nn.CrossEntropyLoss(reduction='sum') # reduction='mean' or 'sum'
loss_cross = loss_fn3(logits, labels)
loss_cross/3

tensor(2.1275)

In [12]:
loss_fn3 = nn.CrossEntropyLoss() # reduction='mean' or 'sum', default 'mean'
loss_cross = loss_fn3(logits, labels)
loss_cross

tensor(2.1275)