[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/itmorn/AI.handbook/blob/main/DL/torch/nn/LossFunction/CTCLoss.ipynb)

# CTCLoss

**定义**：  
torch.nn.CTCLoss(blank=0, reduction='mean', zero_infinity=False)

**参数**:  
- blank (int, optional) – blank label. Default 0.  空类别的标签，默认是0

- reduction (str, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the output losses will be divided by the target lengths and then the mean over the batch is taken. Default: 'mean'  当有多个样本一起计算的时候，对多个loss进行reduce的方式

- zero_infinity (bool, optional) – Whether to zero infinite losses and the associated gradients. Default: False Infinite losses mainly occur when the inputs are too short to be aligned to the targets.  是否将无限损失和相关梯度归零。默认值:False无限损失主要发生在输入太短而无法与目标对齐时。

# 图解CTCLoss的输入
<p align="center">
<img src="./imgs/CTCLoss_input.svg"
    width="2000" /></p>

In [108]:
# 单个样本简单举例
import torch
import torch.nn as nn
torch.manual_seed(666)

# Target are to be un-padded and unbatched (effectively N=1)
T = 10      # Input sequence length
C = 5      # Number of classes (including blank)

# Initialize random batch of input vectors, for *size = (T,C)
input = torch.randn(T, C).log_softmax(1).detach().requires_grad_()
input_lengths = torch.tensor(T, dtype=torch.long)

# # Initialize random batch of targets (0 = blank, 1:C = classes)
target_lengths = torch.randint(low=1, high=T, size=(), dtype=torch.long)
target = torch.randint(low=1, high=C, size=(target_lengths,), dtype=torch.long)
ctc_loss = nn.CTCLoss(blank=0, reduction='mean', zero_infinity=False)
loss = ctc_loss(input, target, input_lengths, target_lengths)
# loss.backward()
print("input:\n", input, "\n")
print("target:\n", target, "\n")


input:
 tensor([[-2.5501, -0.9828, -1.7815, -2.2131, -1.3097],
        [-1.6865, -2.3959, -1.0816, -1.8824, -1.4591],
        [-1.1462, -1.7807, -2.0790, -1.6667, -1.6109],
        [-2.1791, -3.0886, -2.0956, -2.4411, -0.4601],
        [-1.9573, -2.3131, -2.7964, -0.6376, -1.7705],
        [-2.6371, -1.2972, -1.3879, -1.1645, -2.3702],
        [-0.9790, -1.3483, -1.3424, -2.9789, -2.9458],
        [-1.8011, -2.5198, -1.0987, -2.1566, -1.1861],
        [-2.2219, -2.0857, -0.7027, -2.1847, -1.8351],
        [-2.0254, -0.6508, -1.7142, -2.1643, -2.9660]], requires_grad=True) 

target:
 tensor([4, 4, 1, 3, 4, 1, 2]) 



# 图解CTCLoss的计算

要求1：不能跳过任何token
<p align="center"> <img src="./imgs/微信截图_20230223143030.png" width="400" /></p>
<hr/>

要求2：在不同位置，能走的方式是不同的。    

- 最开始的时候可以走φ，也可以走第一个token.
- 在token行的时候，可以走重复，可以走φ，还可以走next token.
- 在φ行的时候收到要求1的限制，不能跳过token。
<p align="center"> <img src="./imgs/微信截图_20230223143043.png" width="400" /></p>
<hr/>

要求3：如果target中有连续重复的类别时，不能直接跳到下一个token，因为这会妨碍以后的去重（导致无法识别叠词）。    
<p align="center"> <img src="./imgs/微信截图_20230223143206.png" width="400" /></p>
<hr/>

合法序列举例：
<p align="center"> <img src="./imgs/微信截图_20230223143148.png" width="400" /></p>
<hr/>

下面展示实际计算的过程（为了简洁，我们举一个比较简单的例子）：
<p align="center">
<img src="./imgs/CTCLoss.svg"
    width="700" /></p>


In [118]:
# 调包计算
import torch
import torch.nn as nn
torch.manual_seed(666)

# Target are to be un-padded and unbatched (effectively N=1)
T = 2      # Input sequence length
C = 2      # Number of classes (including blank)

# Initialize random batch of input vectors, for *size = (T,C)
input = torch.randn(T, C).log_softmax(1).detach().requires_grad_()
input_lengths = torch.tensor(T, dtype=torch.long)

# # Initialize random batch of targets (0 = blank, 1:C = classes)
target_lengths = torch.randint(low=1, high=T, size=(), dtype=torch.long)
target = torch.randint(low=1, high=C, size=(target_lengths,), dtype=torch.long)
ctc_loss = nn.CTCLoss(blank=0, reduction='mean', zero_infinity=False)
loss = ctc_loss(input, target, input_lengths, target_lengths)
loss.backward()


print("input:\n", input, "\n")
print("loss:\n", loss, "\n")

# target


input:
 tensor([[-2.2892, -0.1069],
        [-1.6550, -0.2121]], requires_grad=True) 

loss:
 tensor(0.0196, grad_fn=<MeanBackward0>) 



In [109]:
# 手工计算
import math

def logsumexp(*args):
    """ Stable log sum exp. ref: https://zhuanlan.zhihu.com/p/153535799 """
    if all(a == -float("inf") for a in args):
        return -float("inf")
    a_max = max(args)
    lsp = math.log(sum(math.exp(a - a_max)
                    for a in args))
    return a_max + lsp

p_φ1 = -2.2892 -0.2121
p_11 = -0.1069 -0.2121
p_φ1_and_p_11 = logsumexp(p_φ1,p_11)

p_1φ = -0.1069 -1.6550

p_all = logsumexp(p_1φ,p_φ1_and_p_11)
-p_all # 概率的相反数当做loss，可以看到和调包计算结果一致

0.01961900455208171

# 参考资料
https://www.youtube.com/watch?v=5SSVra6IJY4&list=PLJV_el3uVTsO07RpBYFsXg-bN5Lu0nhdG&index=7

https://zhuanlan.zhihu.com/p/153535799

In [114]:
# Target are to be un-padded and unbatched (effectively N=1)
T = 50      # Input sequence length
C = 20      # Number of classes (including blank)

# Initialize random batch of input vectors, for *size = (T,C)
input = torch.randn(T, C).log_softmax(1).detach().requires_grad_()
input_lengths = torch.tensor(T, dtype=torch.long)

# Initialize random batch of targets (0 = blank, 1:C = classes)
target_lengths = torch.randint(low=1, high=T, size=(), dtype=torch.long)
target = torch.randint(low=1, high=C, size=(target_lengths,), dtype=torch.long)
ctc_loss = nn.CTCLoss()
loss = ctc_loss(input, target, input_lengths, target_lengths)
loss.backward()
loss

tensor(18.0198, grad_fn=<MeanBackward0>)