### Pytorchのsigmoidとsoftmaxについてハマったのでメモ --
【対応】
* BCEwithLogitLoss(outputs, targets) = BCELoss(sigmoid(outputs), targets)
* CrossEntropyLoss(outputs, targets) = NLLLoss(softmax(outputs), targets)

さらに、targetsの形について
* BCE, NLL : targets.shape = [num_classes, 1], 値がカテゴリ名
    * つまり、 torch.argmax(targets, axis=1)

* CrossEntropyLossはなぜかどっちでもよい

In [83]:
import pandas as pd
import torch
import torch.nn as nn

In [84]:
# 適当な結果を読み込む --
train_df = pd.read_feather("output/roberta_large_cat4/train_df.feather")

### 0, logits = (確率化される前のNNのアウトプット)

In [85]:
tt = torch.Tensor(train_df.loc[:, ["model_oof_class_0", "model_oof_class_1"]].values)
tt # logits --

tensor([[ 5.3130, -5.3238],
        [ 5.9247, -5.9556],
        [-0.9774,  0.9705],
        ...,
        [ 3.4022, -3.0494],
        [ 5.8692, -5.8984],
        [ 5.2150, -5.3340]])

### 1, sigmoidとsoftmaxは別物 --

In [86]:
sigmoid = nn.Sigmoid()
softmax = nn.Softmax(dim=1)

In [87]:
sigmoid(tt)

tensor([[0.9951, 0.0049],
        [0.9973, 0.0026],
        [0.2734, 0.7252],
        ...,
        [0.9678, 0.0452],
        [0.9972, 0.0027],
        [0.9946, 0.0048]])

* Sigmoidは厳密な確率空間への変換になっていない

In [88]:
sigmoid(tt).sum(axis=1) # 1にならない --

tensor([0.9999, 0.9999, 0.9986,  ..., 1.0130, 0.9999, 0.9994])

In [89]:
softmax(tt).sum(axis=1)

tensor([1., 1., 1.,  ..., 1., 1., 1.])

### 2, BCELossとBCELossWithLogitsLoss --

In [90]:
outputs = torch.Tensor([
    [1e6, 1e-6],
    [1e6, 1e-6],
    [1e6, 1e-6]
])

targets = torch.Tensor([
    [1, 0],
    [0, 1],
    [1, 0]
])

* BCELossはoutputs, targetsともにprobaを受け取る

In [91]:
bceloss = nn.BCELoss()
bceloss(sigmoid(outputs), targets)

tensor(17.0132)

In [92]:
bceloss(softmax(outputs), targets)

tensor(33.3333)

* BCELosswithLogitslossはlogitsを受け取れる --
    * ここで、0.7561で一致することから「BCEwithLogitsLossは内部でSigmoidを適用している」

In [93]:
bcelossw = nn.BCEWithLogitsLoss()
bcelossw(outputs, targets)

tensor(166667.0156)

* Softmaxを適用してるのはnn.CrossEntropyLoss

In [94]:
closs = nn.CrossEntropyLoss()
closs(outputs, targets)

tensor(333333.3438)

In [95]:
targets_argmax = torch.argmax(targets, axis=1)
targets_argmax

tensor([0, 1, 0])

In [96]:
print(closs(outputs, targets))

tensor(333333.3438)


In [97]:
print(closs(outputs, targets_argmax))

tensor(333333.3438)


* さらに、CELはこれと一緒

In [98]:
nllloss = nn.NLLLoss(weight=torch.Tensor([100, 0]))
logsoftmax = nn.LogSoftmax(dim=1)

In [99]:
print(logsoftmax(outputs))
print(torch.argmax(targets, axis=1))

nllloss(
    logsoftmax(outputs),
    torch.argmax(targets, axis=1)
)

tensor([[       0., -1000000.],
        [       0., -1000000.],
        [       0., -1000000.]])
tensor([0, 1, 0])


tensor(0.)