## 1. The binary cross-entropy loss

Code example companion notebook for the blog article   
[Losses Learned -- Optimizing Negative Log-Likelihood and Cross-Entropy in PyTorch (Part 1)](https://sebastianraschka.com/blog/2022/losses-learned-part1.html)

### 1.1 Example data for binary cross-entropy

In [42]:
import torch


y_targets = torch.tensor([1., 1., 0., 0., 0.])

logits = torch.tensor([1.1, 2.2, 0.5, -1.1, -2.2])
probas = torch.sigmoid(logits)
print(probas)

tensor([0.7503, 0.9002, 0.6225, 0.2497, 0.0998])


### 1.2 Implementing the binary cross-entropy loss from scratch

In [43]:
def binary_logistic_loss_v1(probas, y_targets):
    res = 0.
    for i in range(y_targets.shape[0]):
        if y_targets[i] == 1.:
            res += torch.log(probas[i])
        elif y_targets[i] == 0.:
            res += torch.log(1-probas[i])            
        else:
            raise ValueError(f'Value {y_targets[i]} not allowed')
    res *= -1
    res /= y_targets.shape[0]

    return res


binary_logistic_loss_v1(probas, y_targets)

tensor(0.3518)

In [44]:
def binary_logistic_loss_v2(probas, y_targets):
    first = -y_targets.matmul(torch.log(probas))
    second = -(1 - y_targets).matmul(torch.log(1 - probas))
    return (first + second) / y_targets.shape[0]

binary_logistic_loss_v2(probas, y_targets)

tensor(0.3518)

In [45]:
%timeit binary_logistic_loss_v1(probas, y_targets)

38.6 µs ± 292 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [46]:
%timeit binary_logistic_loss_v2(probas, y_targets)

10.6 µs ± 47.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


### 1.3 Using the binary cross-entropy loss in PyTorch

In [47]:
bce = torch.nn.BCELoss()
bce(probas, y_targets)

tensor(0.3518)

In [48]:
class MyBCELoss(torch.nn.Module):
    def __init__(self):
        super().__init__()
 
    def forward(self, inputs, targets):        
        return binary_logistic_loss_v2(inputs, targets)
    
    
my_bce = MyBCELoss()
my_bce(probas, y_targets)

tensor(0.3518)

In [49]:
bce_logits = torch.nn.BCEWithLogitsLoss()
bce_logits(logits, y_targets)

tensor(0.3518)

In [50]:
bce_logits(logits, y_targets)

tensor(0.3518)

In [51]:
bce(torch.sigmoid(logits), y_targets)

tensor(0.3518)

**Log-Sum Trick and Logsigmoid**

In [52]:
def binary_logistic_loss_v2(probas, y_targets):
    first = -y_targets.matmul(torch.log(probas))
    second = -(1 - y_targets).matmul(torch.log(1 - probas))
    return (first + second) / y_targets.shape[0]

binary_logistic_loss_v2(probas, y_targets)

tensor(0.3518)

In [53]:
import torch.nn.functional as F


def binary_logistic_loss_v3(logits, y_targets):
    first = -y_targets.matmul(F.logsigmoid(logits))
    second = -(1 - y_targets).matmul(F.logsigmoid(logits) - logits)
    return (first + second) / y_targets.shape[0]

binary_logistic_loss_v3(logits, y_targets)

tensor(0.3518)

### 1.4 PyTorch’s functional vs object-oriented API

In [56]:
import torch.nn.functional as F


F.binary_cross_entropy(probas, y_targets)

tensor(0.3518)

In [57]:
F.binary_cross_entropy_with_logits(logits, y_targets)

tensor(0.3518)

### 1.5 A PyTorch loss function cheatsheet (so far)

- Note that we use different inputs here, which is why the outputs differ from previous sections.

In [69]:
logits = torch.tensor([ -1., 0,  1.])
targets = torch.tensor([0.,  0., 1.])

bce_logits = torch.nn.BCEWithLogitsLoss()
bce_logits(logits, targets)

tensor(0.4399)

In [70]:
F.binary_cross_entropy_with_logits(logits, targets)

tensor(0.4399)

In [68]:
bce = torch.nn.BCELoss()
bce(torch.sigmoid(logits), targets)

tensor(0.4399)

In [71]:
F.binary_cross_entropy(torch.sigmoid(logits), targets)

tensor(0.4399)