In code, we can express the BCE loss function as follows:

In [1]:
import math
import numpy as np
from torch import tensor, where, nn

def sigmoid(x):
    return 1 / (1 + math.exp(-x))

def binary_cross_entropy_single_label(logit, label):
    pred = sigmoid(logit)

    if label == 1:
        return -math.log(pred)
    
    return -math.log(1-pred)

In [2]:
logit = 5
label = 1

binary_cross_entropy_single_label(logit, label)

0.006715348489117944

We can express the same function in math as follows:

$$p = \text{sigmoid}(o)$$
$$L(p, y) = −(𝑦 \times log(𝑝) + (1−𝑦) \times log(1−𝑝))$$

For multi-label outputs, the function takes the mean (or some other reduction method) each of the log loss values:

In [3]:
def binary_cross_entropy(logits, labels):
    return np.mean([
        binary_cross_entropy_single_label(logit, label) for logit, label in zip(logits, labels)])

In [4]:
logits = [5, -2, 0.5]
labels = [1, 0, 1]

binary_cross_entropy(logits, labels)

0.2025734479040657

Which we express in math as follows:

$$P = \text{sigmoid}(O)$$
$$L(P, Y) = −\frac{1}{N} \sum\limits_{i=1}^{N} (Y_{i} \times log(P_{i}) + (1− Y_{i}) \times log(1− P_{i}))$$

PyTorch provides the function via the [`nn.BCELoss`](https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html) class, which expects you first to apply the Sigmoid function to the model's outputs. It's the equivalent of [`nn.CrossEntropyLoss`](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html) in multi-class classification.

Use [`nn.BCEWithLogitsLoss`](https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html) if your model doesn't perform the Sigmoid Activation Function on the final layer.

In [5]:
logits = tensor([5, -2, 0.5]).float()
labels = tensor([1, 0, 1]).float()

nn.BCEWithLogitsLoss()(logits, labels)

tensor(0.2026)

Which is the equivalent of the following function:

In [7]:
def binary_cross_entropy(logits, labels):
    preds = logits.sigmoid()
    return -where(labels==1, preds, 1-preds).log().mean()

In [8]:
binary_cross_entropy(logits, labels)

tensor(0.2026)