 NLL Loss and Cross Entropy Loss are closely related in PyTorch and in machine learning generally. In fact, CrossEntropyLoss is a combination of LogSoftmax and NLLLoss in a single function.
 
Key Points about nll_loss:
- NLL Loss (Negative Log Likelihood Loss) is often used in classification problems where the model outputs log probabilities (often the output from log_softmax).
- The function compares the log-probabilities from the model to the actual target labels.
- This loss function expects the input to be log probabilities and the target to be class indices.

CrossEntropyLoss(logits,target)=NLLLoss(log(softmax(logits)),target)

training:
- logits -> cross_entropy 
- logits -> log_softmax -> nll_loss
- logits -> softmax -> log -> nll_loss (Not optimal and should be avoided due to inefficiency and numerical instability.)

inference:
- logits -> softmax -> multinomial


In [10]:
import torch
import torch.nn.functional as F

# Step 1: Define some arbitrary logits
logits = torch.tensor([
    [2.0, 1.5, 0.5],  # Sample 1
    [3.0, 0.0, 1.2],  # Sample 2
    [0.1, 2.5, 0.3],  # Sample 3
    [1.5, 1.0, 1.3]   # Sample 4
])

# Step 2: Apply softmax
softmax_probs = torch.nn.functional.softmax(logits, dim=1)

# Step 3: Apply log
log_probs = torch.log(softmax_probs)

# Print the correct log_probs tensor
print("log_probs:\n", log_probs)

# Target labels for each sample (true class indices)
targets = torch.tensor([0, 2, 1, 0])

# Calculate NLL Loss
loss = F.nll_loss(log_probs, targets)

print(f"NLL Loss: {loss.item()}")

# manual calculation
manual_loss = -((-0.6041) + (-1.9948) + (-0.1836) + (-0.8859)) / 4
print(f"Manual NLL Loss: {manual_loss}")



log_probs:
 tensor([[-0.6041, -1.1041, -2.1041],
        [-0.1948, -3.1948, -1.9948],
        [-2.5836, -0.1836, -2.3836],
        [-0.8859, -1.3859, -1.0859]])
NLL Loss: 0.9171183109283447
Manual NLL Loss: 0.9171


In [11]:
import torch.nn as nn

# Instantiate the loss function
criterion = nn.CrossEntropyLoss()

# Sample logits and labels
logits = torch.tensor([[1.0, 2.0, 0.1],
                       [1.2, 0.5, 0.3],
                       [0.4, 1.0, 1.5]], dtype=torch.float32)
labels = torch.tensor([2, 0, 1], dtype=torch.long)

# Call the instantiated object to compute the loss
loss = criterion(logits, labels)
print(loss)


tensor(1.3743)


In [None]:
import torch
import torch.nn as nn

logits = torch.tensor([[2.0, 1.0, 0.1], 
                       [0.5, 2.5, 1.0]])  # Raw scores

targets = torch.tensor([0, 1])  # Class labels

# Using CrossEntropyLoss
loss_fn_ce = nn.CrossEntropyLoss()
loss_ce = loss_fn_ce(logits, targets)

# Equivalent using LogSoftmax + NLLLoss
log_softmax = torch.log_softmax(logits, dim=1)  # Log probabilities
loss_fn_nll = nn.NLLLoss()
loss_nll = loss_fn_nll(log_softmax, targets)

print(loss_ce)   # Same value
print(loss_nll)  # Same value


tensor(0.3617)
tensor(0.3617)


In [6]:
import torch.nn.functional as F
logits = torch.tensor([[1.0, 2.0, 0.1],
                       [1.2, 0.5, 0.3],
                       [0.4, 1.0, 1.5]], dtype=torch.float32)

sm = F.softmax(logits, dim=-1)
print(sm)

lsm = F.log_softmax(logits, dim=-1)
print(lsm)

labels = torch.tensor([2, 0, 1], dtype=torch.long)

# incorrect
incorrect_nll = F.nll_loss(sm, labels)

# correct, we should use log_softmax
correct_nll = F.nll_loss(lsm, labels)

print("incorrect_nll: ", incorrect_nll)
print("correct_nll:", correct_nll)

cross_entropy = F.cross_entropy(logits, labels)
print("cross entropy: ", cross_entropy)

tensor([[0.2424, 0.6590, 0.0986],
        [0.5254, 0.2609, 0.2136],
        [0.1716, 0.3127, 0.5156]])
tensor([[-1.4170, -0.4170, -2.3170],
        [-0.6435, -1.3435, -1.5435],
        [-1.7624, -1.1624, -0.6624]])
incorrect_nll:  tensor(-0.3123)
correct_nll: tensor(1.3743)
cross entropy:  tensor(1.3743)


In [6]:
logits = torch.tensor([[1.0, 2.0, 0.1],
                       [1.2, 0.5, 0.3],
                       [0.4, 1.0, 1.5]], dtype=torch.float32)

t1 = F.softmax(logits, dim=-1)
print("t1:")
print(t1)

t2 = t1.log()
print("t1.log():")
print(t2)

t3 = torch.log(t1)
print("torch.log(t1):")
print(t3)

print("log_softmax:")
t4 = F.log_softmax(logits, dim=-1)
print(t4)



t1:
tensor([[0.2424, 0.6590, 0.0986],
        [0.5254, 0.2609, 0.2136],
        [0.1716, 0.3127, 0.5156]])
t1.log():
tensor([[-1.4170, -0.4170, -2.3170],
        [-0.6435, -1.3435, -1.5435],
        [-1.7624, -1.1624, -0.6624]])
torch.log(t1):
tensor([[-1.4170, -0.4170, -2.3170],
        [-0.6435, -1.3435, -1.5435],
        [-1.7624, -1.1624, -0.6624]])
log_softmax:
tensor([[-1.4170, -0.4170, -2.3170],
        [-0.6435, -1.3435, -1.5435],
        [-1.7624, -1.1624, -0.6624]])
