In PyTorch, `torch.nn.NLLLoss` (Negative Log-Likelihood Loss) is typically used in classification problems, particularly when working with the output of a `log_softmax` activation function. It computes the negative log likelihood of the target classes, given the predicted log-probabilities.


### Explanation:
1. **Input (logits)**: The model's raw output (before applying any activation like softmax).
2. **Log softmax**: Converts logits into log-probabilities.
3. **Targets**: The ground truth class labels (0-indexed).
4. **Loss**: The `NLLLoss` function computes the negative log-likelihood for each sample and returns the mean loss.

Make sure the model output uses `log_softmax` before passing it to `NLLLoss`. If your model outputs raw logits, `log_softmax` is necessary to get proper probabilities.

The formula for the **Negative Log-Likelihood Loss** (NLLLoss) is given by:

$$
\text{NLLLoss}(x, y) = - \log(p(y | x))
$$

Where:
- \( x \) is the input to the model (e.g., logits).
- \( y \) is the ground truth (the target class).
- \( p(y | x) \) is the predicted probability of the correct class \( y \), based on the input \( x \).

For multi-class classification, if the model output is the **logits** (i.e., unnormalized scores), we first apply the **softmax** function to obtain the predicted probabilities, and then compute the negative log of the probability of the correct class.

Mathematically, for the **logits** \( z_i \) (where \( i \) is the class index), the softmax function is defined as:

$$
p(y = j | x) = \frac{e^{z_j}}{\sum_{k=1}^{C} e^{z_k}}
$$

Where \( C \) is the number of classes, and \( z_j \) is the logit corresponding to class \( j \).

The **NLLLoss** for a single sample is then:

$$
\text{NLLLoss}(z, y) = -\log(p(y = y_{\text{true}} | x)) = -\log\left(\frac{e^{z_{y_{\text{true}}}}}{\sum_{k=1}^{C} e^{z_k}}\right)
$$

This loss is averaged over all samples in the batch during training.

### Key Points:
1. The model outputs logits (raw scores before activation).
2. The softmax function converts logits into probabilities.
3. The log of the predicted probability for the correct class is taken.
4. The negative of this log is the loss.

In [4]:
import torch
import torch.nn as nn

# Define the NLLLoss function
criterion = nn.NLLLoss()

# Example of logits (output from log_softmax)
logits = torch.tensor([[0.2, 0.5, -0.1], [-0.1, 0.1, 0.3]], requires_grad=True)  # 2 samples, 3 classes
log_probs = torch.log_softmax(logits, dim=1)  # Convert logits to log probabilities

# Example of target labels
targets = torch.tensor([1, 2])  # Correct classes for each sample (class 1 for the first sample, class 2 for the second)

# Compute the loss
loss = criterion(log_probs, targets)
print(loss)


tensor(0.8701, grad_fn=<NllLossBackward0>)


To manually calculate the **Negative Log-Likelihood Loss (NLLLoss)** for the given data, let's go through each step using the provided values.

### **Given values:**
- **Logits**: 
  $$
  \text{logits} = \begin{bmatrix}
  0.2 & 0.5 & -0.1 \\
  -0.1 & 0.1 & 0.3
  \end{bmatrix}
  $$
  
- **Targets**: 
  $$
  \text{targets} = [1, 2]
  $$

### **Step 1: Apply Softmax to the logits**
To compute log-probabilities, we first apply the **softmax** function to the logits. The softmax function transforms logits into probabilities.

The softmax function for a vector \(z = [z_1, z_2, z_3]\) is given by:

$$
p(y = j | x) = \frac{e^{z_j}}{\sum_{k=1}^{C} e^{z_k}}
$$

For each row of logits, we apply softmax:

#### **First row: [0.2, 0.5, -0.1]**
$$
\text{softmax}(0.2, 0.5, -0.1) = \left[ \frac{e^{0.2}}{e^{0.2} + e^{0.5} + e^{-0.1}}, \frac{e^{0.5}}{e^{0.2} + e^{0.5} + e^{-0.1}}, \frac{e^{-0.1}}{e^{0.2} + e^{0.5} + e^{-0.1}} \right]
$$
Calculate exponentials and normalize to obtain probabilities.

#### **Second row: [-0.1, 0.1, 0.3]**
$$
\text{softmax}(-0.1, 0.1, 0.3) = \left[ \frac{e^{-0.1}}{e^{-0.1} + e^{0.1} + e^{0.3}}, \frac{e^{0.1}}{e^{-0.1} + e^{0.1} + e^{0.3}}, \frac{e^{0.3}}{e^{-0.1} + e^{0.1} + e^{0.3}} \right]
$$
Similarly, calculate exponentials and normalize to get the probabilities.

### **Step 2: Compute Log-Probabilities**
Once we have the softmax probabilities, we apply the natural logarithm to these probabilities to obtain log-probabilities.

### **Step 3: Calculate NLL Loss**
Now, the **NLL Loss** for each sample is computed as:

$$
\text{NLLLoss}(x, y) = - \log(p(y_{\text{true}} | x))
$$

For the given targets:
- For the first sample, the target class is 1 (index 1 in 0-indexed).
- For the second sample, the target class is 2 (index 2 in 0-indexed).

Thus, we calculate the negative log of the predicted probability for the correct class for each sample.

In [2]:
import torch
import torch.nn.functional as F

# Given logits
logits = torch.tensor([[0.2, 0.5, -0.1], [-0.1, 0.1, 0.3]])

# Targets (true classes)
targets = torch.tensor([1, 2])

# Step 1: Apply Softmax to logits to get probabilities
probabilities = F.softmax(logits, dim=1)

# Step 2: Get the log-probabilities by taking the log of probabilities
log_probs = torch.log(probabilities)

# Step 3: Compute the negative log-likelihood for the correct class (using target labels)
# For each sample, we pick the log-probability of the correct class and take the negative
nll_losses = -log_probs[torch.arange(len(targets)), targets]

# Final NLL Loss is the mean of the losses
nll_loss = nll_losses.mean()

print("Logits:\n", logits)
print("Probabilities:\n", probabilities)
print("Log-Probabilities:\n", log_probs)
print("Negative Log-Likelihoods for each sample:", nll_losses)
print("Mean NLL Loss:", nll_loss.item())


Logits:
 tensor([[ 0.2000,  0.5000, -0.1000],
        [-0.1000,  0.1000,  0.3000]])
Probabilities:
 tensor([[0.3236, 0.4368, 0.2397],
        [0.2693, 0.3289, 0.4018]])
Log-Probabilities:
 tensor([[-1.1284, -0.8284, -1.4284],
        [-1.3119, -1.1119, -0.9119]])
Negative Log-Likelihoods for each sample: tensor([0.8284, 0.9119])
Mean NLL Loss: 0.8701457977294922
