# Crossentropy

## Objective:
In this notebook, we explore the relationship between softmax probabilities and crossentropy loss in the context of a classification task. We aim to visually analyze how well our model is predicting class probabilities and how the crossentropy loss reflects the alignment between predicted and true distributions.

## Background:
In classification problems, the crossentropy loss is a common choice for measuring the dissimilarity between predicted and true probability distributions. For a binary classification scenario, the crossentropy loss (log loss) is calculated as follows:

$$
\text{Crossentropy Loss} = - \sum_{i}^{N} \left( y_i \cdot \log(\hat{y}_i) + (1 - y_i) \cdot \log(1 - \hat{y}_i) \right)
$$

where:
- $N$ is the number of classes (2 in binary classification).
- $y_i$ is the true probability distribution (1 for the true class, 0 for others).
- $\hat{y}_i$ is the predicted probability for class $i$.

We will visualize the softmax probabilities and the corresponding crossentropy loss for each reference class. This allows us to gain insights into how our model is making predictions and which classes contribute the most to the loss.

In [None]:
import matplotlib
import matplotlib.pyplot as plt
%config InlineBackend.figure_format = 'svg'
import torch
import torch.nn.functional as F

In [None]:
num_classes = 10  # we have 10 classes

In [None]:
# Generate random logits:
# This normally is the output of the last layer before
# the softmax
logits = torch.randn((1, num_classes))

In [None]:
# Define the lost function
crossentropy = torch.nn.CrossEntropyLoss()

In [None]:
# crossentropy loss of the logits we 'obtained
# with respect to the class 'Car' which is the 2.
car_class = torch.tensor((2,))
                               
crossentropy(logits, car_class)

In [None]:
# Let's compute the crossentropy of out logits with
# respect all the possible classes 0 to 10
losses = [crossentropy(logits, torch.tensor((i,))) for i in range(num_classes)]

In [None]:
losses = torch.tensor(losses)

In [None]:
# Apply softmax
probabilities = F.softmax(logits, dim=1)

In [None]:
matplotlib.rcParams['figure.figsize'] = (10, 4)

# Plot the original logits and probabilities
fig, ax1 = plt.subplots()  # 1, 3, figsize=(10, 4))

# Plot original logits
ax1.bar( range(num_classes), probabilities.squeeze(), color='deepskyblue', label='probabilities')
ax1.bar( range(num_classes), losses / (-num_classes * losses.max()), color='tomato', label='crossentropy (scaled)')
ax1.plot(range(num_classes), logits.squeeze()  / (num_classes * torch.abs(logits).max()), '.-', color='k', label='logits (scaled)')
ax1.set_xlabel('Class')
ax1.set_ylabel('Probabilities')
ax1.set_ylim([-0.12 / (0.1 * num_classes), probabilities.max() + (0.01 / (0.1 * num_classes))])
ax1.grid(ls=':', alpha=0.5)
ax1.legend()

plt.show()