# **Categorical Cross-Entropy**

The *Categorical Cross-Entropy* is an objective function often used in classification problems. It measures "how far" the probability over the classes predicted by the machine learning model is from the desired targets. It is defined as:

$
cross-entropy = -\frac{1}{N} \sum_{n=1}^N \sum_{k=1}^K y_{n,k} log(p_{n,k}),
$

where N is the number of training examples, and K is the number of classes.  The target value for the example $x_n$ and the class $k$ is indicated as $y_{n,k}$. More precisely, the targets Y are summarized in a matrix $\mathbb{R}^{N x K}$ where the labels are encoded as one-hot vectors:

$ Y =
\begin{bmatrix}
1 & 0 & 0\\
0 & 0 & 1 \\
0 & 1 & 0 \\
1 & 0 & 0
\end{bmatrix}
$

The output probability $p_{n,k}$ summarizes how likely is the class $k$ for the training example $x_n$. More precisely, the output probabilities P are summarized in a matrix $\mathbb{R}^{N x K}$:

$ P =
\begin{bmatrix}
0.6 & 0.2 & 0.2\\
0.3 & 0.3 & 0.4 \\
0.4 & 0.5 & 0.1 \\
0.6 & 0.1 & 0.3
\end{bmatrix}
$

The categorical cross-entropy ranges between 0 (perfect solution) and +inf (worst solution).

Let's now define a function to compute the categorical cross-entropy.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

In [None]:
def categorical_cross_entropy(targets, probs):
  cross_entropy = (targets * np.log(probs)).sum()
  return -cross_entropy/targets.shape[0]

Let's now assume to have the targets and probabilities outlined above:

In [None]:
targets = np.asarray([[1, 0, 0], [0, 0, 1], [0, 1, 0], [1, 0, 0]])
probs = np.asarray([[0.6, 0.2, 0.2], [0.3, 0.3, 0.4], [0.4, 0.5, 0.1], [0.6, 0.1, 0.3]])
print(categorical_cross_entropy(targets, probs))

0.6577722899915204


Let's see now what happens if the probabilities are very similar to the targetes:

In [None]:
targets = np.asarray([[0.998, 0.001, 0.001], [0.001, 0.001, 0.998], [0.001, 0.001, 0.998], [0.998, 0.001, 0.001]])
print(categorical_cross_entropy(targets, targets))

0.015813509223296007


In the case above, the solution is good because the predictions and the targets are very close. The cross-entropy is indeed close to zero.


Let's see what happens if the output classes are equiprobable:

In [None]:
targets = np.asarray([[1, 0, 0], [0, 0, 1], [0, 1, 0], [1, 0, 0]])
probs = np.asarray([[0.333, 0.333, 0.333], [0.333, 0.333, 0.333], [0.333, 0.333, 0.333], [0.333, 0.333, 0.333]])
print(categorical_cross_entropy(targets, probs))

1.0996127890016931


When the classes are equiprobable, the categorical cross-entropy is -$log(1/k)$. This is already a bad system, but we can do even worse. For instance:

In [None]:
targets = np.asarray([[1, 0, 0], [0, 0, 1], [0, 1, 0], [1, 0, 0]])
probs = np.asarray([[0.001, 0.998, 0.001], [0.001, 0.998, 0.001], [0.001, 0.001, 0.9998], [0.001, 0.998, 0.001]])
print(categorical_cross_entropy(targets, probs))

6.907755278982137


In this case, the categorical cross-entropy is much higher because the predictions of the machine learning algorithm are totally wrong (even worse than the random guess).

**Summary**: Categorical cross-entropy is a popular objective function that can tell us how good a classifier is. It is better than the accuracy because it is a "soft" measure able to rank more precisely the solutions explored during training.