Categorical crossentropy is a loss function that is used in machine learning to train classification models. It is a measure of the difference between the predicted probability distribution of the model and the actual probability distribution of the data. The lower the categorical crossentropy, the better the model is at predicting the correct class.

The formula for categorical crossentropy is:

```
loss = -sum(y_true * log(y_pred))
```

where:

* `y_true` is the ground truth label, a one-hot vector
* `y_pred` is the model's prediction, a probability vector

Categorical crossentropy is a differentiable function, which means that it can be used to train a model using gradient descent. Gradient descent is an iterative optimization algorithm that updates the model's parameters in the direction of the steepest descent of the loss function.

Categorical crossentropy is a powerful loss function that can be used to train a variety of classification models. It is a good choice for problems where the classes are mutually exclusive, such as image classification or natural language processing.

Here are some examples of how categorical crossentropy can be used:

* Training a model to classify images of cats and dogs
* Training a model to classify text as spam or ham
* Training a model to classify handwritten digits

If you are facing a classification problem, and you have a large amount of data, then categorical crossentropy is a good loss function to consider.

In [3]:
import tensorflow as tf
y_true = [[0, 1, 0], [0, 0, 1]]
y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]

cce = tf.keras.losses.CategoricalCrossentropy()
cce(y_true, y_pred).numpy()


1.1769392