## Cross Entropy 
Let's consider the example described in Section 6.3 where we are trying to predict whether a given image contains a cat, dog, airplane or automobile. This is a 4 class classification problem. We know from the ground truth that the image is that of a cat. Thus $X_{gt} = [1, 0, 0, 0]$, where the 4 numbers represents the probability of the 4 classes respectively. Since this is the ground truth data, we are confident that the image is that of a cat. 

Let's say we have a model that predicts the probability of each of these classes. Let's call this prediction $X_{pred}$. A good prediction would have a high probability for cat and a low probability for the other classes. For example, $X_{good\_pred} = [0.8, 0.15, 0.04, 0.01]$. On the other hand, a bad prediction would have a low probability for cat. For example, $X_{bad\_pred} = [0.25, 0.25, 0.25, 0.25]$. 

Cross Entropy, denoted by $H_{C}$ gives us a way to quantify this measurement. It measures the dissimlarity between the two probability distributions. $H_{C}(X_{gt}, X_{good\_pred})$ would be low, whereas $X_{gt}$ and $X_{bad\_pred}$ would be high. It can be computed by using the following formula.

$$ H_{C}(X_{gt}, X_{pred}) = -\sum_{i=1}^{N} p_{gt}(i) \log(p_{pred}(i)) $$



In [1]:
import torch

def cross_entropy(X_gt, X_pred):
    H_C = 0
    for x_gt, x_pred in zip(X_gt, X_pred):
        H_C += -1 * (x_gt * torch.log (x_pred))
    return H_C

In [2]:
X_gt = torch.Tensor([1., 0., 0., 0.])
X_good_pred = torch.Tensor([0.8, 0.15, 0.04, 0.01])
X_bad_pred = torch.Tensor([0.25, 0.25, 0.25, 0.25])

H_C_good = cross_entropy(X_gt, X_good_pred)
H_C_bad = cross_entropy(X_gt, X_bad_pred)

# Note how the cross entropy for the bad prediction is much higher than that for the good prediction
print('Cross entropy for the good prediction: {:.3f}'.format(H_C_good))
print('Cross entropy for the bad prediction: {:.3f}'.format(H_C_bad))

Cross entropy for the good prediction: 0.223
Cross entropy for the bad prediction: 1.386
