#### Softmax and Cross Entropy

Softmax is a function that distributes a series of values from 0 to 1 where the sum of all values in the distribution will sum to 1. This is particularly useful in the case of multi-class classification.

$S(y_i) = \frac{e^{y_i}}{\sum_{all y_j} {e^{y_j}}}$

In [1]:
import torch
import torch.nn as nn
import numpy as np

In [2]:
# Softmax function
def softmax(x):
    return np.exp(x) / np.sum(np.exp(x), axis=0)

In [3]:
# Test softmax function
arr1 = np.array([1, 2, 3, 4])
softarr1 = softmax(arr1)
print(softarr1)

[0.0320586  0.08714432 0.23688282 0.64391426]


Cross Entropy loss is a loss function that measures how far the predicted labels diverge or deviate from the actual labels. This is also useful for multi-class classification problems.

$D(\hat{Y}, Y) = -\frac{1}{N}\sum{Y_i log(\hat{Y_i})}$

It is to be noted that the actual predictions Y must be one-hot encoded (true class = 1, other classes = 0; e.g. [1, 0, 0, 0])

In [4]:
# Cross-Entropy loss function
def crossentropy(y, yhat):
    return -np.sum(y * np.log(yhat)) / float(yhat.shape[0])

In [5]:
arr2 = np.array([1, 0, 0])
arr2good = np.array([0.7, 0.2, 0.1])
arr2bad = np.array([0.5, 0.3, 0.2])
arr2goodcross = crossentropy(arr2, arr2good)
arr2badcross = crossentropy(arr2, arr2bad)
print(f"Loss for arr2good: {arr2goodcross}")
print(f"Loss for arr2bad: {arr2badcross}")

Loss for arr2good: 0.11889164797957748
Loss for arr2bad: 0.23104906018664842


In PyTorch, these functions are built into the nn library as the nn.CrossEntropyLoss() function.

It is to be noted that when using the nn.CrossEntropyLoss() function:
<ul>
    <li>It implements both the softmax and the negative log likelihood functions</li>
    <li>Y has class labels and is <b>NOT one-hot encoded</b></li>
    <li>Y_predictions has raw scores for the classes</li>
</ul>

In [6]:
# PyTorch example
loss = nn.CrossEntropyLoss()

# Actual class
y = torch.tensor([0]) # True class is class 0
yhat1 = torch.tensor([[1.0, 0.1, 0.1]]) # Predicting strongly class 0
yhat2 = torch.tensor([[0.1, 0.1, 0.9]]) # Predicting strongly class 2

l1 = loss(yhat1, y) # Low
l2 = loss(yhat2, y) # High
print(l1)
print(l2)

tensor(0.5951)
tensor(1.4411)


In [7]:
# Combine with torch.max() to get the class label
print(torch.max(yhat1, 1)[1].item())
print(torch.max(yhat2, 1)[1].item())

0
2


In [8]:
# To do this across multiple samples
# Actual classes
y = torch.tensor([0, 1, 0]) # True classes are class 0, 1 and 0 respectively
yhat1 = torch.tensor([[1.0, 0.1, 0.1],
                     [0.1, 1, 0.1],
                     [1.0, 0.1, 0.1]]) # Predicting strongly class 0, 1 and 0 respectively
yhat2 = torch.tensor([[0.1, 0.1, 0.9],
                     [0.9, 0.1, 0.1],
                     [0.9, 0.1, 0.1]]) # Predicting strongly class 2, 0, and 0 respectively

l1 = loss(yhat1, y) # Low
l2 = loss(yhat2, y) # High
print(l1)
print(l2)

tensor(0.5951)
tensor(1.1745)


In [9]:
# Combine with torch.max() to get the class label
print(torch.max(yhat1, 1)[1])
print(torch.max(yhat2, 1)[1])

tensor([0, 1, 0])
tensor([2, 0, 0])
