## Softmax and Cross-Entropy
$$
\text{softmax}(x_i) = \frac{e^{x_i}}{\sum_{j=1}^{n} e^{x_j}}
$$
$$
\text{CrossEntropy}(\hat{y}, y) = -\frac{1}{N}\sum_{i=1}^{N} y_i \log(\hat{y}_i)
$$


### softmax numpy code

In [13]:
import numpy as np

In [14]:
def softmax(x):
    return np.exp(x) / np.sum(np.exp(x), axis = 0)

In [15]:
x = np.array([2.0, 1.0, 1.0])
output = softmax(x)
print(f"softmax numpy: {output}")

softmax numpy: [0.57611688 0.21194156 0.21194156]


### softmax pytorch code

In [16]:
import torch

In [17]:

x = torch.tensor([2.0, 1.0, 1.0])
output = torch.softmax(x, dim = 0)
print(f"softmax pytorch: {output}")


softmax pytorch: tensor([0.5761, 0.2119, 0.2119])


### Cross-Entropy in numpy

In [18]:
def cross_entropy(actual, predicted):
    loss = -np.sum(actual * np.log(predicted))
    return loss # / float(predicted.shape[0])

In [19]:
Y = np.array([1, 0, 0])
Y_good_pred = np.array([0.7, 0.2, 0.1])
Y_bad_pred = np.array([0.1, 0.3, 0.6])
print(f"Loss1 numpy: {cross_entropy(Y, Y_good_pred)}")
print(f"Loss1 numpy: {cross_entropy(Y, Y_bad_pred)}")

Loss1 numpy: 0.35667494393873245
Loss1 numpy: 2.3025850929940455


### Cross-Entropy in pytorch 

-> nn.CrossEntropyLoss <br>
-> nn.LogSoftmax + nn.NLLLoss (negative log likelihood loss)
- No softmax in last layer!
- Y has class labels, not One-Hot!
- Y_pred has raw scrores (logits), not Softmax!

In [20]:
import torch.nn as nn

In [21]:
loss = nn.CrossEntropyLoss()

In [22]:
Y = torch.tensor([0]) # nsample x nclasses = 1 x 3
Y_pred_good = torch.tensor([[2.0, 1.0, 0.1]])
Y_pred_bad = torch.tensor([[0.5, 2.0, 0.3]])

l1 = loss(Y_pred_good, Y)
l2 = loss(Y_pred_bad, Y)

print(f"loss l1: {l1.item()}")
print(f"loss l2: {l2.item()}")

loss l1: 0.4170299470424652
loss l2: 1.840616226196289


In [23]:
_, prediction1 = torch.max(Y_pred_good, 1)
_, prediction2 = torch.max(Y_pred_bad, 1)
print(prediction1)
print(prediction2)

tensor([0])
tensor([1])


In [24]:
Y = torch.tensor([2, 0, 1])
# nsamples x nclasses = 3 x 3
Y_pred_good = torch.tensor([[0.1, 1.0, 2.1],
                            [2.0, 1.0, 0.1],
                            [0.1, 3.0, 0.1]])
Y_pred_bad = torch.tensor([[2.1, 1.0, 0.1],
                            [0.1, 1.0, 2.1],
                            [0.1, 3.0, 0.1]])

l1 = loss(Y_pred_good, Y)
l2 = loss(Y_pred_bad, Y)

print(f"loss l1: {l1.item()}")
print(f"loss l2: {l2.item()}")

_, prediction1 = torch.max(Y_pred_good, 1)
_, prediction2 = torch.max(Y_pred_bad, 1)
print(prediction1)
print(prediction2)

loss l1: 0.3018244206905365
loss l2: 1.6241613626480103
tensor([2, 0, 1])
tensor([0, 2, 1])
