## PyTorch Tutorial 11 - Softmax and Cross Entropy


#### Softmax

$S(y_i) = \frac{e^{y_i}}{\sum e^{y_i}}$


<img src="images/Sofmax_Layer.png">


$y_i$ are raw scores/logits. Softmax makes them normalized $y_{pred}$

In [36]:
import torch
import torch.nn as nn
import numpy as np


def softmax(x):
    return np.exp(x)/np.sum(np.exp(x), axis = 0, keepdims = True)

x = np.array([2,1,0.1])

softmax(x)

array([0.65900114, 0.24243297, 0.09856589])

We see that highest logit has the highest probability.

In [37]:
x = torch.tensor([2,1,0.1])
outputs = torch.softmax(x, dim = 0)

In [39]:
print(outputs)

tensor([0.6590, 0.2424, 0.0986])


We see results almost same


## Cross-Entropy


$ D(\hat Y) = - \frac{1}{N} \sum Y_i * log(\hat Y_i) $

Y = [ 1, 0, 0] 
$\hat Y = [0.7, 0.2, 0.1] \implies   D(\hat Y) = 0.35  $

$\hat Y = [0.1, 0.3, 0.6] \implies   D(\hat Y) = 2.30  $



In [68]:
def cross_entropy(y,y_hat):
    n = y_hat.shape[0]
    return -np.sum(y * np.log(y_hat)) #/n - we do not do here deliberately

y = np.array([1, 0, 0])
y_hat = np.array([0.7, 0.2, 0.1])

cross_entropy(y, y_hat)

0.35667494393873245

In [70]:
y_pred_bad = np.array([0.1, 0.3, 0.6])
cross_entropy(y, y_pred_bad)

2.3025850929940455

In [72]:
#We can see that bad one has high loss as expected
#bad gives high probability to one which is not 1 and gives low to one which actually 1


In [73]:
-1 *np.sum(y * np.log2(y_hat))

0.5145731728297583

We can also do as in torch

In [89]:
loss = nn.CrossEntropyLoss()

#Here we trat as if there are 3 classes and true value is the first class, ie index 0
#If second class were true, then [1] and if the third class were true we would put [2]
Y = torch.tensor([0])  

# nsamples x nclasses = 1 x 3
Y_pred_good = torch.tensor([[2.0,1.0, 0.1]])  #good because highest value to first
Y_pred_bad  = torch.tensor([[0.5,2.0, 0.3]])  #bad because highest not to first

l1 = loss(Y_pred_good, Y)
l2 = loss(Y_pred_bad, Y)

print(l1.item())
print(l2.item())

0.4170299470424652
1.840616226196289


As expected, the bad prediction has highest cross entropy!

In [92]:
_, predictions1 = torch.max(Y_pred_good, 1)
_, predictions2 = torch.max(Y_pred_bad, 1)

print(predictions1)
print(predictions2)

tensor([0])
tensor([1])


In [95]:
# Lets do three samples

Y = torch.tensor([2,0,1])  

# nsamples x nclasses = 1 x 3
Y_pred_good = torch.tensor([[0.1,1.0, 2.1],
                            [2.0,1.0, 0.1],
                            [0.1,3.0, 0.1]])  #good because highest value according to Y

Y_pred_bad  = torch.tensor([[2.1,1.0, 0.1],
                            [0.1,1.0, 2.1],
                            [0.1,3.0, 0.1]
                           ])  #bad 

l1 = loss(Y_pred_good, Y)
l2 = loss(Y_pred_bad, Y)

print(l1.item())
print(l2.item())


0.3018244206905365
1.6241613626480103


<img src="images/NeuralNet-WithSoftmax.png">
(source : https://www.youtube.com/watch?v=7q7E91pHoW4)

In pytorch we use Cross Entorpy Loss we should not define softmax layer ourselves - it does automatically.

In [105]:
# Multiclass Problem

class NeuralNet2(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(NeuralNet2, self).__init__()
        self.linear1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(hidden_size, num_classes)
        
    def forward(self, x):
        out = self.linear1(x)
        out = self.relu(out)
        out = self.linear2(out)
        # no softmax at the end
        return out
    
model = NeuralNet2(input_size = 28*28, hidden_size = 5, num_classes=3)
criterion = nn.CrossEntropyLoss()  #applies softmax


<img src="images/NeuralNet-WithSigmoid.png">
(source : https://www.youtube.com/watch?v=7q7E91pHoW4)

In [107]:
# Multiclass Problem

class NeuralNet3(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(NeuralNet3, self).__init__()
        self.linear1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(hidden_size, 1)  #NOTE 1 - 
        
    def forward(self, x):
        out = self.linear1(x)
        out = self.relu(out)
        out = self.linear2(out)
        # sigmoid at the end
        y_pred = torch.sigmoid(out)
        return y_pred
    
# Note we DO not have num classes as we are doing binary classification
model3 = NeuralNet3(input_size= 28*28, hidden_size = 5) 
criterion3 = nn.BCELoss()  # binary cross entropy