<a href="https://colab.research.google.com/github/reeda23/Deep-Learning-With-Pytorch/blob/main/10_Softmax_and_Cross_Entropy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Softmax

Formula of Softmax <br>

\begin{align}
          {S(y_i)} =  \frac {e^{y_i}}{∑ e^{y_i}}
\end{align}


Softmax applies the exponential function to each element, and normalizes by dividing the sum of all these exponentials. <br>
It squashes the output to be between 0 and 1 = probabiltiy<br>
sum of all the proababilites is 1 <br>



In [None]:
# For example: 
        
#         --> 2.0                   --> 0.65 
#  Linear --> 1.0                   --> 0.25   --> CrossEntropy(y, y_hat)
#         --> 0.1                   --> 0.1  
        
    #scores(logits)                  #probabilties
                                     #sum = 1.0 

In [1]:
import torch
import torch.nn as nn
import numpy as np


In [6]:
def softmax(x):
    return np.exp(x) / np.sum(np.exp(x), axis=0)

**Numpy**

In [7]:
x = np.array([2.0,1.0,0.1])
#highest logit has the highest probability

outputs = softmax(x)
print("Softmax numpy:", outputs)

Softmax numpy: [0.65900114 0.24243297 0.09856589]


In [8]:
x.shape

(3,)

**Pytorch**

In [9]:
x = torch.tensor([2.0,1.0,0.1])
outputs = torch.softmax(x, dim=0) #calculate values along first axis
print(outputs)

tensor([0.6590, 0.2424, 0.0986])


# Cross-Entropy

\begin{align}
          {D(\hat{Y},Y)} =  -\frac{1}{N} {∑ {Y_i}\log(\hat{Y}_i)}
\end{align}

A lot of times softmax function is combined with the cross entropy loss so this measures the performance of our classification model whose output is a probabiltiy between 0 and 1 and it can be used in multi-class problems.<br>
The loss increases as the predicted probabilty diverges from the actual label.<br>
So the better our prediction, the lower is our loss. <br>


In [None]:
#For example
#    #one hot encoded class labels
#    Y     =  [1,0, 0]
#                            -->D(Y_hat, Y)  = 0.35
#    Y_hat =  [0.7,0.2,0.1]
#    #probabilties (softmax)

#    #one hot encoded class labels
#    Y     =  [1,0, 0]
#                            -->D(Y_hat, Y)  = 2.30
#    Y_hat =  [0.1,0.3,0.6]
#    #probabilties (softmax)



**Numpy**

In [10]:
def cross_entropy(actual, predicted):
    loss = -np.sum(actual * np.log(predicted))
    return loss   #float(predicted.shape[0])

#y must be one hot encoded
#if class 0: [1 0 0]
#if class 1: [0 1 0]
#if class 2: [0 0 1]

Y = np.array([1, 0, 0])

#y_pred has proababilites
Y_pred_good = np.array([0.7, 0.2, 0.1])
Y_pred_bad = np.array([0.1,0.3,0.6])

l1 = cross_entropy(Y, Y_pred_good)
l2 = cross_entropy(Y, Y_pred_bad)

print(f'Loss1 numpy: {l1:.4f}')
print(f'Loss2 numpy: {l2:.4f}')



Loss1 numpy: 0.3567
Loss2 numpy: 2.3026


**Pytorch**

CrossEntropyLoss in PyTorch (applies Softmax) <br>
nn.LogSoftmax + nn.NLLLoss <br>
NLLLoss = negative log likelihood loss 

In [13]:
#as softmax layer is already applied so we should not implement softmax in last layer
#it is not necessary for Y to be one hot encoded we should only put correct class
#Y_pred has raw scores(logits), no Softmax
 
loss = nn.CrossEntropyLoss()

Y = torch.tensor([0]) #only correct class label which is 0 no hot encoded value

# nsamples x nclasses = 1 x3
Y_pred_good = torch.tensor([[2.0, 1.0, 0.1]]) #this is array of arrays, these are raw values we didn't apply softmax

Y_pred_bad = torch.tensor([[0.5,2.0,0.3]])

l1 = loss(Y_pred_good, Y)
l2 = loss(Y_pred_bad, Y)

print(l1.item())
print(l2.item())

0.4170299470424652
1.840616226196289


In [20]:
#to get predictions

_, predictions1 = torch.max(Y_pred_good, 1)
_, predictions2 = torch.max(Y_pred_bad, 1)

print(predictions1)
print(predictions2)

print(f'Actual class:{Y.item()}, Y_pred1: {predictions1.item()}')
print(f'Actual class:{Y.item()}, Y_pred2: {predictions2.item()}')

tensor([0])
tensor([1])
Actual class:0, Y_pred1: 0
Actual class:0, Y_pred2: 1


In [25]:
#allows batch loss for multiple samples

#samples

Y= torch.tensor([2, 0, 1])


# nsamples x nclass = 3 x 3 
Y_pred_good = torch.tensor([[0.1, 1.0, 2.1],[2.0, 1.0, 0.1], [0.1, 3.0, 0.1]])
Y_pred_bad = torch.tensor([[2.1,1.0, 0.1],[0.1, 1.0, 2.1], [0.1,3.0,0.1]])


l1 = loss(Y_pred_good, Y)
l2 = loss(Y_pred_bad, Y)

print(l1.item())
print(l2.item())

_, predictions1 = torch.max(Y_pred_good, 1)
_, predictions2 = torch.max(Y_pred_bad, 1)

print(predictions1)
print(predictions2)

print(f'Actual class:{Y}, Y_pred1: {predictions1}')
print(f'Actual class:{Y}, Y_pred2: {predictions2}')

0.3018244206905365
1.6241613626480103
tensor([2, 0, 1])
tensor([0, 2, 1])
Actual class:tensor([2, 0, 1]), Y_pred1: tensor([2, 0, 1])
Actual class:tensor([2, 0, 1]), Y_pred2: tensor([0, 2, 1])


**Binary classification**

In [27]:
class NeuralNet1(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(NeuralNet1, self).__init__()
        self.linear1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(hidden_size, 1)


    def forward(self, x):
        out = self.linear(x)
        out = self.relu(out) 
        out = self.linear2(out)
        #sigmoid at the end
        y_pred = torch.sigmoid(out)
        return y_pred

model = NeuralNet1(input_size= 28*28, hidden_size = 5)
criterion = nn.BCELoss()

In [28]:
# Multiclass problem
class NeuralNet2(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(NeuralNet2, self).__init__()
        self.linear1 = nn.Linear(input_size, hidden_size) 
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(hidden_size, num_classes)  
    
    def forward(self, x):
        out = self.linear1(x)
        out = self.relu(out)
        out = self.linear2(out)
        # no softmax at the end
        return out

model = NeuralNet2(input_size=28*28, hidden_size=5, num_classes=3)
criterion = nn.CrossEntropyLoss()  # (applies Softmax)