# Softmax 
![image-4](image-4.png)

It applies the exponential function to each element and normalizes it by dividing by the sum of all these exponentials. 

Basically squishes output to be between 0-1 and help to calculate probability.

In [1]:
import torch

x = torch.tensor([2.0,1.0,0.1])
outputs = torch.softmax(x, dim=0) # dimension so that it computes along the first axis
outputs

tensor([0.6590, 0.2424, 0.0986])

# Cross-Entropy
![image-5](image-5.png)

- Bad prediction = high cross entropy loss 
- Good prediction = low cross entropy loss

![image-6](image-6.png)


In [2]:
import torch
import torch.nn as nn
import numpy as np

# instatntiate loss
loss = nn.CrossEntropyLoss()

y =torch.tensor([0])
y

tensor([0])

In [3]:
# nsamples x nclasses =1x3 (1 sample in 3 possible classes)
y_pred_good = torch.tensor([[2.0,1.0,0.1]]) #<-- not applied softmax
y_pred_bad = torch.tensor([[0.5,2.0,0.3]]) #<-- not applied softmax

In [4]:
# compute loss
l1 = loss(y_pred_good, y)
l2 = loss(y_pred_bad, y)

l1,l2

(tensor(0.4170), tensor(1.8406))

Good predicted value has less Cross Entropy Loss than bad predicted value.

In [5]:
# get actual predictions
_, predictions1 = torch.max(y_pred_good, 1)
_, predictions2 = torch.max(y_pred_bad, 1)

predictions1, predictions2

(tensor([0]), tensor([1]))

- Good prediction --> correct predicted that sample belongs to class 0
- Bad prediction --> incorrect prediction

### Cross Entropy in multiple samples

In [6]:
# suppose 3 samples
y = torch.tensor([2,0,1])

# nsamples x nclasses =3x3 (3 sample in 3 possible classes)
y_pred_good = torch.tensor([[0.1,1.0,2.0],[2.0,1.0,0.1],[0.0,3.0,0.1] ])
y_pred_bad = torch.tensor([[2.5,1.0,0.3],[0.1,1.0,2.1],[0.1,3.0,0.1]])

In [7]:
# compute loss
l1 = loss(y_pred_good, y)
l2 = loss(y_pred_bad, y)

l1,l2

(tensor(0.3112), tensor(1.6589))

In [8]:
# get actual predictions
_, predictions1 = torch.max(y_pred_good, 1)
_, predictions2 = torch.max(y_pred_bad, 1)

predictions1, predictions2

(tensor([2, 0, 1]), tensor([0, 2, 1]))