<a href="https://colab.research.google.com/github/ishandahal/stats453-deep_learning_torch/blob/main/Logistic_/Onehot_Encoding_and_Cross_Entropy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### **Understanding Onehot Encoding and Cross Entropy in PyTorch**

In [1]:
import torch

#### Onehot Endcoding

In [60]:
def to_onehot(y, num_classes):
    y_onehot = torch.zeros(y.size(0), num_classes)
    y_onehot.scatter_(1, y.view(-1, 1).long(), 1).float()
    return y_onehot

y = torch.tensor([0, 1, 2, 2,])

y_enc = to_onehot(y, 3)

print('One-hot encoding:\n', y_enc)

One-hot encoding:
 tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.],
        [0., 0., 1.]])


### **Softmax**

Suppose we have some net inputs Z, where each row is one training example:

In [22]:
Z = torch.tensor([[-0.3, -0.5, -0.5],
                  [-0.4, -0.1, -0.5],
                  [-0.3, -0.94, -0.5],
                  [-0.99, -0.88, -0.5]])
Z

tensor([[-0.3000, -0.5000, -0.5000],
        [-0.4000, -0.1000, -0.5000],
        [-0.3000, -0.9400, -0.5000],
        [-0.9900, -0.8800, -0.5000]])

Converting the inputs into "probabilities" via softmax

In [39]:
def softmax(z):
    # While summing the dimension changes so using some array manipulation
    return (torch.exp(z.t()) / torch.sum(torch.exp(z), dim=1)).t()

def softmax2(z):
    # Same result as above but keeping dimension helps so no need of array play
    return (torch.exp(z) / torch.sum(torch.exp(z), dim=1, keepdim=True))

smax = softmax(Z)
smax2 = softmax2(Z)

print('softmax:\n', smax)
print('softmax2:\n', smax2)

softmax:
 tensor([[0.3792, 0.3104, 0.3104],
        [0.3072, 0.4147, 0.2780],
        [0.4263, 0.2248, 0.3490],
        [0.2668, 0.2978, 0.4354]])
softmax2:
 tensor([[0.3792, 0.3104, 0.3104],
        [0.3072, 0.4147, 0.2780],
        [0.4263, 0.2248, 0.3490],
        [0.2668, 0.2978, 0.4354]])


tensor([[0.7408, 0.6703, 0.7408, 0.3716],
        [0.6065, 0.9048, 0.3906, 0.4148],
        [0.6065, 0.6065, 0.6065, 0.6065]])

Probabilities can be converted back to class labels based on the largest probability in each row

In [57]:
def to_classlabel(z):
    return torch.argmax(z, dim=1)

print('predicted class labels: ', to_classlabel(smax))
print('true class labels: ', to_classlabel(y_enc))

predicted class labels:  tensor([0, 1, 0, 2])
true class labels:  tensor([ 0,  1,  2,  2, 10])


### Cross Entropy

In [61]:
def cross_entropy(softmax, y_target):
    return - torch.sum(torch.log(softmax) * (y_target), dim=1)

xent = cross_entropy(smax, y_enc)
print("Cross Entropy: ", xent)

Cross Entropy:  tensor([0.9698, 0.8801, 1.0527, 0.8314])


## In PyTorch

In [62]:
import torch.nn.functional as F

Note that ```nll_loss``` takes log(softmax) as input:

In [63]:
F.nll_loss(torch.log(smax), y, reduction='none')

tensor([0.9698, 0.8801, 1.0527, 0.8314])

Note that ```cross_entropy``` takes logits as input

In [66]:
F.cross_entropy(Z, y, reduction='none')

tensor([0.9698, 0.8801, 1.0527, 0.8314])