In [None]:
import torch
from torch import Tensor

# Tutorial 1b: Softmax Function

**Question:** To have the logistic regressor output probabilities, they need to be processed through a softmax layer. Implement a softmax layer yourself. What numerical issues may arise in this layer? How can you solve them? Use the testing code to confirm you implemented it correctly.

**Discussion**

The softmax layer returns `nan`. This problem rises due to the numerical instability of softmax. This is caused by overflow (applying exponential for very large range of value) or underflow. This problem can be solved by shifting the $x_{i}$'s by a constant. We can stabilize our softmax by taking the constant as $max(x).$

In [None]:
logits = torch.rand((1, 20)) + 100

In [None]:
# logits

In [None]:
def bad_softmax(x: Tensor) -> Tensor:
    return torch.exp(x) / torch.sum(torch.exp(logits), axis=0)

In [None]:
torch.sum(bad_softmax(logits))

tensor(nan)

In [None]:
import torch.nn as nn
m=nn.Softmax(dim=1)
outs=m(logits)
outs

tensor([[0.0609, 0.0569, 0.0362, 0.0488, 0.0496, 0.0398, 0.0400, 0.0531, 0.0297,
         0.0572, 0.0395, 0.0365, 0.0591, 0.0620, 0.0661, 0.0668, 0.0518, 0.0475,
         0.0608, 0.0375]])

In [None]:
# import torch.nn as nn
# m=nn.LogSoftmax(dim=1)
# out=m(logits)
# out

In [None]:
def good_softmax(x: Tensor) -> Tensor:
    ###########################################################################
    max=torch.max(x) 
    num=torch.exp(x-max)   
    den=torch.sum(num)   
    soft=num/den           
    ###########################################################################
    return soft

In [None]:
good_softmax(logits)
# torch.sum(good_softmax(logits))

tensor([[0.0609, 0.0569, 0.0362, 0.0488, 0.0496, 0.0398, 0.0400, 0.0531, 0.0297,
         0.0572, 0.0395, 0.0365, 0.0591, 0.0620, 0.0661, 0.0668, 0.0518, 0.0475,
         0.0608, 0.0375]])

Because of numerical issues like the one you just experiences, PyTorch code typically uses a `LogSoftmax` layer.

**Question [optional]:** PyTorch automatically computes the backpropagation gradient of a module for you. However, it can be instructive to derive and implement your own backward function. Try and implement the backward function for your softmax module and confirm that it is correct.