# Safe Softmax

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pytorch/maskedtensor/blob/main/docs/source/notebooks/safe_softmax.ipynb)

## Motivation

One of the issues that commonly comes up is the necessity for a safe softmax -- that is, if there is an entire batch that is "masked out" or consists entirely of padding (which in the softmax case translates to being set to `-inf`, then this will result in NaNs, which can lead to training divergence. For more detail on why this functionality is helpful, please find [Issue 55056 - Feature Request for Safe Softmax](https://github.com/pytorch/pytorch/issues/55056).

Luckily, MaskedTensor has solved this issue already.

In [17]:
import torch
from maskedtensor import masked_tensor

In [10]:
data = torch.randn(3, 3)
mask = torch.tensor([
    [True, False, False],
    [True, False, True],
    [False, False, False]
])
x = data.masked_fill(~mask, float('-inf'))

m = masked_tensor(data, mask)

**PyTorch result**:

In [19]:
x.softmax(0)

tensor([[0.2291,    nan, 0.0000],
        [0.7709,    nan, 1.0000],
        [0.0000,    nan, 0.0000]])

**MaskedTensor result**:

In [20]:
m.softmax(0)

masked_tensor(
  [
    [  0.2291,       --,       --],
    [  0.7709,       --,   1.0000],
    [      --,       --,       --]
  ]
)