# Softmax and Sigmoid

Task: more practice using the `softmax` function, and connect it with the `sigmoid` function.

## Setup

In [1]:
import torch
from torch import tensor
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
def softmax(x):
    return torch.softmax(x, axis=0)

## Task

Try this example:

In [3]:
x1 = tensor([0.1, 0.2, 0.3])
x2 = tensor([0.1, 0.2, 100])

In [4]:
softmax(x1)

tensor([0.3006, 0.3322, 0.3672])

1. Write a chunk of code that assigns `p = softmax(x1)` then evaluates `p.sum()`. **Before you run it**, predict what the output will be.

In [5]:
# your code here #-out
p = softmax(x1)
p.sum()

tensor(1.0000)

2. Write a chunk of code that evaluates `p2 = softmax(x2)` and displays the result. **Before you run it**, predict what it will output.

In [6]:
# your code here #-out
p2 = softmax(x2)
p2

tensor([4.0638e-44, 4.4842e-44, 1.0000e+00])

3. Evaluate `torch.sigmoid(tensor(0.1))`. Write an expression that uses `softmax` to get the same output. *Hint*: Give `softmax` a two-element `tensor([num1, num2])`, where one of the numbers is 0.

In [7]:
print(f"{torch.sigmoid(tensor(0.1))}")
# your code here
print("Computing the same expression using softmax:", softmax(tensor([0.1, 0.0]))[0])

0.5249791741371155
Computing the same expression using softmax: tensor(0.5250)


## Analysis

1. A valid probability distribution has no negative numbers and sums to 1. Is `softmax(x)` a valid probability distribution? Why or why not?

*your answer here*

2. Jargon alert: sometimes `x` is called the "logits". `x.softmax()` is called the "probs", short for "probabilities". Now, we could take the log of `probs` to get something we call `logprobs`. See the cell below.

In [10]:
logits = x1
probabilities = softmax(logits)
logprobs = probabilities.log() # alternatively, x1.log_softmax(axis=-1)

- Is `softmax(logprobs)` the same as `softmax(logits)`?
- Compute `logits - logprobs`. What do you notice about the numbers in the result?
- Could you write `logprobs = logits + some_number`? What would `some_number` be? Hint: it's the log of the sum of something.

In [12]:
softmax(logprobs), softmax(logits)

(tensor([0.3006, 0.3322, 0.3672]), tensor([0.3006, 0.3322, 0.3672]))

In [13]:
# your code here #-out
logits - logprobs

tensor([1.3019, 1.3019, 1.3019])

In [14]:
# your code here
logits.exp().sum().log()

tensor(1.3019)

In [17]:
# here's the hint
logits.logsumexp(axis=-1)

tensor(1.3019)

3. In light of your observations about the difference between `softmax(x1)` and `softmax(x2)`, why might `softmax` be an appropriate name for this function?

*your answer here*