# Softmax and Sigmoid

Task: more practice using the `softmax` function, and connect it with the `sigmoid` function.

## Setup

In [1]:
import torch
from torch import tensor
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
def softmax(x):
    return torch.softmax(x, axis=0)

## Task

Try this example:

In [3]:
x1 = tensor([0.1, 0.2, 0.3])
x2 = tensor([0.1, 0.2, 100])

In [4]:
softmax(x1)

tensor([0.3006, 0.3322, 0.3672])

1. Write a block of code that assigns `p = softmax(x1)` then evaluates `p.sum()`. **Before you run it**, predict what the output will be.

In [5]:
# your code here
p = softmax(x1)
p.sum()

tensor(1.0000)

2. Write a block of code that evaluates `p2 = softmax(x2)` and displays the result. **Before you run it**, predict what it will output.

In [6]:
# your code here
p2 = softmax(x2)
p.sum()

tensor(1.0000)


3. Evaluate `torch.sigmoid(tensor(0.1))`. Write an expression that uses `softmax` to get the same output. *Hint*: Give `sigmoid` a two-element `tensor([num1, num2])`, where one of the elements is 0.

In [21]:
# your code here
torch.sigmoid(tensor([0.1, -0.1]))

tensor([0.5250, 0.4750])

In [23]:
n = tensor([0.1,0.0])
print(torch.sigmoid(n))
print(softmax((n)))

tensor([0.5250, 0.5000])
tensor([0.5250, 0.4750])


## Analysis

1. A valid probability distribution has no negative numbers and sums to 1. Is `softmax(x)` a valid probability distribution? Why or why not?

Yes it is because the outputs sum to 1.

2. Jargon alert: sometimes `x` is called the "logits" and `x.softmax(axis=0).log()` (or `x.log_softmax(axis=0)`) is called the "logprobs", short for "log probabilities". Complete the following expressions for `x1` (from the example above).

In [27]:
logits = tensor([0.1, 0.2, 0.3])
logprobs = logits.softmax(axis=0).log()
probabilities = logits.softmax(axis=0)
print(logprobs)
print(probabilities)


tensor([-1.2019, -1.1019, -1.0019])
tensor([0.3006, 0.3322, 0.3672])


3. In light of your observations about the difference between `softmax(x1)` and `softmax(x2)`, why might `softmax` be an appropriate name for this function?

In [29]:
print(softmax(x1))
print(softmax(x2))

tensor([0.3006, 0.3322, 0.3672])
tensor([4.0638e-44, 4.4842e-44, 1.0000e+00])


It might be an appropriate name because the output isn't one absolute maximum; `softmax()` returns a range/list of possible maximum probabilities. 