# `008-softmax-2`

Task: more practice using the `softmax` function, and connect it with the `sigmoid` function.

## Setup

In [1]:
import torch
from torch import tensor
import matplotlib.pyplot as plt
%matplotlib inline

## Task

Try this example: `x = tensor([0.1, 0.2, 0.3])`

1. If `p=softmax(x)`, what is `p.sum()`? Can you get `p.sum()` to change by changing `x`? Can you make `p.min()` be less than 0?
2. Try making an `xx` that's the same as `x` except that `xx[2] = 100`, and let `p = softmax(xx)` again. How does `p[2]` compare with `p[0]` and `p[1]`?
3. Try `torch.sigmoid(tensor(0.1))`. Can you write an expression that uses `torch.softmax` to get the same output?

**Hint for \#3**: Give `sigmoid` a two-element `tensor`. One of those elements will be 0.

## Solution

Add code and Markdown cells for each of the listed tasks above.

In [2]:
x = tensor([0.1, 0.2, 0.3])

### Problem 1

In [3]:
p = torch.softmax(x, dim=0)
p.sum()

tensor(1.0000)

In [4]:
# Attempt to get p.sum() to change by changing x
p1 = torch.softmax(x + 100, dim=0)
p2 = torch.softmax(x * 3, dim=0)
p1.sum(), p2.sum()

(tensor(1.), tensor(1.))

In [5]:
p.min(), p1.min(), p2.min()

(tensor(0.3006), tensor(0.3006), tensor(0.2397))

In [6]:
# Attempt to get p.min() to be less than 0
p3 = torch.softmax(x * 0.00001, dim=0)
p4 = torch.softmax(x * 100000, dim=0)
p3.min(), p4.min()

(tensor(0.3333), tensor(0.))

Based on my (limited) testing, `p.sum()` cannot change by changing `x`; `p.sum()` will always equal 1. `p.min()` also cannot be less than 0.

### Problem 2

In [7]:
xx = tensor([0.1, 0.2, 100])
p = torch.softmax(xx, dim=0)
p

tensor([4.0638e-44, 4.4842e-44, 1.0000e+00])

`p[2]` is much, much greater than `p[0]` and `p[1]`, to the point where `p[2]` is pretty much 1 and `p[0]` and `p[1]` are extremely tiny (almost insignificant) integers.

### Problem 3

In [8]:
sig = torch.sigmoid(tensor([0.1, 0]))
soft = torch.softmax(tensor([0.1, 0]), dim=0)
sig, soft

(tensor([0.5250, 0.5000]), tensor([0.5250, 0.4750]))

In [9]:
sig[0], soft[0]

(tensor(0.5250), tensor(0.5250))

## Analysis

1. A valid probability distribution has no negative numbers and sums to 1. Is `softmax(x)` a valid probability distribution?

Based on my observations, yes.

2. Sometimes `x` is called the "logits" and `x.softmax().log()` (or `x.log_softmax()`) is called the "logprobs", short for "log probabilities". Compute the logits, logprobs, and probabilities for `x` in the example above.

In [10]:
logits = x
logprobs = logits.softmax(dim=0).log()
probs = logprobs.exp()

# Compare the computed probabilities with the results of softmax() earlier
probs, p

(tensor([0.3006, 0.3322, 0.3672]),
 tensor([4.0638e-44, 4.4842e-44, 1.0000e+00]))

3. In light of your observations about `xx[2]` and `p[2]` above, why might `softmax` be an appropriate name for this function?

`softmax` might be an appropriate name because it maps large numbers to a value close to 1 and small numbers to a value closer to 0, relative to all of the numbers passed to `softmax`. As with `xx[2]` and `p[2]`, `softmax` mapped 100 to a 1, indicating that not only was 100 the largest value in the given tensor, but that it was much greater than the other values in the same tensor. If a 1 was used instead of 100, `softmax` would have mapped each value in the tensor to a less sparse range (see below). Therefore, the "soft" in `softmax` refers to the max number in the given tensor being mapped to a relatively larger value than what the others are mapped to.

In [11]:
x_100 = tensor([0.1, 0.2, 100])
p_100 = torch.softmax(x_100, dim=0)

x_1 = tensor([0.1, 0.2, 1])
p_1 = torch.softmax(x_1, dim=0)

p_100, p_1

(tensor([4.0638e-44, 4.4842e-44, 1.0000e+00]),
 tensor([0.2191, 0.2421, 0.5388]))