# Softmax

The softmax function is described in detail in [Wikipedia](https://en.wikipedia.org/wiki/Softmax_function).

In short, it returns a probability distribution for the values in a vector **z**. It is defined as, for each element $z_j$ in vector **z**:

\begin{equation}
\sigma(z)_{j} = \frac{e^{z_{j}}}{\Sigma_{k=1}^K e^{z_{k}}}
\end{equation}

Recognize that it *scales* the exponent of each element (= $e^{z_j}$) by dividing it by a constant sum, which is the summation of the exponents of all the elements of the vector (= $\Sigma_{k=1}^K e^{z_{k}}$).

If there is just one value in **z**, it translates to:

\begin{equation}
\sigma(z)_{1} = \frac{e^{z_{j}}}{\Sigma_{k=1}^1 e^{z_{k}}} = \frac{e^{z_1}}{e^{z_1}} = 1
\end{equation}

For two values:

\begin{equation}
\begin{split}
\sigma(z)_{1} &= \frac{e^{z_1}}{e^{z_1} + e^{z_2}} \\
\sigma(z)_{2} &= \frac{e^{z_2}}{e^{z_1} + e^{z_2}} \\
\end{split}
\end{equation}

And so on.

Note that the sum of the softmax result is always 1.

In [13]:
import math

def softmax(vector):
    s = 0
    for x in vector:
        s = s + math.exp(x)
    return [math.exp(x) / s for x in vector]

In [14]:
softmax([8])

[1.0]

In [15]:
softmax([3, 3])

[0.5, 0.5]

In [22]:
softmax([1, 2])

[0.2689414213699951, 0.7310585786300049]

In [16]:
v = [5, 0, 2, 3, 1, 4]
softmax(v)

[0.6336913225737218,
 0.00426977854528211,
 0.03154963320110001,
 0.08576079462509835,
 0.011606461431184656,
 0.23312200962361299]

In [17]:
sum(softmax(v))

0.9999999999999999