## Softmax Layer

When there are several possible outcomes, expressing probability of each outcome is not possible with logistic activation, because activations between units must add up to one.

$$ \textbf{z}^{[l]} = \text{exp}(\textbf{a}^{[l-1]}) $$ 

$$ 
\textbf{a}^{[l]}_{:, m} 
= \frac
    {\textbf{z}^{[l]}_{:, m}}
    {\sum_{k=1}^M{\textbf{z}^{[l]}_{:, k}}}
$$

Vector $\textbf{z}^{[l]} \in \mathbb{R}^{S \times M}$ is the result of element-wise exponentiation. When $M = 2$, the softmax activation is identical to the logistic activation.

In [1]:
import pandas as pd
import numpy as np

df = pd.DataFrame()
df['a[l-1]'] = [-2, -1, 0, 1, 2]
df['z[l]'] = np.exp(df['a[l-1]'])
df['a[l]'] = df['z[l]'] / df['z[l]'].sum()
df = df.round(2)
df.index.name = 'm'
df

Unnamed: 0_level_0,a[l-1],z[l],a[l]
m,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,-2,0.14,0.01
1,-1,0.37,0.03
2,0,1.0,0.09
3,1,2.72,0.23
4,2,7.39,0.64


In [2]:
from tensorflow.keras.layers import Softmax
?Softmax

[0;31mInit signature:[0m [0mSoftmax[0m[0;34m([0m[0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
Softmax activation function.

Input shape:
  Arbitrary. Use the keyword argument `input_shape`
  (tuple of integers, does not include the samples axis)
  when using this layer as the first layer in a model.

Output shape:
  Same shape as the input.

Arguments:
  axis: Integer, axis along which the softmax normalization is applied.
[0;31mFile:[0m           /opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/layers/advanced_activations.py
[0;31mType:[0m           type
[0;31mSubclasses:[0m     
