# Softmax

the formula for Softmax is:

$$\text{Prob}(i) = \dfrac{\exp(z_i)}{\sum_{N}^{j=1} \exp(z_j)}$$

$$\log\text{Prob}(i) = z_i - \log \sum_{j=1}^{N} \exp(z_j)$$

Consider a classification task with three classes $1$, $2$, $3$. Suppose a particular input is presented, producing outputs:
$$z_1 = 1$$
$$z_2 = 2$$
$$z_3 = 3$$

and that the correct class is 2

Compute each of the following to 2dp:
- Prob(1)
- Prob(2)
- Prob(3)

In [2]:
import torch as T
device = T.device("cpu")
t1 = T.tensor([1.0, 2.0, 3.0], dtype=T.float32).to(device)
sm = T.nn.functional.softmax(t1, dim=0)
T.set_printoptions(precision=4)
print("tensor t1        = ", end=""); print(t1)
print("softmax(t1)      = ", end=""); print(sm)

tensor t1        = tensor([1., 2., 3.])
softmax(t1)      = tensor([0.0900, 0.2447, 0.6652])


In 80 of the 100 cases, the target output value is 1; in the other 20, it is 0.

### Sum of Squared Errors
$$E = \dfrac{1}{2} \sum_{i}(t_i - z_i)^2$$

In [3]:
from sympy import *
z = symbols('z')

In [4]:
expr = -80 * log(z) - 20 * log(1-z)
expr

-80*log(z) - 20*log(1 - z)

In [5]:
expr_1 = diff(expr, z)
expr_1

20/(1 - z) - 80/z

In [6]:
solve(expr_1)  # this is equivalent to 0.8

[4/5]

### Cross Entropy
$$E = \sum_{i} (-t_i\log(z_i) - (1-t_i)\log(1-z_i)$$

In [7]:
expr_2 = diff(expr, z)
expr_2

20/(1 - z) - 80/z

In [8]:
solve(expr_2)  # this is equivalent to 0.8

[4/5]

# SSE & Cross Entropy

Consider a degenerate case of supervised learning where the training set consists of just a single input, repeated 100 times.

In 80 of the 100 cases, the target output value is 1; in the other 20, it is 0.

What will a back-propagation neural network predict for this example, assuming that it has been trained and reaches a global minimum? Does it make a difference whether the loss function is sum squared error or cross entropy?

(**Hint**: to find the global minimum, differentiate the loss function and set the derivative to zero.)

calculate the SSE and Cross Entropy

# Sum of Squared Errors
$$E = \dfrac{1}{2} \sum_{i}(t_i - z_i)^2$$

In [9]:
from sympy import *
z = symbols('z')

In [10]:
expr = -80 * log(z) - 20 * log(1-z)
expr

-80*log(z) - 20*log(1 - z)

In [11]:
expr_1 = diff(expr, z)
expr_1

20/(1 - z) - 80/z

In [12]:
solve(expr_1)  # this is equivalent to 0.8

[4/5]

# Cross Entropy
$$E = \sum_{i} (-t_i\log(z_i) - (1-t_i)\log(1-z_i)$$

In [13]:
expr_2 = diff(expr, z)
expr_2

20/(1 - z) - 80/z

In [14]:
solve(expr_2)  # this is equivalent to 0.8

[4/5]

### derivation of least squares

### compute softmax

### compute weight decay

### compute momentum