# Loss Function

## Categorical Cross-Entropy

<br>

$L_i = - \sum{y_{i,j}log(\hat{y}_{i,j})}$

<br>

|    Symbol    | Definition         |
|:------------:| ------------------ |
|    $L_i$     | sample loss value  |
|     $i$      | i-th sample in a set |
|     $j$      | label/output index |
|     $y$      | target values |
|  $\hat{y}$   | predicted values |


-------------------------------

Then because of One hot encoding, it can then be simplified to:

$L_i = -log(\hat{y}_{i,k})$

<br>

|  Symbol   | Definition                                             |
|:---------:|--------------------------------------------------------|
|   $L_i$   | sample loss value                                      |
|    $i$    | i-th sample in a set                                   |
|    $k$    | target label index, index of correct class probability |
| $\hat{y}$ | predicted values                                       |

In [3]:
import math

# An example output from the output layer of the neural network
softmax_output = [0.7, 0.1, 0.2]

# Ground truth
target_output = [1, 0, 0]

loss = -(math.log(softmax_output[0]) * target_output[0] +
         math.log(softmax_output[1]) * target_output[1] +
         math.log(softmax_output[2]) * target_output[2])

print(loss)

# Simplified 
loss = -math.log(softmax_output[0])
print(loss)


# --------------------
# ------ Further proof
# --------------------

# Where the output (or "confidence") is higher, the loss is lower
print(-math.log(0.7))

# Vice-Versa: Output is lower, the loss is higher
print(-math.log(0.5))


0.35667494393873245
0.35667494393873245
0.35667494393873245
0.6931471805599453
