## Symbolic differentiation with the Python package ``scipy``

We are going to verify the formulas in the notes
[*Softmax and categorical cross entropy*](https://github.com/schneider128k/machine_learning_course/blob/master/slides/softmax.pdf) using the Python package ```scipy```. 

It lets you do symbolic differentiation. See [differentiation with ```scipy```](https://scipy-lectures.org/packages/sympy.html#differentiation).

In [0]:
import sympy as sym

## Weighted inputs $z_1, z_2, z_3$

In [0]:
z1 = sym.Symbol('z1')
z2 = sym.Symbol('z2')
z3 = sym.Symbol('z3')


## Softmax activations $a_k$

The softmax activations $a_k$ are given by

$$ a_k = \frac{e^{z_k}}{\sum_{j=1}^n e^{z_j}}. $$

In [0]:
a1 = sym.exp(z1) / (sym.exp(z1) + sym.exp(z2) + sym.exp(z3))
a2 = sym.exp(z2) / (sym.exp(z1) + sym.exp(z2) + sym.exp(z3))
a3 = sym.exp(z3) / (sym.exp(z1) + sym.exp(z2) + sym.exp(z3))


## Partial derivatives of $a_k$ with respect to $z_j$

We want to verify the formula (2) for the partial derivatives of the activations $a_k$ with respect to the weighted inputs $z_j$:

$${\partial a_k \over \partial z_j} = a_k \cdot (\delta_{jk} - a_j)$$

In [0]:
sym.simplify(a1 * (1 - a1) - sym.diff(a1, z1))

0

In [0]:
sym.simplify(a1 * -a2 - sym.diff(a1, z2))

0

In [0]:
sym.simplify(a1 * -a3 - sym.diff(a1, z3))

0

## Categorical cross entropy loss function $\mathcal{L}$

The cross entropy function is given by

$$ \mathcal{L} = -\sum_{k=1}^m y_k \log a_k. $$

In [0]:
# labels
y1 = sym.Symbol('y1')
y2 = sym.Symbol('y2')
y3 = sym.Symbol('y3')

# loss
L = - y1 * sym.log(a1) - y2 * sym.log(a2) - y3 * sym.log(a3) 

## Partial derivatives of the loss function $\mathcal{L}$ with respect to the activations $a_k$

These are given by formula (5). 

$$ {\partial \mathcal{L} \over \partial z_j} = - \frac{y_k}{a_k}$$

## Partial derivatives of the loss function $\mathcal{L}$ with respect to the activations $z_j$.

We want to verify the formula (8) for the partial derivative of the loss function $\mathcal{L}$ with respect to $z_j$:

$$ {\partial \mathcal{L} \over \partial z_j} = \sum_{k=1}^m y_k \cdot (a_j - \delta_{kj})$$

In [0]:
sym.simplify(y1 * (a1 - 1) + y2 * a1 + y3 * a1 - sym.diff(L, z1))

0

In [0]:
sym.simplify(y1 * a2 + y2 * (a2 - 1) + y3 * a2 - sym.diff(L, z2))

0

In [0]:
sym.simplify(y1 * a3 + y2 * a3 + y3 * (a3 - 1) - sym.diff(L, z3))

0