In [1]:
%run Latex_macros.ipynb

<IPython.core.display.Latex object>

Macro `_latex_std_` created. To execute, type its name (without quotes).
=== Macro contents: ===
get_ipython().run_line_magic('run', 'Latex_macros.ipynb')
 

In [2]:
# My standard magic !  You will see this in almost all my notebooks.

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# Reload all modules imported with %aimport
%load_ext autoreload
%autoreload 1

%matplotlib inline

# How a Neural Network toolkit works

TensorFlow is the toolkit of primitives that underlies Keras.

It is what powers training and computation in Neural Networks.

Although it might seem mysterious, it (and similar toolkits) is based on a very simple concept.

Here is pseudo-code for the *training loop*
- The part of the Keras framework that implements `fit`
- It solves for the optimal weights $\W^*$ that minimize the Loss function
- Pre-Keras, the user coded this loop for each problem

It is nothing more than Gradient Descent.

- We process all the training examples once per epoch
- The epoch is divided into *mini-batches*: disjoint subsets of training examples
- The estimate of the weights is updated in each epoch
- We do this for many epochs, until the Loss function no longer decreases

Each epoch consists of two phases
- A Forward Pass in which inputs are mapped into predictions, for each example in the mini batch
    - An Average Loss is computed over all examples in the mini batch
- A Backward Pass in which gradients of the Average Loss are computed
    - And used to update the weights

# The Forward and Backward API

There is a clever "trick" that facilitates
- Computation of predictions (Forward Pass)
- Computation of analytical derivatives (Backward Pass)

**Each atomic operation is implemented by an Object-Oriented Class**

The class implements methods
- `forward` for the Forward Pass
- `backward` for the Backward Pass

This trick is repeated many times, for each atomic operation.

That's all there is to it: Consistent application of a simple trick !

Let's illustrate using the Multiplication operation.

# Inside the Forward Pass

The essential part of the Forward Pass is computing layer $\\l$'s output $\y_\llp$ from
the layer's input $\y_{(\ll-1)}$ and the layer's weights $\W_{(\ll)}$.

$$
\y_{(\ll)} = a_{(\ll)}( f_{(\ll)}( \y_{(\ll-1)}, \W_{(\ll)})
$$

For simplicity of presentation, we will temporarily assume that the activation $a_\llp$ is the identity
function.

(Without loss of generality, we can implement the activation as a separate layer that also obeys
the per layer logic we are about to present).

Consider the atomic operation of multiplication
`x * y`

We define a class `MultiplyLayer`
- derived from parent class `Layer`, which requires the `forward` and `backward` methods

Here is the code for the Forward Pass

Not surprisingly
- The key statement is the one that multiplies the two inputs
- And returns the product

Just as you would expect.

But also notice that we are saving the two multiplicands (x and y).

We will need them for the Backward Pass.

# Inside the Backward Pass

The job of the Backward Pass is
- To take the Loss gradient $\loss'_\llp$ for the layer
- Compute the Loss gradient $\loss'_{(\ll-1)}$ to "flow backwards" to the previous layer
- Compute the Local gradients
- Obtain the derivative with respect to $\W_\llp$, the layer's weights, using the Loss and Local gradients


That is, the derivative of the Loss with respect to the layer's weights

$$
\frac{\partial \loss}{\partial \W_\llp}
$$

is computed via the Chain Rules as

$$
\begin{array}[lll] \\
\frac{\partial \loss}{\partial \W_\llp} & = & \frac{\partial \loss}{\partial \y_\llp} \frac{\partial \y_\llp}{\partial \W_\llp} & = & \loss'_\llp \frac{\partial \y_\llp}{\partial \W_\llp}
\end{array}
$$

Here is the code

The `backward` method
- Takes the loss gradient $\loss'_\llp = \frac{\partial \loss}{\partial y_\llp}$ as input via formal parameter `dz`
  - Variable `dz` denotes $\frac{\partial \loss}{\partial z}$. the derivative of the loss with respect to `z` 
    - Which is the loss gradient $\frac{\partial \loss}{\partial y_\llp}$
    - Since variable `z` is the name for  $\y_\llp$ 
 

- Computes the local gradients $\frac{\partial \y_\llp}{\partial \y_{(\ll-1)}}$
$$
    \begin{array}[lll]\\
    \frac{\partial \y_\llp}{\partial \y_{(\ll-1)}} & = &[ \frac{\partial \y_\llp}{\partial x},  \frac{\partial \y_\llp}{\partial y}] & \text{Since } \y_{(\ll-1)} = [x,y]\\
    & = & [ \frac{\partial z}{\partial x},  \frac{\partial z}{\partial y}] & \text{Since } z = y_\llp \\
    & = & [ \frac{\partial (x*y)}{\partial x},  \frac{\partial (x*y)}{\partial y}] & \text{Since } z = x*y \\
    & = & [ y,  x] & \text{Since } z = x*y \\
    \end{array}
    $$
    
- `local_grad_x, local_grad_y` are the variables that store the local gradients    

- Multiplies the loss gradient $\loss'_\llp$ (stored in variable `dz`)
- By the local gradients $\frac{\partial \y_\llp}{\partial \y_{(\ll-1)}}$ (stored in variables `local_grad_x, local_grad_y`)
- To compute the product which is $\loss'_{(\ll-1)}$
- Returned as `[ dx, dy ]`

Thus, the `backward` method flows the loss gradient "backwards" one layer
- And facilitates the computation of 
$$
\begin{array}[lll] \\
\frac{\partial \loss}{\partial \W_\llp} & = & \frac{\partial \loss}{\partial \y_\llp} \frac{\partial \y_\llp}{\partial \W_\llp} & = & \loss'_\llp \frac{\partial \y_\llp}{\partial \W_\llp}
\end{array}
$$
- In the multiply layer, there are no weights $\W_\llp$

Now you can see why the `forward` method stored the multiplicands `x, y`
- They were needed as
   $[ y, x ] = [ \frac{\partial (x*y)}{\partial x},  \frac{\partial (x*y)}{\partial y}] $



# Conclusion

The whole basis of toolkits for Neural Networks is this simple Module API consisting of methods
- `forward`
- `backward`

Knowing this: you can implement *your own* operations if you ever find that necessary.

That is how more complex layers are implemented (e.g., Convolution).

Hopefully this demystified the notion that Neural Network toolkits are complicated.

In [5]:
print("Done")

Done
