In [1]:
%run ../common/import_all.py

from common.setup_notebook import set_css_style, setup_matplotlib, config_ipython
config_ipython()
setup_matplotlib()
set_css_style()

# The backpropagation algorithm

The backpropagation algorithm is the core of how artificial neural networks manage to *learn*, doing so by iteratively correcting the error between the actual value and the predicted value in a back-propagation fashion. The original paper proposing this now universally adopted mechanism is a Nature from 1986 by Rumelhart, Hinton, Williams [[1]](#paper-original).

## What is it

The backpropagation algorithm is a brilliant way to train a network by perturbing each weight iteratively with an amount proportional to the partial derivative of the cost function with respect to it, propagating these derivatives backwards in the network. This is done to aid gradient descent and eventually train the network by reducing the error between what gets predicted and what is actually.

The idea per se is simple, the implementation is hard though and it took some research to figure out an efficient mechanism for it. Mechanism that arrived with Rumelhart & co. paper in 1986.

The reason why backpropagation is the core of the learning procedure of neural networks is that by adjusting the weights though little kicks and repeatedly, the hidden layers of the network come to *learn* features. While what happens to the input and output layers is controllable, it is the hidden layer(s) that do all the painstaking work of representing the featured of the input data. If in a network there were no hidden layer, it would be easy to change the weights in such a way that the output matches the expected real output. But the network wouldn't be learning and wouldn't do anything worth of excitement. It is via backpropagation that the network can learn, in its hidden neurons, how to represent the data.

## The procedure in detail


The notes here will follow the original paper cited above and will refer to a [feedforward network](anns.ipynb#Feedforward-networks) of [sigmoid neurons](types/sigmoid-neuron.ipynb), however the backpropagation procedure applies to any activation function, but the sigmoid makes for very nice calculations.

Let's consider the transmission of information from neuron $i$ to a neuron $j$, as per figure, where the weight of the connection is given by $w_{ij}$. We write the output of $i$, which arrives as input to $j$, as $y_i w_{ij}$, and considering all neurons at the same level of $i$ that spit to $j$ (note that we are factoring the bias as a further component to the weights vector as per [this note](types/sigmoid-neuron.ipynb#A-note-on-notation)),

$$
x_j = \sum_i y_i w_{ij}
$$

\begin{equation}
E = F \cdot s 
\end{equation}

## References

1. <a name="paper-original"></a> D E Rumelhart, G E Hinton, R J Williams, [**Learning representations by back-propagating errors**](http://www.cs.toronto.edu/~hinton/absps/naturebp.pdf), *Nature*, 323.6088, 1986