In [1]:
import numpy as np

# Backpropagation with learning algorithm

The backpropagation algorithm computes the gradient of the cost function for a single training example, $C = C_x$. In practice, it's common to combine backpropagation with a learning algorithm such as 'stochastic gradient descent', in which we compute the gradient for many training examples. In particular, given  a mini-batch of $m$ training examples, the following algorithm applies a gradient descent learning step based on a 'mini-batch':

1. Input a set of training examples
2. For each training example x: Set the corresponding input activation $a^{x,1}$, and perform the following steps:
   - Feedforward: For each $l = 2, 3, \ldots, L$ compute $z^{x,l} = w^l a^{x,l-1}+b^l$ and $a^{x,l} = \sigma(z^{x,l})$
   - Output error $\delta^{x,L}$: Compute the vector $\delta^{x,L} = \nabla_a C_x \odot \sigma'(z^{x,L})$
   - Backpropagate the error: For each $l = L-1, L-2,
  \ldots, 2$ compute $\delta^{x,l} = ((w^{l+1})^T \delta^{x,l+1})
  \odot \sigma'(z^{x,l})$
3. Gradient descent: For each $l = L, L-1, \ldots, 2$ update the weights according to the rule $w^l \rightarrow
  w^l-\frac{\eta}{m} \sum_x \delta^{x,l} (a^{x,l-1})^T$, and the biases according to the rule $b^l \rightarrow b^l-\frac{\eta}{m}
  \sum_x \delta^{x,l}$

## Creating a network class for training a neural network with backpropagation and stochastic gradient descent 

In [18]:
class Network(object):
    def __init__(self, sizes):
        """
        parameters
        sizes: a LIST containing the number of neurons in the respective layers 
        """
        self.num_layers = len(sizes)
        self.sizes = sizes
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
        self.weights = [np.random.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])]

        
net = Network([4,5,2])
        

In [15]:
print("The biases of the hidden layer ", net.biases[0])
print("The biases of the output layer ", net.biases[1])

The biases of the first layer  [[ 1.15494994]
 [-0.6007814 ]
 [-0.24239692]
 [ 1.78980074]
 [-0.3145079 ]]
The biases of the output layer  [[-0.74048213]
 [-0.71329663]]


In [16]:
print("The weights connecting the input layer with the hidden layer ", net.weights[0])
print("The weights connecting the first layer with the output layer  ", net.weights[1])

The weights connecting the input layer with the first layer  [[-0.5429775  -0.02178585  0.72909077  0.46251342]
 [ 1.24434343  0.73691073  2.19286778  0.71802016]
 [-0.13738598  1.67472041  0.19574636 -0.01059091]
 [-0.67570365  0.84736245 -1.80284305 -2.30078438]
 [ 0.76816213  1.85204346 -0.04290432  1.34077169]]
The weights connecting the first layer with the output layer   [[ 0.22725836 -0.22472271 -1.18352648  0.53570675  1.29350688]
 [-0.61104483 -0.52933924  0.04232861 -0.19318958  1.28618062]]
