In this notebook, we'll go through the neural network implementation code from Chapter 1 in lots more detail. Along the way, we'll explain some potentially confusing python commands and OOP constructs by going through plenty of mini-exercises. 

We'll be doing object-oriented Python in this study group. This means we'll be making different classes with their own methods (~functions), which can be instantiated to make objects of that class. I've written up a quick review of the central OOP ideas here.

In [31]:
import numpy as np
import random

# Let's start by making the Network class
class Network(object):

    # The __init__ method is a sort of Python equivalent of a constructor
    # Constructors initialize the values of variables of a newly constructed object
    # When we make a new network object, we'll pass a certain value of 'sizes' as an arguments
    # For example, n1 = Network([2,5,1]) makes a new Network object with the sizes variable set to [2,5,1] which means
    # we have 2 input neurons, 5 in hidden layer, 1 in output
    # So, we can make networks with different values of sizes which initialise the network differently and the
    # sizes argument specifies the number of neurons in each layer of the network
    def __init__(self, sizes):
        self.num_layers = len(sizes)
        self.sizes = sizes
        # we randomly initialise the biases of the hidden layer and output neurons only (hence sizes[1:])
        # the numpy random.randn command generates numbers from a standard normal distribution (mean 0, variance 1)
        # This is a use of a list comprehension, which are a powerful tool in python
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
        # here, we initialise the weights leading from the input to the hidden layer, and 
        # from the hidden layer to the output. Play around with the indexing and zip command to get a feel for it
        self.weights = [np.random.randn(y, x) 
                        for x, y in zip(sizes[:-1], sizes[1:])]

In [23]:
# Some useful commands you could try out to help understand what is happening in the above code are here. 
# Just put print statements in front of output you want to see. Otherwise, only the last output will be displayed
# randn command
np.random.randn(1,1)
np.random.randn(5,1)
np.random.randn(5,3)

# array indexing
sizes = [2,5,1]
sizes[1:]
sizes[:-1]
zip(sizes[:-1], sizes[1:])

# list comprehensions
# reminder of some simpler uses
[x for x in range(10)]
[x**2 for x in range(10)]
[x**2 for x in range(10) if x%2==0]

# leading up to understanding the specific usage in the Network class definition
[x+y for x, y in zip(sizes[:-1], sizes[1:])]
[x*y for x, y in zip(sizes[:-1], sizes[1:])]
[x for x, y in zip(sizes[:-1], sizes[1:])]
[[x,y] for x, y in zip(sizes[:-1], sizes[1:])]
[np.random.randn(x) for x, y in zip(sizes[:-1], sizes[1:])]
[np.random.randn(x, y) for x, y in zip(sizes[:-1], sizes[1:])]
[np.random.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])]

[7, 6]


[array([[ 1.56279143, -0.79528642],
        [-0.58178778,  1.32913706],
        [-0.84971337, -0.86955482],
        [-1.71414351,  0.61172744],
        [ 0.83496434, -0.16360056]]),
 array([[ 1.2995577 ,  0.34253338, -0.96438116, -0.5923    ,  0.86274076]])]

In [36]:
# Let's go ahead and make a new network
net1 = Network([2,5,1])
print net1.num_layers
print net1.biases
print net1.weights

3
[array([[ 0.1265344 ],
       [ 0.92402966],
       [-1.62969926],
       [ 1.62903212],
       [ 0.41219127]]), array([[ 0.83005241]])]
[array([[-0.59216698,  0.57756099],
       [-1.05164244,  1.77603307],
       [ 0.7664931 ,  0.05062822],
       [-1.40892278,  0.44292187],
       [-2.44369652,  1.38992025]]), array([[-1.75631984, -1.43950517, -0.39580496, -1.01950956, -0.03196487]])]


In [37]:
# Let's make the sigmoid function
# Refer to the textbook for an explanation of the function's shape (an S, a sort of smoothed step function)
def sigmoid(z):
    """The sigmoid function."""
    return 1.0/(1.0+np.exp(-z))

In [47]:
# Check sigmoid function works as intended
print sigmoid(0)
sigmoid(1)
sigmoid(-1)
sigmoid(10)
sigmoid(-10)

0.5


4.5397868702434395e-05

In [115]:
# Next up, let's make the feedforward method of the Network class 
# we'll put this code into the correct place later, let's understand it first
# We are trying to apply the sigmoid function to the input from each layer
# I.e. each layer applies the sigmoid function on the layer before it
# I got rid of the references to 'self' for now, for clarity
def feedforward(a):
    for b, w in zip(biases, weights):
        print "The weights we're working with are:"; print w
        print "The biases we're working with are:"; print b

        a = sigmoid(np.dot(w, a)+b)
        print "We put the weighted input + bias into the sigmoid function, which for this layer gives activations of:",; print a; print "done printing a"
    return a

In [116]:
# So we can see what's happening, let's play with numbers
biases = net1.biases
weights = net1.weights
# recall these are the biases of the 5 hidden layers and the 1 output layer
biases
# these are the weights leading from each of the 2 neurons in the input to each of the 5 neurons in the hidden layer
# and the weights from the 5 input neurons to the 1 output neuron
weights
# Recall that for each e.g. hidden layer neuron, we need to multiply all incoming input by weights and sum it up and add a bias
# Then we put this value through the sigmoid function
# So, we need an efficient way of multiplying a lot of weights and a lot of inputs (activations) together
# We'll use the dot product for this, np.dot(weights, inputs)
# let's see what np.dot does
np.dot([1,1,1], [5,0,1])
np.dot([3,1,2], [0,0,0])
# looks good to me! We multiply element wise and add these values up
# we can immediately add a bias to the output
np.dot([3,3,3], [1,1,1]) + 1
# then we'll want to apply the sigmoid function
z = np.dot([3,3,3], [1,1,1]) + 1
sigmoid(z)
# that's quite a high input to the sigmoid function, let's try something more realistic
z = np.dot([-0.3,0.2,0.5], [0.4,0.2,-0.1]) + 0.5
sigmoid(z)

0.59145897843278006

In [117]:
# what on earth does zip(biases, weights) look like? 
# if you're also finding it hard to visualise all these different matrices, try:
import pandas as pd
pd.DataFrame([1,2])
pd.DataFrame([3,4])
pd.DataFrame(zip([1,2],[3,4]))
# now we can look at these more clearly:
pd.DataFrame(biases)
pd.DataFrame(weights)
pd.DataFrame(zip(biases,weights))
# finally, try to understand:
biases
weights
zip(biases, weights)

In [None]:
# Hope that helps! Let's move on to the mini-batch stochastic gradient descent function
# We will have to take training data and the desired outputs and move the weights and bias vectors towards values 
# that get close to computing the desired output
# I am ignoring the test_data option and the self references for now.
def SGD(training_data, epochs, mini_batch_size, eta, test_data=None):

    n = len(training_data)
    # the xrange function is very similar to range(generates integers from 0 to whatever) but doesn't generate a static list at run time
    # it only generates integers when they're needed, this is good when lists are large and the system is memory sensitive
    for j in xrange(epochs):
        # random.shuffle shuffles the training data in place
        random.shuffle(training_data)
        # let's create our mini batches of size mini_batch_size
        mini_batches = [
            training_data[k:k+mini_batch_size]
            for k in xrange(0, n, mini_batch_size)]
        # then, for each mini_batch, we update
        for mini_batch in mini_batches:
            self.update_mini_batch(mini_batch, eta)
        print "Epoch {0} complete".format(j)

In [147]:
# first, let's try out the random.shuffle function
# notice the function doesn't return anything, but if you check back on x after shuffling, it's been shuffled
x = [1,2,3,4]
random.shuffle(x)
x
# now try in 2 dimensions like with our training input
x = [[1,2], [3,4], [5,6]]
random.shuffle(x)
x

[[5, 6], [1, 2], [3, 4]]

In [160]:
# Next up, let's get to grips with how the mini batches are created
# let's make up some specific values for variables so we can follow them through
training_data = [[1,1], [1,2], [2,3], [3,4], [5,7], [7,8]]
n = len(training_data)
mini_batch_size = 2

# recall that range and xrange work by specifying the starting value, the end value, and the step size
# so xrange(0, n, mini_batch_size), in our case xrange(0,6,2), will produce the output [0,2,4]
xrange(0, n, mini_batch_size)
range(0, n, mini_batch_size)

# this produces start indices which we can use to split our shuffled training data set on
training_data[0:0+mini_batch_size]
training_data[2:2+mini_batch_size]

# all together, let's now run:
# this divides up our training data set of 6 data points into mini-batches of 2 data points each
mini_batches = [training_data[k:k+mini_batch_size]
                for k in xrange(0, n, mini_batch_size)]

mini_batches


[[[1, 1], [1, 2]], [[2, 3], [3, 4]], [[5, 7], [7, 8]]]

In [None]:
# Great! For each step in the training phase (each 'epoch'), you'll have noticed that we updated each mini_batch with the
# update_mini_batch method. Now let's take a look at what this is 
# Intuitively, we want to gradually move the weights and biases towards the values that produce correct output
# The rate at which we move towards this optimum depends on the learning rate, eta

def update_mini_batch(mini_batch, eta):
    # 'nabla' is the word for the upside down triangle symbol we've been using to denote gradient vectors
    #np.zeros produces zeros in the same shape as the biases and weights
    nabla_b = [np.zeros(b.shape) for b in self.biases]
    nabla_w = [np.zeros(w.shape) for w in self.weights]
    
    for x, y in mini_batch:
        
        delta_nabla_b, delta_nabla_w = self.backprop(x, y)
        # we update the gradient vectors
        nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
        nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
    self.weights = [w-(eta/len(mini_batch))*nw 
                    for w, nw in zip(self.weights, nabla_w)]
    self.biases = [b-(eta/len(mini_batch))*nb 
                    for b, nb in zip(self.biases, nabla_b)]

In [184]:
# try out the zeros function
np.zeros(5)
np.zeros([2,5])
# try out the shape function
x = np.array([1,2,3])
x.shape
x = np.array([[1,2],[4,5]])
x.shape
x = np.array([[1,2,3],[4,5,6]])
x
x.shape
# trying it out on our data set
weights
weights[0].shape

(5, 2)