# Exploring Neural Networks
Welcome! I'm still moving forward in learning the important parts of data science, machine learning and artificial intelligence. This time I'll be exploring how simple [neural networks](https://en.wikipedia.org/wiki/Artificial_neural_network) work by _designing one from scratch_.

This notebook will be exploring the mathematical theory behind neural networks in order to better understand how they work and why they might fail. I am a firm believer that you should learn the stuff you're working with on a theoretical level. That way, if there is any odd behaviour, you can confirm what might be causing it. I'm not a code-monkey: I'm a human who can think.

## Neural Networks
Let's start by defining what a neural network, and eventually an artificial neural network, is. A neural network is defined as, "a series of interconnected neurons whose activation defines a recognizable linear pathway." To take that out of nerd speak and return it back to human speak, a neural network is a brain! It is a bunch of neurons that fire depending on inputs to produce outputs.

An _artificial_ neural network is simply a digitized version of this. By creating a set of connected artificial neurons, it is possible to predict non-linear behaviour from lots of different input types. Although the principle is quite organic, we use massive amounts of mathematics to develop our neural networks. You'll get a taste of that as you progress throughout this notebook.

## Why Not Use A Package Like scikit-learn?
My background is in physics. We want to understand the deep relationships and theory behind everything in order to predict, debug and analyse results. Machine learning is a prime example of something that should be understood in theory. 

By understanding _how_ AI works, you can begin to predict downfalls in your training models. You might understand why two neurons keep getting the same weights. You can understand the significance of an outlying result. 

If I were to just use a machine learning package like scikit-learn, Tensorflow, Caffe or the likes I would never understand what I'm doing. Data would pass in and magic would pass out. That's not a good thing. So I've decided that I should implement a very basic neural network to get the proper understanding I require to move forward.

## Inspiration, Tutorials and Resources
Of course I never learnt this stuff from trial and error. I had to teach myself using a bunch of resources online, who I wouldn't want to go unacknowledged. The main resources I used were:
 - iamtrask's blog article on "[A Neural Network in 11 lines of Python](https://iamtrask.github.io/2015/07/12/basic-python-network/)"
 - Matt Mazur's fantastic article on the theory behind backpropogation, "[A Step by Step Backpropagation Example](https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/)" 
 - A short (25 minute) video on deep neural networks that is both interesting and informative by Brandon Rohrer, "[How Deep Neural Networks Work](https://www.youtube.com/watch?v=ILsA4nyG7I0)"

---
# The Problem To Solve
I've noticed that there is a "Hello World" problem for neural networks: the XOR gate! It doesn't follow and exact 1-to-1 relationship so it is a nice and simple problem to try to solve. For anyone reading this who doesn't know what an XOR gate it, it is simply:

> XOR(A, B) = (A OR B) AND (NOT (A AND B)) 

I know that looks complicated, but it really isn't. Simply put, it returns true if any of the inputs are true _except_ when all are true. It's a great example to work with because of that last clause.

If that isn't particularly clear, I've designed a circuit diagram of the operation for you to look at below. Note that I forgot to include a not "bubble" on the first AND gate.

<div style='margin: 100px'>
![XOR Gate Circuit](imgs/neuralnetworkcircuit.png)
<p style='text-align: center'><i>Figure 1: The Circuit Diagram of an XOR Gate. Created in draw.io.</i></p>
</div>

# The Neural Network
The neural network I'll be designing will be super simple: 2 inputs, 2 hidden neurons in a single hidden layer and a single output neuron. That means that there will be 6 weighted synapses in total. 

There is a reason for this design. I'm treating each neuron and an "operation" in a Boolean Algebra question. So one neuron can calculate the OR's and another could calculate the AND's. From my experience, I know the minimum amount of basic circuitry needed to design one: a single OR and a single AND gate. Thus we will need at least two hidden neurons to compute these operations and a single output neuron to do the final AND operation.

I've also decided to show a visualisation of the neural network I'll be designing. Each node and synapse has a label associated with it that I'll be using throughout this notebook.

<div style='margin: 100px'>
![Neural Network](imgs/neuralnetworkdiagram.png)
<p style='text-align: center'><i>Figure 1: The Neural Network. Created in draw.io.</i></p>
</div>

With this all out of the way we can get started!

---
# Part 1 - Background Theory
Get prepared to exercise your mind if you haven't done much ML yet. This section was incredibly hard for me to understand as I was teaching myself the content, so I expect that you'll struggle too!

## Derivatives
Although calculus seems like this scary, other-wordly notation that only Gods can understand it is actually *really* simple to understand. I would honestly say it is one of the most intuative concepts you'll ever come across in mathematics and it is my favourite thing to teach. This however is not a tutorial, rather just an analysis, so I wont be teaching the topic in depth here. I will only talk about the "feel" of derivatives rather than how to actually compute them.

A derivative is the slope of a function. If the profile of a hill was a mathematical function, think of the derivative as the gradient of a hill. So sharply changing functions will have large derivatives while smooth, almost flat functions will have a very small derivative.

The two important concepts in calculus is the idea of maxima and minima. The **maxima**, either global or local, is the 'heighest' point of a function. The **minima** is the opposite, being the lowest point of a function. Both of these points however have a derivative equal to *exactly zero*. Again, think of a hill. At the top of the hill there is no slope (only slopes around you) so the derivative is equal to zero. At the bottom of a valley, the ground is flat momentarily before climbing up again so the derivative is equal to zero.

If you are interested in learning more about calculus (and I strongly recommend it) then head over to [this tutorial on derivatives and basic calculus](https://www.mathsisfun.com/calculus/derivatives-introduction.html).

## Normalising Inputs With The Sigmoid Function
The first thing I want to introduce you to is the **Sigmoid Function**. This function takes any real number (i.e. from -infinity to +infinity) and normalises it to a value between 0 and 1. The beauty of the function is that this range of 0 to 1 is *exactly* that of a probability. Hence why it is useful in neurons who are either "firing" or "not firing".

The Sigmoid function is defined as:

$$ S(x) = \frac{1}{1+e^{-x}}. $$

One beautiful property of this function is it's derivative. The gradient (i.e. derivative) doesn't depend on the input parameter *x* and only depends on the value *S(x)*! That's useful in reducing computational overhead in our model. 

Thus, if you compute the derivative you reach the following result:

$$ S'(x) = S(x)(1-S(x)). $$

Keep this result in mind: we will come back to it later on.


## Linear Algebra
Linear Algebra is all about matrices and matrix mechanics. There are incredibly deep concepts like eigenvalues, eigenvectors, diagonalisation, etc. but we will only be focusing on the very basics. To understand what I'm doing you only need a very, *very* basic understanding of matrices.

I think of a matrix as a store of values or equations that are related to one another. An example of this would be the rotation matrix. This following matrix rotates any vector (or colletion of vectors i.e. a matrix) by 90 counter-clockwise:

$$
\theta = 
\begin{bmatrix}
0 & -1 \\
1 & 0
\end{bmatrix}
$$

Why? Well, pick a vector! Let's say a vector pointing NE,

$$
\vec{x} = \begin{bmatrix} 1 \\ 1 \end{bmatrix},
$$

is rotated 90$\deg$ CCW to point NW. The resulting vector would point at

$$
\vec{x} = \begin{bmatrix} -1 \\ 1 \end{bmatrix} = \theta\cdot\vec{x} =
\begin{bmatrix}
0 & -1 \\
1 & 0
\end{bmatrix}
\cdot
\begin{bmatrix} 1 \\ 1 \end{bmatrix}
$$

Of course, at this point you might be saying "Hey! How did you multiply those two matrices?" Matrices multiply using a special set of rules that enable us to work with them. Basically, you just multiply the rows of the left-hand matrix into the columns of the right-hand matrix. So a 2x2 matrix multiplied into a 2x1 vector will result in a vector being output. 

You don't need to really worry about any of this since NumPy automatically handles it with the dot-product function. This isn't a tutorial on matrix mechanics, rather I'm just explaining what each part is so you can follow my logic. If you are interested, however, you can visit this article on "[Intro to Matrices](http://www.purplemath.com/modules/matrices.htm)".

## Neural Networks As Matrices
You can express a neural network as a collection of matrices. Although almost impossible (currently) to visualise for multi-layer, multi-neuron neural networks it is trivial to understand for our neural network. To compute the input values for the hidden layer neurons we use the following equation:

$$
\begin{bmatrix}
h_1\\ 
h_2
\end{bmatrix}
=
\begin{bmatrix}
w_1 & w_3\\ 
w_2 & w_4
\end{bmatrix}
\cdot
\begin{bmatrix}
I_1\\ 
I_2
\end{bmatrix}
+
\begin{bmatrix}
b_1 \\
b_2
\end{bmatrix}
=
\begin{bmatrix}
w_1 I_1 + w_3 I_2 + b_1\\ 
w_2 I_1 + w_4 I_2 + b_2
\end{bmatrix}
$$

and for the output neuron, we do the following:

$$
o_1
=
\begin{bmatrix}
w_5 & w_6\\
\end{bmatrix}
\cdot
\begin{bmatrix}
h_1\\ 
h_2
\end{bmatrix}
=
w_5(w_1 I_1 + w_3 I_2 + b_1) + w_6(w_2 I_1 + w_4 I_2 + b_2)
$$

where each $w_i$ is the weight of synapse $i$, $I_i$ is the input neuron values and $b_i$ is the bias for each neuron. An astute reader might have noticed that I haven't normalised any of the outputs. Of course that is a crucial step in the neural network, but for simplicity I've omitted it from the equations above.

Thus this equation will enable us to forward-propogate the expected value of the output given the inputs. 

## Backpropogation
This is one of most confusing topics when you start with machine learning, especially if you haven't had much exposure to mathematics. I'll try to reduce it down into a really simple explaination, but don't count on it making much sense.

Backpropogation is an algorithm used in supervised machine learning to reduce error and make the model more accurate. It relies on [gradient decent](https://en.wikipedia.org/wiki/Gradient_descent) principles, which is basically allowing the error to be minimised by "falling" into the best solution. By computing how much the model improves (i.e. the error decreases) when the weight of a certain synapse changes, you can accurately find the best solution to a neural network.

There is a lot of mathematics involved in backpropogation that you don't have to worry about. If you are interested, then have a look at Wikipedia's entry on [Backpropogation](https://en.wikipedia.org/wiki/Backpropagation). You'll notice some of it in the final code where I will explain it a bit more.

---
# Part 2 - Python Implementation of Neural Network
Hopefully you have followed me up to this point. I know there is a lot to take in. Now I'll be writing some of the actual code that will be our neural network.

In [53]:
# Import in the packages required.
import numpy as np
import pandas as pd

# Set global variables.

# Initalise the seed to be a consistent value, so the output of the model is always the same.
np.random.seed(0)

In [58]:
# Define the Sigmoid Function
def sigmoid(x, derivative=False):
    if derivative: return sigmoid(x)*(1-sigmoid(x))
    else: return 1.0/(1.0+np.exp(-x))
    
sigmoid = np.vectorize(sigmoid)  # This is done to enable mapping the function over matrix.
    
print "x = 10, S(x) =", sigmoid(10)
print "x = -50, S(x) =", sigmoid(-50)
print "x = 0, S(x) =", sigmoid(0)
print "x = 10, S'(x) =", sigmoid(10, derivative=True)

x = 10, S(x) = 0.999954602131
x = -50, S(x) = 1.92874984796e-22
x = 0, S(x) = 0.5
x = 10, S'(x) = 4.53958077359e-05


In [61]:
# Define the synapse weights matrices
syn0 = 2*np.random.random((2, 2)) - 1  # Input > Hidden Layer 1
syn1 = 2*np.random.random((2, 1)) - 1  # Hidden Layer 1 > Output

print syn0, '\n'
print syn1

[[ 0.5563135   0.7400243 ]
 [ 0.95723668  0.59831713]] 

[[-0.07704128]
 [ 0.56105835]]


# Part 3 - Creating Training & Validation Data
Creating the training data here is super easy. Since we are dealing with a simple mathematical XOR, and not any "real" data, we can create as much training data as we like.

I'm going to create 1100 data points. Although that means that there is a lot of duplicated points, I can truncate it later if I require.

In [9]:
def xor(a, b):
    """The XOR gate for two inputs a & b."""
    return bool((a or b) and not(a and b))

In [19]:
INPUT_NEURONS = 2
DATA_POINTS = 1100

x = np.random.randint(2, size=(INPUT_NEURONS*DATA_POINTS))
x = np.split(x, DATA_POINTS)

y = [xor(a, b) for a, b in x]

In [20]:
print x[:5]
print y[:5]

[array([1, 0]), array([0, 1]), array([1, 1]), array([1, 1]), array([1, 1])]
[True, True, False, False, False]


The training data will be the first 1000 points while the validation data will only be last 100 points.

In [23]:
training_df = pd.DataFrame(data={'x': x[:1000], 'y': y[:1000]})
validation_df = pd.DataFrame(data={'x': x[1000:], 'y': y[1000:]})

print training_df.head()
print validation_df.head()

        x      y
0  [1, 0]   True
1  [0, 1]   True
2  [1, 1]  False
3  [1, 1]  False
4  [1, 1]  False
        x      y
0  [0, 1]   True
1  [1, 0]   True
2  [1, 0]   True
3  [0, 0]  False
4  [1, 0]   True


# Part 4 - Training The Model
Lorem ipsum dolor.

# Part 5 - Testing The Model
Lorem ipsum dolor.