# Simple Neural Network

https://iamtrask.github.io/2015/07/12/basic-python-network/

## The network

Import numpy (mathematics library)

In [1]:
import numpy as np

The sigmoid function is defined as $\frac{1}{1+e^{-x}}$. The derivative of that is $s(x) * (1 - s(x))$ where $s(x)$ is the sigmoid function. This function definition doesn't define itself recursively because in actual usage the computed value of the original sigmoid will be passed in.

In [2]:
def sigmoid(x, deriv=False):
    if deriv:
        return x * (1 - x)
    return 1 / (1 + np.exp(-x))

Define the inputs. Each row is a data point that will be used to train the network.

`numpy.array` - generates a vector if passed in a list or a matrix if passed in a list of lists

In [8]:
x = np.array([
    [0, 0, 1],
    [0, 1, 1],
    [1, 0, 1],
    [1, 1, 1]
])

`numpy.array().T` - if passed in a matrix, it gets transposed, otherwise it remains the same

The expected outputs. This is equivalent to
```
np.array([
    [0],
    [0],
    [1],
    [1]
])
```

In [4]:
y = np.array([
    [0, 0, 1, 1]
]).T

Seed the random number generator with a definitive number so that we can see progress each time it is run.

PRNG (Pseudo Random Number Generators) are very "random" given that the initial value, the seed, is different each time. If the seed is the same number, then it will be deterministic. For example, if the seed function is passed 1 I can expect x, y, and z when the random function is run three times. If I seed it with 1 again, I can be certain that when run three times I will get x, y, and z.

In [5]:
np.random.seed(1)

Initialize the first layer's weights randomly with a mean of 0.

This has a mean of 0 as $E(x)$ is defined as the sum of all possible values multiplied by its probability. Since `np.random.random` yields values between 0 and 1 and all values between them should be equally probable we end up with 0.5 as the expected value. Multiplying that by 2 and subtracting 1 yields $E(x) = 0$.

In [6]:
syn0 = 2 * np.random.random((3, 1)) - 1

The actual training takes place in the for loop.

The first layer (`layer0`) is just the inputs, in this case `x`.

The second layer (`layer1`) is derived doing a matrix multiplication of `layer0` and `syn0` aka the inputs and the weights. `layer0` is a 4x4 matrix and `syn0` is a 3x1 matrix, so it's just a vector matrix multiplication. In this case it'd be equivalent to:

```
[
    [ syn0[0][0]*layer0[0][0] + syn[1][0]*layer[0][1] + syn[2][0]*layer[0][2] ],
    [ syn0[0][0]*layer0[1][0] + syn[1][0]*layer[1][1] + syn[2][0]*layer[1][2] ],
    [ syn0[0][0]*layer0[2][0] + syn[1][0]*layer[2][1] + syn[2][0]*layer[2][2] ],
    [ syn0[0][0]*layer0[3][0] + syn[1][0]*layer[3][1] + syn[2][0]*layer[3][2] ],
]
```

This is fed into the the sigmoid function to normalize it between 0 and 1. That is taken and `layer1_error` is calculated by seeing the difference between the expected value and the value output from the sigmoid function.

Next the error is multiplied by the slope of the sigmoid function at the values in `layer1`. This is then taken and a matrix multiplication is done between the inputs transposed and the delta derived from multiplying the error and the slopes.

The intuition about the delta is that in a sigmoid function, the slope near 0 and 1 is very low.

This is then added to the weights at which point we can rinse and repeat.

In [7]:
for i in range(10000):
    # layer 0 aka the inputs
    layer0 = x

    # layer 1
    layer1 = sigmoid(np.dot(layer0, syn0))

    layer1_error = y - layer1
    layer1_delta = layer1_error * sigmoid(layer1, deriv=True)

    # adjust the weights
    syn0 += np.dot(layer0.T, layer1_delta)
    
print('Output after training: ')
print(layer1)

Output after training: 
[[ 0.00966449]
 [ 0.00786506]
 [ 0.99358898]
 [ 0.99211957]]
