In the Artificial Neural Network notebook we saw how we could use operations on matrices to to handle a layer of neurons at a time. We also used some snippet of code to verify if the network actually did what we expected it to.

In this notebook we will start with looking at the matrix operations needed to evaluate a multi-layer artificial neural network (ANN). Then we will consider how we could calculate how close the ANN is to what we want it to be and finally we will look into how we could make it learn and reduce the error of its output.

In [138]:
# We are going to import and use the numpy library for vector/matrix/array operations, so we better import it
import numpy as np
np.set_printoptions(precision=2)

Lets imagine we have a multi-layer ANN where the input come from the left, from the input layer named `X`, then we have a series of layers of neurons named `L1`-`Ln`, where all except the last layer `Ln` are hidden and we cannot observe the output. We call the expected output from layer n (`Ln`) for `Y`, and the actual output for `Yhat`. We use both `Y` and `Yhat` so we can calculate the error of the network when we try to improve it by updating the parameters (weights and biases).

```
X -> L1 -> L2 -> ... -> Li -> ... -> Ln-1 -> Ln -> Yhat
```

If we need to refer to the output for any hidden layer we use `H1` for the output of layer `L1` and so on. All of the layers can contain mutiple nodes (inputs or neurons). As we did in the last notebook, we want it to be easy to evaluate a full network layer at a time, but we would also like to evaluate mutiple example inputs at a time without needing to loop. Let see what we can come up with.

In [139]:
X = [[0, 0], [0, 1], [1, 0], [1, 1]]
print("X = " + str(X))
W, B = L = ([[1, 1], [1, 1]], [0, -1])  # These parameters are meant to output OR from neuron 1 and AND from neuron 2
print("W = " + str(W))
print("B = " + str(B))
# Would't it be dreamy if we could just calculate Z = X * W + B for all the example inputs in one go?
# And then just run it all in ine go through an activation function f = step?
Z = np.dot(X, W) + B
print("Z = \n" + str(Z))

def step(z):
    if isinstance(z, (list, np.ndarray, tuple)):
        return [step(zi) for zi in z]
    else:
        return 1 if z > 0 else 0
    
Yhat = step(Z)
print("Yhat = " + str(Yhat))

X = [[0, 0], [0, 1], [1, 0], [1, 1]]
W = [[1, 1], [1, 1]]
B = [0, -1]
Z = 
[[ 0 -1]
 [ 1  0]
 [ 1  0]
 [ 2  1]]
Yhat = [[0, 0], [1, 0], [1, 0], [1, 1]]


If you look real close at the output (and think really hard) you might see that it actually matches a 2 layer network with 2 nodes in each layer that outputs `OR` from the first neuron and `AND` from the second neuron. It is kind of har to see it though. We did however manage to calculate the outputs from one layer for all our example inputs in one go though, so that is great. Lets look into the parameters in the form of weights and biases and see if it would be useful to rearrange them and possibly the input data.

In [140]:
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]).T  # .T is numpys transpose operation, swapping rows with columns in the matrix
print("X:\n" + str(X))

W, B = L = (np.array([[10, 10], [10, 10]]),
            np.array([[-5, -15]]).T)  # These parameters are meant to output OR from neuron 1 and AND from neuron 2

print("W:\n" + str(W))
print("B:\n" + str(B))

Z = W.dot(X) + B
print("Z:\n" + str(Z))

sigmoid = lambda x: 1/(1 + np.exp(-x))
f = sigmoid

Yhat = f(Z)
print("Yhat:\n" + str(Yhat))

X:
[[0 0 1 1]
 [0 1 0 1]]
W:
[[10 10]
 [10 10]]
B:
[[ -5]
 [-15]]
Z:
[[ -5   5   5  15]
 [-15  -5  -5   5]]
Yhat:
[[0.01 0.99 0.99 1.  ]
 [0.   0.01 0.01 0.99]]


As you can see above, we know have the input examples arranged with one example per column. We have the weights with one row per neuron. We have the bias with one neuron bias per row, mathching up with the row of weights. And finally the output matches the input with one column per example in the output. It is also formatted in a nicer way with long rows instead of columns and using numpy arrays everywhere. We also change secretely to use the sigmoid function, partly because we can apply that one to all elements of a matrix, which was not as handy with the step function we used before. Also the sigmoid function is more often used for neurons than the step function.

Lets try to formalise these calculations into a network functions, calculating the outputs from a multi-layer ANN, taking the layers as input.

In [141]:
def network(inputs, layers, activationFunction):
    activations = inputs
    for layer in layers:
        weights, bias = layer
        Z = weights.dot(activations) + bias
        activations = activationFunction(Z)
    return activations

The above network function uses verbose descriptive names rather than mathematical, you could also have it like below, depending on your preferences:

In [142]:
def G(X, L, f):
    A = X
    for li in L:
        W, B = li
        Z = W.dot(A) + B
        A = f(Z)
    return A

If we are inerested in evaluating the performance of an ANN and improve the performance, it would be useful to calculate the error or loss of the network, as well as potentially determining the prediction rate if we use the network for classification.

In [143]:
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]).T  # All possible binary inputs combinations using an input layer with 2 inputs
Y = np.array([[0, 0, 0, 1]])  # Expected output, we want AND beaviour :-)

W = np.random.randn(1,2)
B = np.random.randn(1,1)
print("X Y W B:\n", X, Y, W, B)

Yhat = G(X, L = [(W,B)], f = sigmoid)  # Actual output
print("Yhat:", Yhat)

X Y W B:
 [[0 0 1 1]
 [0 1 0 1]] [[0 0 0 1]] [[-0.25 -1.62]] [[1.54]]
Yhat: [[0.82 0.48 0.78 0.42]]


When we have both our expected output and our actual output we could calculate the difference in some clever way to see how well the network match what we need.

In [144]:
delta = Y - Yhat   # THe element-wise difference
print("delta = Y - Yhat = ", delta)
loss = np.sum(delta ** 2)    # square it (to remove the sign etc) and sum
print("loss = ", loss)

delta = Y - Yhat =  [[-0.82 -0.48 -0.78  0.58]]
loss =  1.860642742045609


If we want to compare different weights and bias it will be much easier to handle a single decimal number than an array of numbers do decide what is better. Let try to generate another set of parameters (W and B) and compare.

In [151]:
W = np.random.randn(1,2)
B = np.random.randn(1,1)
print("X Y W B:\n", X, Y, W, B)

Yhat = G(X, L = [(W,B)], f = sigmoid)  # Actual output
print("Yhat:", Yhat)

delta = Y - Yhat   # THe element-wise difference
print("delta = Y - Yhat = ", delta)
loss = np.sum(delta ** 2)    # square it (to remove the sign etc) and sum
print("loss = ", loss)

X Y W B:
 [[0 0 1 1]
 [0 1 0 1]] [[0 0 0 1]] [[-0.64 -0.31]] [[-0.56]]
Yhat: [[0.36 0.3  0.23 0.18]]
delta = Y - Yhat =  [[-0.36 -0.3  -0.23  0.82]]
loss =  0.9430779061131485


So, in theory a smaller loss would mean the network is closer to implementing what we want, but it would also be nice to see how many of the examples it actually calculates the correct output for. To see how closely our network inplements an OR, we will count values above 0.5 as 1 and below as 0. We will then count the percentage of the examples it get right.

To simplify life, we will now create helper functions to generate networks, evaluate network, calculate the loss and the correctness percentage.

In [181]:
def generate(nodes):
    layers = []
    for i in range(len(nodes) - 1):
        W = np.random.randn(nodes[i + 1], nodes[i])
        B = np.random.randn(1,1)
        layers.append((W,B))
    return layers
        
L = generate([2, 1])
print("Layers L = (W , B) = ", L)

Layers L = (W , B) =  [(array([[ 0.29, -0.17]]), array([[-0.77]]))]


In [196]:
def calculate(X, L, f):
    A = X
    for li in L:
        W, B = li
        Z = W.dot(A) + B
        A = f(Z)
    return A

Yhat = calculate(X, L, sigmoid)
print("Yhat = ", Yhat)

Yhat =  [[0.63 0.77 0.77 0.87]]


In [197]:
def correctness(Yhat, Y):
    interpret = (Yhat > 0.5) * 1
    #print(interpret)
    correct = interpret == Y
    #print(correct)
    correct_count = sum(correct.flatten())  # Yes, flatten is a bit fishy, it turns a matrix into a vector though, removing one dimension
    return correct_count / len(Y.flatten())
    
result = correctness(Yhat, Y)
print(result, "% correct")

0.75 % correct


In [198]:
def loss(Yhat, Y):
    return np.sum((Y - Yhat) ** 2)

print("loss = ", loss(Yhat, Y))
    

loss =  0.5197701759851555


With the possibility to generate, calculate the output from and calculate the loss and correctness of the output, lets see if we can manage to generate a network that implements a simple `OR` between 2 inputs.

In [202]:
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]).T
Y = np.array([[0, 1, 1, 1]])

L = generate([2, 1])
Yhat = calculate(X, L, sigmoid)
l = loss(Yhat, Y)
score = correctness(Yhat, Y)
print("Network", L, "gets", score, "% correct.")

Network [(array([[ 1.44, -0.43]]), array([[-0.44]]))] gets 0.75 % correct.


Could we use these functions to find a network that actually implements `OR`, lets see if we could randomly manage to generate it.

In [215]:
for i in range(100):
    L = generate([2, 1])
    Yhat = calculate(X, L, sigmoid)
    score = correctness(Yhat, Y)
    print("score:", score)
    if score == 1:
        break
        
if score == 1:
    print("winner after", i, "tries: ", L)
    print((evaluate(X, L, sigmoid) > 0.5) * 1)

score: 0.75
score: 0.75
score: 0.25
score: 0.25
score: 0.75
score: 0.75
score: 0.25
score: 0.5
score: 0.75
score: 0.25
score: 0.75
score: 0.25
score: 0.5
score: 0.75
score: 0.25
score: 0.75
score: 0.5
score: 0.75
score: 0.25
score: 0.25
score: 0.75
score: 0.25
score: 0.75
score: 0.75
score: 0.5
score: 0.5
score: 0.75
score: 0.75
score: 0.75
score: 0.75
score: 0.75
score: 0.5
score: 0.0
score: 0.0
score: 0.75
score: 0.25
score: 0.25
score: 0.0
score: 0.5
score: 1.0
winner after 39 tries:  [(array([[0.82, 0.8 ]]), array([[-0.02]]))]
[[0 1 1 1]]


It seems to work, however it is a bit random, and we probably should have a helper function to parse to output of the network to know if it closer to 0 or 1 (if those are the expected outputs).