## Simplest single hidden layer network using backprop

Original code from Andrew Trask

http://iamtrask.github.io/2015/07/12/basic-python-network/

Notebook version: Ravi Annaswamy March 30, 2017

In [109]:
import numpy as np
np.random.seed(1)

Let us create an input dataset with four rows (four cases) and three columns each

In [110]:
X = np.array([[0,0,1],[0,1,1],[1,0,1],[1,1,1]])
X

array([[0, 0, 1],
       [0, 1, 1],
       [1, 0, 1],
       [1, 1, 1]])

Corresponding outputs

In [111]:
y = np.array([[0,0,1,1]]).T

In [112]:
y

array([[0],
       [0],
       [1],
       [1]])

We will create a single layer network 3 input cells and one output cell, means that we
need a 3x1 weight matrix.
We will use numpy random generator and ask for 3 rows of 1 weight each.
We will have these centered at 0 by initially generating from 0 to 2 and subtracting 1.

In [113]:
W0 = 2*np.random.random((3,1))-1

In [114]:
W0

array([[-0.16595599],
       [ 0.44064899],
       [-0.99977125]])

Let us present the inputs to an input layer

In [115]:
L0 = X

The output is obtained by multiplying the inputs one row at a time with the matrix.


For instance, the first input case [0,0,1] is processed by these weights like this:
    

In [116]:
np.dot(X[0],W0)

array([-0.99977125])

We could do all input rows at the same time! We save the output as Layer 1, L1

In [117]:
L1 = np.dot(L0,W0); L1

array([[-0.99977125],
       [-0.55912226],
       [-1.16572724],
       [-0.72507825]])

This is cool because we can find our error against the four expected outputs stored in y

In [118]:
L1_error= y-L1; L1_error

array([[ 0.99977125],
       [ 0.55912226],
       [ 2.16572724],
       [ 1.72507825]])

For case 1, we expected the network to give out a 0, but since the fourth input was on and it passed along
its value, the output was -0.999, so we need to reduce the output to 0, and so the amount to change is 0.999

In [119]:
L1_delta = L1_error * L1; L1_delta

array([[-0.99954255],
       [-0.31261771],
       [-2.52464724],
       [-1.25081673]])

This is the direction in which we have to adjust the weights.

In [120]:
W0 += np.dot(L0.T, L1_delta)

In [121]:
W0

array([[-3.94141996],
       [-1.12278545],
       [-6.08739548]])

Has this improved outputs?

In [122]:
L1 = np.dot(L0, W0)

In [123]:
L1

array([[ -6.08739548],
       [ -7.21018093],
       [-10.02881544],
       [-11.15160089]])

Yes it has! The first two rows now are more closer to 0 and the last two are more closer to 1 now!



In [124]:
np.random.seed(1)
W0 = 2*np.random.random((3,1))-1
for iter in xrange(5):
        L0 = X
        L1 = np.dot(L0, W0)
        L1_error = y - L1
        L1_delta = L1_error * L1
        W0 +=  np.dot(L0.T, L1_delta)
        print(iter, L1)

(0, array([[-0.99977125],
       [-0.55912226],
       [-1.16572724],
       [-0.72507825]]))
(1, array([[ -6.08739548],
       [ -7.21018093],
       [-10.02881544],
       [-11.15160089]]))
(2, array([[-341.24624596],
       [-529.86554361],
       [-591.30342369],
       [-779.92272134]]))
(3, array([[-1356838.15721241],
       [-2246843.64480525],
       [-2316378.63067319],
       [-3206384.11826603]]))
(4, array([[ -2.25358321e+13],
       [ -3.78650417e+13],
       [ -3.81823477e+13],
       [ -5.35115572e+13]]))


Ouch! What happened here?

Until iteration 2, this were going good, and I was expecting
that iteration 3 will be 0, 0, 1, 1 but instead all four numbers shot negative!

In fact if you run it for 20 iterations, you will see crazily shooting up and down to large numbers!

# Enter the Sigmoid.


Very clever people, who had experience developing feedback control systems, got the insight into what is happening.

If you feed unscaled feedback, it becomes positive feedback and values begin to shoot out!

One solution is to use a multiplier like 0.01 to limit how much the weights will change each iteration. 

For this example, it will work, but in other cases, it will just postpone the out of control dance.

What we really need is a way to ensure that the feedback signal will always be far less than one.

This is done by rescaling outputs themselves in the range [0-1].

The Sigmoid is a beautiful function that transforms a value ranging 
from negative infinity to positive infinity to range of [0-1].

Remember, we need to limit both the output and the feedback amount to this range of [0-1].

First the function itself.

In [125]:
def sigmoid(x):
    return 1/(1+np.exp(-x))
print(-20, sigmoid(-20))
print(100, sigmoid(100))

(-20, 2.0611536181902037e-09)
(100, 1.0)


Now, let us make the feedback proportional to the slope of the output.
In other words, if the output is large and negative, the feedback should be 
large and negative and so on.

In [126]:
def gradient(x):
    return x*(1-x)
print(2, gradient(2))
print(-1, gradient(-1))

(2, -2)
(-1, -2)


Let us redo the forward computation and error delta with this new scheme.

Since we have updated W0 once or twice above, I am resetting to initial known seed state in the first two lines.

In [127]:
np.random.seed(1)
W0 = 2*np.random.random((3,1))-1

L0 = X
L1 = sigmoid(np.dot(L0, W0))
print('L1: ',L1)

L1_error = y - L1
print('Error: ', L1_error)

L1_gradient = gradient(L1)
print('L1 gradient:', L1_gradient)

L1_delta = L1_error * L1_gradient
print('L1_delta: ', L1_delta)

('L1: ', array([[ 0.2689864 ],
       [ 0.36375058],
       [ 0.23762817],
       [ 0.3262757 ]]))
('Error: ', array([[-0.2689864 ],
       [-0.36375058],
       [ 0.76237183],
       [ 0.6737243 ]]))
('L1 gradient:', array([[ 0.19663272],
       [ 0.23143609],
       [ 0.18116102],
       [ 0.21981987]]))
('L1_delta: ', array([[-0.05289153],
       [-0.08418501],
       [ 0.13811206],
       [ 0.14809799]]))


Now we are ready to update weights, but by how much? We use the delta rule, that says weight update should depend on how high an input was and how much the error was! Makes sense.

In [128]:
np.dot(L0.T, L1_delta)

array([[ 0.28621005],
       [ 0.06391297],
       [ 0.14913351]])

Let us do that.

In [129]:
W0 += np.dot(L0.T, L1_delta)
W0

array([[ 0.12025406],
       [ 0.50456196],
       [-0.85063774]])

Let us see if this will improve output next time.

In [130]:
L1 = sigmoid(np.dot(L0, W0))
L1

array([[ 0.29929909],
       [ 0.41433436],
       [ 0.32511054],
       [ 0.44378327]])

Let us do it over and over again.


In [131]:
np.random.seed(1)
W0 = 2*np.random.random((3,1))-1
for iter in xrange(10000):
        L0 = X
        L1 = sigmoid(np.dot(L0, W0))
        L1_error = y - L1
        L1_delta = L1_error * gradient(L1)
        W0 += np.dot(L0.T, L1_delta)
print(iter, L1)

(9999, array([[ 0.00966449],
       [ 0.00786506],
       [ 0.99358898],
       [ 0.99211957]]))


Wow! Success!