In [0]:
# LAMBDA SCHOOL
#
# MACHINE LEARNING
#
# MIT LICENSE

import numpy as np

In this challenge we build that single-hidden-layer neural network that we discussed yesterday. Like yesterday, it shoud have two numbers (call them x1 and x2) in the input layer, two (y1 and y2) in the hidden layer, one one output (z). It also has a total of nine weights. Call them a, b, c, d, e, f, g, h, i so that

y1 = sigma(a*x1 + b*x2 + c)

and

y2 = sigma(d*x1 + e*x2 + f)

and

z = sigma(g*y1 + h*y2 + i)

Make sure you undrestand why this corresponds to the neural net diagram you saw yesterday! And notice that the output is no longer rounded. The reasons for this are easier to explain face to face, but it has to do with the fact that there are two stages to using a machine model: first is the *testing* phase, where you try it on the data you already have answers for. Once you've optimized that comes the *application* phase, where you use it to predict the outputs for novel inputs. The short version of our story is that we round at the last step during the application phase, but not during the testing phase. If this seems silly, don't let it stress you out too much for now.

The first thing to do is to initialize the values of a through f randomly, asembling them into the numpy vector [a, b, c, d, e, f, g, h, i]. Use numpy's random function with 42 as your seed (if you don't know what this means, get to Googling). So your results match your classmates', use the usual numpy random vector method described here: 

https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.rand.html

In [0]:
np.random.seed(42)
weights = np.random.normal(scale=0.1, size=9)

Great. Now define a function NN(x1, x2) that takes in x1 and x2 and spits out z as described above. You'll probably find it helps to define the sigmoid function first.

In [0]:
def sigma(x):
    return 1 / (1 + np.exp(-x))

def NN(x1, x2):
    y1 = sigma(weights[0]*x1 + weights[1]*x2 + weights[2])
    y2 = sigma(weights[3]*x1 + weights[4]*x2 + weights[5])
    
    z = sigma(weights[6]*y1 + weights[7]*y2 + weights[8])
    return z

Now think back to the SSE we used to measure how we a linear model performed. We can do the same thing here! We're trying to model XOR, so our set of desired inputs and outputs is:

NN(0,0) == 0

NN(0,1) == 1

NN(1,0) == 1

NN(1,1) == 0

To evaluate the SSE, we plug each of these inputs into our NN function, compute the squared difference between that result and the intended results listed just above, and add up those four squared errors. The next step for you is to define that squared error function! Since we have a through f defined above it shouldn't use them as literal function arguments, but they should appear inside the definition of SS. It may be helpful to define an actual XOR function first.

In [0]:
def XOR(x, y):
    if x == y:
        return 0
    else:
        return 1

def SSE():
    X = [(0,0), (0,1), (1,0), (1,1)]
    
    sse = 0
    for x1, x2 in X:
        y = XOR(x1, x2)
        yhat = NN(x1, x2)
        sse += (y-yhat)**2
        
    return sse

def accuracy():
    X = [(0,0), (0,1), (1,0), (1,1)]
    
    correct = 0
    for x1, x2 in X:
        y = XOR(x1, x2)
        yhat = NN(x1, x2)
        correct += int(np.round(yhat)==y)
        
    return correct / len(X)

Now we try out the same character-building exercise we did when we first tried out linear regression:

By any means necessary (fiddling by hand, gradient descent, discussion with your classmates), find values of a, ..., f that make the SSE as small as possible. Note that, depending on how you approach this, our earlier step of initializing the values randomly may or not have been necessary

In [0]:
def optimize(weights, lr=0.01, epochs=100):
    X = [(0,0), (0,1), (1,0), (1,1)]
    
    for _ in range(epochs):
        for x1, x2 in X:
            y = XOR(x1, x2)

            a11 = sigma(weights[0]*x1 + weights[1]*x2 + weights[2])
            a12 = sigma(weights[3]*x1 + weights[4]*x2 + weights[5])

            a2 = sigma(weights[6]*a11 + weights[7]*a12 + weights[8])
            
            # delta error with respect to output activation
            da2 = a2 - y
            # delta output activation with respect to output layer input
            dz2 = a2 * (1-a2)
            
            dout = da2 * dz2
            # delta output layer input with respect to bias, weight7, weight6
            dw8 = dout
            dw7 = dout * a12
            dw6 = dout * a11

            # delta hidden activations with respect to hidden node inputs
            dz12 = a12 * (1-a12)
            dz11 = a11 * (1-a11)
            
            # delta error with respect to bias, weight4, weight3
            dw5 = weights[7] * dout * dz12
            dw4 = weights[7] * dout * dz12 * x2
            dw3 = weights[7] * dout * dz12 * x1
            
            # delta error with respect to bias, weights1, weights0
            dw2 = weights[6] * dout * dz11
            dw1 = weights[6] * dout * dz11 * x2
            dw0 = weights[6] * dout * dz11 * x1
            
            deltas = [dw0, dw1, dw2, dw3, dw4, dw5, dw6, dw7, dw8]
            for ix, dw in enumerate(deltas):
                weights[ix] = weights[ix] - lr*dw
                
        if SSE() < 0.1:
            break
            
    return weights

In [6]:
np.random.seed(42)
weights = np.random.normal(scale=0.5, size=9)
print(weights, '\n')
print([NN(0,0), NN(0,1), NN(1,0), NN(1,1)])

print('SSE:', SSE())
print('Accuracy:', accuracy())

[ 0.24835708 -0.06913215  0.32384427  0.76151493 -0.11707669 -0.11706848
  0.78960641  0.38371736 -0.23473719] 

[0.5996679303344842, 0.5937707990575999, 0.6275360701458144, 0.6221392298966929]
SSE: 1.0504103907888185
Accuracy: 0.5


In [0]:
weights = optimize(weights, epochs=200000)

In [8]:
print(weights, '\n')
print([NN(0,0), NN(0,1), NN(1,0), NN(1,1)])

print('SSE:', SSE())
print('Accuracy:', accuracy())

[ 5.07464927  4.95735308 -1.74895479  2.6981281   2.67841262 -4.03023374
  5.67154587 -5.82934728 -2.55137326] 

[0.14030624828793162, 0.8457269802303404, 0.8463343803367845, 0.18138461724134622]
SSE: 0.09999950997575413
Accuracy: 1.0
