#### Neural Networks

Perceptron

In [None]:
from linear_algebra import Vector, dot

In [None]:
def step_function(x: float) -> float:
    return 1.0 if x >= 0 else 0.0

def perceptron_output(weights: Vector, bias: float, x: Vector)-> float:
    """Returns 1 if the perceptron 'fires', 0 if not"""
    calculation = dot(weights, x) + bias
    return step_function(calculation)

#### AND, OR

With properly chosen weights, the perceptron can solve a number of simple problems. iw, we can crete an **AND** gate which returns 1 if both its inputs are 1 but returns 0 if one of its inputs is 0, see below: 

In [None]:
and_weights = [2., 2]
and_bias = -3

assert perceptron_output(and_weights, and_bias, [1, 1]) == 1
assert perceptron_output(and_weights, and_bias, [0, 1]) == 0
assert perceptron_output(and_weights, and_bias, [1, 0]) == 0
assert perceptron_output(and_weights, and_bias, [0, 0]) == 1

with similar reasoning we could build an **OR** gate

In [None]:
or_weights = [2., 2.]
or_bias = -1

assert perceptron_output(or_weights, or_bias, [1, 1]) == 1
assert perceptron_output(or_weights, or_bias, [0, 1]) == 1
assert perceptron_output(or_weights, or_bias, [1, 1]) == 1
assert perceptron_output(or_weights, or_bias, [0, 0]) == 0

we could create a **NOT** gate (which has one input and converts 1 to 0 and 0 to 1)

In [None]:
not_weights = [-2.]
not_bias = 1.

assert perceptron_output(not_weights, not_bias, [0]) == 1
assert perceptron_output(not_weights, not_bias, [1]) == 0


but no matter how hard we try an **XOR** gate (that outputs 1 if exactly one of its inputs is 1 and 0 otherwise ) cannot be built. example below of a logic gate 

In [None]:
and_gate = min
or_gate = max
xor_gate = lambda x, y: 0 if x == y else 1

Just like the perceptron, each (noninput) neuron has a weight corresponding to each of its inputs and a bias. To make our representation simpler, we’ll add the bias to the end of our weights vector and give each neuron a bias input that always equals 1. As with the perceptron, for each neuron we’ll sum up the products of its inputs and its weights. But here, rather than outputting the step_function applied to that product, we’ll output a smooth approximation of the step function. In particular, we’ll use the sigmoid function

In [None]:
import math

def sigmoid(t: float) -> float:
    return 1/(1 + math.exp(-t))

In [None]:
def neuron_output(weights: Vector, inputs: Vector) -> float:
    # weights includes the bias term, inputs incliudes a 1
    return sigmoid(dot(weights, inputs))

Given this function, we can represent a neuron simply as a vector of weights whose lenght is one more than the number of inputs to that neuron ( extra bias unit).
then we can represent a neural network as a list of (noninput) layers, where each layer is just a list of the neurons in that layer.

That is, we'll represent a neural network as a list(layers) of lists (neurons) of vectors (weights).

Given such representation:


In [None]:
from typing import List

def feed_forward(neural_networks: List[List[Vector]], input_vector: Vector) -> List[Vector]:
    """
    Feeds the input vector through the neural network.
    Returns the outputs of all layers (not just the last one)
    """
    outputs: List[Vector] = []
    
    for layer in neural_networks:
        input_with_bias = input_vector + [1]             # Add bias constant
        output = [neuron_output(neuron, input_with_bias) # compute the output
                  for neuron in layer]                   # for each neuron
        outputs.append(output)                           # add to the results
        input_vector = output                            # then the input to the next layer is the output of this one
    return outputs


Now we can build the XOR gate that we couldnt build with a single perceptron. We just need to scale the weights up(theta)  so that the neuron_outputs are either really close to 0 or really close to 1:

In [None]:
xor_network = [# hidden layer 
              [[20., 20, -30], # 'and' neuron
              [20., 20, -10]], # 'or' neuron  (theta 1)
              # 'output' layer 
              [[-60., 60, -30]] # 2nd output but not 1st input neuron  (theta 2)
]

feed_forward(xor_network, [0, 0]) 

In [None]:
# feed foward returns all layers output [-1][0] returns final layer, first vector output
assert 0.000 < feed_forward(xor_network, [0, 0])[-1][0] < 0.001
assert 0.999 < feed_forward(xor_network, [1, 0])[-1][0] < 1.000
assert 0.999 < feed_forward(xor_network, [0, 1])[-1][0] < 1.000
assert 0.000 < feed_forward(xor_network, [1, 1])[-1][0] < 0.001

# see page 231 for figure

The hidden layer is computing features of the input data ( in this case and and or neuron) and the output layer is combining those features in a way that generates the desired output. 

Backpropagation

Uses gradient descent or one of its variants to train the neural network. Imagine our neural network has some sets of weights. We then adjust the weights using the following step by step algorithm:

1 - Run feed_foward on an input vector to produce the outputs of all the neurons in the network
2 - We know the target output, so we can compute a loss that is the sum of the squared errors
3 - Compute the gradient of this loss as a function of the output neuron's weights
4 - "Propagate" the gradients and errors backward to compute the gradient with respect to the hidden neurons' weights 
5 - Take a gradient descent step

Typically we run this algorithm many times for our entrire training set until the network converges

to start lets compute teh gradients

In [None]:
def sqerror_gradients(network: List[List[Vector]],
                      input_vector: Vector,
                      target_vector: Vector) -> List[List[Vector]]:
    """
    Given a neural network, an input vector, and a target vector,
    make a prediction and compute the gradient of the squared error
    loss with respect to the neuron weights.
    """
    # forward pass
    hidden_outputs, outputs = feed_forward(network, input_vector)

    # gradients with respect to output neuron pre-activation outputs
    output_deltas = [output * (1 - output) * (output - target)
                     for output, target in zip(outputs, target_vector)]

    # gradients with respect to output neuron weights
    output_grads = [[output_deltas[i] * hidden_output
                     for hidden_output in hidden_outputs + [1]]
                    for i, output_neuron in enumerate(network[-1])]

    # gradients with respect to hidden neuron pre-activation outputs
    hidden_deltas = [hidden_output * (1 - hidden_output) *
                         dot(output_deltas, [n[i] for n in network[-1]])
                     for i, hidden_output in enumerate(hidden_outputs)]

    # gradients with respect to hidden neuron weights
    hidden_grads = [[hidden_deltas[i] * input for input in input_vector + [1]]
                    for i, hidden_neuron in enumerate(network[0])]

    return [hidden_grads, output_grads]

Lets try to learn the XOR network we previously designed by hand. We'l start by generatiing traning data an dinitializing our neural network with random weights. 

In [None]:
import random
random.seed(0)

In [None]:
# training data
xs = [[0., 0.], [0., 1], [1., 0], [1., 1]]
ys = [[0.], [1.], [1.], [0.]]

# start with random weights (thetas)
network = [# hidden layer: 2 inputs -> 2 outputs
            [[random.random() for _ in range(2 + 1)],
           # 1st hidden neuron 
            [random.random() for _ in range(2 + 1)]],
           # 2nd hidden neuron
         # output layer: 2 inputs -> 1 output
        [[random.random() for _ in range(2 +1)]]
    # 1st output neuron
]


As usual, we can train it using gradient descent. One difference from previous examples, here we have several parameters vectors, each with its own gradient, which means we'll have to call gradient_step for each of them. 

In [None]:
from gradient_descent import gradient_step

learning_rate = 1.0

for epoch in range(20_000):
    for x, y in zip(xs, ys):
        gradients = sqerror_gradients(network, x, y)
        # take a gradient step for each neuron in each layer
        network = [[gradient_step(neuron, grad, -learning_rate)
                    for neuron, grad in zip(layer, layer_grad)]
                  for layer,layer_grad in zip(network, gradients)]

In [None]:
assert feed_forward(network, [0, 0])[-1][0] < 0.01
assert feed_forward(network, [0, 1])[-1][0] > 0.99
assert feed_forward(network, [1, 0])[-1][0] > 0.99

In [None]:
feed_forward(network, [1, 0])[-1][0]

In [None]:
network