# Neural Networks

An **artificial neural network** (or neural network for short) is a predictive model motivated by the way the brain operates. Think of the brain as a collection of neurons wired together. Each neuron looks at the outputs of the other neurons that feed into it, does a calculation, and then either fires (if the calculation exceeds some threshhold) or doesn’t (if it doesn’t).

Accordingly, artificial neural networks consist of artificial neurons, which perform similar calculations over their inputs. Neural networks can solve a wide variety of problems like handwriting recognition and face detection, and they are used heavily in `deep learning`, one of the trendiest subfields of data science. However, most neural networks are `“black boxes”` — inspecting their details doesn’t give you much understanding of how they’re solving a problem.

## Perceptron

Pretty much the simplest neural network is the **perceptron**, which approximates a sin‐ gle neuron with n binary inputs. It computes a weighted sum of its inputs and “fires” if that weighted sum is zero or greater.

In [1]:
from scratch.linear_algebra import Vector, dot

def step_function(x: float) -> float:
    return 1.0 if x >= 0 else 0.0

In [2]:
def perceptron_output(weights: Vector, bias: float, x: Vector) -> float:
    """Returns 1 if the perceptron 'fires', 0 if not"""
    calculation = dot(weights, x) + bias
    return step_function(calculation)

With properly chosen weights, perceptrons can solve a number of simple problems. For example, we can create an `AND gate` (which returns 1 if both its inputs are 1 but returns 0 if one of its inputs is 0):

In [3]:
and_weights = [2., 2]
and_bias = -3.

assert perceptron_output(and_weights, and_bias, [1, 1]) == 1
assert perceptron_output(and_weights, and_bias, [0, 1]) == 0
assert perceptron_output(and_weights, and_bias, [1, 0]) == 0
assert perceptron_output(and_weights, and_bias, [0, 0]) == 0

Similarly, we could build an `OR gate`:

In [4]:
or_weights = [2., 2]
or_bias = -1.

assert perceptron_output(or_weights, or_bias, [1, 1]) == 1
assert perceptron_output(or_weights, or_bias, [0, 1]) == 1
assert perceptron_output(or_weights, or_bias, [1, 0]) == 1
assert perceptron_output(or_weights, or_bias, [0, 0]) == 0

And we could build a `NOT gate` (which has one input and converts 1 to 0 and 0 to 1):

In [5]:
not_weights = [-2.]
not_bias = 1.

assert perceptron_output(not_weights, not_bias, [0]) == 1
assert perceptron_output(not_weights, not_bias, [1]) == 0

<img src="images/neural_networks1.png" alt="" style="width: 600px;"/>


However, there are some problems that simply can’t be solved by a single perceptron. For example, no matter how hard you try, you cannot use a perceptron to build an `XOR gate` that outputs 1 if exactly one of its inputs is 1 and 0 otherwise. This is where we start needing more-complicated neural networks.

In [6]:
# XOR logic gate without artificial neurons
and_gate = min
or_gate = max
xor_gate = lambda x, y: 0 if x == y else 1

In [10]:
assert xor_gate(0, 1) == 1
assert xor_gate(1, 1) == 0
assert xor_gate(1, 0) == 1
assert xor_gate(0, 0) == 0

## Feed-Forward Neural Network

The topology of the brain is enormously complicated, so it’s common to approximate it with an idealized **feed-forward** neural network that consists of discrete layers of neurons, each connected to the next. This typically entails an `input layer` (which receives inputs and feeds them forward unchanged), one or more `“hidden layers”` (each of which consists of neurons that take the outputs of the previous layer, performs some calculation, and passes the result to the next layer), and an `output layer` (which produces the final outputs).

Just like the perceptron, each (noninput) neuron has a` weight` corresponding to each of its inputs and a `bias`. To make our representation simpler, we’ll add the bias to the end of our weights vector and give each neuron a bias input that always equals 1.

As with the `perceptron`, for each neuron we’ll sum up the products of its inputs and its weights. But here, rather than outputting the step_function applied to that prod‐ uct, we’ll output `a smooth approximation of the step function`. In particular, we’ll use the `sigmoid function`:

<img src="images/neural_networks2.png" alt="" style="width: 600px;"/>


In [11]:
import math

def sigmoid(t: float) -> float:
    return 1 / (1 + math.exp(-t))

In [12]:
sigmoid(-5)

0.0066928509242848554

In [13]:
sigmoid(0)

0.5

In [14]:
sigmoid(2)

0.8807970779778823

`Why use sigmoid instead of the simpler step_function?` In order to train a neural network, we’ll need to use calculus, and in order to use calculus, we need smooth functions. The step function isn’t even continuous, and sigmoid is a good smooth approximation of it.

Technically `sigmoid` refers to the shape of the function, `logistic` to this particular function although people often use the terms interchangeably.

In [15]:
def neuron_output(weights: Vector, inputs: Vector) -> float:
    # weights includes the bias term, inputs includes a 1
    return sigmoid(dot(weights, inputs))

Given this function, `we can represent a neuron` simply as a list of weights whose length is one more than the number of inputs to that neuron (because of the bias weight). Then `we can represent a neural network` as a list of (noninput) layers, where each layer is just a list of the neurons in that layer.

That is, we’ll represent a neural network as a list (layers) of lists (neurons) of lists (weights).

In [16]:
from typing import List

def feed_forward(neural_network: List[List[Vector]], input_vector: Vector) -> List[Vector]:
    '''
    Feeds the input vector through the neural network.
    Returns the outputs of all layers (not just the last one)
    '''
    outputs: List[Vector] = []
        
    for layer in neural_network:
        # Add a constant
        input_with_bias = input_vector + [1]
        # Compute the output for each neuron
        output = [neuron_output(neuron, input_with_bias) for neuron in layer]
        # Add to results
        outputs.append(output)
        
        # Then the input to the next layer is the output of this one
        input_vector = output
        
    return outputs

Now it’s easy to build the `XOR gate` that we couldn’t build with a single perceptron. We just need to scale the weights up so that the neuron_outputs are either really close to 0 or really close to 1.

<img src="images/neural_networks3.png" alt="" style="width: 600px;"/>


In [20]:
xor_network = [      # hidden layer 
    [[20, 20, -30],  # 'and' neuron
     [20, 20, -10]], # 'or' neuron
                     # output layer 
    [[-60, 60, -30]]]# '2nd input but not 1st input' neuron

# feed_forward returns the outputs of all layers, so the [-1] get the final output,
# and the[0] gets the value of the resulting vector

assert 0.000 < feed_forward(xor_network, [0, 0])[-1][0] < 0.001 # 0
assert 0.999 < feed_forward(xor_network, [1, 0])[-1][0] < 1.000 # 1
assert 0.999 < feed_forward(xor_network, [0, 1])[-1][0] < 1.000 # 1
assert 0.000 < feed_forward(xor_network, [1, 1])[-1][0] < 0.001 # 0

## Backpropagation

Usually we don't build neural networks by hand, because we use them to solve bigger problems (image recognition) and because we usually won't be able to 'reason out' what the neurons should be. Instead, we use data to `train neural networks`. The typical approach is an algorithm called **backpropagation**, which uses gradient descent or one of its variants.