## From biological neurons to ANNs

Neurons can be abstracted to three core elements: a **cell body (soma)**, an **axon** and **dendrites**.

Dendrites take in electrical input from other firing neurons. The cell body accumulates electrical potential until a **threshold** is reached and the neuron spikes an action potential, leading to an electric discharge via the axon, which will eventually reach the dendrites of other neurons.


### Neuron and Computation

How do neurons when reduced to these characteristics relate to computation? Already in 1943 [Warren McCulloch and Walter Pitts](https://link.springer.com/article/10.1007/BF02478259) studied the relation between abstract neurons and logical operators such as AND, OR and NOT. Since all of logic can be reduced to this minimal set of operators, combining neurons in a network allows to implement arbitrary logical computations based on inputs.

A McCulloch-Pitts neuron outputs either 0 or 1 just as a logical operator/function would.

## Logical functions

Let's implement an arbitrary logical function that takes three boolean truth value arguments **a**, **b**, and **c** and evaluates the following logical expression:

$$ \neg a \rightarrow (b \land \neg c) $$

which is also equivalent to

$$ a \lor (b \land \neg c) $$

In [3]:
def logical_function(arguments:tuple):
    
    # not(a) -> (b and not c)
    
    a,b,c = arguments
    
    if a:
        return 1.
    else:
        if (b and not c):
            return 1
        else:
            return 0
        
all_possible_arguments = \
            [(0,0,0),
             (1,0,0),
             (1,1,0),
             (1,1,1),
             (0,1,0),
             (0,1,1),
             (0,0,1)]

logical_function((0,1,0))

1

Now we want to see that a **perceptron** can express this function.

Before applying a threshold function (activation function), the output of a perceptron is given by

$$ o = \sum_i^m W_i x_i + b$$

or, in matrix notation:

$$ o = W x  + b$$

Note that you often also see $o = x W + b$ - it all depends on whether the input vector x is a row or a column vector. 

On this scalar output, we apply the thresholding function to determine whether the perceptron unit is "firing" (it outputs 1) or not (it outputs 0). The following threshold function is known as the **heaviside-step function**:

$$
\sigma(x) = \begin{cases}
    1 ,& \text{if } x > 0\\
    0,              & \text{otherwise}
\end{cases}
$$

To relate this back to coarse biological inspiration, we can read the weights W as analogous to the synaptic strengths of dendritic connections, and the bias as shifting the threshold.

This is all we need to know in order to define a perceptron in code:

In [4]:
import numpy as np

def heaviside_step(x):
    """
    Hard binary threshold function.
    
    A unit fires if a threshold of 0 is surpassed.
    """
    return 1. if x > 0 else 0.

class Perceptron(object):
    def __init__(self, weight_values, bias, threshold_function=heaviside_step):
        
        self.weights = np.array([weight_values], dtype=np.float32)
        self.bias = bias
        self.threshold_function = threshold_function
    
    def __call__(self, x:np.ndarray):
        
        o = self.weights @ x + self.bias # where "@" is matrix multiplication (or dot product)
        
        return self.threshold_function(o)

Can we configure the perceptron to express the logical function from above?

In [5]:
p_weights = [1,1,1] #[2, 1, -1]
p_bias = 0

perceptron = Perceptron(weight_values=p_weights, bias=p_bias)

In [6]:
assert np.all(
    [logical_function(x) == perceptron(np.array(x)) for x in all_possible_arguments]), \
"the perceptron is not functionally equivalent ot the logical function"

AssertionError: the perceptron is not functionally equivalent ot the logical function

## The logic of (binary) Multilayer Perceptrons (MLP)

What about the following logical function?

$$ (a \lor b) \land \neg(a \land b)$$

This is called the XOR operator for "exclusive or".

It turns out that a single perceptron can not express XOR since it is not linearly separable. Perceptrons can only express a linear function.

The **linear** separatrix (decision boundary) of a perceptron is given by
$$ w_1x_1+w_2x_2+...+w_nx_n+b = 0$$


![https://929687.smushcdn.com/2633864/wp-content/uploads/2021/04/bitwise_datasets-1024x365.png?lossy=1&strip=1&webp=1](https://929687.smushcdn.com/2633864/wp-content/uploads/2021/04/bitwise_datasets-1024x365.png?lossy=1&strip=1&webp=1)

To express this non-linear function, we can stack perceptrons on top of each other, where one perceptron can take the outputs of other perceptrons as its input arguments. This is exactly equivalent to how we can combine logical operators to create composite logical functions.

In [7]:
def xor(x):
    a,b = x
    return (a or b) and (not (a and b))

all_possible_arguments = [
    (0,0),
    (0,1),
    (1,0),
    (1,1)
]

In [8]:
# a or b
p1_weights = [1,1]
p1_bias = 0

# a and b
p2_weights = [0.5,0.5]
p2_bias = -0.5

# p1 and not p2
p3_weights = [0.5,-0.5]
p3_bias = 0

perceptron_1 = Perceptron(p1_weights, p1_bias)
perceptron_2 = Perceptron(p2_weights, p2_bias)
perceptron_3 = Perceptron(p3_weights, p3_bias)

def wire_perceptrons(x):
    """
    perceptron 1 and 2 take (a,b) as inputs.
    perceptron 3 takes the outputs of perceptron 1 and perceptron 2 as inputs.
    """
    
    return perceptron_3([perceptron_1(x),perceptron_2(x)])

# evaluate that our simple handwired network implements xor
np.all([xor(x) == wire_perceptrons(np.array(x)) for x in all_possible_arguments])

True

We just implemented an artificial neural network that implements a non-linear logical function by setting all the weights by hand.

What about other kinds of functions, like non-binary functions? Currently our perceptron can only output 1 or 0..

We need to use a continuous valued threshold function. One class of such activation functions is logistic functions.

## Sigmoid activation function

$$\sigma(x) = \frac{1}{1+e^{-x}}$$

- saturates towards 0 and 1
- can be read as a probability

## Tanh activation function

$$ \sigma(x) = \frac{(e^x – e^{-x})}{(e^x + e^{-x})} $$

- saturates at -1 and 1


## No activation function (sometimes referred to as linear activation)

$$ \sigma(x) = x $$

- can lead to arbitrary output values depending on what the pre-activation is

In [9]:
def sigmoid(x):
    return 1/(1+np.exp(-x))

def tanh(x):
    return (np.exp(x)-np.exp(-x))/(np.exp(x) + np.exp(-x))

def linear(x):
    return x

In [10]:
perceptron = Perceptron(weight_values=[0.5, -0.6, 0.1], bias=0, threshold_function=sigmoid)

perceptron(np.array([-3,0,3]))

array([0.23147522])

## (Next week)

## Layers of perceptrons
 
In the XOR example we had two perceptrons that took the raw values (a,b) as inputs. These can be considered to constitute a **layer** of two perceptrons. For readability and for matrix multiplication efficiency we want to implement a layer of perceptrons as a single object instead of having an object for each unit.

In [12]:
class PerceptronLayer(object):
    def __init__(self, weights, biases, activation_function=sigmoid):
        
        self.weights = np.array(weights)
        self.bias = np.array(biases)
        self.activation_function = activation_function
        
    def __call__(self, x):
        
        output = (self.weights @ x + self.bias)
        
        return self.activation_function(output)

Looks the exact same as before, right??

This time, we pass a different shape of the weights array, turning the dot product into a matrix-vector multiplication.

Let's instantiate a layer of 4 perceptrons that each connect to 3 input values.

In [14]:
weights = np.array([[1,3,4],
                    [-2,-4,1],
                    [3,1,9],
                    [0,1,0]])

biases = np.array([0,0,0,0])

print(f"weights have shape {weights.shape}")

perceptron_layer = PerceptronLayer(weights, biases)

weights have shape (4, 3)


In [None]:
print(perceptron_layer([0.4, 1.2, 3.1]))

From the layer with 4 units we get a 4 dimensional vector instead of a single scalar as the output. 

How can we stack these layers and make sure that everything is going to work?

All we need to do is to make sure we obey the rules of matrix multiplication!

Matrix multiplication only works if the inner dimensions of the two matrices match.

Given a matrix **W** (the weights of our layer) and a vector **x** (the inputs to the layer), we need the dimensionality of **x** to match the number of columns in the matrix. The output then is a vector with a dimensionality of the number of rows of **W**. This can be flipped if we use $o = x W + b$ instead or we use $o = W^T x + b$. In the latter case we can read the weights shape as (n inputs, n outputs) which may be more intuitive. Here we assume the arguably more unintuitive version in which the weights have shape (n outputs, n inputs).

If we have data x with shape (10,) and a first layer with a weight shape of (5,10), we get a vector of shape (5,) as the output. If we want to apply another layer to this output, its weights need to be of shape (n_units, 5).

Let's define such a network, this time using randomly initialized weights so we can also scale it up:

In [15]:
def random_weights(*shape):
    return np.random.normal(size=shape)

class MultiLayerPerceptron(object):
    
    def __init__(self, n_inputs):
        
        self.layer_1 = PerceptronLayer(weights=random_weights(64, n_inputs),
                                       biases=np.zeros(64))
        
        self.layer_2 = PerceptronLayer(weights=random_weights(32, 64),
                                       biases=np.zeros(32))
        
        self.layer_3 = PerceptronLayer(weights=random_weights(16, 32),
                                       biases=np.zeros(16))
        
        self.output_layer = PerceptronLayer(weights=random_weights(1, 16),
                                            biases=np.zeros(1),
                                            activation_function=linear)

    def __call__(self, x):
        
        x = self.layer_1(x)
        x = self.layer_2(x)
        x = self.layer_3(x)
        
        return self.output_layer(x)

In [16]:
mlp = MultiLayerPerceptron(n_inputs=3)

observation = [1, -1, 0.5]
prediction = mlp(observation)
print(prediction)

[1.44910641]


Now suppose we have an entire set of observations and we know the true values for what we want to predict.

**How can we adapt the weights (and the bias values) of our multi-layer perceptron to match the predictions to the ground truth given the observations?**

We will discuss how artificial neural networks can be optimized in next week's flipped classroom!