## Perceptrons

A perceptron takes in multiple binary inputs $x_1, x_2, ..., x_n$ and returns a binary output.  
  
This means that a perceptron has is a map $f: \mathbb{Z}^n_2 \rightarrow \mathbb{Z}_2$ . 
  
In order to convert these binary inputs into a binary ouput we use weights. $w_1, w_2, ..., w_n$ where $w_j \in \mathbb{R}$. These weights are used to scale the inputs then summed to determine what the output should be based on if they are above or below a defined threshold.  
  
output = $\begin{cases} \text{0    if } \Sigma_j w_j x_j \leq \text{threshold} \\ \text{1    if } \Sigma_j w_j x_j > \text{threshold} \end{cases}$ . 
  
So the aim is to use the degrees of freedom given by the weights and the thresholds to come up with a model to make our decisions.  
  
### Network of perceptrons 
  
To make the percepton model more plausible in it's decision making we mimic the approach that the brain uses. So we come up with layers of connected perceptons, which allows for ever more subtle and complex decisions to be made in later layers. When we consider the first layer all it can do is a dot product so can't model very complex deccisions. Yet by taking the output of this first layer and passing them into the input as a second layer allows the second layer to produce more complex decisions.  
  
### Mathmatical tidy up  
  
Previously spoke about the arbitary threshold, the more colloquial way to describe this is a bias where $b \equiv -\text{threshold}$ . 
  
This means we can rewrite the previous equation as:  
  
output = $\begin{cases} \text{0    if } w \cdot x + b \leq 0\\ \text{1    if } w \cdot x + b > 0 \end{cases}$ . 
  
### As logical operators  
  
Whilst perceptrons can be used to weigh up evidence as described above, they can also be used to implement basic logical functions. Consider a single perceptron with two inputs where $w_0 = w_1 = -2$ and $b = 3$. In this instance if the input is $00, 10 \text{ or } 01$ then the perceptron will output $1$ and if the input is $11$ the output is $0$. This means the above perceptron is a NAND gate. This is useful as the NAND gate is 'Functionally complete', means any truth table can be made out of combinations of NAND gates.  
  
This shows the benefit of the nework of perceptrons as at the simplest level we can build any logical circuit if needed. However, the benefit over logical operators comes from the ability to tune the weights and biases and therefore model even more complex decisions.

## Sigmoid neurons  
  
To implement a learning algorithm we want is an algoritm that automatically adjusts the weights and biases of the network of neurons. 
Ideally what we desire, to implement this learning algorithm, is for a small change in the weights and biases to cause a corresponding small change in the networks output, to allow for gradual improvements.  

If the network exhibits this property we could make these gradual improvements is by comparing the output we got to the desired output and then make a small change to the weights and biases to nudge it closer to the desired output.  

Perceptron netowrks however do not exhibit this behaviour. A small change can cause a complete flip in the output of the perceptron. This makes the network hard to control - "train".  
   
The first change that we make to allow for a small change to cause a small change in the output is to change the *activation function* for the neuron. Now we define the neuron's activation function as a mapping:  
  
$f: [0,1]^n \rightarrow [0,1]$  
  
For a sigmoid function the specific activation function is:  
  
$\sigma(w \cdot x + b)$  
  
where:  
  
$\sigma(z) \equiv \frac{1}{1 + e^{-z}}$  
  
can see that in the limits when $z << 0$ we get $\sigma(z) \approx 0$ and when $z >>0$ we get $\sigma(z) \approx 1$. This show that in the extreme cases the sigmoid neuron behaves similar to the perceptron. However, between these limits we gain more control of the neuron.

## Neural network architecture  
  
*Terminology for neuron network, input layer and output layer are self explanatory and the hidden layers are every other layer.*  

Often the design of the input layer is a given as it matches the space of the input data (likewise for the output).  