## Perceptrons : the basis of Artificial Neural networks

## Artificial Intelligence 1 , week 7

This week:  Artificial Neural Networks part 1
1. Intro video
2. Then taking you through how they developed, step by step
  1. Logical calculus 
  1. perceptrons 
  1. multilayer perceptrons
3. Biological metaphor



Video. https://www.youtube.com/watch?v=bxe2T-V8XRs
[![Simple introduction to artificial neural networks](https://img.youtube.com/vi/v=bxe2T-V8XRs/0.jpg)](https://www.youtube.com/watch?v=bxe2T-V8XRs)
    

<h2> The development of neural networks </h2>

## Step 1: Linking logic with computational units <img src="figures/Perceptron.png" style = "float:right" width=40%>
- McCulloch and Pitts: the Logical Calculus:  
  1940’s, predates high level programming languages
- Link the logic of the “mind” with the functioning of the brain

    UNIT with inputs 

    OUTPUT  ‘all or none’ output 

    CONNECTION - weighted  
    
    THRESHOLD = sum of all weighted inputs needs to meet this for unit to fire  


## Logical Calculus Units <img src="figures/Perceptron.png" style = "float:right">

- INPUT: 0 or 1
- OUTPUT: 0 or 1
- SYNAPSE = weighted connection:  
  w<sub>1</sub>, w<sub>2</sub>, bias_weight are all either  +1,0 or -1
- THRESHOLD = 0
- BIAS INPUT – clamped at 1  
  weighting this lets us change the effective threshold

- Output = 1 if sum of inputs+bias >0  
  Output = 0 if sum of inputs +bias  < 0
  
The code cell below implements this.  
To be consistency with the format for models in sklearn, we will call the 
output() function predict()


In [1]:
class two_input_logical_calculus_unit:
    def  __init__(self,weight1:int, weight2:int,biasweight:int):
        valid = [-1,0,1]
        if(weight1 not in valid or weight2 not in valid or biasweight not in valid):
            return("Error,  weights can only be +1, -1 or 0")
        else:
            self.weight1 = weight1
            self.weight2 = weight2
            self.biasweight = biasweight
        
        
    def predict(self,input1:int,input2:int) -> int:
        valid = [0,1]
        if (input1 not in valid or input2 not in valid):
            return("Errors, inputs must be 0 or 1")
        else:
            summedInput = input1*self.weight1 +input2*self.weight2 + 1*self.biasweight
            if summedInput >0:
                return 1
            else:
                return 0

        

## How the logical calculus works

Take the connectives from logic ( OR, AND, NOR, NAND, XOR)

Look at their truth tables

Input1 value | Input2 Value | OR(in1,In2) | AND (In1,In2) |NOR(In1,In2) | NAND( In1,In2) | XOR (In1,Inb2)
-------------|--------------|-------------|---------------|-------------|----------------|---------------
0            | 0            | 0           |    0          | 1           | 1              | 0
0            | 1            | 1           | 0             | 0           | 1              | 1
1            | 0            | 1           | 0             | 0           | 1              | 1     
1            | 1            | 1           | 1             | 0           | 0              | 0     

## Hand-crafted examples
 <img src = "figures/logical_calculus_or.png" style="float:right" width = 30%> 
 
 OR has no bias,   
 so is off by default (zero inputs).    
 But a signal from  **either** input is enough to turn the output on.
 

 <img src="figures/logical_calculus_nor.png" style="float:right" width = 30%>
 
 NOR is the opposite: A bias of +1 turns it on with no inputs.  
 But weights from both inputs are negative, so a signal from **either**  is enough to turn it off
 

 <img src = "figures/logical_calculus_and.png" style="float:right" width = 30%>
 
 AND has a bias of -1,   
 so it is off unless there is a positive signal from **both** inputs.    
 **Note** bias weight is only difference between AND and OR.
 

In [2]:
orUnit  = two_input_logical_calculus_unit( 1,1,0)
andUnit = two_input_logical_calculus_unit( 1,1,-1)
norUnit = two_input_logical_calculus_unit( -1,-1,1)

for in1 in range(2):
    for in2 in range(2):
        inputs = "inputs = {}{}:  ".format(in1,in2)
        print ("  {}myOR = {}, myAND = {}, myNOR = {}".format(inputs,orUnit.predict(in1,in2),andUnit.predict(in1,in2),norUnit.predict(in1,in2)))


  inputs = 00:  myOR = 0, myAND = 0, myNOR = 1
  inputs = 01:  myOR = 1, myAND = 0, myNOR = 0
  inputs = 10:  myOR = 1, myAND = 0, myNOR = 0
  inputs = 11:  myOR = 1, myAND = 1, myNOR = 0


## What about XOR ?

- no set of weights makes **one** unit behave like XOR
- but we can make it by combining others:    
  XOR(in1,in2) = AND ( OR(in1,in2), NAND(in1,in2) )
    
## What's the big deal? 
Logical Calculus units demonstrate how **logic** (true/false) can be implemented by **computation**
- and so you can express  logical structures in terms of networks of computational units

Transistors can easily be configured to implement logical calculus units
- and so we get modern "Integrated Circuits"  which may contain millions of transistors
 - Apple's new M1 chip has 16 billion transistors!  
Hence the modern digital computer

## These would be great for building models but…

How do we choose the weights? 

Can we get the computer to do it for us?
 - Space would be all possible configurations of nodes connected by +/- 1 weights
 
Rosenblatt claimed to have the answer in the 1960s…
Based on a simple model of how a nerve cell works


## Step 2: Inspiration from biology
What are computers good at?
- calculation 
- memory.

BUT traditional architecture is very poor at some tasks
- tasks that we accomplish with ease. 
- Input: Recognition
- Output: Control

So can we use biological metaphors to guide engineering decisions, and make more useful tools?

**Artificial Neural Networks** are algorithms and architectures inspired by the functioning of the biological nervous system.  


## Basic neurobiology <img src="figures/neuron.jpg" style = "float:right">

**Neurons** (nerve cells):  
about 10^11 (100 billion) in humans 
/// coimpare to chat gbt///

**Dendrites** – incoming activity

**Axon** – output of neuron


**Synapses** – connection from axon of one neuron to dendrite of another 
- 100's to 1000’s per neurone
- Each synapse has a weight

**Action potential** - electrical signal.  
In the soma the incoming electrical activity is summed over
- space (many dendrites connected to other neurons)
- time (not usually in ANN except 'spiking neural networks')
-  if the sum exceeds a **threshold** then ‘all or none’ output,   travels down axon and then is connected via synapses to dendrites of other neurones  where it then adds to their inputs ...



**The synaptic weights can be changed during learning (or forgetting)**



## Now answer the multiple choice questions

electrical signals reach the cell body through the:   {soma| dendrites| axons | myelin sheath]

number of cells in your brain is: [much less than | about the same as | much bigger than] the number of stars in our galaxy


number of connections in your brain is: [much less than | about the same as | much bigger than] the number of stars in our galaxy

Perceptron Research from the 50's & 60's, 1 min clip. https://www.youtube.com/watch?v=cNxadbrN_aI

[![Perceptron Research from the 50's & 60's, clip](https://img.youtube.com/vi/v=cNxadbrN_aI/0.jpg)]( https://www.youtube.com/watch?v=cNxadbrN_aI)


## Step 3: The Rosenblatt Perceptron - the simplest Artificial Neural Network <img src="figures/Perceptron.png" style="float:right">

Similar to the logical calculus BUT **weights can be any real number**. 

However only 1 'layer' : just inputs and units.   
units cannot be connected to other units, so there is 1 unit per output.

Code is just like above, but now weights can be floats not constrained [-1,0,1]

### Rosenblatt proposed an algorithm to get the computer to choose the weights. 
- Once this was done, ////perceptrons///// can handle problems  e.g. *learning* image recognition (letters)
- The idea is to **supervise** the procedure (sounds familiar?) 
 - give the perceptron a list of inputs and DESIRED outputs. 
 - the perceptron will learn from that and generalize to previously unseen inputs.


## Perceptron Training Law

    ∆ω = ε · i · α

change in weight = error X input X learning rate.

So this means that 
- Error = target-actual, can be negative 
- Weights only change when there is an error 
- Only active inputs are changed. 
- Inactive (x=0) inputs are not changed at all (which makes sense since they did not contribute to the error). 

See AI illuminated p297+ for a worked example


In [3]:
class two_input_perceptron:
    def  __init__(self,weight1, weight2,biasweight, learningRate):
        self.weight1 = weight1
        self.weight2 = weight2
        self.biasweight = biasweight
        self.learningRate = learningRate ## <== this is new
        
    def predict(self,input1:int,input2:int) -> int:
        summedInput = input1*self.weight1 +input2*self.weight2 + self.biasweight
        if summedInput>0:
            return 1
        else:
            return 0

    def update_weights( self, in1,in2,target):
        error = target - self.predict(in1,in2)
        if(error != 0):
            self.biasweight += error * 1 *self.learningRate # bias is always +1
            if (in1>0):
                self.weight1 += error * in1 * self.learningRate
            if (in2>0):
                self.weight2 += error * in2 * self.learningRate           
            return 1
        else:
            return 0     ## <=let the calling fuinction know if it made the right prediction

## Now answer these questions:

Perceptrons are useful because they can learn rather than needing to be hand-coded [True| False]

The perceptron's output is: 
- +1 if the sum of the inputs is more than 0
- +1 if the weighted sum of the inputs is more than 0
- equal to the weighted sum of the inputs
- +1 if the sum of thwe weights is more than 0

The weight from an input i to the perceptron is not changed when we present a data example ..
- If the output for this example is correct
- If the value of feature i for this case is 0
- Both the above
- Neither of the above

## The promise of perceptrons
- If the perceptron can handle a problem, then the perceptron is guaranteed to find an answer

        The perceptron convergence theorem

- Works for OR, AND, NOT and many demonstration problems…


In [5]:
from random import random

# four rows of test cases,   third column is the right answer
andData = [[0,0,0],[0,1,1],[1,0,1],[1,1,1]]

# start with random weights
w0=random()
w1 = random()
w2 = random()
print("starting with initial random weights {:.4f}, {:.4f} and {:.4f}".format(w1,w2,w0))

myPerceptron = two_input_perceptron(w1,w2,w0,0.1)

# just keep presenting the test cases nd updating until there are no errors
for epoch in range(50):
    errors = 0
    for testcase in range(4):
        errors += myPerceptron.update_weights(andData[testcase][0], andData[testcase][1],andData[testcase][2])
    if(errors >0):
        print("in epoch {} there were {} errors".format(epoch,errors))
    else:
        print(" Perceptron solved the learning problem in {} epochs".format(epoch))
        break
    

starting with initial random weights 0.5653, 0.4213 and 0.0100
in epoch 0 there were 1 errors
 Perceptron solved the learning problem in 1 epochs


# Do you think perceptrons will be able to learn XOR?

## The problem with perceptrons <img src="figures/linearly_seperable.png" style= "float:right" width = 50%>
The Minsky & Papert book 'perceptrons' showed in detail the limitations of perceptrons 
- It only deals with linearly separable tasks.  
  **more on this next week**
- So cannot deal with XOR… 
-  and pretty much all real world problems.

Rosenblatt was aware of this but didn't know how to fix it …
- neural network research went into a decline in the 1970s.


## The answer ? ... add layers

You can solve XOR with logical calculus units
XOR(a,b) = AND(  OR(a,b) , NAND(a,b). )

So why not just do that for perceptrons?

## The catch ...

Training a single layer of perceptrons is easy
- We can measure what the outputs actually are
- We can apply the perceptron update rule because
- We know what the outputs should be


Training a multi-layer perceptron is harder
- We can measure what the *outputs* actually are
- so we can apply the perceptron update rule to thje last layer
 - But what should the output from the hidden layers be?
 

 1:46 video clip with nice animations
 [![Video of simple neural netowrk from oolitionTech technologies](https://img.youtube.com/vi/v=gcK_5x2KsLA/0.jpg)](https://www.youtube.com/watch?v=gcK_5x2KsLA)
 

## Summary
- Logical Calculus units simulate logical operations but need hand-crafting
- Nature offers us inspiration in the form of nerve cells
- Perceptrons with training offer a way of automatically creating single-unit systems that can learn!
- Multi-layer perceptrons offer a way of creating more complicated systems

## Next Week:  Perceptons as linear classifiers, Multi-Layer Perceptrons
- Architecture
- Mathematical foundations
- Search/training algorithms  
  (**don’t panic!**)


Practicalities of using neural networks 
