# Lab 1

### Objectives

1. Understand and code a simple neuron
2. Understand how a neuron learns
3. Understand its limitations

In [43]:
import numpy as np
import matplotlib.pyplot as plt

Steps for this neuron:
1. Each input is multiplied by a weight.
2. All the weighted inputs are added together with a bias b.
3. The sum is passed through an activation function (in this case a sigmoid function).
This process of passing inputs forward to get an output is known as feedforward.

![Simple neuron for lab 1](i/simple-neuron-lab1.svg)

Source: https://towardsdatascience.com/machine-learning-for-beginners-an-introduction-to-neural-networks-d49f22d238f9 

The first step in coding this is to set random initial weights and a random bias. Note how the shape of W and B match the structure of the neuron shown in the image above. There are 2 inputs, one layer and one output.

In [44]:
W = np.random.randn(1, 2)
B = np.random.randn(1)

print("Weights: ", W)
print("Bias: ", B)

Weights:  [[-0.15571303  0.76570695]]
Bias:  [1.59749211]


This is our activation function. We are using a sigmoid function. You can choose the function (e.g. could be a step function, tanh etc). Source: https://towardsdatascience.com/statistics-is-freaking-hard-wtf-is-activation-function-df8342cdf292

This takes in the inputs (X), weights (W) and bias (B). For an input, X, of length _n_ the output is size _(n, n)_.

In [45]:
def sigm(X, W, B):
    M = 1/(1+np.exp(-(X.dot(W.T)+B)))
    return M

### Putting this into code
Uptating rules for weights and bias.

Example arrays of dB and dW:
```
dB:  [ 0.23964142 -1.53671735  0.23964142  0.23964142  0.23964142  0.23964142
  0.23964142  0.23964142 -1.53671735  0.23964142  0.23964142  0.23964142
  0.23964142  0.23964142  0.23964142]

dW:  [[ 0.83152027 -0.12928806  0.83152027 -0.12928806 -0.12928806  0.83152027
   0.83152027 -0.12928806  0.83152027  0.83152027  0.83152027  0.83152027
   0.83152027  0.83152027  0.83152027]
 [ 0.63166038 -0.10823955  0.63166038 -0.10823955 -0.10823955  0.63166038
   0.63166038 -0.10823955  0.63166038  0.63166038  0.63166038  0.63166038
   0.63166038  0.63166038  0.63166038]]
  
```
diff_W() returns ```X.T.dot(dE)```

diff_B() returns ```dE.sum(axis=0)```
- This is summing all the elements in each column together, so you end up with one row




In [47]:
def diff_W(X, Z, Y, B, W):

    dS = sigm(X, W, B)*(1-sigm(X, W, B)) # differentiating sigm function
    dE = (Y-Z)*dS
    #print("dW: ", X.T.dot(dE))

    return X.T.dot(dE)

def diff_B(X, Z, Y, B, W):

    dS = sigm(X, W, B)*(1-sigm(X, W, B))
    dE = (Y-Z)*dS
    #print("dE: ", dE)
    #print("dB: ", dE.sum(axis=0))

    return dE.sum(axis=0)


Creating 15 two-dimensional samples as input e.g., [0, 1]. We apply the bitwise OR operator (|) to find the outputs and split into training and testing.

In [48]:
X = np.random.randint(2, size=[15, 2]) # produces an array size [15, 2] containing either 0 or 1
Y = np.array(X[:,0] | X[:,1] ).T

X_Test = np.random.randint(2, size=[15, 2])
Y_Test = np.array(X[:,0] | X[:,1] ).T

### Teaching the neuron to emulate the OR function.

NB: There was an error when += was used. While the arrays are broadcastable, numpy does not like trying to assign the larger array to the shape of the smaller array.

Broadcasting = When a smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations.

Sources:
- https://numpy.org/doc/stable/user/basics.broadcasting.html
- https://stackoverflow.com/questions/47493559/valueerror-non-broadcastable-output-operand-with-shape-3-1-doesnt-match-the 

Note each epoch of the variable 'error' is a (15, 15) array. This is because 'output' is a (15, 15) array.

```
diff_W size = (2, 15) ( .T = (15, 2) )
diff_B size = (15,)
W size = (1, 2)
output size = (15, 15) --> WHY IS THE OUTPUT THIS SIZE
Y size = (15,)
```

In [55]:
learning_rate = 0.1

for epoch in range(500):
    # forward propogation
    output = sigm(X, W, B)

    # back propogation
    W = W + (learning_rate * diff_W(X, output, Y, B, W).T) # 2x2 matrix
    B = B + learning_rate * diff_B(X, output, Y, B, W) # 15 element list

    error = (Y - output)**2

    #print("Error: ", error)
    #print("Weight: ", W)
    print("Bias: ", B)


Bias:  [ 4.07491201  4.07491201 -3.77817703  4.07491201 -3.77817703  4.07491201
 -3.84213079 -3.84213079 -3.98832698  3.88491568 -3.98832698  4.07491201
  3.80151589 -3.77817703  4.07491201]
Bias:  [ 4.07513818  4.07513818 -3.77844094  4.07513818 -3.77844094  4.07513818
 -3.84239205 -3.84239205 -3.98856637  3.8851723  -3.98856637  4.07513818
  3.80177771 -3.77844094  4.07513818]
Bias:  [ 4.07536424  4.07536424 -3.77870471  4.07536424 -3.77870471  4.07536424
 -3.84265317 -3.84265317 -3.98880563  3.8854288  -3.98880563  4.07536424
  3.8020394  -3.77870471  4.07536424]
Bias:  [ 4.0755902   4.0755902  -3.77896835  4.0755902  -3.77896835  4.0755902
 -3.84291415 -3.84291415 -3.98904478  3.88568516 -3.98904478  4.0755902
  3.80230096 -3.77896835  4.0755902 ]
Bias:  [ 4.07581604  4.07581604 -3.77923185  4.07581604 -3.77923185  4.07581604
 -3.84317501 -3.84317501 -3.98928381  3.88594139 -3.98928381  4.07581604
  3.80256238 -3.77923185  4.07581604]
Bias:  [ 4.07604177  4.07604177 -3.77949522  4.

### Repeating for the AND operator (=&).

In [50]:
X = np.random.randint(2, size=[15, 2])
Y = np.array(X[:,0] & X[:,1] ).T

X_Test = np.random.randint(2, size=[15, 2])
Y_Test = np.array(X[:,0] & X[:,1] ).T

learning_rate = 0.01

for epoch in range(500):
    output = sigm(X, W, B)

    W = W + (learning_rate * diff_W(X, output, Y, B, W).T)
    B = B + (learning_rate * diff_B(X, output, Y, B, W))
    
    # looking at the first output and error at each iteration
    error = (Y - output)**2
    #print("W: ", W[1, 1])
    #print("Error: ", error[1, 1])

**General comments**
The results from logical AND is very similar to logical OR.

### Repeating for the XOR operator (=^).

In [51]:
X = np.random.randint(2, size=[15, 2])
Y = np.array(X[:,0] ^ X[:,1] ).T

X_Test = np.random.randint(2, size=[15, 2])
Y_Test = np.array(X[:,0] ^ X[:,1] ).T

learning_rate = 0.1

for epoch in range(500):
    output = sigm(X, W, B)

    W += learning_rate * diff_W(X, output, Y, B, W).T
    B += learning_rate * diff_B(X, output, Y, B, W)

    # looking at the first output and error at each iteration
    error = (Y - output)**2
    
    print("Error: ", error)
    #print("Weight: ", W)
    #print("Bias: ", B)

Error:  [[0.83038285 0.83038285 0.00787595 0.83038285 0.00787595 0.83038285
  0.95329694 0.95329694 0.73064854 0.97042397 0.73064854 0.83038285
  0.0210889  0.00787595 0.83038285]
 [0.83038285 0.83038285 0.00787595 0.83038285 0.00787595 0.83038285
  0.95329694 0.95329694 0.73064854 0.97042397 0.73064854 0.83038285
  0.0210889  0.00787595 0.83038285]
 [0.38167724 0.38167724 0.14607659 0.38167724 0.14607659 0.38167724
  0.92700192 0.92700192 0.40603814 0.91362118 0.40603814 0.38167724
  0.1316157  0.14607659 0.38167724]
 [0.83038285 0.83038285 0.00787595 0.83038285 0.00787595 0.83038285
  0.95329694 0.95329694 0.73064854 0.97042397 0.73064854 0.83038285
  0.0210889  0.00787595 0.83038285]
 [0.38167724 0.38167724 0.14607659 0.38167724 0.14607659 0.38167724
  0.92700192 0.92700192 0.40603814 0.91362118 0.40603814 0.38167724
  0.1316157  0.14607659 0.38167724]
 [0.59875197 0.59875197 0.05117067 0.59875197 0.05117067 0.59875197
  0.97294785 0.97294785 0.61380114 0.96603434 0.61380114 0.59875

**Are we able to reproduce the XOR? Can you explain these results?**</br>
The learning rate has to be very high (10s of order of magnitude) to achieve similar results to the logical AND/ OR. </br></br>

The outputs of AND and OR gates are easily separable using a single line. A XOR gate needs two lines which can be achieved with a second layer of neuron/s.</br></br>

![Two lines for XOR](i/XOR.gif)

Source: https://www.quora.com/Why-cant-we-make-an-XOR-gate-with-1-neuron </br></br>

It is possible to make an artificial neural network for XOR, although results can be unreliable. See how: https://www.quora.com/How-do-I-code-XOR-gate-in-artificial-neural-network#:~:text=NOTE%3A%20If%20you%20are%20using,be%20quite%20tricky%20and%20unreliable. </br></br>


**What is the maximum accuracy we can get with XOR?**</br>
75%. See how in the image below, three of the four points have been correctly identified.</br>
<img src=".\i\best-neuron-XOR.png" width="400"> </br>

**If you really wanted to solve the XOR just using one neuron, can you think of any solutions that we could apply?**</br>
Have a high learning rate... ? But it will only ever have a max of 75% accuracy.