# Lab 1

### Objectives

1. Understand and code a simple neuron
2. Understand how a neuron learns
3. Understand its limitations

In [2]:
import numpy as np
import matplotlib.pyplot as plt

Steps for this neuron:
1. Each input is multiplied by a weight.
2. All the weighted inputs are added together with a bias b.
3. The sum is passed through an activation function (in this case a sigmoid function).
This process of passing inputs forward to get an output is known as feedforward.

![Simple neuron for lab 1](images/simple-neuron-lab1.svg)

Source: https://towardsdatascience.com/machine-learning-for-beginners-an-introduction-to-neural-networks-d49f22d238f9 

The first step in coding this is to set random initial weights and a random bias. Note how the shape of W and B match the structure of the neuron shown in the image above. There are 2 inputs, one layer and one output.

In [3]:
W = np.random.randn(1, 2)
B = np.random.randn(1)

print("Weights: ", W)
print("Bias: ", B)

Weights:  [[0.32991949 0.61625761]]
Bias:  [1.00468733]


This is our activation function. We are using a sigmoid function. You can choose the function (e.g. could be a step function, tanh etc). Source: https://towardsdatascience.com/statistics-is-freaking-hard-wtf-is-activation-function-df8342cdf292

This takes in the inputs (X), weights (W) and bias (B). For an input, X, of length _n_ the output is size _(n, n)_.

In [4]:
def sigm(X, W, B):
    M = 1/(1+np.exp(-(X.dot(W.T)+B)))
    return M

### Derivation of functions

Derive an analytical expressionn for the error function. In this lab, the squared error function is used as the error function.

This is used to make use of gradient descent in order to update the weights.

In [5]:
# insert image of derivation here

### Putting this into code
Uptating rules for weights and bias.

In [6]:
def diff_W(X, Z, Y, B, W):

    dS = sigm(X, W, B)*(1-sigm(X, W, B)) # differentiating sigm function
    dW = (Y-Z)*dS

    return X.T.dot(dW) # dot product between X transpose and dW

def diff_B(X, Z, Y, B, W):

    dS = sigm(X, W, B)*(1-sigm(X, W, B))
    dB = (Y-Z)*dS

    return dB.sum(axis=0)


Creating 15 two-dimensional samples as input e.g., [0, 1]. We apply the bitwise OR operator (|) to find the outputs and split into training and testing.

In [7]:
X = np.random.randint(2, size=[15, 2]) # produces an array size [15, 2] containing either 0 or 1
Y = np.array(X[:,0] | X[:,1] ).T

X_Test = np.random.randint(2, size=[15, 2])
Y_Test = np.array(X[:,0] | X[:,1] ).T

### Teaching the neuron to emulate the OR function.

NB: There was an error when += was used. While the arrays are broadcastable, numpy does not like trying to assign the larger array to the shape of the smaller array.

Broadcasting = When a smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations.

Sources:
- https://numpy.org/doc/stable/user/basics.broadcasting.html
- https://stackoverflow.com/questions/47493559/valueerror-non-broadcastable-output-operand-with-shape-3-1-doesnt-match-the 

Note each epoch of the variable 'error' is a (15, 15) array. This is because 'output' is a (15, 15) array.

```
diff_W size = (2, 15) ( .T = (15, 2) )
diff_B size = (15,)
W size = (1, 2)
output size = (15, 15) --> WHY IS THE OUTPUT THIS SIZE
Y size = (15,)
```

In [13]:
learning_rate = 0.1

x_plot = []
y_plot = []

for epoch in range(500):
    output = sigm(X, W, B)

    W = W + (learning_rate * diff_W(X, output, Y, B, W).T)
    B = B + learning_rate * diff_B(X, output, Y, B, W)

    error = (Y - output)**2

    print("Error: ", error)
    #print("Weight: ", W)
    #print("Bias: ", B)


Error:  [[3.64902277e-026 2.15665595e-006 3.64902277e-026 2.15665595e-006
  1.18562408e-018 2.13084055e-006 2.12652655e-006 2.12652655e-006
  8.08016464e-161 8.08016464e-161 2.12652655e-006 2.12652655e-006
  8.08016464e-161 2.13084055e-006 2.13084055e-006]
 [4.50829830e-027 1.12624487e-008 4.50829830e-027 1.12624487e-008
  2.58403272e-019 1.16826277e-008 1.41729982e-008 1.41729982e-008
  4.95128620e-164 4.95128620e-164 1.41729982e-008 1.41729982e-008
  4.95128620e-164 1.16826277e-008 1.16826277e-008]
 [4.50829830e-027 1.12624487e-008 4.50829830e-027 1.12624487e-008
  2.58403272e-019 1.16826277e-008 1.41729982e-008 1.41729982e-008
  4.95128620e-164 4.95128620e-164 1.41729982e-008 1.41729982e-008
  4.95128620e-164 1.16826277e-008 1.16826277e-008]
 [3.64902277e-026 2.15665595e-006 3.64902277e-026 2.15665595e-006
  1.18562408e-018 2.13084055e-006 2.12652655e-006 2.12652655e-006
  8.08016464e-161 8.08016464e-161 2.12652655e-006 2.12652655e-006
  8.08016464e-161 2.13084055e-006 2.13084055e-0

### Repeating for the AND operator (=&).

In [10]:
X = np.random.randint(2, size=[15, 2])
Y = np.array(X[:,0] & X[:,1] ).T

X_Test = np.random.randint(2, size=[15, 2])
Y_Test = np.array(X[:,0] & X[:,1] ).T

learning_rate = 0.01

for epoch in range(500):
    output = sigm(X, W, B)

    W = W + (learning_rate * diff_W(X, output, Y, B, W).T)
    B = B + (learning_rate * diff_B(X, output, Y, B, W))
    
    # looking at the first output and error at each iteration
    error = (Y - output)**2
    #print("W: ", W[1, 1])
    #print("Error: ", error[1, 1])

**General comments**
The results from logical AND is very similar to logical OR.

### Repeating for the XOR operator (=^).

In [15]:
X = np.random.randint(2, size=[15, 2])
Y = np.array(X[:,0] ^ X[:,1] ).T

X_Test = np.random.randint(2, size=[15, 2])
Y_Test = np.array(X[:,0] ^ X[:,1] ).T

learning_rate = 0.1

for epoch in range(500):
    output = sigm(X, W, B)

    W += learning_rate * diff_W(X, output, Y, B, W).T
    B += learning_rate * diff_B(X, output, Y, B, W)

    # looking at the first output and error at each iteration
    error = (Y - output)**2
    
    print("Error: ", error)
    #print("Weight: ", W)
    #print("Bias: ", B)

Error:  [[1.00000000e+000 9.99355595e-001 1.00000000e+000 9.99355595e-001
  8.34044884e-019 1.48541428e-146 9.71879891e-008 1.00000000e+000
  1.00000000e+000 1.00000000e+000 9.99376597e-001 9.99376597e-001
  1.00000000e+000 1.48541428e-146 8.35301547e-008]
 [1.00000000e+000 9.99355595e-001 1.00000000e+000 9.99355595e-001
  8.34044884e-019 1.48541428e-146 9.71879891e-008 1.00000000e+000
  1.00000000e+000 1.00000000e+000 9.99376597e-001 9.99376597e-001
  1.00000000e+000 1.48541428e-146 8.35301547e-008]
 [1.00000000e+000 9.99355595e-001 1.00000000e+000 9.99355595e-001
  8.34044884e-019 1.48541428e-146 9.71879891e-008 1.00000000e+000
  1.00000000e+000 1.00000000e+000 9.99376597e-001 9.99376597e-001
  1.00000000e+000 1.48541428e-146 8.35301547e-008]
 [1.00000000e+000 9.99355595e-001 1.00000000e+000 9.99355595e-001
  8.34044884e-019 1.48541428e-146 9.71879891e-008 1.00000000e+000
  1.00000000e+000 1.00000000e+000 9.99376597e-001 9.99376597e-001
  1.00000000e+000 1.48541428e-146 8.35301547e-0

**Are we able to reproduce the XOR? Can you explain these results?**</br>
The learning rate has to be very high (10s of order of magnitude) to achieve similar results to the logical AND/ OR. </br></br>

The outputs of AND and OR gates are easily separable using a single line. A XOR gate needs two lines which can be achieved with a second layer of neuron/s.</br></br>

![Two lines for XOR](images/XOR.gif)

Source: https://www.quora.com/Why-cant-we-make-an-XOR-gate-with-1-neuron </br></br>

It is possible to make an artificial neural network for XOR, although results can be unreliable. See how: https://www.quora.com/How-do-I-code-XOR-gate-in-artificial-neural-network#:~:text=NOTE%3A%20If%20you%20are%20using,be%20quite%20tricky%20and%20unreliable. </br></br>


**What is the maximum accuracy we can get with XOR?**</br>
75%. See how in the image below, three of the four points have been correctly identified.</br>
<img src=".\images\best-neuron-XOR.png" width="400"> </br>

**If you really wanted to solve the XOR just using one neuron, can you think of any solutions that we could apply?**</br>
Have a high learning rate... ? But it will only ever have a max of 75% accuracy.