# Lab 1

### Objectives

1. Understand and code a simple neuron
2. Understand how a neuron learns
3. Understand its limitations

In [1]:
import numpy as np

Initial weights and bias.

In [19]:
W = np.random.randn(1, 2)
B = np.random.randn(1)

print("Weights: ", W)
print("Bias: ", B)

Weights:  [[0.56716598 1.29896192]]
Bias:  [3.06609247]


This is our activation function. You can choose the function (e.g. could be a step function, tanh etc). Source: https://towardsdatascience.com/statistics-is-freaking-hard-wtf-is-activation-function-df8342cdf292

In [6]:
def sigm(X, W, B):
    M = 1/(1+np.exp(-(X.dot(W.T)+B)))
    return M

### Derivation of functions

Derive an analytical expressionn for the error function. In this lab, the squared error function is used as the error function.

This is used to make use of gradient descent in order to update the weights.

### Putting this into code
Uptating rules for weights and bias.

In [7]:
def diff_W(X, Z, Y, B, W):

    dS = sigm(X, W, B)*(1-sigm(X, W, B)) # differentiating sigm function
    dW = (Y-Z)*dS

    return X.T.dot(dW) # dot product between X transpose and dW

def diff_B(X, Z, Y, B, W):

    dS = sigm(X, W, B)*(1-sigm(X, W, B))
    dB = (Y-Z)*dS

    return dB.sum(axis=0)


Creating 15 two-dimensional samples and their outputs for training and testing.

Note that the vertical line, |, is a Bitwise OR operator.

In [8]:
X = np.random.randint(2, size=[15, 2])
Y = np.array(X[:,0] | X[:,1] ).T

X_Test = np.random.randint(2, size=[15, 2])
Y_Test = np.array(X[:,0] | X[:,1] ).T

### Teaching the neuron to emulate the OR function.

NB: There was an error when += was used. While the arrays are broadcastable, numpy does not like trying to assign the larger array to the shape of the smaller array.

Broadcasting = When a smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations.

Sources:
- https://numpy.org/doc/stable/user/basics.broadcasting.html
- https://stackoverflow.com/questions/47493559/valueerror-non-broadcastable-output-operand-with-shape-3-1-doesnt-match-the 

In [49]:
learning_rate = 100

for epoch in range(500):
    output = sigm(X, W, B) # size = (15, 1)
    # diff_W size = (2, 15) ( .T = (15, 2) )
    # diff_B size = (15,)
    # W size = (1, 2)

    W = W + (learning_rate * diff_W(X, output, Y, B, W).T)
    B = B + learning_rate * diff_B(X, output, Y, B, W)

    # looking at the first output and error at each iteration
    error = (Y - output)**2
    print("W: ", W[1, 1])
    print("Error: ", error[1, 1])


W:  -0.9414690092697594
Error:  8.815796212976728e-06
W:  -0.9436400403195915
Error:  8.373268528001199e-06
W:  -0.945702405645021
Error:  7.972706469836933e-06
W:  -0.9476661917585139
Error:  7.608409565869172e-06
W:  -0.9495401520863133
Error:  7.275666533056534e-06
W:  -0.9513319314942865
Error:  6.9705492956491044e-06
W:  -0.9530482454791976
Error:  6.6897564926007895e-06
W:  -0.9546950245841725
Error:  6.43049311787027e-06
W:  -0.9562775318342877
Error:  6.190376908660781e-06
W:  -0.9578004590221837
Error:  5.967364785976163e-06
W:  -0.9592680062547229
Error:  5.759694503318873e-06
W:  -0.960683948134356
Error:  5.565837953535327e-06
W:  -0.9620516891813334
Error:  5.3844635012265884e-06
W:  -0.9633743105286872
Error:  5.214405366945102e-06
W:  -0.9646546094879074
Error:  5.054638568195488e-06
W:  -0.9658951332520481
Error:  4.904258274129311e-06
W:  -0.9670982077480113
Error:  4.762462692113989e-06
W:  -0.9682659624517898
Error:  4.628538800271599e-06
W:  -0.9694003518255536
Erro

**How W and the error change with each iteration** </br>
The closer epoch gets to 500, the smaller the error is.

**Increasing the number of iterations** </br>
The number of iterations was set to 1000, then 2000, then 40000 and finally 8000. In each case, the final error value got smaller, however, it never reached zero. When epoch = 8000, the final error value was 0.00011786992787626743. 

**Changing the learning rate:**</br>
The learning rate was change and the final error value recored in the table below.</br>

| Learning rate | Final error value      |
|---------------|------------------------|
| 0.01          | 7.896464798111002e-05  |
| 0.04          | 8.065431814065625e-05  |
| 0.08          | 6.765797698067773e-05  |
| 0.12          | 5.5594127298506185e-05 |
| 0.16          | 4.4823914746097494e-05 |
| 0.20          | 3.600934133541541e-05  |

For fun, the learning rate was set to 10 then 100 to see the errors decrease. Whilst they did get smaller, they never reached zero.

**Does the error go to zero? Why not? What could we do to make it exactly zero?**
Whilst increasing the learning rate reduces the error, it will never reach zero. To reach zero we could change the error function to be something other than the squared error function...? (Check this)

### Repeating for the AND operator (=&).

In [50]:
X = np.random.randint(2, size=[15, 2])
Y = np.array(X[:,0] & X[:,1] ).T

X_Test = np.random.randint(2, size=[15, 2])
Y_Test = np.array(X[:,0] & X[:,1] ).T

learning_rate = 0.01

for epoch in range(500):
    output = sigm(X, W, B)

    W = W + (learning_rate * diff_W(X, output, Y, B, W).T)
    B = B + (learning_rate * diff_B(X, output, Y, B, W))
    
    # looking at the first output and error at each iteration
    error = (Y - output)**2
    print("W: ", W[1, 1])
    print("Error: ", error[1, 1])

W:  -1.07083025382841
Error:  1.2226645304769793e-06
W:  -1.070830260990424
Error:  1.222664349164097e-06
W:  -1.0708302681524369
Error:  1.22266416785127e-06
W:  -1.0708302753144485
Error:  1.2226639865384978e-06
W:  -1.070830282476459
Error:  1.2226638052257786e-06
W:  -1.0708302896384683
Error:  1.2226636239131145e-06
W:  -1.0708302968004764
Error:  1.2226634426005056e-06
W:  -1.0708303039624834
Error:  1.2226632612879493e-06
W:  -1.070830311124489
Error:  1.2226630799754478e-06
W:  -1.0708303182864936
Error:  1.222662898663002e-06
W:  -1.070830325448497
Error:  1.2226627173506089e-06
W:  -1.070830332610499
Error:  1.222662536038271e-06
W:  -1.0708303397725
Error:  1.2226623547259858e-06
W:  -1.0708303469345
Error:  1.222662173413756e-06
W:  -1.0708303540964985
Error:  1.2226619921015806e-06
W:  -1.070830361258496
Error:  1.2226618107894585e-06
W:  -1.0708303684204923
Error:  1.2226616294773916e-06
W:  -1.0708303755824873
Error:  1.2226614481653799e-06
W:  -1.0708303827444812
Error:

**General comments**
The results from logical AND is very similar to logical OR.

### Repeating for the XOR operator (=^).

In [52]:
X = np.random.randint(2, size=[15, 2])
Y = np.array(X[:,0] ^ X[:,1] ).T

X_Test = np.random.randint(2, size=[15, 2])
Y_Test = np.array(X[:,0] ^ X[:,1] ).T

learning_rate = 100

for epoch in range(500):
    output = sigm(X, W, B)

    W += learning_rate * diff_W(X, output, Y, B, W).T
    B += learning_rate * diff_B(X, output, Y, B, W)

    # looking at the first output and error at each iteration
    error = (Y - output)**2
    print("W: ", W[1, 1])
    print("Error: ", error[1, 1])

W:  -0.7603660560825478
Error:  0.9988339625879754
W:  0.3695693098850392
Error:  0.9960012154059942
W:  0.4206001188162342
Error:  2.999832645151828e-05
W:  0.4599208052313669
Error:  2.4715993549957858e-05
W:  0.49190639072261755
Error:  2.115459789752368e-05
W:  0.5188485377754998
Error:  1.8562388852235977e-05
W:  0.5421047511689993
Error:  1.6577971596277636e-05
W:  0.5625477934911783
Error:  1.5003118719371314e-05
W:  0.5807728189536133
Error:  1.37189410710485e-05
W:  0.5972041798281509
Error:  1.2649338281723243e-05
W:  0.6121552274385326
Error:  1.1743113160339666e-05
W:  0.6258639842842004
Error:  1.0964437882687336e-05
W:  0.6385155229490013
Error:  1.0287423768971384e-05
W:  0.6502565952580187
Error:  9.692859759555282e-06
W:  0.6612055289044692
Error:  9.16616623518672e-06
W:  0.6714591197235589
Error:  8.696063639652861e-06
W:  0.6810975527717229
Error:  8.273678925305888e-06
W:  0.6901879928311728
Error:  7.891929613901165e-06
W:  0.6987872542836022
Error:  7.54508923077

**Are we able to reproduce the XOR? Can you explain these results?**</br>
The learning rate has to be very high (10s of order of magnitude) to achieve similar results to the logical AND/ OR. </br></br>

The outputs of AND and OR gates are easily separable using a single line. A XOR gate needs two lines which can be achieved with a second layer of neuron/s.</br></br>

![Two lines for XOR](XOR.gif)

Source: https://www.quora.com/Why-cant-we-make-an-XOR-gate-with-1-neuron </br></br>

It is possible to make an artificial neural network for XOR, although results can be unreliable. See how: https://www.quora.com/How-do-I-code-XOR-gate-in-artificial-neural-network#:~:text=NOTE%3A%20If%20you%20are%20using,be%20quite%20tricky%20and%20unreliable. </br></br>


**If you really wanted to solve the XOR just using one neuron, can you think of any solutions that we could apply?**</br>
Have a high learning rate... ?(check this)