## XOR


| X1    | X2       | Out  |
| :-----: |:--------:| :----:|
|   0   |    0     |   0  |
|   0   |    1     |   1  |
|   1   |    0     |   1  |
|   1   |    1     |   0  |



In [1]:
import numpy as np

In [2]:
input_features = np.array([[0,0], [0,1], [1,0], [1,1]])
target_output = np.array([[0,1,1,0]])
target_output = target_output.reshape(4,1)

## Perceptron

In [3]:
weights = np.array([[0.1], [0.2]])
bias = 0.3
lr = 0.01

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):  # we need the derivative in GD
    return sigmoid(x) * (1 - sigmoid(x))

for epoch in range(50000):
    
    inputs = input_features                  
    in_o = np.dot(inputs, weights) + bias    #feed-forward input
    out_o = sigmoid(in_o)           # feed-forward output        
    error = out_o - target_output   # back-propogation 
    
    x = error.sum()
    
    if epoch % 1000 == 0:
        print(f'epoch {epoch}. Error {x}')

    derr_dout = error                     # 1st deriv
    dout_din = sigmoid_derivative(out_o)  # 2nd deriv

    deriv = derr_dout * dout_din 
    
    inputs = input_features.T           # 3rd deriv
    
    deriv_final = np.dot(inputs,deriv)  # that's the one we were looking for

    weights -= lr * deriv_final    # update weights 
    for i in deriv:
        bias -= lr * i             # update bias

epoch 0. Error 0.441245814351761
epoch 1000. Error 0.012968341182077792
epoch 2000. Error 0.0014554728665179817
epoch 3000. Error 0.0008567365564094986
epoch 4000. Error 0.0006128457699500767
epoch 5000. Error 0.00044187489655800327
epoch 6000. Error 0.00031877054481288525
epoch 7000. Error 0.00022999445989568823
epoch 8000. Error 0.00016595310319361678
epoch 9000. Error 0.0001197483532638377
epoch 10000. Error 8.640994101538624e-05
epoch 11000. Error 6.235401942189522e-05
epoch 12000. Error 4.4995566408134735e-05
epoch 13000. Error 3.2469692264003314e-05
epoch 14000. Error 2.34308954326079e-05
epoch 15000. Error 1.690834970302646e-05
epoch 16000. Error 1.2201542122503017e-05
epoch 17000. Error 8.804994031219593e-06
epoch 18000. Error 6.353953234783383e-06
epoch 19000. Error 4.58521177743032e-06
epoch 20000. Error 3.308834914172998e-06
epoch 21000. Error 2.387761952049594e-06
epoch 22000. Error 1.723086658134143e-06
epoch 23000. Error 1.2434356946866565e-06
epoch 24000. Error 8.9730404

Good. Error is decreasing.

**Good??** Let's make predictions.

In [14]:
for pair in [[0,0], [0,1], [1,0], [1,1]]:
    point = np.array([1,0])
    res1 = np.dot(point, weights) + bias  # step1
    res2 = sigmoid(res1)     # step2
    print(pair, '-->', res2)

[0, 0] --> [0.5]
[0, 1] --> [0.5]
[1, 0] --> [0.5]
[1, 1] --> [0.5]


All results are the same...

**XOR is not linearly separable. Perceptron won't work.** 

## Why?

If you look at the truth table of the operator you find the following situation:

<img src="xor.png" width="300"/>

There is no linear function that can correctly separate the two classes.

**Solution**: add a hidden layer with 2 neurons. One for each "line" it would be nice to draw.

<img src="xornn.png" width="300"/>

That is:
  * 2 neurons in the input layer
  * 2 neurons in the hidden layer
  * 1 neuron in the output layer
  * 4 weights for the hidden layer
  * 2 weights for the output layer
  

## Let's add some neurons... by hand
Refer to the perceptron notebook.

In [8]:
import numpy as np

def sigmoid (x):
    return 1/(1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)


inputs = np.array([[0,0],[0,1],[1,0],[1,1]])
expected_output = np.array([[0],[1],[1],[0]])

num_neurons_input_layer = 2 
num_neurons_hidden_layer = 2 
num_neurons_output_layer = 1

# instead of trying some values, let's sample from a uniform distribution  
hidden_weights = np.random.uniform(size=
                                   (num_neurons_input_layer, 
                                    num_neurons_hidden_layer))   # 2x2
hidden_bias = np.random.uniform(size=(1, 
                                      num_neurons_hidden_layer))                            # 1x2
output_weights = np.random.uniform(size=(num_neurons_hidden_layer, 
                                         num_neurons_output_layer))  # 2x2
output_bias = np.random.uniform(size=(1, 
                                      num_neurons_output_layer))                            # 1x2

epochs = 50000
lr = 0.1


for _ in range(epochs):
    # Feed forward: Input to Hidden
    hidden_layer_activation = np.dot(inputs,hidden_weights)
    hidden_layer_activation += hidden_bias
    hidden_layer_output = sigmoid(hidden_layer_activation)
    
    # Feed forward: Hidden to output
    output_layer_activation = np.dot(hidden_layer_output,
                                     output_weights)
    output_layer_activation += output_bias
    predicted_output = sigmoid(output_layer_activation)

    # back-propagation
    error = expected_output - predicted_output
    d_predicted_output = error * sigmoid_derivative(predicted_output)
    
    error_hidden_layer = d_predicted_output.dot(output_weights.T)
    d_hidden_layer = error_hidden_layer * sigmoid_derivative(
        hidden_layer_output)

    #updates
    output_weights += hidden_layer_output.T.dot(d_predicted_output) * lr
    output_bias += np.sum(d_predicted_output,axis=0,keepdims=True) * lr
    hidden_weights += inputs.T.dot(d_hidden_layer) * lr
    hidden_bias += np.sum(d_hidden_layer,axis=0,keepdims=True) * lr
    

print(f"Output: {predicted_output}")
print(f"Loss: {error}")

Output: [[0.01930689]
 [0.98336113]
 [0.98338449]
 [0.01721923]]
Loss: [[-0.01930689]
 [ 0.01663887]
 [ 0.01661551]
 [-0.01721923]]


Pretty good. `[0.01, 0.98, 0.97, 0.01]` instead of `[0, 1, 1, 0]`.