In [1]:
import numpy as np
import math

## OR GATE PERCEPTRON

In this notebook we will implement a perceptron for the OR GATE.

Remember the `OR` gate works as follows:

| X1    | X2       | Out  |
| :-----: |:--------:| :----:|
|   0   |    0     |   0  |
|   0   |    1     |   1  |
|   1   |    0     |   1  |
|   1   |    1     |   1  |

And the corresponding perceptron is:


![Image](https://marcomilanesio.github.io/material/5/perceptron.png)

### Assign random weigths and calculate output

Let's assign the following weights at random: 
  * $w_1 = 0.2$
  * $w_2 = 0.3$
  * $w_3 = 0.5$
  
Let's assume:
  * $x_1 = 0$
  * $x_2 = 1$
  
Then we can compute:

#### input for $o_1$

\begin{align}
o_1 & = w_1 * x_1 + w_2 * x_2 + w3 * b \\
    & = 0.3 + 0.5 \\
    & = 0.8
\end{align}


#### output value (using the `sigmoid` function):

\begin{align}
out & = \frac{1}{1 + e^{-X}} = \frac{1}{1 + e^{-0.8}} = 0.68997
\end{align}

#### error (MSE):

\begin{align}
MSE & = \sum_{i} \frac{1}{2} * (target - output)^2
\end{align}

In $o_1$:

\begin{align}
err & = \frac{1}{2} * (1 - 0.68997)^2 = 0.048059
\end{align}


We need to calculate this for all possible inputs, and then calculate the global MSE.

After this, we can update the weights.
So,

## Gradient Descent

\begin{align}
X & = X - lr * \frac{\partial}{\partial X} f(X)
\end{align}

Where:
  * $X$ is the input
  * $lr$ is the learning rate
  * $f(X)$ is the output
  
### Derivation

#### N.1: how a particular weight $w$ influence the error $err$?

\begin{align}
\frac{\partial err}{\partial w}
\end{align}


Let's apply [Chain Rule](https://en.wikipedia.org/wiki/Chain_rule).

\begin{align}
\frac{\partial err}{\partial w} = \frac{\partial err}{\partial out} * \frac{\partial out}{\partial in} * \frac{\partial in}{\partial w}
\end{align}

Where (I'll skip the derivation, if interested ask me)

  * $\frac{\partial err}{\partial out} = (output - target)$
  * $\frac{\partial out}{\partial in} = output * (1 - output)$
  * $\frac{\partial in}{\partial w} = input$

And remember:
  * $input = w_1 * x_1 + w_2 * x_2 + w_3 * b$
  * $output = \frac{1}{1 + e^{-input}}$
  * $MSE = \sum \frac{1}{2} (target - output)^2$
  * Gradient Descent $w = w - lr * \frac{\partial err}{\partial w}$
  
  
## LET'S DO IT!

In [2]:
input_features = np.array([[0,0], [0,1], [1,0], [1,1]])
target_output = np.array([[0,1,1,1]])
target_output = target_output.reshape(4,1)

print(input_features.shape)
print(target_output)

(4, 2)
[[0]
 [1]
 [1]
 [1]]


In [3]:
def sigmoid(x):
    return 1/(1+np.exp(-x))

def sigmoid_derivative(x):
    return sigmoid(x)*(1-sigmoid(x))

In [19]:
weights = np.random.rand(2,1) #np.array([[0.1], [0.2]])
print(weights)
bias = 0.3
learning_rate=0.05

[[0.34016512]
 [0.60143311]]


In [20]:
for epoch in range(10000):
    # input
    inputs = input_features
    # feed-forward input
    in_o = np.dot(inputs, weights) + bias #shape(4,2) dot shape(2,1) -> np.dot is of shape(4,1)
    # feed-forward output
    out_o = sigmoid(in_o)
    
    # error
    error = out_o - target_output #result in [0,1] range
    x = error.sum()
    if epoch % 1000 == 0: print(f"epoch {epoch}, error: {x}")
    
    # back-propagation
    derr_dout = error
    dout_din = sigmoid_derivative(out_o)
    deriv = derr_dout * dout_din
    inputs = input_features.T
    deriv_final = np.dot(inputs, deriv) # shape(4,1)
    
    # weight updating
    weights -= learning_rate * deriv_final
    
    for i in deriv:
        bias -= learning_rate * i

epoch 0, error: -0.2836806920826571
epoch 1000, error: 0.03367278503551974
epoch 2000, error: 0.009542414059545673
epoch 3000, error: 0.0038118061304079076
epoch 4000, error: 0.0017639619281635543
epoch 5000, error: 0.0008557228052071542
epoch 6000, error: 0.0003967629348268359
epoch 7000, error: 0.000144268347174778
epoch 8000, error: -2.783176885359784e-06
epoch 9000, error: -9.169477688272115e-05


In [21]:
weights

array([[7.02260341],
       [7.02352136]])

In [22]:
bias

array([-3.15601742])

## PREDICTIONS for OR GATE

In [23]:
# target = 1

point = np.array([1,0])
step1 = np.dot(point, weights) + bias
step2 = sigmoid(step1)
print(step2)

[0.97949937]


In [24]:
# target = 1

point = np.array([1,1])
step1 = np.dot(point, weights) + bias
step2 = sigmoid(step1)
print(step2)

[0.99998136]


In [25]:
# target = 1

point = np.array([0,0])
step1 = np.dot(point, weights) + bias
step2 = sigmoid(step1)
print(step2)

[0.04085483]
