# Assign input, output, weight, learning rate

In [None]:
import numpy as np
import pandas as pd
data = {'Fever':[0,0,1,1],'Shortness_breath':[0,1,0,1], 'Need_exam':[0,1,1,1]}
covid = pd.DataFrame(data)
covid

In [None]:
# Simplify our example
data = {'input1':[0,0,1,1],'input2':[0,1,0,1], 'target':[0,1,1,1]}
df = pd.DataFrame(data)
df

$
\begin{vmatrix}
I_1 & I_2 \\
I_1 & I_2 \\
I_1 & I_2 \\
I_1 & I_2 \\
\end{vmatrix} 
\times
\begin{vmatrix}
W_1  \\
W_2  \\
\end{vmatrix} -> 
\begin{vmatrix}
I_w  \\
I_w  \\
I_w  \\
I_w  \\
\end{vmatrix} ->
Activation
 = 
\begin{vmatrix}
O  \\
O  \\
O  \\
O  \\
\end{vmatrix}
$

# Activation function: Sigmoid

$$f(x) = Sigmoid(x) = \frac{1}{1 + e^{(-x)}}$$
* f(x):sigmoid output value should between 0 and 1

* e:known as Euler's number, is a mathematical constant approximately equal to 2.71828.

* x: input values
* In the binary classification both sigmoid and softmax function are the same where as in the multi-class classification we use Softmax function.

In [None]:
# Define sigmoid
def sigmoid(x):
    return 1/(1+np.exp(-x))

import numpy as np
import matplotlib.pyplot as plt

example = np.linspace(-10,10,100) # generate 100 points between -10 and 10
#print('example:',example)
output = sigmoid(example)
plt.plot(example,output)
plt.xlabel("Input")
plt.ylabel("Output")

## Derivative of Sigmoid function

$$
f'(x) = \frac{d}{dx}{Sigmoid(x)} = Sigmoid(x)*(1-Sigmoid(x))
$$

In [None]:
# The derivative measures the steepness of the graph of a function at some particular point on the graph
# The derivative is a slope.
# Define derivative of sigmoid function
def sigmoid_der(x):
    return sigmoid(x)*(1-sigmoid(x))
der_output = sigmoid_der(example)
plt.plot(example,output)
plt.plot(example,der_output)

## First round calculation

In [None]:
df

### Feedforward input

Initialize Weights -> Summation -> Activation -> Error
* Randomly assign weight for x1 x2 and bias: w1 = 0.2, w2 = 0.3, w3 = 0.5
* From dataset, we choose row 1: x1 = 0, x2 = 1, target = 1

Weighted perceptron o1 from the row 1: \
$
in_{o1} = input_1*w_1 + input_2*w_2 + bias \\
= x_1 \times w_1 + x_2 \times w_2 + b \times w_3\\
= 0 \times 0.2 + 1 \times 0.3 + 1 \times 0.5 \\
= 0 + 0.3 + 0.5 \\
= 0.8
$

### Feedforward output

$in_{o1} -> sigmoid \ standardize -> out_{o1} \\
out_{o1} = sigmoid(in_{o1}) = \frac{1}{1 + e^{-0.8}} = 0.68997
$

### Error Calculation
$
\begin{align}
Error = MSE = \frac{1}{n} \sum_ {i=1}^n (\hat Y_i - Y_i)^2  = \frac{1}{2} \sum_ {}(target_1 - output_1)^2 \\
= 0.5 * (1 - 0.68997)^2 = 0.048059
\end{align}
$

You need to calculate for all input pairs, here is an example for only one input

### Backpropagation
Update Weights <- Summation <- Activation <- Error

$$X = X - lr * \frac{d}{dX}{f(X)}$$

$
W_{new1} = W_{old1} - lr * \frac{dError}{dW_1} \\
W_{new2} = W_{old2} - lr * \frac{dError}{dW_2}
$
* X:input
* lr:learning rate
* f(X): output based on X
* assume lr = 0.01

## Derivation of the formula used in a neural network

$$\frac{\partial Error}{\partial w} = \frac{\partial Error}{\partial out_o} \times \frac{\partial out_o}{\partial in_o} \times  \frac{\partial in_o}{\partial w}$$

### Calculate the 1st part $\frac{\partial Error}{\partial out_o} = output-target = -0.31003$

$
\frac{\partial Error}{\partial out_o} = \frac{\partial}{\partial out_o}({\frac{1}{2}*{(target - output)^2})}
$

$
\frac{\partial Error}{\partial out_o} = (\frac{1}{2}*2*{(target - output))}* \frac{\partial}{\partial out_o}{(target - output)}
$

$
\frac{\partial Error}{\partial out_o} = (target - output) * (-1)
$

$
\frac{\partial Error}{\partial out_o} = output - target
$

$ 
\because out_{o1} = 0.68997, target = 1
$

$\therefore \frac{\partial Error}{\partial out_o} = (0.68997 - 1) = -0.31003$

### Calculate the 2nd part $\frac{\partial out_o}{\partial in_o} = out_o \times (1 - out_o) = 0.21391$

$
\because out_{o1} = sigmoid(in_{o1}) = \frac{1}{1 + e^{-in_{o1}}}
$

$
\frac{\partial out_{o1}}{\partial ino_1} = \frac{\partial}{\partial in_{o1}}{(\frac{1}{1 + e^{-in_{o1}}})}
$

$
= \frac{\partial}{\partial in_{o1}}{(1 + e^{-in_{o1}})}^{-1}
$  -----> simplify

$
= -1(1 + e^{-in_{o1}})^{-2} \times \frac{\partial}{\partial in_{o1}}(1 + e^{-in_{o1}})
$ ------> chain rule + power rule

$
= -1(1 + e^{-in_{o1}})^{-2} \times (\frac{\partial}{\partial in_{o1}}(1) + \frac{\partial}{\partial in_{o1}}(e^{-in_{o1}}))
$ ------> sum rule

$
= -1(1 + e^{-in_{o1}})^{-2} \times (0 + \frac{\partial}{\partial in_{o1}}(e^{-in_{o1}}))
$ ------> simplify

$
= -1(1 + e^{-in_{o1}})^{-2} \times (e^{-in_{o1}} \times \frac{\partial}{\partial in_{o1}}(-in_{o1}))
$ ------> exponential rule

$
= -1(1 + e^{-in_{o1}})^{-2} \times (e^{-in_{o1}} \times (-1)))
$ ------> simplify

$
= (1 + e^{-in_{o1}})^{-2} \times (e^{-in_{o1}})
$ ------> simplify

$
= \frac{e^{-in_{o1}}}{(1 + e^{-in_{o1}})^2}
$ ------> simplify

$
= \frac{1 \times (e^{-in_{o1}})}{(1 + e^{-in_{o1}}) \times(1 + e^{-in_{o1}})}
$ ------> tricks

$
= \frac{1}{1 + e^{-in_{o1}}} \times \frac{e^{-in_{o1}}}{1 + e^{-in_{o1}}}
$ ------> separate

$
= \frac{1}{1 + e^{-in_{o1}}} \times \frac{e^{-in_{o1}} + 1 - 1}{1 + e^{-in_{o1}}}
$ ------> tricks

$
= \frac{1}{1 + e^{-in_{o1}}} \times [\frac{e^{-in_{o1}} + 1}{1 + e^{-in_{o1}}} - \frac{1}{1 + e^{-in_{o1}}}]
$ ------> separate

$
= \frac{1}{1 + e^{-in_{o1}}} \times [1 - \frac{1}{1 + e^{-in_{o1}}}]
$ ------> simplify

$
\because out_{o1} = \frac{1}{1 + e^{-in_{o1}}} = 0.68997
$ 

$
\therefore \frac{\partial out_{o1}}{\partial ino_1} = out_{o1} \times (1 - out_{o1}) = 0.68997 \times (1 - 0.68997) = 0.21391
$

### Calculate the 3rd part $\frac{\partial in_o}{\partial w} = input \ values = 1$

All the other values except w2 will be considered constant here.
$
\frac{\partial in_{o1}}{\partial w} = w_1 \times x_1 + w_2 \times x_2 = x_2 = 1
$ 

### Put 3 parts together

$\frac{\partial Error}{\partial w} = \frac{\partial Error}{\partial out_o} \times \frac{\partial out_o}{\partial in_o} \times  \frac{\partial in_o}{\partial w} = -0.31003 \times 0.21391 \times 1 = -0.06631$

### New calculate new weights to join the next epoch
$
\because w_{new1} = w_{old1} - lr \times \frac{\partial Error}{\partial w_1} \\
= 0.2 - (0.05) \times (-0.06631) \\
= 0.2033155
$

In [None]:
df

In [None]:
#input_features = np.array([[0,0],[0,1],[1,0],[1,1]])
input_features = df[['input1','input2']].to_numpy()
print(input_features.shape)
input_features

In [None]:
#target_output = np.array([[0,1,1,1]])
target_output = df[['target']].to_numpy()
target_output = target_output.reshape(4,1)
print(target_output.shape)
target_output

In [None]:
weights = np.array([[0.1],[0.2]])
print(weights.shape)
weights

In [None]:
bias = 0.3
lr = 0.05

$\frac{\partial Error}{\partial w} = \frac{\partial Error}{\partial out_o} \times \frac{\partial out_o}{\partial in_o} \times  \frac{\partial in_o}{\partial w}$

1st part $\frac{\partial Error}{\partial out_o} = output-target$ 

2nd part $\frac{\partial out_o}{\partial in_o} = out_o \times (1 - out_o)$

3rd part $\frac{\partial in_o}{\partial w} = input \ values$

In [None]:
weights = np.array([[0.2],[0.3]])
bias = 0.3
lr = 0.05

for epoch in range(1):
    print('weights:',weights)
    
    inputs = input_features
    print('inputs:',inputs)
    
    in_o = np.dot(inputs, weights) + bias
    print('bias:',bias)
    print('in_o:',in_o)
    
    out_o = sigmoid(in_o)
    print('out_o:',out_o)
    
    error = out_o - target_output
    print('target_output:',target_output)
    print('error:',error)

    dError_dOut = error #1st part
    print('first part:',dError_dOut)
    
    dOut_dIn = sigmoid_der(out_o) #2nd part
    print('second part:',dOut_dIn)
    
    # We need to update bias. Bias weight is not dependent on the input.
    # We have to update it separately.
    # We need deriv values to update bias value.
    deriv = dError_dOut * dOut_dIn
    print('deriv:',deriv)
    
    inputs = input_features.T #3rd part
    print('third part:',inputs)
    deriv_final = np.dot(inputs, deriv) 
    print('All 3 parts:',deriv_final)
    
    weights -= lr * deriv_final # Get new weight
    print('New Weight:',weights)
    
    # Through the "for loop" for updating bias at each input on every iteration
    for i in deriv:
        bias -= lr * i
    print('New Bias:',bias)
    print('------')

In [None]:
# Clear Version
weights = np.array([[0.2],[0.3]])
bias = 0.3
lr = 0.05

for epoch in range(10000):
    inputs = input_features
    in_o = np.dot(inputs, weights) + bias
    out_o = sigmoid(in_o)
    error = out_o - target_output
    
    x = error.sum()
    if epoch%1000 == 0:
        print('Epoch:',epoch,'Error Sum:',x)
    
    dError_dOut = error #1st part
    dOut_dIn = sigmoid_der(out_o) #2nd part
    deriv = dError_dOut * dOut_dIn
    inputs = input_features.T #3rd part
    deriv_final = np.dot(inputs, deriv) 
    weights -= lr * deriv_final 
    
    for i in deriv:
        bias -= lr * i

In [None]:
print('Weights:',weights)
print('Bias:',bias)

In [None]:
# Prediction with the new model for [1,1], groud truth result = 1
single_point = np.array([1,0])
result1 = np.dot(single_point, weights) + bias
result2 = sigmoid(result1)
result2

In [None]:
# Prediction with the new model for [0,0], groud truth result = 0
single_point = np.array([0,0])
result1 = np.dot(single_point, weights) + bias
result2 = sigmoid(result1)
result2

# Why do we need bias?

In [None]:
# Change steepness
input = np.linspace(-10,10,100)
output = sigmoid(input)
plt.plot(input,output,c="blue",label='output')
output = sigmoid(input*0.5)
plt.plot(input,output,c="red",label='output*0.5')
output = sigmoid(input*2.5)
plt.plot(input,output,c="green",label='output*2.5')
plt.legend()

In [None]:
# Change location
input = np.linspace(-10,10,100)
output = sigmoid(input)
plt.plot(input,output,c="green",label='output')
output = sigmoid(input+5)
plt.plot(input,output,c="blue",label = 'output+5')
output = sigmoid(input-5)
plt.plot(input,output,c="red",label = 'output-5')
plt.legend()

In [None]:
# Clear Version
weights = np.array([[0.2],[0.3]])
#bias = 0.3
lr = 0.05

for epoch in range(10001):
    inputs = input_features
    in_o = np.dot(inputs, weights) #+ bias
    out_o = sigmoid(in_o)
    error = out_o - target_output
    
    x = error.sum()
    if epoch%1000 == 0:
        print('Epoch:',epoch,'Error Sum:',x)
    
    dError_dOut = error #1st part
    dOut_dIn = sigmoid_der(out_o) #2nd part
    inputs = input_features.T #3rd part
    deriv_final = np.dot(inputs, dError_dOut * dOut_dIn) 
    
    weights -= lr * deriv_final 
    
    #for i in deriv:
    #    bias -= lr * i

In [None]:
print('Weights:',weights)

In [None]:
# Prediction with the new model for [1,1], groud truth result = 1
single_point = np.array([1,0])
result1 = np.dot(single_point, weights)
result2 = sigmoid(result1)
result2

In [None]:
# Prediction with the new model for [0,0], groud truth result = 0
single_point = np.array([0,0])
result1 = np.dot(single_point, weights)
result2 = sigmoid(result1)
result2