In [1]:
import numpy as np

First we must have the data. A sample with features, and the true value of our labels.
 * $x^{(m)}$: One sample of our data
 * $y_{t}^{(m)}$: true value of our labels
 
Our input will be a column vector (e.g. with four features here):
$$\begin{bmatrix}
    x_{1} \\
    x_{2} \\
    x_{3} \\
    x_{4}
  \end{bmatrix}$$

Then we need to define our hyper-parameters: Epoch and learning rate ($\eta$).

We initialize our weights ($W_{h}, W_{i}$) and biases ($b_{i}, b_{h}$). Weights refer to connections between neurons (perceptrons?) and biases refer to parameters in the neurons.

We define the cost function (for example, mean squared error) that gives us our error E.

Then...
`for i in range(epoch):`

We feedforward by passing our data samples into the NN

Then we feedbackward by updating our weights and biases so that our error is minimized.

### Feedforward:
For a one layer neural network where $h$ is our layer and $y$ is our output:

$$h^{(m)} = \sigma\left(W_{i}x^{(m)} + b_{i}\right)$$
$$y_{p}^{(m)} = \sigma\left(W_{h}h^{(m)} + b_{h}\right)$$

where $\sigma(x)$ is a logistic function.

### Cost function
A function that defines our error, e.g. here is mean squared error
$$E = \sum_{m=1}^{N}(y_{p}^{m} - y_{t}^{m})^{2}$$

### Feedbackward:
Minimize error by adjusting weights and biases


How do we update the weights to minimize the error?

First we should define the cost function. for our example here the MSE is our cost function:

$E= \frac{1}{2} ({\bf y}_t - {\bf y}_p)^T ({\bf y}_t - {\bf y}_p)$

We update the weight (${\bf W}_i$ and ${\bf W}_h$) such that the error, $E$, being minimized. The most popular algorithm is Gradient Descent:

${\bf W}_h = {\bf W}_h + \eta {\partial E}/{\partial {\bf W}_h} $

For our above example we can show that:

${\partial E}/{\partial {\bf W}_h} = ({\bf y}_t - {\bf y}_p) {\bf y}_p (1 - {\bf y}_p)\bf {h}$

where ${\bf h} = \sigma({\bf W}_i {\bf x}_i + {\bf b}_i)$

In above code:

$D = {\bf y}_t - {\bf y}_p$

${\bf y}_p (1 - {\bf y}_p)$ = slope_hidden_layer

$\bf {h}$ = hiddenlayer_activations

In [11]:
# check this out:
# https://www.analyticsvidhya.com/blog/2017/05/neural-network-from-scratch-in-python-and-r/
# Input array
X=np.array([[1,0,1,0],[1,0,1,1],[0,1,0,1]])

#Output
y=np.array([[1],[1],[0]])


#Sigmoid Function
def sigmoid (x):
    return 1/(1 + np.exp(-x))


#Derivative of Sigmoid Function
def derivatives_sigmoid(x):
    return x * (1 - x)



#Variable initialization
epoch=5000 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = X.shape[1] #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer

#weight and bias initialization
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))


for i in range(epoch):
    #Forward Propogation
    hidden_layer_input1=np.dot(X,wh)
    hidden_layer_input=hidden_layer_input1 + bh
    hiddenlayer_activations = sigmoid(hidden_layer_input)
    output_layer_input1=np.dot(hiddenlayer_activations,wout)
    output_layer_input= output_layer_input1+ bout
    output = sigmoid(output_layer_input)

    #Backpropagation
    D = y-output
    slope_output_layer = derivatives_sigmoid(output)
#     slope_hidden_layer = derivatives_sigmoid(hiddenlayer_activations)
    d_output = D * slope_output_layer
#     Error_at_hidden_layer = d_output.dot(wout.T)
#     d_hiddenlayer = Error_at_hidden_layer * slope_hidden_layer
    wout += hiddenlayer_activations.T.dot(d_output) *lr
#     bout += np.sum(d_output, axis=0,keepdims=True) *lr
#     wh += X.T.dot(d_hiddenlayer) *lr
#     bh += np.sum(d_hiddenlayer, axis=0,keepdims=True) *lr

print(output)

[[0.68801785]
 [0.69347942]
 [0.62858153]]
