### Exercise for Back Propagation

Following example from [this blog](https://brilliant.org/wiki/backpropagation/)

Main steps for the algorithm:
1. Calculate the _forward phase_ from the input layer to the final output layer-m, and store the results
    - final prediction $\hat y_{d}$
    - activation input (node-j at layer-k) $a_j^k$
    - output (node-j at layer-k) $o_j^k$
2. Caculate the _backward phase_ from the final output layer-m to the input layer, and store the results $\frac{\partial E_d}{\partial w_{ij}^k}$ ($w_{ij}^k$ is the weight connecting node-i in layer-(k-1) to node-j in layer-k)
    - 2.1 - calcualte error term for the final layer-m: $\delta_1^m = g_o'(a_1^m)(\hat y_d - y_d)$ 
    - 2.2 - balckpropagate the error term for the hidden layers: $\delta_j^k = g'(a_j^k)\sum_{l=1}^{r^{k+1}}{w_{jl}^{k+1}\delta_l^{k+1}}$
    - 2.3 - calculate the partial derivatives wrt each weight: $w_{ij}^k = \delta_j^ko_i^{k-1}$
3. Combine the individual gradients for each input-output pair: $\frac{\partial E(X, \theta)}{\partial w_{ij}^k} = \frac{1}{N}\sum_{d=1}^N\frac{\partial E_d}{\partial w_{ij}^k}$
4. Update the weights: $\Delta w_{ij}^k = -\alpha \frac{\partial E(X, \theta)}{\partial w_{ij}^k}$

In [1]:
# set seed
np.random.seed(1024)

In [2]:
np.random.random()

0.6476912306619782

In [3]:
# the following example uses 1 hidden layer and 1 output node

class simple_nn():
    # initialize
    def __init__(self, X, y, n_hidden=3, n_iter=10000, lr=0.1):
        self.n_iter = n_iter
        self.lr = lr
        self.n_hidden = n_hidden
        self.n_data, self.dim_inputs = X.shape
        self.dim_outputs = y.shape[1]
        self.y = y
        self.X = np.hstack((np.ones((self.n_data, 1)), X))
        self.hidden_weights = 2 * np.random.random((self.dim_inputs + 1, self.n_hidden)) - 1
        self.output_weights = 2 * np.random.random((self.n_hidden + 1, self.dim_outputs)) - 1
        
    # define sigmoid function with derivative option
    def sigmoid(self, x, derivative=False):
        if derivative == True:
            return x * (1-x)
        else:
            return 1 / (1 + np.exp(-x))
    
    def fit(self):
        for i in range(self.n_iter):
            # forward phase
            hidden_layer_outputs = np.hstack((np.ones((self.n_data, 1)), self.sigmoid(self.X.dot(self.hidden_weights))))
            output_layer_outputs = hidden_layer_outputs.dot(self.output_weights)
            
            # backward phase
            # calculate output layer error term
            output_error = output_layer_outputs - self.y
            # calcualte hidden layer error term
            hidden_error = hidden_layer_outputs[:, 1:] * (1 - hidden_layer_outputs[:, 1:]) * np.dot(output_error, self.output_weights.T[:, 1:])
            
            # partital derivatives
            hidden_pd = self.X[:, :, np.newaxis] * hidden_error[:, np.newaxis, :]
            output_pd = hidden_layer_outputs[:, :, np.newaxis] * output_error[:, np.newaxis, :]
            
            # average for total gradients
            self.total_hidden_gradient = hidden_pd.mean(axis=0)
            self.total_output_gradient = output_pd.mean(axis=0)
            
            # update weights
            self.hidden_weights += -self.lr * self.total_hidden_gradient
            self.output_weights += -self.lr * self.total_output_gradient
            
            if i % 500 == 0:
                print(f'output after training iteration {i}: {output_layer_outputs}')
    

In [4]:
# fake data
X = np.array([  
    [0, 0, 1],
    [0, 1, 1],
    [1, 0, 0],
    [1, 1, 0],
    [1, 0, 1],
    [1, 1, 1],
])

y = np.array([[0, 1, 0, 1, 1, 0]]).T


In [5]:
model = simple_nn(X, y)
model.fit()

output after training iteration 0: [[0.56213115]
 [0.44555914]
 [0.47731602]
 [0.36361526]
 [0.47747825]
 [0.37241419]]
output after training iteration 500: [[0.43679569]
 [0.67451019]
 [0.40351425]
 [0.65781425]
 [0.29035525]
 [0.52642169]]
output after training iteration 1000: [[0.38967252]
 [0.7030734 ]
 [0.38370924]
 [0.71375792]
 [0.25690025]
 [0.54320504]]
output after training iteration 1500: [[0.36743917]
 [0.72187583]
 [0.36704491]
 [0.73981058]
 [0.25157379]
 [0.54003994]]
output after training iteration 2000: [[0.34198746]
 [0.75031232]
 [0.34136624]
 [0.77012587]
 [0.2522033 ]
 [0.52978184]]
output after training iteration 2500: [[0.30715247]
 [0.79215255]
 [0.30458624]
 [0.80982721]
 [0.26193012]
 [0.51116203]]
output after training iteration 3000: [[0.26891903]
 [0.83986972]
 [0.26459716]
 [0.85208211]
 [0.28447847]
 [0.4816553 ]]
output after training iteration 3500: [[0.24038876]
 [0.87447131]
 [0.23593791]
 [0.88250689]
 [0.32695421]
 [0.4354427 ]]
output after trainin