### Guided Intro to Backpropagation

It's fundamental algorithm that enables neural network to learn by efficiently compute gradients and updates weights.

**Key Concepts**

1. *Foward Pass:* Inputs follows through the network layers
2. *Loss calculation:* Computes error present in the layer with respect to true targets
3. *Backward Pass:* Calculates how much each weight contributed to the error (uses chain rule)
4. *Weight Update:* Adjusts weights to minimize the error using optimization algorithms like Gradient Descent

> TL;DR
Backpropagation is essentially `Learning with mistakes` - the network sees how wrong it was and adjusts weights to be less wrong next time.

In [None]:
# Building a neural network to showcase Backprop

import numpy as np
weights = np.array([0.5, 0.48, -0.7])
alpha = 0.1

matrix_value = np.array([ [1, 0, 1],
                          [0, 1, 1],
                          [0, 0, 1],
                          [1, 1, 1],
                          [0, 1, 1],
                          [1, 0, 1] ])

output = np.array([0, 1, 0, 1, 1, 0])

input = matrix_value[0] # -> [1, 0, 1]
goal_prediction = output[0]

In [None]:
# we are making it learn only 1 feature

for iter in range(20):
    prediction = input.dot(weights)
    error = (goal_prediction - prediction) ** 2
    delta = prediction - goal_prediction
    weights = weights - (alpha * (input * delta))
    
    if iter % 5 == 0:
        print('Error:' + str(error) + 'Prediction:' + str(prediction))

Error:5.316911983140017e-06Prediction:-0.0023058430092137705
Error:5.708990770825904e-07Prediction:-0.0007555786372592799
Error:6.129982163457715e-08Prediction:-0.0002475880078569581
Error:6.5820182292947054e-09Prediction:-8.112963841466758e-05


**Making it learn the entire dataset**

``The neural network has been learning only one feature, we want to make it learn all of them``

In [5]:
import numpy as np

weights = np.array([0.5, 0.48, -0.7])
alpha = 0.1

matrix_value = np.array([ [1, 0, 1],
                          [0, 1, 1],
                          [0, 0, 1],
                          [1, 1, 1],
                          [0, 1, 1],
                          [1, 0, 1] ])

output = np.array([0, 1, 0, 1, 1, 0])

input = matrix_value[0]
goal_prediction = output[0]

In [9]:
for iter in range(5):
    error_for_all = 0
    for row in range(len(output)):
        input = matrix_value[row]
        goal_prediction = output[row]
        
        prediction = input.dot(weights)
        
        error = (goal_prediction - prediction) ** 2
        error_for_all += error
        
        delta = prediction - goal_prediction
        weights = weights - (alpha * (delta * input))
        if row % 10 == 0:
            print('Prediction:' + str(prediction))
    
    print('Error:' + str(error_for_all) + '\n')

Prediction:-0.003733415818170091
Error:0.0018939739123713475

Prediction:-0.003565528921192253
Error:0.0016451096996342332

Prediction:-0.0033920135266339822
Error:0.0014290353984827077

Prediction:-0.003216638628616701
Error:0.0012413985592149145

Prediction:-0.003042243679620596
Error:0.0010784359268087556



> **Full, batch, and stochastic gradient descent**

1. *Stochastic gradient descent* updates weights one examples at at time.
    
2. *(Full) gradient descent* updates weights one dataset at a time.

3. *Batch gradient* descent updates weights after n examples
    

### To make non-linear observation

We use Non-linear functions(Activation) such as `Sigmoid`, `TanH` and `ReLu`.


**Activation in Deep Learning**

Activation functions are a crucial component of a neural network, They determine whether a neuron should be activated or not by introducing non-linearity into the model, enabling it to learn and perform complex tasks.

> There are more activation functions like this and but core functionality changes.

In [2]:
import numpy as np
np.random.seed(1)

def relu(x):
    '''
    Returns X if x > 0:
    returns 0 otherwise
    '''
    return (x > 0) * x

def relu2deriv(output):
    '''
    Returns 1 for output > 0
    returns 0 otherwise
    '''
    return (output > 0)

alpha = 0.2
hidden_size = 4


matrix_value = np.array([ [1, 0, 1],
                          [0, 1, 1],
                          [0, 0, 1],
                          [1, 1, 1],
                          [0, 1, 1],
                          [1, 0, 1] ])

output = np.array([0, 1, 0, 1, 1, 0])

weights_0_1 = 2 * np.random.random((3, hidden_size)) - 1 # ((3, 4))
weights_1_2 = 2 * np.random.random((hidden_size, 1)) - 1 # ((4, 1))

for iter in range(60):
    layer_2_error = 0
    for i in range(len(matrix_value)):
        layer_0 = matrix_value[i : 1 + i]
        layer_1 = relu(np.dot(layer_0, weights_0_1))
        layer_2 = np.dot(layer_1, weights_1_2)
        
        layer_2_error += np.sum((layer_2 - output[i:i+1]) ** 2)
        
        layer_2_delta = (output[i:i+1] - layer_2)
        layer_1_delta = layer_2_delta.dot(weights_1_2.T)*relu2deriv(layer_1)
        
        weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)
        weights_0_1 += alpha * layer_0.T.dot(layer_1_delta)
        
    if (iter % 10 == 9):
        print('Error:' + str(layer_2_error))

Error:0.31298811341557736
Error:0.043272385299784624
Error:0.0034372657015086937
Error:0.00019649599346773566
Error:1.015593230102771e-05
Error:5.122102420300414e-07
