# Introduction

* This notebook demonstrates implementing backpropagation from scratch without using PyTorch, TensorFlow, or Keras.
* The network is trained using Batch Gradient Descent.
* Backpropagation can also be implemented with Stochastic Gradient Descent, Mini-Batch Gradient Descent, or other optimizers.

# BackPropogation

Backpropagation is a fundamental algorithm for training artificial neural networks, short for "backward propagation of errors". It works by calculating the error between a network's prediction and the actual target, then propagating this error back through the network to adjust the internal weights and biases, thereby minimizing future prediction errors and improving accuracy. This iterative process, utilizing gradient descent, allows the network to learn from its mistakes and become better at mapping inputs to outputs. 


## Steps in Backpropogation
1. **Forward Pass**

    * Compute hidden layer output: H = X·W1 + b1
    
    * Compute output: y_pred = H·W2 + b2

2. **Compute Loss**

    * Mean Squared Error: Loss = mean((y_pred - y)^2)

3. **Output Layer Gradients**

    * Error: delta2 = y_pred - y
    
    * Gradients: dw2 = H.T · delta2 / N, db2 = sum(delta2) / N

4. **Hidden Layer Gradients**

    * Error: delta1 = delta2 · W2.T
    
    * Gradients: dw1 = X.T · delta1 / N, db1 = sum(delta1) / N

5. **Update Weights & Biases**

    * W2 -= lr * dw2, b2 -= lr * db2
    
    * W1 -= lr * dw1, b1 -= lr * db1

6. Repeat for all epochs until loss decreases and predictions improve.

In [1]:
# Importing Libraries

import numpy as np
import pandas as pd

In [2]:
# Creating Dataset

# Create regression dataset
data = {
    "Hours_Studied": [2, 4, 6, 8],
    "Hours_Slept": [9, 8, 6, 5],
    "Final_Score": [55, 65, 75, 85]  # target (y)
}

df = pd.DataFrame(data)


In [3]:
df.head()

Unnamed: 0,Hours_Studied,Hours_Slept,Final_Score
0,2,9,55
1,4,8,65
2,6,6,75
3,8,5,85


In [4]:
# Normalizing the Data

df['Hours_Studied'] = df['Hours_Studied']/df['Hours_Studied'].max()
df['Hours_Slept'] = df['Hours_Slept']/df['Hours_Slept'].max()
df['Final_Score'] = df['Final_Score']/df['Final_Score'].max()

df.head()

Unnamed: 0,Hours_Studied,Hours_Slept,Final_Score
0,0.25,1.0,0.647059
1,0.5,0.888889,0.764706
2,0.75,0.666667,0.882353
3,1.0,0.555556,1.0


In [5]:
# Spltting the data

X = df.drop('Final_Score',axis = 1).values
y = df['Final_Score'].values.reshape(-1,1)

In [6]:
X.shape

(4, 2)

In [7]:
y.shape

(4, 1)

In [8]:
# Creating Neular network

"""
 Input Layer - 2 neurons
 Hidden Layer - 2 neurons
 Output Layer - 1 neuron

 Total Trainable Parameters = Total weights + Total bias

 Input layer to hidden layer = 2 x 2 = 4 weights
 Hidden layer bias's         = 2 bias's

 Hidden layer to Output layer = 2 x 1 = 2 weights
 Output layer bias's          = 1 bias's

 Total Trainable Parameters = 4+2 (weights) +  2+1 (bias's)

 Total Trainable Parameters = 9 

"""


# Initializing layers and neurons
input_neurons = 2
hidden_neurons = 2
output_neurons = 1


# Initalizing weights and bias's
w1 = np.random.rand(input_neurons,hidden_neurons)
b1 = np.random.rand(1,hidden_neurons)
w2 = np.random.rand(hidden_neurons,output_neurons)
b2 = np.random.rand(1,output_neurons)

In [9]:
# Priniting Initial weights and bias's

print(f"w1 : \n",w1)
print(f"\nw2 : \n",w2)
print(f"\nb1 : \n",b1)
print(f"\nb2 : \n",b2)

w1 : 
 [[0.53478669 0.44097337]
 [0.23139675 0.47830704]]

w2 : 
 [[0.6784994 ]
 [0.61629207]]

b1 : 
 [[0.06164027 0.12409449]]

b2 : 
 [[0.66356891]]


In [10]:
epochs = 1000
lr = 0.1


In [11]:
for epoch in range(epochs):
    # Forward Pass
    hidden_y_pred = np.dot(X, w1) + b1        # (4,2)
    output_y_pred = np.dot(hidden_y_pred, w2) + b2  # (4,1)

    # Loss (MSE)
    loss = np.mean((y - output_y_pred)**2)
    if epoch % 100 == 0:
        print(f"Epoch {epoch}, Loss: {loss:.5f}")

    # Backpropagation
    # 1️Output layer gradients
    delta2 = output_y_pred - y                # (4,1)
    dw2 = np.dot(hidden_y_pred.T, delta2) / X.shape[0]   # (2,1)
    db2 = np.sum(delta2, axis=0, keepdims=True) / X.shape[0]  # (1,1)

    # Hidden layer gradients
    delta1 = np.dot(delta2, w2.T)            # (4,2)
    dw1 = np.dot(X.T, delta1) / X.shape[0]   # (2,2)
    db1 = np.sum(delta1, axis=0, keepdims=True) / X.shape[0]  # (1,2)

    # Update weights and biases
    w2 -= lr * dw2
    b2 -= lr * db2
    w1 -= lr * dw1
    b1 -= lr * db1

# Final Predictions
hidden_y_pred = np.dot(X, w1) + b1
output_y_pred = np.dot(hidden_y_pred, w2) + b2
print("\nFinal Predictions:\n", output_y_pred)

Epoch 0, Loss: 0.50015
Epoch 100, Loss: 0.00072
Epoch 200, Loss: 0.00014
Epoch 300, Loss: 0.00004
Epoch 400, Loss: 0.00002
Epoch 500, Loss: 0.00002
Epoch 600, Loss: 0.00002
Epoch 700, Loss: 0.00002
Epoch 800, Loss: 0.00002
Epoch 900, Loss: 0.00002

Final Predictions:
 [[0.64502238]
 [0.77016154]
 [0.87683572]
 [1.00197488]]
