<a href="https://colab.research.google.com/github/mohammedhemed77/DL-Course-UDL-Book-Based-/blob/main/Implementations/Notebooks/Backpropagation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### 🚀🚀 Building abstract NN from scratch
This code is based on Chapter 2 of Michael Nielsen's interactive book Neural Networks and Deep Learning.

 I have updated it to use PyTorch instead of NumPy and rewritten it for Python 3.


#### By ENG / Mohammed Hemed

### Import libraries

In [None]:
import torch
import random
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim

### 🔥 Neural Network Example: Why Compute ∂L/∂L in Backpropagation?
Let's use a simple neural network to classify even vs. odd numbers.
We'll use PyTorch and see what happens when we remove ∂L/∂L.


### ✅ Correct Way: Using Backprop Properly

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim

# Example Dataset: Even (0) vs. Odd (1)
X_train = torch.tensor([[2.0], [4.0], [6.0], [1.0], [3.0], [5.0]])  # Input numbers
y_train = torch.tensor([[0.0], [0.0], [0.0], [1.0], [1.0], [1.0]])  # Labels (0 for even, 1 for odd)

# Simple Neural Network (1 hidden layer)

class SimpleNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(1, 4)  # 1 input → 4 hidden units
        self.fc2 = nn.Linear(4, 1)  # 4 hidden → 1 output

    def forward(self, x):
        x = torch.sigmoid(self.fc1(x))
        x = torch.sigmoid(self.fc2(x))
        return x

# Model, Loss, Optimizer
model = SimpleNN()
criterion = nn.BCELoss()  # Mean Squared Error Loss (for binary classification)
optimizer = optim.SGD(model.parameters(), lr=0.1)

# Training Loop
epochs = 1000
for epoch in range(epochs):
    optimizer.zero_grad()         # Reset gradients
    y_pred = model(X_train)       # Forward pass
    loss = criterion(y_pred, y_train)  # Compute loss
    loss.backward()               # Backward pass (compute gradients)
    optimizer.step()              # Update weights

    if epoch % 100 == 0:  # Print loss every 100 epochs
        print(f'Epoch {epoch}, Loss: {loss.item():.4f}')


Epoch 0, Loss: 0.7386
Epoch 100, Loss: 0.6813
Epoch 200, Loss: 0.6767
Epoch 300, Loss: 0.6721
Epoch 400, Loss: 0.6680
Epoch 500, Loss: 0.6646
Epoch 600, Loss: 0.6617
Epoch 700, Loss: 0.6593
Epoch 800, Loss: 0.6574
Epoch 900, Loss: 0.6559


### ❌ Incorrect Way: Detaching the Loss (Breaking Backpropagation)

In [None]:
# Detaching the loss (breaking the computation graph)
loss = loss.detach()

# Trying to compute gradients
try:
    loss.backward()
except RuntimeError as e:
    print("Error:", e)


Error: element 0 of tensors does not require grad and does not have a grad_fn


#### zip function in python  
The zip() function returns a zip object, which is an iterator of tuples where the first item in each passed iterator is paired together, and then the second item in each passed iterator are paired together etc.

If the passed iterables have different lengths, the iterable with the shortest items decides the length of the new iterator.

In [None]:
weights = [0.2, 0.5, 0.8]
biases = [0.1, 0.3, 0.6]

for w, b in zip(weights, biases):
    print(f"Weight: {w}, Bias: {b}")


sizes = [1 , 3 , 5 , 7 , 9]

print(sizes[1:])
print(sizes[:-1])

for size in zip(sizes[:-1], sizes[1:]):
    print(size)



Weight: 0.2, Bias: 0.1
Weight: 0.5, Bias: 0.3
Weight: 0.8, Bias: 0.6
[3, 5, 7, 9]
[1, 3, 5, 7]
(1, 3)
(3, 5)
(5, 7)
(7, 9)


### Define our class


In [None]:
class Network:
    def __init__(self, sizes):
        """Initialize a feedforward neural network with `sizes` representing the number of neurons per layer."""

        self.num_layers = len(sizes)
        self.sizes = sizes

        ''' These two lines create lists of tensors, where each element in
        the list is a tensor represents the biases or weights for a specific layer in your neural network '''
        # Initialize the network with random weights and zero biases

        self.biases = [torch.zeros(y, 1) for y in sizes[1:]]  # Biases initialized as zeros
        self.weights = [torch.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])]  # Heuristic: N(0,1)

    def sigmoid(self, z):
        """Sigmoid activation function."""
        return 1 / (1 + torch.exp(-z))

    def sigmoid_prime(self, z):
        # Derivative of the sigmoid function
        return self.sigmoid(z) * (1 - self.sigmoid(z))

    def feedforward(self, a):
         # Return the output of the network given input (a)
        for b, w in zip(self.biases, self.weights):
            a = self.sigmoid(w @ a + b)
        return a

    def SGD(self, training_data, epochs, mini_batch_size, lr):
            # Train the network using mini-batch stochastic gradient descent
        for epoch in range(epochs):
            # shuffling dataset every epoch to help model generalize better
            random.shuffle(training_data)
            # slicing dataset into mini-batces
            mini_batches = [training_data[k:k+mini_batch_size] for k in range(0, len(training_data), mini_batch_size)]
            for mini_batch in mini_batches:
                # update parameters of each batch
                self.update_mini_batch(mini_batch, lr)
            print(f"Epoch {epoch+1} complete")

    def update_mini_batch(self, mini_batch, lr):
        # Update the network’s weights and biases by applying gradient descent using backpropagation to a single mini batch.

        # these two lines are equivalent to zero_grad in pytorch
        # Two list of tensors one for weights - the other for bias
        nabla_b = [torch.zeros_like(b) for b in self.biases]      # each element is the same shape as Weights tensor in a given layer
        nabla_w = [torch.zeros_like(w) for w in self.weights]     # each element is the same shape as bias tensor in a given layer

        for x, y in mini_batch:
            # gradients computed by BP for each example
            delta_nabla_b, delta_nabla_w = self.backprop(x, y)
            nabla_b = [nb + dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
            nabla_w = [nw + dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]

        # update Weights and biases after one mini-batch
        self.weights = [w - (lr / len(mini_batch)) * nw for w, nw in zip(self.weights, nabla_w)]
        self.biases = [b - (lr / len(mini_batch)) * nb for b, nb in zip(self.biases, nabla_b)]

    # cheeck shapes for debuguing
    def print_shapes(self):
        print("Biases:")
        for b in self.biases:
            print(b.shape)
        print("\nWeights:")
        for w in self.weights:
            print(w.shape)


    def backprop(self, x, y):
        # Return a tuple `(nabla_b, nabla_w)` representing the gradient for the cost function."""

        # This two lines are equivalent to zero_grad in pythoch
        nabla_b = [torch.zeros_like(b) for b in self.biases]
        nabla_w = [torch.zeros_like(w) for w in self.weights]

        # Forward pass
        activation = x
        activations = [x]  # Store activations layer by layer
        zs = []  # Store weighted inputs layer by layer

        for b, w in zip(self.biases, self.weights):
            z = w @ activation + b
            zs.append(z)
            activation = self.sigmoid(z)
            activations.append(activation)

        # Backward pass
        delta = (activations[-1] - y) * self.sigmoid_prime(zs[-1])  # Output layer error
        nabla_b[-1] = delta
        nabla_w[-1] = delta @ activations[-2].T

        for l in range(2, self.num_layers):
            z = zs[-l]
            sp = self.sigmoid_prime(z)
            delta = (self.weights[-l+1].T @ delta) * sp
            nabla_b[-l] = delta
            nabla_w[-l] = delta @ activations[-l-1].T

        return nabla_b, nabla_w



### Load and prepare our dataset (MNIST)

In [None]:
# Load MNIST dataset
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Lambda(lambda x: x.view(-1, 1))  # Flatten 28x28 image to 784x1
])

train_dataset = torchvision.datasets.MNIST(root="./data", train=True, transform=transform, download=True)
test_dataset = torchvision.datasets.MNIST(root="./data", train=False, transform=transform, download=True)

# Convert to list of (image, label) tuples for custom training loop
train_data = [(image, torch.nn.functional.one_hot(torch.tensor(label), num_classes=10).float().view(-1, 1))
              for image, label in train_dataset]
test_data = [(image, label) for image, label in test_dataset]


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9.91M/9.91M [00:00<00:00, 124MB/s]


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28.9k/28.9k [00:00<00:00, 40.1MB/s]


Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1.65M/1.65M [00:00<00:00, 25.5MB/s]


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4.54k/4.54k [00:00<00:00, 4.71MB/s]


Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw



### Define network architecture (784 input neurons, 30 hidden, 10 output)

In [None]:
network = Network([784, 30, 10])
network.print_shapes()

Biases:
torch.Size([30, 1])
torch.Size([10, 1])

Weights:
torch.Size([30, 784])
torch.Size([10, 30])


### Train the network

In [None]:
network.SGD(training_data=train_data, epochs=10, mini_batch_size=32, lr=3.0)

Epoch 1 complete
Epoch 2 complete
Epoch 3 complete
Epoch 4 complete
Epoch 5 complete
Epoch 6 complete
Epoch 7 complete
Epoch 8 complete
Epoch 9 complete
Epoch 10 complete


### Evaluate on test set

In [None]:
correct = 0
total = len(test_data)

for image, label in test_data:
    output = network.feedforward(image)
    predicted = torch.argmax(output).item()
    if predicted == label:
        correct += 1

accuracy = 100 * correct / total
print(f"Test Accuracy: {accuracy:.2f}%")

Test Accuracy: 93.91%


### Sources :

CH2 of NN Nielsen Book : http://neuralnetworksanddeeplearning.com/chap1.html

3B1B :
https://www.3blue1brown.com/lessons/neural-networks
https://www.3blue1brown.com/lessons/gradient-descent
https://www.3blue1brown.com/lessons/backpropagation
https://www.3blue1brown.com/lessons/backpropagation-calculus

DL4CV course Michigan university :
Lect6 : (BP) https://youtu.be/dB-u77Y5a6A?si=TjPeCMZh-mz-O2K7

CH2 of NN Nielsen Book : http://neuralnetworksanddeeplearning.com/chap1.html