# Backprop

https://www.reddit.com/r/learnmachinelearning/comments/1gvzv7l/resources_to_practice_backpropagation_for
https://medium.com/@andresberejnoi/how-to-implement-backpropagation-with-numpy-andres-berejnoi-e7c14f2e683a
https://ml-cheatsheet.readthedocs.io/en/latest/backpropagation.html
https://github.com/xbeat/Machine-Learning/blob/main/Understanding%20Backpropagation%20with%20Python.md
https://www.askpython.com/python/examples/backpropagation-in-python

https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
http://neuralnetworksanddeeplearning.com/chap2.html
https://cs231n.stanford.edu/

Implementing backpropagation from scratch is a common interview question at companies like Amazon, especially for machine learning roles. Below is a clear, step-by-step implementation of backpropagation for a simple 2-layer neural network (1 hidden layer) using Python and NumPy.

Problem Statement:
Implement backpropagation for a neural network with:

Input layer: 3 features

Hidden layer: 4 neurons (ReLU activation)

Output layer: 1 neuron (sigmoid activation, binary classification)



Implementing backpropagation from scratch is a common interview question at companies like Amazon, especially for machine learning roles. Below is a clear, step-by-step implementation of backpropagation for a simple 2-layer neural network (1 hidden layer) using Python and NumPy.

### Problem Statement:
Implement backpropagation for a neural network with:
- Input layer: 3 features
- Hidden layer: 4 neurons (ReLU activation)
- Output layer: 1 neuron (sigmoid activation, binary classification)

### Key Steps Explained:
1. **Initialization**:
   - Weights (`W1`, `W2`) and biases (`b1`, `b2`) are initialized randomly.
2. **Forward Pass**:
   - Compute activations for the hidden layer (`A1` using ReLU) and output layer (`A2` using sigmoid).
3. **Loss Calculation**:
   - Binary cross-entropy loss is used for binary classification.
4. **Backward Pass (Backpropagation)**:
   - Compute gradients for the output layer (`dW2`, `db2`).
   - Compute gradients for the hidden layer (`dW1`, `db1`) using the chain rule.
   - Update weights and biases using gradient descent.
5. **Training Loop**:
   - Iteratively perform forward and backward passes to minimize loss.

### Why This Matters for Amazon Interviews:
- **Fundamentals**: Tests understanding of core ML concepts (gradients, chain rule).
- **Coding Skills**: Requires clean, efficient NumPy implementation.
- **Debugging**: Interviewers may ask about numerical stability (e.g., sigmoid gradients vanishing).

### Variations You Might Be Asked:
1. **Add L2 Regularization**: Modify the loss and gradients to include weight decay.
2. **Mini-Batch Training**: Update weights using batches instead of full data.
3. **Different Architectures**: Implement dropout or batch normalization.

This implementation covers the essentials while remaining concise—exactly what interviewers look for!

In [None]:
import numpy as np

class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        # Initialize weights and biases with random values
        self.W1 = np.random.randn(input_size, hidden_size) * 0.01
        self.b1 = np.zeros((1, hidden_size))
        self.W2 = np.random.randn(hidden_size, output_size) * 0.01
        self.b2 = np.zeros((1, output_size))
    
    def relu(self, Z):
        return np.maximum(0, Z)
    
    def sigmoid(self, Z):
        return 1 / (1 + np.exp(-Z))
    
    def forward(self, X):
        # Layer 1 (Hidden Layer)
        self.Z1 = np.dot(X, self.W1) + self.b1
        self.A1 = self.relu(self.Z1)
        
        # Layer 2 (Output Layer)
        self.Z2 = np.dot(self.A1, self.W2) + self.b2
        self.A2 = self.sigmoid(self.Z2)
        return self.A2
    
    def compute_loss(self, Y, Y_hat):
        m = Y.shape[0]
        loss = -np.mean(Y * np.log(Y_hat) + (1 - Y) * np.log(1 - Y_hat))
        return loss
    
    def backward(self, X, Y, learning_rate=0.01):
        m = X.shape[0]
        
        # Output layer gradients
        dZ2 = self.A2 - Y
        dW2 = np.dot(self.A1.T, dZ2) / m
        db2 = np.sum(dZ2, axis=0, keepdims=True) / m
        
        # Hidden layer gradients
        dA1 = np.dot(dZ2, self.W2.T)
        dZ1 = dA1 * (self.Z1 > 0)  # ReLU derivative
        dW1 = np.dot(X.T, dZ1) / m
        db1 = np.sum(dZ1, axis=0, keepdims=True) / m
        
        # Update weights and biases
        self.W2 -= learning_rate * dW2
        self.b2 -= learning_rate * db2
        self.W1 -= learning_rate * dW1
        self.b1 -= learning_rate * db1
    
    def train(self, X, Y, epochs=1000, learning_rate=0.01):
        for epoch in range(epochs):
            # Forward pass
            Y_hat = self.forward(X)
            
            # Compute loss
            loss = self.compute_loss(Y, Y_hat)
            
            # Backward pass
            self.backward(X, Y, learning_rate)
            
            if epoch % 100 == 0:
                print(f"Epoch {epoch}, Loss: {loss:.4f}")

# Example usage
if __name__ == "__main__":
    # Sample data (3 features, 5 samples)
    X = np.random.randn(5, 3)
    Y = np.array([[0], [1], [1], [0], [1]])  # Binary labels
    
    # Initialize and train the network
    nn = NeuralNetwork(input_size=3, hidden_size=4, output_size=1)
    nn.train(X, Y, epochs=1000, learning_rate=0.01)