# Lab: Predicting diabetes with a Simple Neural Network

In this exercise you'll complete a **training loop** for a simple neural network built using **NumPy**.

**Learning Objective:**  
By the end of this lab you'll understand how to implement a forward pass, compute binary cross-entropy loss, perform backpropagation and update weights using gradient descent all from scratch.

**Your goal:**
Teach the network to predict whether a patient has **diabetes (1)** or **not (0)** based on 3 clinical inputs: **BMI, Age, Glucose**.

## What you'll implement:
- Forward pass,
- Binary cross-entropy loss,
- Backpropagation,
- Weight updates using gradient descent.

# Dataset and setup

Let's generate a synthetic dataset of 100 patients, each with 3 features (BMI, Age, Glucose). We'll also normalize the features for better training stability.

**Why normalize?** Neural networks train better when input features have similar scales (mean=0, std=1).

In [None]:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)

# Simulated input data: 100 patients, 3 features (BMI, Age, Glucose)
X = np.random.randn(100, 3)
true_weights = np.array([[2], [-1], [1.5]])
y_true = (1 / (1 + np.exp(-X @ true_weights)) > 0.5).astype(int)

# Normalize features (mean=0, std=1)
X = (X - X.mean(axis=0)) / X.std(axis=0)

print(f"Dataset shape: {X.shape}")
print(f"Labels shape: {y_true.shape}")
print(f"Positive cases (diabetes): {y_true.sum()}/{len(y_true)}")

# Initialization and activations

We initialize the weights for a simple neural network with one hidden layer using small random values.  
We also define the activation functions:
- **ReLU** for the hidden layer (introduces non-linearity)
- **Sigmoid** for the output (gives us probabilities between 0 and 1)

In [None]:
# Initialize weights with small random values
def init_weights(input_dim, hidden_dim):
    W1 = np.random.randn(input_dim, hidden_dim) * 0.01  # Small random weights
    b1 = np.zeros((1, hidden_dim))                      # Biases start at zero
    W2 = np.random.randn(hidden_dim, 1) * 0.01
    b2 = np.zeros((1, 1))
    return W1, b1, W2, b2

# Activation functions
def relu(Z):
    """ReLU activation: max(0, Z)"""
    return np.maximum(0, Z)

def sigmoid(Z):
    """Sigmoid activation: 1 / (1 + exp(-Z))"""
    return 1 / (1 + np.exp(-np.clip(Z, -250, 250)))  # Clip to prevent overflow

# Your task: Implement the training loop

Now let's implement the training loop. We'll perform the following steps:

1. **Forward pass**: compute predictions by passing data through the network
2. **Compute loss**: measure how far off our predictions are using binary cross-entropy
3. **Backpropagation**: compute gradients (how much each weight should change)
4. **Update weights**: adjust weights in the direction that reduces loss

Fill in the missing code below:

In [None]:
def train(X, y, hidden_dim=4, epochs=500, lr=0.01):
    W1, b1, W2, b2 = init_weights(X.shape[1], hidden_dim)
    m = X.shape[0]  # Number of training examples
    losses = []

    for epoch in range(epochs):
        # === 1. Forward pass ===
        Z1 = # Your code here: Linear transform for hidden layer (X @ W1 + b1)
        A1 = # Your code here: Apply ReLU activation
        Z2 = # Your code here: Linear transform for output (A1 @ W2 + b2)
        A2 = # Your code here: Apply sigmoid activation

        # === 2. Compute loss ===
        # Binary cross-entropy: -mean(y*log(pred) + (1-y)*log(1-pred))
        loss = # Your code here: binary cross-entropy loss
        losses.append(loss)

        # === 3. Backpropagation ===
        # Output layer gradients
        dZ2 = # Your code here: A2 - y (derivative of loss w.r.t. Z2)
        dW2 = # Your code here: A1.T @ dZ2 / m (gradient for W2)
        db2 = # Your code here: mean of dZ2 (gradient for b2)

        # Hidden layer gradients
        dA1 = # Your code here: dZ2 @ W2.T (backprop into hidden layer)
        dZ1 = # Your code here: dA1 * (Z1 > 0) (derivative through ReLU)
        dW1 = # Your code here: X.T @ dZ1 / m (gradient for W1)
        db1 = # Your code here: mean of dZ1 (gradient for b1)

        # === 4. Update weights ===
        W1 -= lr * dW1
        b1 -= lr * db1
        W2 -= lr * dW2
        b2 -= lr * db2

        # Print progress every 50 epochs
        if epoch % 50 == 0:
            print(f"Epoch {epoch}, Loss: {loss:.4f}")

    return W1, b1, W2, b2, losses

# Unit test

Let's run the training loop and check that:
1. The function returns the expected outputs
2. The loss decreases over time (indicating learning)

In [None]:
# Unit test for training loop
print("Running unit test for your training loop...\n")

# Run learner's train() implementation
try:
    W1, b1, W2, b2, losses = train(X, y_true, epochs=100, lr=0.05)
except Exception as e:
    raise RuntimeError(f"Error during training execution: {e}")

# Check that losses is a list of numbers
assert isinstance(losses, list), "Expected 'losses' to be a list."
assert all(isinstance(loss, (int, float)) for loss in losses), "All loss values should be numeric."

# Check loss length
assert len(losses) == 100, f"Expected 100 loss values (one per epoch), but got {len(losses)}."

# Check that loss decreases
if losses[-1] >= losses[0]:
    raise AssertionError(f"Final loss ({losses[-1]:.4f}) is not lower than initial loss ({losses[0]:.4f}). Model may not be learning.")

# Optional: check parameter update
if np.allclose(W1, 0) and np.allclose(W2, 0):
    raise AssertionError("Model weights appear to be unchanged. Check if you're updating them correctly.")

print("All tests passed!")
print(f"Initial loss: {losses[0]:.4f}")
print(f"Final loss: {losses[-1]:.4f}")
print(f"Loss reduction: {((losses[0] - losses[-1]) / losses[0] * 100):.1f}%")

# Visualize the loss curve

Plotting the loss helps us see if the model is learning effectively. A good training curve should show:
- **Decreasing loss** over time
- **Smooth convergence** (not too noisy)
- **Leveling off** when the model has learned as much as it can

In [None]:
# Train for more epochs to see the full learning curve
print("Training for 300 epochs...")
W1, b1, W2, b2, losses = train(X, y_true, epochs=300, lr=0.05)

# Plot the loss curve
plt.figure(figsize=(10, 6))
plt.plot(losses, linewidth=2)
plt.xlabel('Epoch', fontsize=12)
plt.ylabel('Binary Cross-Entropy Loss', fontsize=12)
plt.title('Training Loss Curve', fontsize=14)
plt.grid(True, alpha=0.3)
plt.show()

print(f"Final training accuracy: {((sigmoid(relu(X @ W1 + b1) @ W2 + b2) > 0.5) == y_true).mean():.1%}")

# Model evaluation

Let's evaluate how well our trained model performs on the training data.

In [None]:
# Make predictions with the trained model
def predict(X, W1, b1, W2, b2):
    Z1 = X @ W1 + b1
    A1 = relu(Z1)
    Z2 = A1 @ W2 + b2
    A2 = sigmoid(Z2)
    return A2

# Get predictions and convert to binary
predictions_prob = predict(X, W1, b1, W2, b2)
predictions_binary = (predictions_prob > 0.5).astype(int)

# Calculate accuracy
accuracy = (predictions_binary == y_true).mean()
print(f"Training Accuracy: {accuracy:.1%}")

# Show some example predictions
print("\nSample Predictions:")
print("Probability | Predicted | Actual")
print("-" * 35)
for i in range(10):
    prob = predictions_prob[i, 0]
    pred = predictions_binary[i, 0]
    actual = y_true[i, 0]
    print(f"   {prob:.3f}    |     {pred}     |   {actual}")

# Results and next steps:

**Congratulations!** You've successfully implemented a neural network training loop from scratch.

## What you learned:
- How to implement forward propagation
- How to compute binary cross-entropy loss
- How to perform backpropagation to compute gradients
- How to update weights using gradient descent

## Experiment with these ideas:
- **Change the learning rate**: Try `lr=0.01` or `lr=0.1`. What happens?
- **Adjust hidden units**: Try `hidden_dim=8` or `hidden_dim=2`. How does it affect performance?
- **More epochs**: Train for 1000 epochs. Does the loss keep decreasing?
- **Different initialization**: Try larger initial weights (`* 0.1` instead of `* 0.01`)

## Challenge problems:
1. **Add another hidden layer** - Can you modify the code to have 2 hidden layers?
2. **Try different activation functions** - What about using `tanh` instead of ReLU?
3. **Add regularization** - Can you add L2 regularization to prevent overfitting?
4. **Implement momentum** - Modify the weight updates to include momentum.

Great work! You now understand the fundamentals of how neural networks learn.

# Instructor-Only Solution
> The following cell contains the complete, correct implementation of the training loop.
> In production this would be hidden from learners.

In [None]:
# Full solution to training loop
def train(X, y, hidden_dim=4, epochs=500, lr=0.01):
    W1, b1, W2, b2 = init_weights(X.shape[1], hidden_dim)
    m = X.shape[0]
    losses = []

    for epoch in range(epochs):
        # === Forward pass ===
        Z1 = X @ W1 + b1                # Linear transform for hidden layer
        A1 = relu(Z1)                   # ReLU activation
        Z2 = A1 @ W2 + b2               # Linear transform for output
        A2 = sigmoid(Z2)                # Sigmoid activation (output probability)

        # === Compute loss ===
        loss = -np.mean(y * np.log(A2 + 1e-8) + (1 - y) * np.log(1 - A2 + 1e-8))  # Binary cross-entropy
        losses.append(loss)

        # === Backpropagation ===
        dZ2 = A2 - y                    # Derivative of loss w.r.t. Z2
        dW2 = A1.T @ dZ2 / m            # Gradient for W2
        db2 = np.sum(dZ2, axis=0, keepdims=True) / m  # Gradient for b2

        dA1 = dZ2 @ W2.T                # Backprop into hidden layer
        dZ1 = dA1 * (Z1 > 0)            # Derivative through ReLU
        dW1 = X.T @ dZ1 / m             # Gradient for W1
        db1 = np.sum(dZ1, axis=0, keepdims=True) / m  # Gradient for b1

        # === Update weights ===
        W1 -= lr * dW1
        b1 -= lr * db1
        W2 -= lr * dW2
        b2 -= lr * db2

        if epoch % 50 == 0:
            print(f"Epoch {epoch}, Loss: {loss:.4f}")

    return W1, b1, W2, b2, losses