# Exercise 02: Train a 2-Layer Network on Real Data (Manual Backprop)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shang-vikas/series1-coding-exercises/blob/main/exercises/blog-031/exercise-02.ipynb)

## Setup

In [7]:
# Install required packages using the kernel's Python interpreter
import sys
import subprocess
import importlib

def install_if_missing(package, import_name=None):
    """Install package if it's not already installed."""
    if import_name is None:
        import_name = package

    try:
        importlib.import_module(import_name)
        print(f"âœ“ {package} is already installed")
    except ImportError:
        print(f"Installing {package}....")
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])
        print(f"âœ“ {package} installed successfully")

# Install required packages
install_if_missing("numpy")
install_if_missing("scikit-learn", "sklearn")

âœ“ numpy is already installed
âœ“ scikit-learn is already installed


This is your "you now understand training" checkpoint.

### Step 1 â€” Load Real Dataset

In [8]:
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

np.random.seed(42)

# Load real dataset
data = load_breast_cancer()
X = data.data
y = data.target.reshape(-1, 1)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Normalize features (important!)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

This dataset predicts:

- 0 = malignant
- 1 = benign

**30 real medical features.**

### Step 2 â€” Initialize Network

We'll use:

**Input (30) â†’ Hidden (16) â†’ Output (1)**

In [10]:
input_dim = X_train.shape[1]
hidden_dim = 16

W1 = np.random.randn(input_dim, hidden_dim) * 0.01
b1 = np.zeros((1, hidden_dim))

W2 = np.random.randn(hidden_dim, 1) * 0.01
b2 = np.zeros((1, 1))

lr = 0.05 # learning rate - how much we update the weights by each iteration.

### Step 3 â€” Define Functions

In [11]:
def relu(z):
    return np.maximum(0, z)

def relu_derivative(z):
    return (z > 0).astype(float)

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def compute_loss(p, y):
    eps = 1e-8
    return -np.mean(y * np.log(p + eps) + (1 - y) * np.log(1 - p + eps))

def accuracy(p, y):
    preds = (p > 0.5).astype(int)
    return np.mean(preds == y)

### Step 4 â€” Training Loop (Full Batch)

In [12]:
for epoch in range(500):

    # Forward
    z1 = X_train @ W1 + b1
    a1 = relu(z1)

    z2 = a1 @ W2 + b2
    p = sigmoid(z2)

    loss = compute_loss(p, y_train)

    # Backward
    dz2 = p - y_train
    dW2 = a1.T @ dz2 / len(X_train)
    db2 = np.mean(dz2, axis=0, keepdims=True)

    da1 = dz2 @ W2.T
    dz1 = da1 * relu_derivative(z1)

    dW1 = X_train.T @ dz1 / len(X_train)
    db1 = np.mean(dz1, axis=0, keepdims=True)

    # Update
    W2 -= lr * dW2
    b2 -= lr * db2
    W1 -= lr * dW1
    b1 -= lr * db1

    if epoch % 50 == 0:
        train_acc = accuracy(p, y_train)
        print(f"Epoch {epoch} | Loss: {loss:.4f} | Train Acc: {train_acc:.4f}")

Epoch 0 | Loss: 0.6932 | Train Acc: 0.3714
Epoch 50 | Loss: 0.6510 | Train Acc: 0.6330
Epoch 100 | Loss: 0.4344 | Train Acc: 0.9341
Epoch 150 | Loss: 0.2253 | Train Acc: 0.9560
Epoch 200 | Loss: 0.1445 | Train Acc: 0.9714
Epoch 250 | Loss: 0.1116 | Train Acc: 0.9736
Epoch 300 | Loss: 0.0935 | Train Acc: 0.9824
Epoch 350 | Loss: 0.0833 | Train Acc: 0.9824
Epoch 400 | Loss: 0.0767 | Train Acc: 0.9824
Epoch 450 | Loss: 0.0721 | Train Acc: 0.9824


### Step 5 â€” Evaluate on Test Set

In [13]:
z1 = X_test @ W1 + b1
a1 = relu(z1)
z2 = a1 @ W2 + b2
p_test = sigmoid(z2)

test_acc = accuracy(p_test, y_test)
print("Test Accuracy:", test_acc)

Test Accuracy: 0.9912280701754386


You should reach **~95%+ accuracy**.

With no frameworks.

No autograd.

Just the core engine.

This is data with:

- Real noisy data
- Real feature correlations
- Real class imbalance
- Real generalization gap

**Now you can:**

- Change hidden size â†’ watch overfitting
- Increase learning rate â†’ watch divergence
- Remove normalization â†’ watch training destabilize
- Add L2 regularization â†’ see generalization improve

Now the system behaves like real ML.

## ðŸ’¡ Reflection Prompts

- What happens if `hidden_dim = 128`?
- What if you remove ReLU?
- What if you increase `lr` to `0.5`?
- What if you train for 5000 epochs?
- What happens if you initialize weights too large?

**You'll see:**

- Vanishing/exploding behavior
- Overfitting
- Optimization instability

On real data.

## Why This Is Powerful

You just trained a real medical classifier using:

- manual gradient computation
- no frameworks
- no magic

**If you understand this exercise deeply,**

**you understand 90% of deep learning training.**

The rest is engineering scale.