# Building a Neural Network Classifier with TinyTorch 🔥

This notebook demonstrates how to build and train a neural network for binary classification using TinyTorch. We'll walk through each step of the machine learning pipeline, from data preparation to model evaluation, while explaining the key concepts along the way.

## What We'll Cover
1. Data preparation and visualization
2. Building a multi-layer perceptron (MLP)
3. Training the model with gradient descent
4. Evaluating model performance
5. Visualizing the decision boundary

Let's start by installing the required dependencies:

In [None]:
# Install required packages in the current environment
!uv pip install matplotlib scikit-learn

## 1. Data Preparation 📊

We'll use the `make_moons` dataset from scikit-learn, which creates two interleaving half circles. This is a classic binary classification problem that requires a nonlinear decision boundary, making it perfect for demonstrating the power of neural networks.

The dataset has two features (x and y coordinates) and a binary label (0 or 1) for each point. We'll generate 1000 samples with some added noise to make the problem more realistic:

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import make_moons

# Generate the dataset
data = make_moons(n_samples=1000, noise=0.1, random_state=42)
X = 4.0 * (data[0] - np.array([0.5, 0.25]))  # Scale and center the data
y = data[1]

# Visualize the dataset
colors = np.array(["#3057D3", "#D33030"])  # Blue for class 0, Red for class 1
plt.figure(figsize=(5, 3))
plt.scatter(X[:, 0], X[:, 1], s=10, color=colors[(y + 1) // 2])
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.title("Two Moons Dataset")
plt.tight_layout()
plt.savefig("../assets/classification_data.svg", format="svg")

### Data Preprocessing

Before training our neural network, we need to:
1. Split the data into training and test sets
2. Standardize the features

Standardization (scaling features to zero mean and unit variance) is crucial for neural networks as it:
- Ensures all features contribute equally to the model
- Helps with gradient descent convergence
- Makes the training process more stable

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Split dataset into 70% training and 30% test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f"Training set shape: {X_train.shape} samples with {X_train.shape[1]} features")
print(f"Test set shape:     {X_test.shape} samples with {X_test.shape[1]} features")

### Converting to TinyTorch Tensors

Now we'll convert our NumPy arrays to TinyTorch tensors. TinyTorch tensors are the fundamental data structure that supports automatic differentiation, allowing us to compute gradients for training our neural network.

In [None]:
from tinytorch import Tensor

# Create tinytorch tensor objects
Xt_train = Tensor(X_train_scaled)
yt_train = Tensor(y_train)
Xt_test = Tensor(X_test_scaled)
yt_test = Tensor(y_test)

## 2. Building the Neural Network 🧠

We'll create a Multi-Layer Perceptron (MLP) with the following architecture:
- Input layer: 2 features
- First hidden layer: 24 neurons with ReLU activation
- Second hidden layer: 12 neurons with ReLU activation
- Output layer: 1 neuron with Sigmoid activation

This architecture was chosen because:
1. The hidden layers with ReLU activation can learn complex nonlinear patterns
2. The decreasing layer sizes (24 → 12 → 1) create a bottleneck that helps prevent overfitting
3. The sigmoid output activation squashes values to [0,1], perfect for binary classification

Let's create and inspect our model:

In [None]:
from tinytorch import MLP, Activation

# Define the neural network architecture
mlp = MLP(
    n_input=2,  # Two input features
    layers=[
        (24, Activation.RELU),   # Hidden layer 1: 24 neurons with ReLU
        (12, Activation.RELU),   # Hidden layer 2: 12 neurons with ReLU
        (1, Activation.SIGMOID), # Output layer: 1 neuron with Sigmoid
    ],
)
display(mlp)

## 3. Training the Model 📈

We'll train our model using:
- Binary cross-entropy loss (appropriate for binary classification)
- Gradient descent optimization
- 200 epochs of training
- Learning rate of 0.5

During training, we'll track both loss and accuracy on training and test sets to monitor:
1. How well the model is learning (training metrics)
2. How well it generalizes (test metrics)
3. Whether we're overfitting (gap between training and test performance)

In [None]:
epochs = 200
lr = 5e-1

# Track both loss and accuracy
metrics = {"epoch": [], "train_loss": [], "test_loss": [], "train_acc": [], "test_acc": []}

# Training loop
for i in range(0, epochs):
    # Forward pass to get probabilities
    y_train_probs = mlp(Xt_train)
    y_test_probs = mlp(Xt_test)

    # Calculate accuracy (probabilities > 0.5 for binary classification)
    y_train_pred = (y_train_probs.data > 0.5).astype(np.float32)
    y_test_pred = (y_test_probs.data > 0.5).astype(np.float32)

    train_acc = np.mean(y_train_pred == yt_train.data)
    test_acc = np.mean(y_test_pred == yt_test.data)

    # Zero gradients before backward pass
    mlp.flush_grads()

    # Binary cross-entropy loss
    neg_logl_train = -(
        yt_train * y_train_probs.log() + (1 - yt_train) * (1 - y_train_probs).log()
    ).sum() / len(yt_train)
    neg_logl_test = -(
        yt_test * y_test_probs.log() + (1 - yt_test) * (1 - y_test_probs).log()
    ).sum() / len(yt_test)

    # Store metrics for plotting
    epoch = i + 1
    train_loss = neg_logl_train.data.item()
    test_loss = neg_logl_test.data.item()

    print(
        f"epoch {epoch:03d}: "
        f"loss[train]={train_loss:.3f}, loss[test]={test_loss:.3f} | "
        f"acc[train]={train_acc:.3f}, acc[test]={test_acc:.3f}"
    )

    metrics["epoch"].append(epoch)
    metrics["train_loss"].append(train_loss)
    metrics["test_loss"].append(test_loss)
    metrics["train_acc"].append(train_acc)
    metrics["test_acc"].append(test_acc)

    # Backward pass to compute gradients
    neg_logl_train.backward()

    # Update parameters using gradient descent
    for param in mlp.parameters:
        param.data += -lr * param.grad

# Visualize training progress
plt.figure(figsize=(8, 3))

# Plot loss curves
plt.subplot(1, 2, 1)
plt.plot(metrics["epoch"], metrics["train_loss"], label="Train")
plt.plot(metrics["epoch"], metrics["test_loss"], label="Test")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.legend()
plt.grid()
plt.title("Loss vs Epoch")

# Plot accuracy curves
plt.subplot(1, 2, 2)
plt.plot(metrics["epoch"], metrics["train_acc"], label="Train")
plt.plot(metrics["epoch"], metrics["test_acc"], label="Test")
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.legend()
plt.title("Accuracy vs Epoch")
plt.grid()
plt.tight_layout()
plt.savefig("../assets/classification_training.svg")

## 4. Visualizing the Decision Boundary 🎨

To understand how our model separates the two classes, we'll visualize its decision boundary. We'll:
1. Create a fine grid of points covering our feature space
2. Get model predictions for each point
3. Plot the probability contours and decision boundary

This visualization helps us see:
- The nonlinear nature of the learned decision boundary
- Areas where the model is confident vs. uncertain
- How well the boundary separates the two classes

In [None]:
from matplotlib.colors import Normalize

# Create a fine mesh grid
xx, yy = np.meshgrid(np.linspace(-7, 7, 150), np.linspace(-7, 7, 150))
X_mesh = np.c_[xx.ravel(), yy.ravel()]  # (150*150, 2)

# Scale mesh points using the same scaler used for training data
X_mesh_scaled = scaler.transform(X_mesh)

# Get model predictions for mesh points
X_mesh_tensor = Tensor(X_mesh_scaled)
mesh_probs = mlp(X_mesh_tensor)
mesh_probs = mesh_probs.data.reshape(xx.shape)

# Create the visualization
plt.figure(figsize=(5, 3))

# Plot probability contours
norm = Normalize(vmin=0, vmax=1)
levels = np.linspace(0, 1, num=11, endpoint=True)
contour = plt.contourf(xx, yy, mesh_probs, alpha=0.3, cmap="RdBu_r", norm=norm, levels=levels)
plt.colorbar(contour, label="Probability of Class 1")

# Plot training points
class_colors = ["#3057D3", "#D33030"]  # Blue for class 0, Red for class 1
plt.scatter(
    X_train[:, 0],
    X_train[:, 1],
    c=[class_colors[int(y)] for y in y_train],
    s=10,
    alpha=0.8,
)

# Add decision boundary (where probability = 0.5)
plt.contour(xx, yy, mesh_probs, levels=[0.5], colors="black", linestyles="--", linewidths=2)
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.title("Neural Network Decision Boundary")
plt.tight_layout()
plt.savefig("../assets/classification_results.svg")

## Summary 📝

In this notebook, we've demonstrated how to:
1. Prepare and preprocess data for neural network training
2. Build a multi-layer perceptron using TinyTorch
3. Train the model using gradient descent and binary cross-entropy loss
4. Monitor training progress and evaluate model performance
5. Visualize the learned decision boundary

The resulting model successfully learns a nonlinear decision boundary that separates the two classes of our moon-shaped dataset. This example showcases the power of neural networks to learn complex patterns in data, all implemented using our minimal TinyTorch framework!