# 06 - XOR Classification with a Neural Network

The XOR problem is a classic test for neural networks: it cannot be solved by a single linear layer, so it demonstrates the need for nonlinearity and multiple layers—just like in LLMs and transformers.

In this notebook, you'll scaffold a neural network to solve XOR, and see how the same principles apply to much larger models.

## 🔢 The XOR Dataset

The XOR dataset consists of four points:
- Inputs: (0,0), (0,1), (1,0), (1,1)
- Labels: 0, 1, 1, 0

**LLM/NN Context:**
- This is a minimal example of a non-linearly separable problem, showing why deep models (with nonlinearity) are needed.

### Task:
- Scaffold code to create the XOR dataset as numpy arrays.

In [None]:
# TODO: Create XOR input and label arrays (X, y)
# X: shape (4, 2), y: shape (4,)
pass

## 🧮 Neural Network Architecture

To solve XOR, you need at least one hidden layer with a non-linear activation (e.g., ReLU, sigmoid, or GELU).

**LLM/Transformer Context:**
- LLMs use deep stacks of such layers to model complex relationships between tokens.

### Task:
- Scaffold a function to initialize the weights and biases for a two-layer neural network.
- Add a docstring explaining the role of each parameter.

In [None]:
def init_xor_network(input_dim, hidden_dim, output_dim):
    """
    Initialize weights and biases for a two-layer neural network.
    Args:
        input_dim (int): Number of input features.
        hidden_dim (int): Number of hidden units.
        output_dim (int): Number of output classes.
    Returns:
        dict: Parameters (W1, b1, W2, b2)
    """
    # TODO: Initialize weights and biases (random or zeros)
    pass

## 🔗 Forward Pass

The forward pass computes the output of the network for a given input.

**LLM/Transformer Context:**
- This is the same computation performed in every transformer block, just at a much larger scale.

### Task:
- Scaffold a function for the forward pass through the two-layer network (with nonlinearity).
- Add a docstring explaining the computation.

In [None]:
def xor_forward(X, params, activation_fn):
    """
    Forward pass for a two-layer neural network on XOR data.
    Args:
        X (np.ndarray): Input data (batch_size x input_dim)
        params (dict): Network parameters (W1, b1, W2, b2)
        activation_fn (callable): Nonlinear activation function
    Returns:
        np.ndarray: Output logits (batch_size x output_dim)
    """
    # TODO: Implement the forward pass (linear -> activation -> linear)
    pass

## 🧮 Loss Function: Binary Cross-Entropy

For binary classification, use the binary cross-entropy loss.

**LLM/Transformer Context:**
- LLMs use cross-entropy loss for next-token prediction; here, we use it for binary classification.

### Task:
- Scaffold a function to compute binary cross-entropy loss for predictions and true labels.
- Add a docstring explaining its role.

In [None]:
def binary_cross_entropy_loss(preds, targets):
    """
    Compute binary cross-entropy loss.
    Args:
        preds (np.ndarray): Predicted probabilities (batch_size,)
        targets (np.ndarray): True labels (batch_size,)
    Returns:
        float: Average loss
    """
    # TODO: Implement binary cross-entropy loss
    pass

## 🔁 Training Loop (Gradient Descent)

Train the network by updating weights using gradients from backpropagation.

**LLM/Transformer Context:**
- LLMs are trained using gradient descent on massive datasets; here, you'll do it for XOR.

### Task:
- Scaffold a function for the training loop: forward pass, loss, backward pass, parameter update.
- Add a docstring explaining each step.

In [None]:
def train_xor_network(X, y, params, activation_fn, loss_fn, lr, epochs):
    """
    Train the XOR neural network using gradient descent.
    Args:
        X (np.ndarray): Input data.
        y (np.ndarray): True labels.
        params (dict): Network parameters.
        activation_fn (callable): Activation function.
        loss_fn (callable): Loss function.
        lr (float): Learning rate.
        epochs (int): Number of training epochs.
    Returns:
        dict: Trained parameters.
    """
    # TODO: Implement the training loop (forward, loss, backward, update)
    pass

## 📈 Evaluation: Accuracy

After training, evaluate the model's accuracy on the XOR dataset.

**LLM/Transformer Context:**
- Evaluation metrics (like accuracy, perplexity) are used to measure LLM performance.

### Task:
- Scaffold a function to compute accuracy given predictions and true labels.
- Add a docstring explaining its use.

In [None]:
def compute_accuracy(preds, targets):
    """
    Compute accuracy for binary predictions.
    Args:
        preds (np.ndarray): Predicted probabilities or logits.
        targets (np.ndarray): True labels.
    Returns:
        float: Accuracy (0 to 1)
    """
    # TODO: Implement accuracy computation
    pass

## 🧠 Final Summary: Why XOR Matters for LLMs

- XOR demonstrates the need for nonlinearity and multiple layers—core ideas in LLMs and transformers.
- Training and evaluating on XOR builds intuition for how deep models learn complex patterns.
- The same principles scale up to the massive neural networks used in language models.

In the next notebook, you'll explore optimization algorithms that make training deep networks efficient!