# NumPy Neural Network Demonstration

This notebook demonstrates how to use the `numpy-neural-network` library, which we have built from scratch.

We will cover:
1.  **Mathematical Foundations**: A brief overview of the math behind our layers.
2.  **Running Scripts**: How to use the pre-built training scripts.
3.  **Manual Experiment**: How to import library components to build, train, and evaluate a custom model.

## Part 1: Mathematical Foundations

Our network is built on the principle of backpropagation, which is a clever application of the chain rule from calculus.

### Backpropagation (The Chain Rule)

Imagine a simple network: `Loss = L(y_pred, y_true)` where `y_pred = f(z)` and `z = g(x, W)`.

To update our weights `W`, we need to find how the `Loss` changes with respect to `W` (i.e., $\frac{\partial L}{\partial W}$). We use the chain rule:

$$\frac{\partial L}{\partial W} = \frac{\partial L}{\partial y_{pred}} \cdot \frac{\partial y_{pred}}{\partial z} \cdot \frac{\partial z}{\partial W}$$

In our code:
1.  `loss_fn.backward()` computes $\frac{\partial L}{\partial y_{pred}}$ (the *upstream gradient*).
2.  This gradient is passed to the `backward()` method of layer `f`, which multiplies it by its local gradient ($\frac{\partial y_{pred}}{\partial z}$) and passes the result ($\frac{\partial L}{\partial z}$) *downstream*.
3.  This new gradient is received by layer `g`, which computes $\frac{\partial L}{\partial W}$ and $\frac{\partial L}{\partial x}$.

### Core Layers and Activations

**Linear Layer:**
-   **Forward:** $y = xW + b$
-   **Backward:** Computes $\frac{\partial L}{\partial W} = x^T \cdot \frac{\partial L}{\partial y}$, $\frac{\partial L}{\partial b} = \sum \frac{\partial L}{\partial y}$, and $\frac{\partial L}{\partial x} = \frac{\partial L}{\partial y} \cdot W^T$.

**ReLU Activation:**
-   **Forward:** $f(x) = \max(0, x)$
-   **Backward:** The local gradient is 1 for $x > 0$ and 0 otherwise. It acts as a "gate," only allowing positive gradients to pass through. This sparsifies the network and helps prevent the vanishing gradient problem.

**Batch Normalization:**
-   **Forward:** Normalizes activations within a batch: $\hat{x} = \frac{x - \mu_B}{\sqrt{\sigma^2_B + \epsilon}}$. Then scales and shifts: $y = \gamma \hat{x} + \beta$.
-   **Backward:** This is the most complex backward pass, as the gradient must flow back through $\gamma$, $\beta$, and the normalization statistics ($\mu_B$ and $\sigma^2_B$) to the input $x$.

### Loss Functions

**MSE (Mean Squared Error) Loss:**
-   **Forward:** $L = \frac{1}{N} \sum (y_{pred} - y_{true})^2$
-   **Backward:** Used for regression. The gradient is $\frac{\partial L}{\partial y_{pred}} = \frac{2}{N} (y_{pred} - y_{true})$.

**Softmax Cross-Entropy Loss:**
-   **Forward:** We combine two functions for numerical stability:
    1.  **Softmax:** $P_i = \frac{e^{z_i}}{\sum e^{z_j}}$ (converts raw scores, or *logits* $z$, to probabilities $P$).
    2.  **Cross-Entropy:** $L = -\frac{1}{N} \sum y_i \log(P_i)$ (penalizes low probabilities for the true class $y_i$).
-   **Backward:** When combined, the gradient is the very simple and stable: $\frac{\partial L}{\partial z} = \frac{1}{N} (P - Y_{onehot})$. This is the initial *upstream gradient*.

## Part 2: Running Pre-built Training Scripts

First, we need to add our `src` directory to the Python path so we can import our library.

In [None]:
import sys
import os

# Add the src directory to the path
# Assumes notebook.ipynb is in the root, and src is in the root
src_path = os.path.abspath(os.path.join(os.getcwd(), 'src'))
if src_path not in sys.path:
    sys.path.insert(0, src_path)

scripts_path = os.path.abspath(os.path.join(os.getcwd(), 'scripts'))
if scripts_path not in sys.path:
    sys.path.insert(0, scripts_path)

# Setup the logger (optional but good practice)
from utils.logger import setup_logger
setup_logger(log_dir="logs", log_file="notebook.log");

You can run the training scripts directly from the command line, or by using `!` in the notebook. These scripts handle all the argument parsing, model building, and training.

In [None]:
# Example: Run the MNIST training script with custom hyperparameters for 5 epochs
!python scripts/train_mnist.py --epochs 5 --lr 0.01 --batch_size 64

In [None]:
# Example: Run the regression script for 10 epochs
!python scripts/train_regression.py --epochs 10 --lr 0.005

## Part 3: Manually Building and Training a Model

This section shows how to use the library components to build a custom experiment from scratch.

In [None]:
# 1. Imports
import numpy as np
import matplotlib.pyplot as plt

from model import Sequential
from layers import Linear, ReLU, BatchNorm
from losses import SoftmaxCrossEntropyLoss
from solver import Solver
from utils.data_utils import load_fashion_mnist

%matplotlib inline

In [None]:
# 2. Load Data
# Let's do a simple binary classification problem:
# Class 0: T-shirt/top
# Class 1: Trouser

X_train, y_train, X_val, y_val, X_test, y_test = load_fashion_mnist(filter_classes=[0, 1])
data = {
    'X_train': X_train, 'y_train': y_train,
    'X_val': X_val, 'y_val': y_val
}

print(f"Input shape: {X_train.shape}")
print(f"Num classes: {len(np.unique(y_train))}")

In [None]:
# 3. Define Model
# A simple model for a simple 2-class problem
input_dim = X_train.shape[1] # 784
output_dim = len(np.unique(y_train)) # 2

model = Sequential(
    layers=[
        Linear(in_dim=input_dim, out_dim=50),
        BatchNorm(dim=50),
        ReLU(),
        Linear(in_dim=50, out_dim=output_dim)
    ],
    loss_fn=SoftmaxCrossEntropyLoss(),
    reg=1e-4
)

In [None]:
# 4. Configure Solver
solver = Solver(
    model, data,
    task_type='classification',
    update_rule='sgd_momentum',
    optim_config={'learning_rate': 1e-3, 'momentum': 0.9},
    lr_decay=0.95,
    num_epochs=5,
    batch_size=32,
    print_every=200, # Print every 200 iterations
    verbose=True
)

In [None]:
# 5. Train!
solver.train()

In [None]:
# 6. Plot Results

plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.title('Training Loss')
plt.plot(solver.loss_history)
plt.xlabel('Iteration')
plt.ylabel('Loss')

plt.subplot(1, 2, 2)
plt.title('Validation Accuracy')
plt.plot(solver.val_metric_history, label='Validation')
plt.plot(solver.train_metric_history, label='Train (sub-sampled)')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
# 7. Check Final Test Accuracy
test_acc = solver.check_metric(X_test, y_test)
print(f"Final Test Accuracy on {len(np.unique(y_test))} classes: {test_acc:.4f}")