Neural Networks from Scratch (NumPy)

A modular deep learning library built from scratch using only NumPy. This project implements a sequential model API, a variety of layers (Linear, ReLU, BatchNorm), loss functions (MSE, SoftmaxCrossEntropy), and a robust training Solver to create and train multi-layer perceptrons for both classification and regression.

This project was developed for a Deep Learning course to demonstrate a foundational understanding of neural network mechanics, from forward propagation to backpropagation and optimization.

Features

Object-Oriented Design: A clean, "PyTorch-like" API with Layer, Loss, and Sequential base classes.
Modular Layers: Easily stack layers, including Linear, ReLU, Sigmoid, and BatchNorm.
Robust Training: A Solver class that handles all training, validation, and hyperparameter logic.
Optimizers: Includes sgd and sgd_momentum update rules.
Versatile: Capable of handling both classification (with SoftmaxCrossEntropyLoss) and regression (with MSELoss) tasks.
Utilities: Comes with data loaders for MNIST, Fashion-MNIST, and California Housing, plus a numerical gradient checker for debugging.

Core Concepts & Techniques

Backpropagation: All layer gradients are analytically derived and implemented from scratch.
Batch Normalization: Implemented as a layer with distinct train and test modes to stabilize training.
Numerical Stability: Uses a combined SoftmaxCrossEntropyLoss to prevent overflow/underflow issues.
Modular Architecture: The Sequential model is decoupled from the Solver, promoting clean code and reusability.
Logging & CLI: All training scripts use argparse for hyperparameter tuning and logging to save results to files.

How It Works

This library is composed of several core modules that work together to train a network.

1. Core Logic & Architecture

The project is built around two main components: the Sequential model and the Solver.

src/model.py (Sequential): This class acts as a container. You initialize it with a list of Layer objects and a Loss object. It is responsible for:
- Collecting all learnable parameters (weights, biases, gamma, beta) from its layers into a central model.params dictionary.
- Performing a full forward pass by calling layer.forward() sequentially.
- Performing a full backward pass by calling layer.backward() in reverse.
- Computing the total loss (data loss + regularization).
src/solver.py (Solver): This is the training engine. You give it the model and a data dictionary. It handles:
- The main training loop (epochs, iterations).
- Creating minibatches of data.
- Calling model.compute_loss() to get the loss and gradients.
- Calling the optimizer (e.g., sgd_momentum) to update every parameter in model.params.
- Tracking loss history, validation metrics, and saving the best model.

2. Mathematical Foundations: Backpropagation

Our network is built on backpropagation, which is a practical application of the chain rule from calculus. To update a weight W, we must find how the final Loss $L$ changes with respect to W (i.e., $\frac{\partial L}{\partial W}$).

For a simple layer $y = f(x, W)$, the chain rule states:

$$\frac{\partial L}{\partial W} = \frac{\partial L}{\partial y} \cdot \frac{\partial y}{\partial W}$$

Here, $\frac{\partial L}{\partial y}$ is the "upstream gradient" (coming from the next layer) and $\frac{\partial y}{\partial W}$ is the "local gradient" (the derivative of the current layer). Each layer's backward() pass computes its local gradients, multiplies them by the upstream gradient, and passes the result $\frac{\partial L}{\partial x}$ downstream to the previous layer.

3. Core Implementations (The Math)

Linear Layer

Forward: $y = xW + b$
Backward: The layer receives the upstream gradient $\frac{\partial L}{\partial y}$ and computes three things:
- $\frac{\partial L}{\partial W} = x^T \cdot \frac{\partial L}{\partial y}$ (Gradient for weights)
- $\frac{\partial L}{\partial b} = \sum \frac{\partial L}{\partial y}$ (Gradient for biases)
- $\frac{\partial L}{\partial x} = \frac{\partial L}{\partial y} \cdot W^T$ (Downstream gradient to pass to the next layer)

ReLU Activation

Forward: $f(x) = \max(0, x)$
Backward: The local gradient is a simple gate: it is $1$ if $x > 0$ and $0$ otherwise. This means gradients only flow through neurons that were "active" during the forward pass.
- $\frac{\partial L}{\partial x} = \frac{\partial L}{\partial y} \cdot (x > 0)$

Batch Normalization

Forward (Train): Normalizes activations within a batch $B$:
1. $\mu_B = \frac{1}{m} \sum_{i \in B} x_i$ (Find batch mean)
2. $\sigma^2_B = \frac{1}{m} \sum_{i \in B} (x_i - \mu_B)^2$ (Find batch variance)
3. $\hat{x_i} = \frac{x_i - \mu_B}{\sqrt{\sigma^2_B + \epsilon}}$ (Normalize)
4. $y_i = \gamma \hat{x_i} + \beta$ (Scale and shift)
Backward: This is the most complex backward pass, as the gradient $\frac{\partial L}{\partial y}$ must be propagated back through $\gamma$, $\beta$, and the normalization statistics ($\mu_B$, $\sigma^2_B$) to the input $x$.

Softmax Cross-Entropy Loss

For numerical stability, we combine the final activation and the loss function.

Forward:
1. Softmax: $P_i = \frac{e^{z_i}}{\sum e^{z_j}}$ (Converts raw scores/logits $z$ to probabilities $P$).
2. Cross-Entropy: $L = - \frac{1}{N} \sum y_i \log(P_i)$ (Calculates loss, where $y_i$ is 1 for the true class).
Backward: When combined, the derivative $\frac{\partial L}{\partial z}$ simplifies to a clean, stable expression that is perfect for starting backpropagation:
- $\frac{\partial L}{\partial z} = \frac{1}{N} (P - Y_{onehot})$ (where $Y_{onehot}$ is the one-hot encoded target vector).

Project Structure

numpy-neural-network/
├── .gitignore                 # Standard Python .gitignore
├── LICENSE                    # MIT License
├── README.md                  # This readme file
├── requirements.txt           # Project dependencies (numpy, sklearn)
├── notebook.ipynb             # Jupyter Notebook for demonstration
├── logs/                      # Directory for output log files
│   └── .gitkeep
├── src/                       # Main library source code
│   ├── __init__.py
│   ├── layers.py              # Layer implementations (Linear, ReLU, BN)
│   ├── losses.py              # Loss functions (MSE, SoftmaxCrossEntropy)
│   ├── model.py               # Sequential model class
│   ├── optimizer.py           # Update rules (SGD, Momentum)
│   ├── solver.py              # The Solver training class
│   └── utils/                 # Helper modules
│       ├── __init__.py
│       ├── data_utils.py      # Data loading (MNIST, etc.)
│       ├── gradient_check.py  # Numerical gradient checker
│       └── logger.py          # Logging setup
└── scripts/                   # Runnable training scripts
    ├── __init__.py
    ├── check_gradients.py     # Script to debug layer gradients
    ├── train_mnist.py         # Script to train on MNIST
    ├── train_fashion_mnist.py # Script to train on Fashion-MNIST
    └── train_regression.py    # Script to train on California Housing

How to Use

Clone the Repository:

git clone https://github.com/msmrexe/numpy-neural-network.git
cd numpy-neural-network

Set up the Environment: (Recommended to use a virtual environment)

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Run a Training Script: The scripts/ folder contains ready-to-run training scripts. You can use argparse to change hyperparameters.

Example: Train on MNIST
```
python scripts/train_mnist.py --epochs 10 --lr 0.01 --batch_size 128
```
- Logs will be saved to logs/train_mnist.log.
- Progress will be printed to the console.
Example: Train on California Housing (Regression)
```
python scripts/train_regression.py --epochs 30 --lr 0.005
```
- Logs will be saved to logs/train_regression.log.
Run the Demonstration Notebook: For a detailed breakdown and manual, step-by-step example of how to use the library, open the Jupyter Notebook:
```
jupyter notebook notebook.ipynb
```
Check Layer Gradients (for Debugging): You can verify that all backward() passes are implemented correctly by running the gradient checker.
```
python scripts/check_gradients.py
```
- You should see very small relative errors (e.g., < 1e-7) for all parameters.

Author

Feel free to connect or reach out if you have any questions!

Maryam Rezaee
GitHub: @msmrexe
Email: ms.maryamrezaee@gmail.com

License

This project is licensed under the MIT License. See the LICENSE file for full details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Neural Networks from Scratch (NumPy)

Features

Core Concepts & Techniques

How It Works

1. Core Logic & Architecture

2. Mathematical Foundations: Backpropagation

3. Core Implementations (The Math)

Linear Layer

ReLU Activation

Batch Normalization

Softmax Cross-Entropy Loss

Project Structure

How to Use

Author

License

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
logs		logs
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
notebook.ipynb		notebook.ipynb
requirements.txt		requirements.txt

License

msmrexe/numpy-neural-network

Folders and files

Latest commit

History

Repository files navigation

Neural Networks from Scratch (NumPy)

Features

Core Concepts & Techniques

How It Works

1. Core Logic & Architecture

2. Mathematical Foundations: Backpropagation

3. Core Implementations (The Math)

Linear Layer

ReLU Activation

Batch Normalization

Softmax Cross-Entropy Loss

Project Structure

How to Use

Author

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages