A modular deep learning library built from scratch using only NumPy. This project implements a sequential model API, a variety of layers (Linear, ReLU, BatchNorm), loss functions (MSE, SoftmaxCrossEntropy), and a robust training Solver to create and train multi-layer perceptrons for both classification and regression.
This project was developed for a Deep Learning course to demonstrate a foundational understanding of neural network mechanics, from forward propagation to backpropagation and optimization.
- Object-Oriented Design: A clean, "PyTorch-like" API with
Layer,Loss, andSequentialbase classes. - Modular Layers: Easily stack layers, including
Linear,ReLU,Sigmoid, andBatchNorm. - Robust Training: A
Solverclass that handles all training, validation, and hyperparameter logic. - Optimizers: Includes
sgdandsgd_momentumupdate rules. - Versatile: Capable of handling both
classification(withSoftmaxCrossEntropyLoss) andregression(withMSELoss) tasks. - Utilities: Comes with data loaders for MNIST, Fashion-MNIST, and California Housing, plus a numerical gradient checker for debugging.
- Backpropagation: All layer gradients are analytically derived and implemented from scratch.
- Batch Normalization: Implemented as a layer with distinct
trainandtestmodes to stabilize training. - Numerical Stability: Uses a combined
SoftmaxCrossEntropyLossto prevent overflow/underflow issues. - Modular Architecture: The
Sequentialmodel is decoupled from theSolver, promoting clean code and reusability. - Logging & CLI: All training scripts use
argparsefor hyperparameter tuning andloggingto save results to files.
This library is composed of several core modules that work together to train a network.
The project is built around two main components: the Sequential model and the Solver.
-
src/model.py(Sequential): This class acts as a container. You initialize it with a list ofLayerobjects and aLossobject. It is responsible for:- Collecting all learnable parameters (weights, biases, gamma, beta) from its layers into a central
model.paramsdictionary. - Performing a full forward pass by calling
layer.forward()sequentially. - Performing a full backward pass by calling
layer.backward()in reverse. - Computing the total loss (data loss + regularization).
- Collecting all learnable parameters (weights, biases, gamma, beta) from its layers into a central
-
src/solver.py(Solver): This is the training engine. You give it themodeland adatadictionary. It handles:- The main training loop (epochs, iterations).
- Creating minibatches of data.
- Calling
model.compute_loss()to get the loss and gradients. - Calling the optimizer (e.g.,
sgd_momentum) to update every parameter inmodel.params. - Tracking loss history, validation metrics, and saving the best model.
Our network is built on backpropagation, which is a practical application of the chain rule from calculus. To update a weight W, we must find how the final Loss W (i.e.,
For a simple layer
Here, backward() pass computes its local gradients, multiplies them by the upstream gradient, and passes the result
-
Forward:
$y = xW + b$ -
Backward: The layer receives the upstream gradient
$\frac{\partial L}{\partial y}$ and computes three things:-
$\frac{\partial L}{\partial W} = x^T \cdot \frac{\partial L}{\partial y}$ (Gradient for weights) -
$\frac{\partial L}{\partial b} = \sum \frac{\partial L}{\partial y}$ (Gradient for biases) -
$\frac{\partial L}{\partial x} = \frac{\partial L}{\partial y} \cdot W^T$ (Downstream gradient to pass to the next layer)
-
-
Forward:
$f(x) = \max(0, x)$ -
Backward: The local gradient is a simple gate: it is
$1$ if$x > 0$ and$0$ otherwise. This means gradients only flow through neurons that were "active" during the forward pass.$\frac{\partial L}{\partial x} = \frac{\partial L}{\partial y} \cdot (x > 0)$
-
Forward (Train): Normalizes activations within a batch
$B$ :-
$\mu_B = \frac{1}{m} \sum_{i \in B} x_i$ (Find batch mean) -
$\sigma^2_B = \frac{1}{m} \sum_{i \in B} (x_i - \mu_B)^2$ (Find batch variance) -
$\hat{x_i} = \frac{x_i - \mu_B}{\sqrt{\sigma^2_B + \epsilon}}$ (Normalize) -
$y_i = \gamma \hat{x_i} + \beta$ (Scale and shift)
-
-
Backward: This is the most complex backward pass, as the gradient
$\frac{\partial L}{\partial y}$ must be propagated back through$\gamma$ ,$\beta$ , and the normalization statistics ($\mu_B$ ,$\sigma^2_B$ ) to the input$x$ .
For numerical stability, we combine the final activation and the loss function.
-
Forward:
-
Softmax:
$P_i = \frac{e^{z_i}}{\sum e^{z_j}}$ (Converts raw scores/logits$z$ to probabilities$P$ ). -
Cross-Entropy:
$L = - \frac{1}{N} \sum y_i \log(P_i)$ (Calculates loss, where$y_i$ is 1 for the true class).
-
Softmax:
-
Backward: When combined, the derivative
$\frac{\partial L}{\partial z}$ simplifies to a clean, stable expression that is perfect for starting backpropagation:-
$\frac{\partial L}{\partial z} = \frac{1}{N} (P - Y_{onehot})$ (where$Y_{onehot}$ is the one-hot encoded target vector).
-
numpy-neural-network/
├── .gitignore # Standard Python .gitignore
├── LICENSE # MIT License
├── README.md # This readme file
├── requirements.txt # Project dependencies (numpy, sklearn)
├── notebook.ipynb # Jupyter Notebook for demonstration
├── logs/ # Directory for output log files
│ └── .gitkeep
├── src/ # Main library source code
│ ├── __init__.py
│ ├── layers.py # Layer implementations (Linear, ReLU, BN)
│ ├── losses.py # Loss functions (MSE, SoftmaxCrossEntropy)
│ ├── model.py # Sequential model class
│ ├── optimizer.py # Update rules (SGD, Momentum)
│ ├── solver.py # The Solver training class
│ └── utils/ # Helper modules
│ ├── __init__.py
│ ├── data_utils.py # Data loading (MNIST, etc.)
│ ├── gradient_check.py # Numerical gradient checker
│ └── logger.py # Logging setup
└── scripts/ # Runnable training scripts
├── __init__.py
├── check_gradients.py # Script to debug layer gradients
├── train_mnist.py # Script to train on MNIST
├── train_fashion_mnist.py # Script to train on Fashion-MNIST
└── train_regression.py # Script to train on California Housing
-
Clone the Repository:
git clone https://github.com/msmrexe/numpy-neural-network.git cd numpy-neural-network -
Set up the Environment: (Recommended to use a virtual environment)
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt
-
Run a Training Script: The
scripts/folder contains ready-to-run training scripts. You can useargparseto change hyperparameters.Example: Train on MNIST
python scripts/train_mnist.py --epochs 10 --lr 0.01 --batch_size 128
- Logs will be saved to
logs/train_mnist.log. - Progress will be printed to the console.
Example: Train on California Housing (Regression)
python scripts/train_regression.py --epochs 30 --lr 0.005
- Logs will be saved to
logs/train_regression.log.
- Logs will be saved to
-
Run the Demonstration Notebook: For a detailed breakdown and manual, step-by-step example of how to use the library, open the Jupyter Notebook:
jupyter notebook notebook.ipynb
-
Check Layer Gradients (for Debugging): You can verify that all
backward()passes are implemented correctly by running the gradient checker.python scripts/check_gradients.py
- You should see very small relative errors (e.g.,
< 1e-7) for all parameters.
- You should see very small relative errors (e.g.,
Feel free to connect or reach out if you have any questions!
- Maryam Rezaee
- GitHub: @msmrexe
- Email: ms.maryamrezaee@gmail.com
This project is licensed under the MIT License. See the LICENSE file for full details.