Mahesh Chebrolu MS in Computer Science (Conc. Machine Learning) @ George Mason University
A "from-scratch" deep learning implementation and deployment pipeline. This project moves beyond standard high-level frameworks to implement the underlying mathematics of neural networks, optimized for edge-case deployment through Post-Training Quantization (PTQ).
The goal was to build a robust digit recognition engine using only NumPy for the core logic. By avoiding PyTorch or TensorFlow, this project demonstrates a deep understanding of vectorized backpropagation, weight initialization strategies, and memory optimization techniques.
- Vectorized Math Engine: Implemented Forward and Backward propagation from scratch using Matrix Calculus.
- Mini-Batch Gradient Descent: Optimized for hardware alignment and faster convergence compared to vanilla SGD.
- 8-Bit Quantization: Developed a custom mapping logic to compress 64-bit float weights into 1-byte integers, achieving a 92.5% reduction in model size.
-
L2 Regularization: Integrated weight decay (
$\lambda = 0.1$ ) to mitigate overfitting, resulting in a near-zero generalization gap. - Production Deployment: A real-time Streamlit dashboard featuring image preprocessing and dilation for real-world mouse-drawn inputs.
During development, we encountered the Quantization Paradox: the 1-byte model slightly outperformed the full-precision version. This suggests that the discretization of weights acted as an implicit regularizer, filtering out low-magnitude noise.
| Metric | Float64 (8-byte) | Int8 (1-byte) | Change |
|---|---|---|---|
| Accuracy (Test Set) | 96.00% | 96.09% | +0.09% |
| Model Size | 383.82 KB | 28.94 KB | -92.5% |
| Inference Latency | Baseline | Optimized | 🔥 |
Problem: High exponents in the Softmax layer led to NaN errors (overflow).
Solution: Implemented Numerically Stable Softmax by subtracting the maximum value from the input vector before exponentiation:
Problem: The model achieved 96% test accuracy but struggled with "thin" mouse drawings in the Streamlit app. Solution: Integrated a Computer Vision Preprocessing Layer using OpenCV. By applying Morphological Dilation, we "fattened" user inputs to match the stroke thickness of the original MNIST training set, significantly increasing real-world reliability.
MNIST_Engine/
├── app.py # Streamlit Web Interface
├── requirements.txt # Dependency List
├── data/ # MNIST raw byte files (IDX format)
├── models/ # Saved Model Artifacts (.npz)
├── assets/ # Documentation media (GIFs/Images)
└── src/ # Core Package
├── model.py # Math Engine (Init, Prop, Save)
├── train.py # Training Pipeline & Mini-batch logic
├── test.py # Evaluation & Comparison Benchmarks
├── quantize.py # PTQ: Float → Int8 conversion
└── utils.py # Activation functions & Metrics
---
## 🚦 Getting Started
### Prerequisites
- **Python 3.9+**
- Optimized for MacBook (M-series Silicon) via NumPy
### Data setup (required for training)
The pipeline expects MNIST IDX files in a `data/` folder. **Filenames must match exactly** (including spaces and case).
1. **Create the data folder** (from project root):
```bash
mkdir -p data
cd data
-
Download the official MNIST files from Yann LeCun's site:
-
Decompress (in
data/):gzip -d train-images-idx3-ubyte.gz train-labels-idx1-ubyte.gz t10k-images-idx3-ubyte.gz t10k-labels-idx1-ubyte.gz
-
Rename so the loader finds them (run from
data/):mv train-labels-idx1-ubyte "MNIST Train Labels.idx1-ubyte" mv train-images-idx3-ubyte "Train Images.idx3-ubyte" mv t10k-labels-idx1-ubyte "MNIST t10k labels.idx1-ubyte" mv t10k-images-idx3-ubyte "mnist_digits.idx3-ubyte"
-
Verify: from project root,
data/should contain exactly these four files (no.gzleft):MNIST Train Labels.idx1-ubyteTrain Images.idx3-ubyteMNIST t10k labels.idx1-ubytemnist_digits.idx3-ubyte
Then run training (see below).
Clone the repo:
git clone https://github.com/your-username/MNIST_Engine.git
cd MNIST_EngineInstall dependencies:
pip install -r requirements.txtRun from the project root (the folder containing app.py and src/).
To train & quantize (saves both float64 and int8 models to models/; requires MNIST data in data/):
python -m src.trainTo run benchmarks (compares float64 vs int8 accuracy and size; requires models/ from training):
python -m src.testTo launch the web app (uses models/mnist_model_int8.npz; run train first if missing):
streamlit run app.py