Fast CSV parser in C — 3–5× faster than Python
Memory-mapped, zero-copy, lightweight
Install • Usage • Benchmarks • Platform • Contribute
Parsing large CSV files in Python is slow due to interpreter overhead.
fast-csv-parser uses memory mapping (mmap) and zero-copy parsing to read millions of rows in seconds, not minutes.
- No per-field allocation
- No Python overhead
- CLI + embeddable library
- Built for Linux & WSL2
| Feature | Status |
|---|---|
| Memory-mapped I/O | Done |
| Zero-copy field extraction | Done |
| Custom delimiter & quote | Done |
| CLI + library | Done |
| Docker build | Done |
| GitHub Actions CI | Done |
| Unit tests | Done |
| OS | Status | Notes |
|---|---|---|
| Linux | Supported | Ubuntu 22.04+ |
| WSL2 | Supported | Recommended for Windows |
| macOS | Experimental | Not tested |
| Windows | Not Supported | Use WSL2 |
Pro tip: On Windows, use WSL2 — full Linux environment, zero changes needed.
# 1. Install WSL2 (run in PowerShell as Admin)
wsl --install -d Ubuntu
# 2. Open WSL terminal
git clone https://github.com/rougebyt/fast-csv-parser.git
cd fast-csv-parser
# 3. Install dependencies
sudo apt update && sudo apt install build-essential python3 python3-pip -y
# 4. Build & run
make
./csvparse examples/sample.csvgit clone https://github.com/rougebyt/fast-csv-parser.git
cd fast-csv-parser
# Install dependencies
sudo apt update
sudo apt install build-essential python3 python3-pip -y
# Build
makeBinary:
./csvparse
docker build -t fast-csv-parser .
# Run tests
docker run --rm fast-csv-parser make test
# Run CLI with local file
docker run --rm -v $(pwd)/examples:/data fast-csv-parser ./csvparse /data/sample.csv# Basic
./csvparse examples/sample.csv
# Custom delimiter
./csvparse data.csv --delimiter ';'
# Custom quote
./csvparse data.csv --quote "'"
# Print only specific columns (0-indexed)
./csvparse data.csv --columns 0,2#include "include/csv_parser.h"
int main() {
CSVParser *parser = csv_parser_new("examples/sample.csv", ',', '"');
CSVRow *row;
while ((row = csv_parser_next(parser)) != NULL) {
printf("Name: %s, Age: %s\n",
row->fields[0], row->fields[1]);
}
csv_parser_free(parser);
return 0;
}$ make bench1M rows × 5 cols (~57 MB CSV)
| Parser | Time | Speedup |
|---|---|---|
Python csv |
4.75s | 1.0× |
| C Parser | 1.70s | 2.8× |
Realistic speedup under fair I/O conditions
Zero-copy +mmap= still massive memory savings
Seeexamples/benchmark.py
src/ → Core parser + CLI
include/ → Header for library use
examples/ → Sample data + benchmark
tests/ → Unit tests
Dockerfile → Containerized build
.github/ → CI/CD
- Fork it
- Create your branch (
git checkout -b feature/fast) - Commit (
git commit -m 'Add SIMD support') - Push (
git push origin feature/fast) - Open a Pull Request
Moibon Dereje
- GitHub: @rougebyt
- X: @rougeByt
- Email: moibonthebest@gmail.com
MIT © Moibon Dereje