Skip to content

rougebyt/fast-csv-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fast-csv-parser

Fast CSV parser in C — 3–5× faster than Python
Memory-mapped, zero-copy, lightweight

InstallUsageBenchmarksPlatformContribute


Why This Exists

Parsing large CSV files in Python is slow due to interpreter overhead.
fast-csv-parser uses memory mapping (mmap) and zero-copy parsing to read millions of rows in seconds, not minutes.

  • No per-field allocation
  • No Python overhead
  • CLI + embeddable library
  • Built for Linux & WSL2

Features

Feature Status
Memory-mapped I/O Done
Zero-copy field extraction Done
Custom delimiter & quote Done
CLI + library Done
Docker build Done
GitHub Actions CI Done
Unit tests Done

Platform Support

OS Status Notes
Linux Supported Ubuntu 22.04+
WSL2 Supported Recommended for Windows
macOS Experimental Not tested
Windows Not Supported Use WSL2

Pro tip: On Windows, use WSL2 — full Linux environment, zero changes needed.

Windows + WSL2 Setup

# 1. Install WSL2 (run in PowerShell as Admin)
wsl --install -d Ubuntu

# 2. Open WSL terminal
git clone https://github.com/rougebyt/fast-csv-parser.git
cd fast-csv-parser

# 3. Install dependencies
sudo apt update && sudo apt install build-essential python3 python3-pip -y

# 4. Build & run
make
./csvparse examples/sample.csv

Installation

Option 1: Direct (Linux / WSL2)

git clone https://github.com/rougebyt/fast-csv-parser.git
cd fast-csv-parser

# Install dependencies
sudo apt update
sudo apt install build-essential python3 python3-pip -y

# Build
make

Binary: ./csvparse


Option 2: Docker

docker build -t fast-csv-parser .

# Run tests
docker run --rm fast-csv-parser make test

# Run CLI with local file
docker run --rm -v $(pwd)/examples:/data fast-csv-parser ./csvparse /data/sample.csv

Usage

CLI

# Basic
./csvparse examples/sample.csv

# Custom delimiter
./csvparse data.csv --delimiter ';'

# Custom quote
./csvparse data.csv --quote "'"

# Print only specific columns (0-indexed)
./csvparse data.csv --columns 0,2

As Library

#include "include/csv_parser.h"

int main() {
    CSVParser *parser = csv_parser_new("examples/sample.csv", ',', '"');
    CSVRow *row;
    while ((row = csv_parser_next(parser)) != NULL) {
        printf("Name: %s, Age: %s\n",
               row->fields[0], row->fields[1]);
    }
    csv_parser_free(parser);
    return 0;
}

Benchmarks

$ make bench

1M rows × 5 cols (~57 MB CSV)

Parser Time Speedup
Python csv 4.75s 1.0×
C Parser 1.70s 2.8×

Realistic speedup under fair I/O conditions
Zero-copy + mmap = still massive memory savings
See examples/benchmark.py


Project Structure

src/           → Core parser + CLI
include/       → Header for library use
examples/      → Sample data + benchmark
tests/         → Unit tests
Dockerfile     → Containerized build
.github/       → CI/CD

Contributing

  1. Fork it
  2. Create your branch (git checkout -b feature/fast)
  3. Commit (git commit -m 'Add SIMD support')
  4. Push (git push origin feature/fast)
  5. Open a Pull Request

Author

Moibon Dereje


License

MIT © Moibon Dereje




Built with performance in mind.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published