SnapDB v0.10.0 brings the columnar engine to performance parity with pandas/NumPy on analytics — while staying zero-dependency by default (NumPy is optional) and the lightest-memory store of the field. Covers everything since v0.7.0. Every change was adversarially reviewed (findings reproduced before fixing); CI is green across Linux 3.9–3.13 and Windows.
⚡ Performance — now competitive with pandas
| Workload (100K rows) | Earlier | v0.10.0 | vs pandas |
|---|---|---|---|
Full-scan aggregate (sum/min/max/avg) |
~58M rows/s | ~530M rows/s | on par |
Numeric filtered count (count_where) |
— | ~314M rows/s | on par |
| Row-store bulk insert | ~17K rows/s | ~287K rows/s | ~SQLite/pandas ballpark |
| Memory footprint | — | 2.2 MB | ~5× lighter than pandas |
- NumPy-accelerated aggregates (v0.8.0, #14) —
aggregate()runs over the zero-copy column buffer when NumPy is installed (~13–27×). Exact parity with the pure-Python path; encoded columns and 64-bit-int sums fall through to the exact path.use_numpy=Falseforces pure-Python. - NumPy-accelerated filters +
count_where()(v0.9.0, #14) —select_where()builds masks vectorially; the newcount_where()does filtered counts with no row materialization (~166× on numeric predicates). f32/limit=0 parity fixes. - ~26× faster row-store bulk insert (v0.10.0, #13) —
batch_insert()grows the file in a single truncate+remap for the whole batch instead of one per slab. On-disk format and durability unchanged.
🛡️ Correctness & durability
- Frame-of-Reference encoding hardened — 7 data-corruption bugs fixed (out-of-range widening, post-activation nulls, NumPy export, delta+FOR combo,
unique_count,update→None, duplicate eligibility set). - Transaction atomicity —
batch_insert()inside atransaction()now rolls back correctly (it previously left phantom rows). - Cross-platform test hygiene —
pytest tests/is green on Windows as well as Linux.
🗺️ Remaining roadmap
#12 (low-overhead sys.monitoring profiler — observability) and further filter acceleration for string/dict-encoded columns.
Closes #13, #14. Zero runtime dependencies; pip install snapdb[numpy] enables the accelerated paths.