Skip to content

v0.10.0 — pandas-parity analytics, zero dependencies

Latest

Choose a tag to compare

@hussain-alsaibai hussain-alsaibai released this 28 Jun 17:10
461b35a

SnapDB v0.10.0 brings the columnar engine to performance parity with pandas/NumPy on analytics — while staying zero-dependency by default (NumPy is optional) and the lightest-memory store of the field. Covers everything since v0.7.0. Every change was adversarially reviewed (findings reproduced before fixing); CI is green across Linux 3.9–3.13 and Windows.

⚡ Performance — now competitive with pandas

Workload (100K rows) Earlier v0.10.0 vs pandas
Full-scan aggregate (sum/min/max/avg) ~58M rows/s ~530M rows/s on par
Numeric filtered count (count_where) ~314M rows/s on par
Row-store bulk insert ~17K rows/s ~287K rows/s ~SQLite/pandas ballpark
Memory footprint 2.2 MB ~5× lighter than pandas
  • NumPy-accelerated aggregates (v0.8.0, #14) — aggregate() runs over the zero-copy column buffer when NumPy is installed (~13–27×). Exact parity with the pure-Python path; encoded columns and 64-bit-int sums fall through to the exact path. use_numpy=False forces pure-Python.
  • NumPy-accelerated filters + count_where() (v0.9.0, #14) — select_where() builds masks vectorially; the new count_where() does filtered counts with no row materialization (~166× on numeric predicates). f32/limit=0 parity fixes.
  • ~26× faster row-store bulk insert (v0.10.0, #13) — batch_insert() grows the file in a single truncate+remap for the whole batch instead of one per slab. On-disk format and durability unchanged.

🛡️ Correctness & durability

  • Frame-of-Reference encoding hardened — 7 data-corruption bugs fixed (out-of-range widening, post-activation nulls, NumPy export, delta+FOR combo, unique_count, update→None, duplicate eligibility set).
  • Transaction atomicitybatch_insert() inside a transaction() now rolls back correctly (it previously left phantom rows).
  • Cross-platform test hygiene — pytest tests/ is green on Windows as well as Linux.

🗺️ Remaining roadmap

#12 (low-overhead sys.monitoring profiler — observability) and further filter acceleration for string/dict-encoded columns.

Closes #13, #14. Zero runtime dependencies; pip install snapdb[numpy] enables the accelerated paths.