Skip to content

v0.6.0 — performance, correctness, durability & features

Choose a tag to compare

@hussain-alsaibai hussain-alsaibai released this 28 Jun 07:52
· 19 commits to main since this release

SnapDB v0.6.0 is a performance, correctness, durability, and feature release. Pure-Python and zero-dependency at runtime (NumPy is optional, only for zero-copy export). Every change was adversarially reviewed (findings reproduced before fixing) and CI is green across Linux (3.9–3.13) and Windows plus a benchmark.

⚡ Performance

  • Delta-encoded column reads are now O(1)/O(n) via a lazy reconstruction cache (previously O(n)/O(n²)) — a 20k-row delta-column scan dropped from ~17.2 s to ~11 ms.
  • Vectorized multi-condition select_where() combines per-column bitmasks with C-speed big-integer AND/OR — ~2× faster than a row-predicate select() on selective queries.
  • Vectorized aggregates (array-level sum/min/max) for null-free numeric columns; __slots__ on hot classes.

🛡️ Correctness & durability

  • Hash indexes are kept in sync on insert / batch_insert / update / delete (they previously went stale after the first build); unified create_index() for row and columnar storage; find() works without a pre-built index; an emptied index is no longer mistaken for a missing one.
  • Delta delete/null corruption fixed — deleting or nulling a delta-encoded row no longer shifts other rows' values.
  • Transaction rollback now actually undoes writes (and restores indexes).
  • Durable multi-slab persistence — the on-disk bitmap/slab geometry and the last slab's high-water mark are now persisted, so reopening a database larger than one slab no longer loses data.
  • Columnar index is now value → set (first-match, consistent under duplicates/deletes and before/after auto-indexing); default to_numpy() returns an independent copy.

✨ Features

  • #4 Vectorized predicates: select_where([(col, op, value), …], combine="and"|"or") (eq/ne/gt/gte/lt/lte/in/between, projection, limit/offset, dict shorthand).
  • #6 Auto-indexing: SnapDB(auto_index=True, auto_index_threshold=N).
  • #7 Zero-copy NumPy export: to_numpy() / to_numpy(zero_copy=True) / column_buffer() (PEP 688), NumPy optional.

🧰 Tooling & docs

  • benchmarks/bench_suite.py — reproducible, cross-platform benchmark with JSON + Markdown output.
  • GitHub Actions CI: ruff lint, pytest matrix (Linux 3.9–3.13 + Windows), benchmark artifact.
  • README refreshed with honest, reproducible numbers: SnapDB columnar is the lowest-memory engine here (~2.2 MB / 100K rows vs SQLite 2.9, pandas 11, dict 22) and ~3.3× faster than in-memory SQLite on full-scan aggregation (NumPy-backed math is faster, stated plainly).

🗺️ Roadmap (tracked issues)

FOR encoding (#11), sys.monitoring profiler (#12), faster row bulk-insert (#13), optional NumPy-accelerated filters/aggregates (#14).

Closes #3, #4, #5, #6, #7. Full diff: see PR #10.