Skip to content

Releases: hussain-alsaibai/snapdb

v0.10.0 — pandas-parity analytics, zero dependencies

28 Jun 17:10
461b35a

Choose a tag to compare

SnapDB v0.10.0 brings the columnar engine to performance parity with pandas/NumPy on analytics — while staying zero-dependency by default (NumPy is optional) and the lightest-memory store of the field. Covers everything since v0.7.0. Every change was adversarially reviewed (findings reproduced before fixing); CI is green across Linux 3.9–3.13 and Windows.

⚡ Performance — now competitive with pandas

Workload (100K rows) Earlier v0.10.0 vs pandas
Full-scan aggregate (sum/min/max/avg) ~58M rows/s ~530M rows/s on par
Numeric filtered count (count_where) ~314M rows/s on par
Row-store bulk insert ~17K rows/s ~287K rows/s ~SQLite/pandas ballpark
Memory footprint 2.2 MB ~5× lighter than pandas
  • NumPy-accelerated aggregates (v0.8.0, #14) — aggregate() runs over the zero-copy column buffer when NumPy is installed (~13–27×). Exact parity with the pure-Python path; encoded columns and 64-bit-int sums fall through to the exact path. use_numpy=False forces pure-Python.
  • NumPy-accelerated filters + count_where() (v0.9.0, #14) — select_where() builds masks vectorially; the new count_where() does filtered counts with no row materialization (~166× on numeric predicates). f32/limit=0 parity fixes.
  • ~26× faster row-store bulk insert (v0.10.0, #13) — batch_insert() grows the file in a single truncate+remap for the whole batch instead of one per slab. On-disk format and durability unchanged.

🛡️ Correctness & durability

  • Frame-of-Reference encoding hardened — 7 data-corruption bugs fixed (out-of-range widening, post-activation nulls, NumPy export, delta+FOR combo, unique_count, update→None, duplicate eligibility set).
  • Transaction atomicitybatch_insert() inside a transaction() now rolls back correctly (it previously left phantom rows).
  • Cross-platform test hygiene — pytest tests/ is green on Windows as well as Linux.

🗺️ Remaining roadmap

#12 (low-overhead sys.monitoring profiler — observability) and further filter acceleration for string/dict-encoded columns.

Closes #13, #14. Zero runtime dependencies; pip install snapdb[numpy] enables the accelerated paths.

v0.7.0 — Frame-of-Reference + Bit Packing

28 Jun 09:16

Choose a tag to compare

What's New

Frame-of-Reference (FOR) Encoding

  • 4–8× memory reduction for bounded numeric columns (ages, scores, ratings)
  • Stores min once, bit-packs deltas into Python int bitmask
  • Auto-detects after sampling threshold (default 50 rows)
  • Auto-fallback when range exceeds 16 bits
  • Per-column via for_columns=[] on ColumnarTable
  • Transparent API — reads return full values, no changes needed
  • Update fallback to raw (same pattern as delta encoding)

Impact

Scenario Raw FOR Reduction
Ages 18-65 (100K rows) 400 KB ~88 KB 4.5×
Scores 0-100 (100K rows) 400 KB ~88 KB 4.5×

Tests

  • 6 new tests added, all passing
  • Zero regressions: 47 existing + 20 dict + 79 delta = 146 tests all pass

Closes

v0.6.0 — performance, correctness, durability & features

28 Jun 07:52

Choose a tag to compare

SnapDB v0.6.0 is a performance, correctness, durability, and feature release. Pure-Python and zero-dependency at runtime (NumPy is optional, only for zero-copy export). Every change was adversarially reviewed (findings reproduced before fixing) and CI is green across Linux (3.9–3.13) and Windows plus a benchmark.

⚡ Performance

  • Delta-encoded column reads are now O(1)/O(n) via a lazy reconstruction cache (previously O(n)/O(n²)) — a 20k-row delta-column scan dropped from ~17.2 s to ~11 ms.
  • Vectorized multi-condition select_where() combines per-column bitmasks with C-speed big-integer AND/OR — ~2× faster than a row-predicate select() on selective queries.
  • Vectorized aggregates (array-level sum/min/max) for null-free numeric columns; __slots__ on hot classes.

🛡️ Correctness & durability

  • Hash indexes are kept in sync on insert / batch_insert / update / delete (they previously went stale after the first build); unified create_index() for row and columnar storage; find() works without a pre-built index; an emptied index is no longer mistaken for a missing one.
  • Delta delete/null corruption fixed — deleting or nulling a delta-encoded row no longer shifts other rows' values.
  • Transaction rollback now actually undoes writes (and restores indexes).
  • Durable multi-slab persistence — the on-disk bitmap/slab geometry and the last slab's high-water mark are now persisted, so reopening a database larger than one slab no longer loses data.
  • Columnar index is now value → set (first-match, consistent under duplicates/deletes and before/after auto-indexing); default to_numpy() returns an independent copy.

✨ Features

  • #4 Vectorized predicates: select_where([(col, op, value), …], combine="and"|"or") (eq/ne/gt/gte/lt/lte/in/between, projection, limit/offset, dict shorthand).
  • #6 Auto-indexing: SnapDB(auto_index=True, auto_index_threshold=N).
  • #7 Zero-copy NumPy export: to_numpy() / to_numpy(zero_copy=True) / column_buffer() (PEP 688), NumPy optional.

🧰 Tooling & docs

  • benchmarks/bench_suite.py — reproducible, cross-platform benchmark with JSON + Markdown output.
  • GitHub Actions CI: ruff lint, pytest matrix (Linux 3.9–3.13 + Windows), benchmark artifact.
  • README refreshed with honest, reproducible numbers: SnapDB columnar is the lowest-memory engine here (~2.2 MB / 100K rows vs SQLite 2.9, pandas 11, dict 22) and ~3.3× faster than in-memory SQLite on full-scan aggregation (NumPy-backed math is faster, stated plainly).

🗺️ Roadmap (tracked issues)

FOR encoding (#11), sys.monitoring profiler (#12), faster row bulk-insert (#13), optional NumPy-accelerated filters/aggregates (#14).

Closes #3, #4, #5, #6, #7. Full diff: see PR #10.

SnapDB v0.5.0 — Delta Encoding & Dictionary Encoding

28 Jun 04:49

Choose a tag to compare

SnapDB v0.5.0 — Delta Encoding for Monotonic Columns

What's New

🚀 Delta Encoding (Issue #1)

  • Transparent delta encoding for monotonic numeric columns (timestamps, IDs, sequences)
  • 1.2× memory reduction for i64 timestamps on 100K rows (2.29 MB → 1.91 MB)
  • Auto-detects monotonicity after 50-sample warmup
  • Auto-fallback to raw storage if non-monotonic data detected
  • Auto-upgrades delta typecode if deltas overflow current width
  • Per-column control: delta_columns=["timestamp", "seq"]

📊 Memory Efficiency Stack (Combined)

Feature Reduction Version
Bit-packed booleans ~8× v0.3.2
Dictionary encoding v0.4.0
Delta encoding 1.2× v0.5.0

Full Changelog

  • ce89679 docs: update README with delta encoding v0.5.0
  • 46636a3 feat: delta encoding for monotonic numeric columns (#1)
  • 13340d7 docs: update README with dictionary encoding v0.4.0
  • 7a9087d feat: dictionary encoding for low-cardinality string columns (#2)

Roadmap

  • #3 Frame-of-Reference + bit packing
  • #4 Vectorized bitmask predicates
  • #5 sys.monitoring profiling
  • #6 Auto-indexing
  • #7 buffer protocol

SnapDB v0.3.2 — Bit-Packed Booleans & Package Reorganization

28 Jun 04:43

Choose a tag to compare

SnapDB v0.3.2 — Bit-Packed Booleans & Package Reorganization

What's New

🚀 Performance

  • Bit-packed boolean storage — Python int bitmask replaces array.array('B'), delivering ~8× memory reduction for boolean columns
  • Precompiled struct format — single struct.pack/unpack per row (1.6–1.9× faster encode/decode)

📦 Packaging

  • Proper Python packagesnapdb/ directory with __init__.py
  • pyproject.toml — PEP 621 compliant packaging config
  • Test suite — moved to tests/ directory
  • Benchmarks — moved to benchmarks/ directory
  • Modular architecturecore.py, columnar.py, metrics.py, index.py, query.py, wal.py, document_store.py

📊 Benchmarks

  • Comprehensive 5-engine comparison: SnapDB vs DuckDB vs SQLite vs Pure Dict
  • DuckDB DataFrame batch insert added
  • Bit-packed boolean benchmarks included

✅ Correctness

  • 47/47 tests passing
  • All v0.2.0 and v0.3.x features verified

Full Changelog

  • 7bc35da v0.3.2: Bit-packed booleans, package reorganization, pyproject.toml
  • 12881c0 v0.3.2: Precompiled struct format, hash index, lookup optimization
  • b708e9c benchmark: Add DuckDB comparison, comprehensive 5-engine suite
  • 19a605b v0.3.1: Batch insert, optimized columnar, comprehensive benchmarks
  • c30d757 v0.3.0: Columnar engine, metrics, CDC

Roadmap

See open issues for upcoming features.

v0.1.0 — Lightning-fast in-memory database

23 Jun 04:15

Choose a tag to compare

SnapDB v0.1.0

Extremely lightweight, lightning-fast in-memory database for Python.

Key Innovations

  • Slab-oriented storage — contiguous pages, CPU cache friendly, no malloc fragmentation
  • Zero-copy readsmemoryview slices into mmap: 1,212,499 reads/sec
  • Single-file — schema + bitmap + data in one .snap file
  • Pure Python, zero dependencies — stdlib only (mmap, struct, os, json)
  • Fixed-width typesi8/16/32/64, u8/16/32/64, f32/64, bool, bytes:N

Performance (100K rows)

Operation Speed
Insert 69,185 rows/sec
Read (decoded dict) 309,698 rows/sec
Read (zero-copy raw) 1,212,499 rows/sec
Sequential scan 389,072 rows/sec

Quick Start

from snapdb import SnapDB, Schema, ColumnDef

schema = Schema([
    ColumnDef("id", "i32"),
    ColumnDef("temp", "f32"),
    ColumnDef("active", "bool"),
])

with SnapDB("data.snap", schema) as db:
    db.insert({"id": 1, "temp": 25.5, "active": True})
    row = db.get(0)           # decoded dict
    raw = db.get_raw(0)       # zero-copy memoryview
    for idx, row in db.query(lambda r: r["active"]):
        print(idx, row)

Files

  • snapdb.py — Core engine (~400 lines)
  • test_snapdb.py — Unit tests
  • quickbench.py — Benchmark script

Design

SnapDB uses a slab allocator: each slab is a fixed-size page (default 4KB) holding all columns for N rows contiguously. A bitmap tracks live vs deleted rows. The file is memory-mapped for persistence and zero-copy access.

No external dependencies. No server. Just a single Python file.