Releases: hussain-alsaibai/snapdb
v0.10.0 — pandas-parity analytics, zero dependencies
SnapDB v0.10.0 brings the columnar engine to performance parity with pandas/NumPy on analytics — while staying zero-dependency by default (NumPy is optional) and the lightest-memory store of the field. Covers everything since v0.7.0. Every change was adversarially reviewed (findings reproduced before fixing); CI is green across Linux 3.9–3.13 and Windows.
⚡ Performance — now competitive with pandas
| Workload (100K rows) | Earlier | v0.10.0 | vs pandas |
|---|---|---|---|
Full-scan aggregate (sum/min/max/avg) |
~58M rows/s | ~530M rows/s | on par |
Numeric filtered count (count_where) |
— | ~314M rows/s | on par |
| Row-store bulk insert | ~17K rows/s | ~287K rows/s | ~SQLite/pandas ballpark |
| Memory footprint | — | 2.2 MB | ~5× lighter than pandas |
- NumPy-accelerated aggregates (v0.8.0, #14) —
aggregate()runs over the zero-copy column buffer when NumPy is installed (~13–27×). Exact parity with the pure-Python path; encoded columns and 64-bit-int sums fall through to the exact path.use_numpy=Falseforces pure-Python. - NumPy-accelerated filters +
count_where()(v0.9.0, #14) —select_where()builds masks vectorially; the newcount_where()does filtered counts with no row materialization (~166× on numeric predicates). f32/limit=0 parity fixes. - ~26× faster row-store bulk insert (v0.10.0, #13) —
batch_insert()grows the file in a single truncate+remap for the whole batch instead of one per slab. On-disk format and durability unchanged.
🛡️ Correctness & durability
- Frame-of-Reference encoding hardened — 7 data-corruption bugs fixed (out-of-range widening, post-activation nulls, NumPy export, delta+FOR combo,
unique_count,update→None, duplicate eligibility set). - Transaction atomicity —
batch_insert()inside atransaction()now rolls back correctly (it previously left phantom rows). - Cross-platform test hygiene —
pytest tests/is green on Windows as well as Linux.
🗺️ Remaining roadmap
#12 (low-overhead sys.monitoring profiler — observability) and further filter acceleration for string/dict-encoded columns.
Closes #13, #14. Zero runtime dependencies; pip install snapdb[numpy] enables the accelerated paths.
v0.7.0 — Frame-of-Reference + Bit Packing
What's New
Frame-of-Reference (FOR) Encoding
- 4–8× memory reduction for bounded numeric columns (ages, scores, ratings)
- Stores
minonce, bit-packs deltas into Pythonintbitmask - Auto-detects after sampling threshold (default 50 rows)
- Auto-fallback when range exceeds 16 bits
- Per-column via
for_columns=[]onColumnarTable - Transparent API — reads return full values, no changes needed
- Update fallback to raw (same pattern as delta encoding)
Impact
| Scenario | Raw | FOR | Reduction |
|---|---|---|---|
| Ages 18-65 (100K rows) | 400 KB | ~88 KB | 4.5× |
| Scores 0-100 (100K rows) | 400 KB | ~88 KB | 4.5× |
Tests
- 6 new tests added, all passing
- Zero regressions: 47 existing + 20 dict + 79 delta = 146 tests all pass
Closes
v0.6.0 — performance, correctness, durability & features
SnapDB v0.6.0 is a performance, correctness, durability, and feature release. Pure-Python and zero-dependency at runtime (NumPy is optional, only for zero-copy export). Every change was adversarially reviewed (findings reproduced before fixing) and CI is green across Linux (3.9–3.13) and Windows plus a benchmark.
⚡ Performance
- Delta-encoded column reads are now O(1)/O(n) via a lazy reconstruction cache (previously O(n)/O(n²)) — a 20k-row delta-column scan dropped from ~17.2 s to ~11 ms.
- Vectorized multi-condition
select_where()combines per-column bitmasks with C-speed big-integerAND/OR— ~2× faster than a row-predicateselect()on selective queries. - Vectorized aggregates (array-level
sum/min/max) for null-free numeric columns;__slots__on hot classes.
🛡️ Correctness & durability
- Hash indexes are kept in sync on insert /
batch_insert/ update / delete (they previously went stale after the first build); unifiedcreate_index()for row and columnar storage;find()works without a pre-built index; an emptied index is no longer mistaken for a missing one. - Delta delete/null corruption fixed — deleting or nulling a delta-encoded row no longer shifts other rows' values.
- Transaction rollback now actually undoes writes (and restores indexes).
- Durable multi-slab persistence — the on-disk bitmap/slab geometry and the last slab's high-water mark are now persisted, so reopening a database larger than one slab no longer loses data.
- Columnar index is now
value → set(first-match, consistent under duplicates/deletes and before/after auto-indexing); defaultto_numpy()returns an independent copy.
✨ Features
- #4 Vectorized predicates:
select_where([(col, op, value), …], combine="and"|"or")(eq/ne/gt/gte/lt/lte/in/between, projection, limit/offset, dict shorthand). - #6 Auto-indexing:
SnapDB(auto_index=True, auto_index_threshold=N). - #7 Zero-copy NumPy export:
to_numpy()/to_numpy(zero_copy=True)/column_buffer()(PEP 688), NumPy optional.
🧰 Tooling & docs
benchmarks/bench_suite.py— reproducible, cross-platform benchmark with JSON + Markdown output.- GitHub Actions CI: ruff lint, pytest matrix (Linux 3.9–3.13 + Windows), benchmark artifact.
- README refreshed with honest, reproducible numbers: SnapDB columnar is the lowest-memory engine here (~2.2 MB / 100K rows vs SQLite 2.9, pandas 11, dict 22) and ~3.3× faster than in-memory SQLite on full-scan aggregation (NumPy-backed math is faster, stated plainly).
🗺️ Roadmap (tracked issues)
FOR encoding (#11), sys.monitoring profiler (#12), faster row bulk-insert (#13), optional NumPy-accelerated filters/aggregates (#14).
SnapDB v0.5.0 — Delta Encoding & Dictionary Encoding
SnapDB v0.5.0 — Delta Encoding for Monotonic Columns
What's New
🚀 Delta Encoding (Issue #1)
- Transparent delta encoding for monotonic numeric columns (timestamps, IDs, sequences)
- 1.2× memory reduction for i64 timestamps on 100K rows (2.29 MB → 1.91 MB)
- Auto-detects monotonicity after 50-sample warmup
- Auto-fallback to raw storage if non-monotonic data detected
- Auto-upgrades delta typecode if deltas overflow current width
- Per-column control:
delta_columns=["timestamp", "seq"]
📊 Memory Efficiency Stack (Combined)
| Feature | Reduction | Version |
|---|---|---|
| Bit-packed booleans | ~8× | v0.3.2 |
| Dictionary encoding | 3× | v0.4.0 |
| Delta encoding | 1.2× | v0.5.0 |
Full Changelog
ce89679docs: update README with delta encoding v0.5.046636a3feat: delta encoding for monotonic numeric columns (#1)13340d7docs: update README with dictionary encoding v0.4.07a9087dfeat: dictionary encoding for low-cardinality string columns (#2)
Roadmap
SnapDB v0.3.2 — Bit-Packed Booleans & Package Reorganization
SnapDB v0.3.2 — Bit-Packed Booleans & Package Reorganization
What's New
🚀 Performance
- Bit-packed boolean storage — Python
intbitmask replacesarray.array('B'), delivering ~8× memory reduction for boolean columns - Precompiled struct format — single
struct.pack/unpackper row (1.6–1.9× faster encode/decode)
📦 Packaging
- Proper Python package —
snapdb/directory with__init__.py pyproject.toml— PEP 621 compliant packaging config- Test suite — moved to
tests/directory - Benchmarks — moved to
benchmarks/directory - Modular architecture —
core.py,columnar.py,metrics.py,index.py,query.py,wal.py,document_store.py
📊 Benchmarks
- Comprehensive 5-engine comparison: SnapDB vs DuckDB vs SQLite vs Pure Dict
- DuckDB DataFrame batch insert added
- Bit-packed boolean benchmarks included
✅ Correctness
- 47/47 tests passing
- All v0.2.0 and v0.3.x features verified
Full Changelog
7bc35dav0.3.2: Bit-packed booleans, package reorganization, pyproject.toml12881c0v0.3.2: Precompiled struct format, hash index, lookup optimizationb708e9cbenchmark: Add DuckDB comparison, comprehensive 5-engine suite19a605bv0.3.1: Batch insert, optimized columnar, comprehensive benchmarksc30d757v0.3.0: Columnar engine, metrics, CDC
Roadmap
See open issues for upcoming features.
v0.1.0 — Lightning-fast in-memory database
SnapDB v0.1.0
Extremely lightweight, lightning-fast in-memory database for Python.
Key Innovations
- Slab-oriented storage — contiguous pages, CPU cache friendly, no malloc fragmentation
- Zero-copy reads —
memoryviewslices intommap: 1,212,499 reads/sec - Single-file — schema + bitmap + data in one
.snapfile - Pure Python, zero dependencies — stdlib only (
mmap,struct,os,json) - Fixed-width types —
i8/16/32/64,u8/16/32/64,f32/64,bool,bytes:N
Performance (100K rows)
| Operation | Speed |
|---|---|
| Insert | 69,185 rows/sec |
| Read (decoded dict) | 309,698 rows/sec |
| Read (zero-copy raw) | 1,212,499 rows/sec |
| Sequential scan | 389,072 rows/sec |
Quick Start
from snapdb import SnapDB, Schema, ColumnDef
schema = Schema([
ColumnDef("id", "i32"),
ColumnDef("temp", "f32"),
ColumnDef("active", "bool"),
])
with SnapDB("data.snap", schema) as db:
db.insert({"id": 1, "temp": 25.5, "active": True})
row = db.get(0) # decoded dict
raw = db.get_raw(0) # zero-copy memoryview
for idx, row in db.query(lambda r: r["active"]):
print(idx, row)Files
snapdb.py— Core engine (~400 lines)test_snapdb.py— Unit testsquickbench.py— Benchmark script
Design
SnapDB uses a slab allocator: each slab is a fixed-size page (default 4KB) holding all columns for N rows contiguously. A bitmap tracks live vs deleted rows. The file is memory-mapped for persistence and zero-copy access.
No external dependencies. No server. Just a single Python file.