28 Jun 17:10

461b35a

v0.10.0 — pandas-parity analytics, zero dependencies Latest

Latest

SnapDB v0.10.0 brings the columnar engine to performance parity with pandas/NumPy on analytics — while staying zero-dependency by default (NumPy is optional) and the lightest-memory store of the field. Covers everything since v0.7.0. Every change was adversarially reviewed (findings reproduced before fixing); CI is green across Linux 3.9–3.13 and Windows.

⚡ Performance — now competitive with pandas

Workload (100K rows)	Earlier	v0.10.0	vs pandas
Full-scan aggregate (`sum`/`min`/`max`/`avg`)	~58M rows/s	~530M rows/s	on par
Numeric filtered count (`count_where`)	—	~314M rows/s	on par
Row-store bulk insert	~17K rows/s	~287K rows/s	~SQLite/pandas ballpark
Memory footprint	—	2.2 MB	~5× lighter than pandas

NumPy-accelerated aggregates (v0.8.0, #14) — aggregate() runs over the zero-copy column buffer when NumPy is installed (~13–27×). Exact parity with the pure-Python path; encoded columns and 64-bit-int sums fall through to the exact path. use_numpy=False forces pure-Python.
NumPy-accelerated filters + count_where() (v0.9.0, #14) — select_where() builds masks vectorially; the new count_where() does filtered counts with no row materialization (~166× on numeric predicates). f32/limit=0 parity fixes.
~26× faster row-store bulk insert (v0.10.0, #13) — batch_insert() grows the file in a single truncate+remap for the whole batch instead of one per slab. On-disk format and durability unchanged.

🛡️ Correctness & durability

Frame-of-Reference encoding hardened — 7 data-corruption bugs fixed (out-of-range widening, post-activation nulls, NumPy export, delta+FOR combo, unique_count, update→None, duplicate eligibility set).
Transaction atomicity — batch_insert() inside a transaction() now rolls back correctly (it previously left phantom rows).
Cross-platform test hygiene — pytest tests/ is green on Windows as well as Linux.

🗺️ Remaining roadmap

#12 (low-overhead sys.monitoring profiler — observability) and further filter acceleration for string/dict-encoded columns.

Closes #13, #14. Zero runtime dependencies; pip install snapdb[numpy] enables the accelerated paths.

Assets 2

28 Jun 09:16

hussain-alsaibai

v0.7.0

8806f95

v0.7.0 — Frame-of-Reference + Bit Packing

What's New

Frame-of-Reference (FOR) Encoding

4–8× memory reduction for bounded numeric columns (ages, scores, ratings)
Stores min once, bit-packs deltas into Python int bitmask
Auto-detects after sampling threshold (default 50 rows)
Auto-fallback when range exceeds 16 bits
Per-column via for_columns=[] on ColumnarTable
Transparent API — reads return full values, no changes needed
Update fallback to raw (same pattern as delta encoding)

Impact

Scenario	Raw	FOR	Reduction
Ages 18-65 (100K rows)	400 KB	~88 KB	4.5×
Scores 0-100 (100K rows)	400 KB	~88 KB	4.5×

Tests

6 new tests added, all passing
Zero regressions: 47 existing + 20 dict + 79 delta = 146 tests all pass

Closes

Assets 2

28 Jun 07:52

hussain-alsaibai

v0.6.0

cc8653d

v0.6.0 — performance, correctness, durability & features

SnapDB v0.6.0 is a performance, correctness, durability, and feature release. Pure-Python and zero-dependency at runtime (NumPy is optional, only for zero-copy export). Every change was adversarially reviewed (findings reproduced before fixing) and CI is green across Linux (3.9–3.13) and Windows plus a benchmark.

⚡ Performance

Delta-encoded column reads are now O(1)/O(n) via a lazy reconstruction cache (previously O(n)/O(n²)) — a 20k-row delta-column scan dropped from ~17.2 s to ~11 ms.
Vectorized multi-condition select_where() combines per-column bitmasks with C-speed big-integer AND/OR — ~2× faster than a row-predicate select() on selective queries.
Vectorized aggregates (array-level sum/min/max) for null-free numeric columns; __slots__ on hot classes.

🛡️ Correctness & durability

Hash indexes are kept in sync on insert / batch_insert / update / delete (they previously went stale after the first build); unified create_index() for row and columnar storage; find() works without a pre-built index; an emptied index is no longer mistaken for a missing one.
Delta delete/null corruption fixed — deleting or nulling a delta-encoded row no longer shifts other rows' values.
Transaction rollback now actually undoes writes (and restores indexes).
Durable multi-slab persistence — the on-disk bitmap/slab geometry and the last slab's high-water mark are now persisted, so reopening a database larger than one slab no longer loses data.
Columnar index is now value → set (first-match, consistent under duplicates/deletes and before/after auto-indexing); default to_numpy() returns an independent copy.

✨ Features

#4 Vectorized predicates: select_where([(col, op, value), …], combine="and"|"or") (eq/ne/gt/gte/lt/lte/in/between, projection, limit/offset, dict shorthand).
#6 Auto-indexing: SnapDB(auto_index=True, auto_index_threshold=N).
#7 Zero-copy NumPy export: to_numpy() / to_numpy(zero_copy=True) / column_buffer() (PEP 688), NumPy optional.

🧰 Tooling & docs

benchmarks/bench_suite.py — reproducible, cross-platform benchmark with JSON + Markdown output.
GitHub Actions CI: ruff lint, pytest matrix (Linux 3.9–3.13 + Windows), benchmark artifact.
README refreshed with honest, reproducible numbers: SnapDB columnar is the lowest-memory engine here (~2.2 MB / 100K rows vs SQLite 2.9, pandas 11, dict 22) and ~3.3× faster than in-memory SQLite on full-scan aggregation (NumPy-backed math is faster, stated plainly).

🗺️ Roadmap (tracked issues)

FOR encoding (#11), sys.monitoring profiler (#12), faster row bulk-insert (#13), optional NumPy-accelerated filters/aggregates (#14).

Closes #3, #4, #5, #6, #7. Full diff: see PR #10.

Assets 2

28 Jun 04:49

hussain-alsaibai

v0.5.0

b476898

SnapDB v0.5.0 — Delta Encoding & Dictionary Encoding

SnapDB v0.5.0 — Delta Encoding for Monotonic Columns

What's New

🚀 Delta Encoding (Issue #1)

Transparent delta encoding for monotonic numeric columns (timestamps, IDs, sequences)
1.2× memory reduction for i64 timestamps on 100K rows (2.29 MB → 1.91 MB)
Auto-detects monotonicity after 50-sample warmup
Auto-fallback to raw storage if non-monotonic data detected
Auto-upgrades delta typecode if deltas overflow current width
Per-column control: delta_columns=["timestamp", "seq"]

📊 Memory Efficiency Stack (Combined)

Feature	Reduction	Version
Bit-packed booleans	~8×	v0.3.2
Dictionary encoding	3×	v0.4.0
Delta encoding	1.2×	v0.5.0

Full Changelog

ce89679 docs: update README with delta encoding v0.5.0
46636a3 feat: delta encoding for monotonic numeric columns (#1)
13340d7 docs: update README with dictionary encoding v0.4.0
7a9087d feat: dictionary encoding for low-cardinality string columns (#2)

Roadmap

#3 Frame-of-Reference + bit packing
#4 Vectorized bitmask predicates
#5 sys.monitoring profiling
#6 Auto-indexing
#7 buffer protocol

Assets 2

28 Jun 04:43

hussain-alsaibai

v0.3.2

66bce9e

SnapDB v0.3.2 — Bit-Packed Booleans & Package Reorganization

What's New

🚀 Performance

Bit-packed boolean storage — Python int bitmask replaces array.array('B'), delivering ~8× memory reduction for boolean columns
Precompiled struct format — single struct.pack/unpack per row (1.6–1.9× faster encode/decode)

📦 Packaging

Proper Python package — snapdb/ directory with __init__.py
pyproject.toml — PEP 621 compliant packaging config
Test suite — moved to tests/ directory
Benchmarks — moved to benchmarks/ directory
Modular architecture — core.py, columnar.py, metrics.py, index.py, query.py, wal.py, document_store.py

📊 Benchmarks

Comprehensive 5-engine comparison: SnapDB vs DuckDB vs SQLite vs Pure Dict
DuckDB DataFrame batch insert added
Bit-packed boolean benchmarks included

✅ Correctness

47/47 tests passing
All v0.2.0 and v0.3.x features verified

Full Changelog

7bc35da v0.3.2: Bit-packed booleans, package reorganization, pyproject.toml
12881c0 v0.3.2: Precompiled struct format, hash index, lookup optimization
b708e9c benchmark: Add DuckDB comparison, comprehensive 5-engine suite
19a605b v0.3.1: Batch insert, optimized columnar, comprehensive benchmarks
c30d757 v0.3.0: Columnar engine, metrics, CDC

Roadmap

See open issues for upcoming features.

Assets 2

23 Jun 04:15

hussain-alsaibai

v0.1.0

1935876

v0.1.0 — Lightning-fast in-memory database

SnapDB v0.1.0

Extremely lightweight, lightning-fast in-memory database for Python.

Key Innovations

Slab-oriented storage — contiguous pages, CPU cache friendly, no malloc fragmentation
Zero-copy reads — memoryview slices into mmap: 1,212,499 reads/sec
Single-file — schema + bitmap + data in one .snap file
Pure Python, zero dependencies — stdlib only (mmap, struct, os, json)
Fixed-width types — i8/16/32/64, u8/16/32/64, f32/64, bool, bytes:N

Performance (100K rows)

Operation	Speed
Insert	69,185 rows/sec
Read (decoded dict)	309,698 rows/sec
Read (zero-copy raw)	1,212,499 rows/sec
Sequential scan	389,072 rows/sec

Quick Start

from snapdb import SnapDB, Schema, ColumnDef

schema = Schema([
    ColumnDef("id", "i32"),
    ColumnDef("temp", "f32"),
    ColumnDef("active", "bool"),
])

with SnapDB("data.snap", schema) as db:
    db.insert({"id": 1, "temp": 25.5, "active": True})
    row = db.get(0)           # decoded dict
    raw = db.get_raw(0)       # zero-copy memoryview
    for idx, row in db.query(lambda r: r["active"]):
        print(idx, row)

Files

snapdb.py — Core engine (~400 lines)
test_snapdb.py — Unit tests
quickbench.py — Benchmark script

Design

SnapDB uses a slab allocator: each slab is a fixed-size page (default 4KB) holding all columns for N rows contiguously. A bitmap tracks live vs deleted rows. The file is memory-mapped for persistence and zero-copy access.

No external dependencies. No server. Just a single Python file.

Assets 2

Releases: hussain-alsaibai/snapdb

v0.10.0 — pandas-parity analytics, zero dependencies

⚡ Performance — now competitive with pandas

🛡️ Correctness & durability

🗺️ Remaining roadmap

Uh oh!

v0.7.0 — Frame-of-Reference + Bit Packing

What's New

Frame-of-Reference (FOR) Encoding

Impact

Tests

Closes

Uh oh!

v0.6.0 — performance, correctness, durability & features

⚡ Performance

🛡️ Correctness & durability

✨ Features

🧰 Tooling & docs

🗺️ Roadmap (tracked issues)

Uh oh!

SnapDB v0.5.0 — Delta Encoding & Dictionary Encoding

SnapDB v0.5.0 — Delta Encoding for Monotonic Columns

What's New

🚀 Delta Encoding (Issue #1)

📊 Memory Efficiency Stack (Combined)

Full Changelog

Roadmap

Uh oh!

SnapDB v0.3.2 — Bit-Packed Booleans & Package Reorganization

SnapDB v0.3.2 — Bit-Packed Booleans & Package Reorganization

What's New

🚀 Performance

📦 Packaging

📊 Benchmarks

✅ Correctness

Full Changelog

Roadmap

Uh oh!

v0.1.0 — Lightning-fast in-memory database

SnapDB v0.1.0

Key Innovations

Performance (100K rows)

Quick Start

Files

Design

Uh oh!