Skip to content

v0.8.5 — flat parallel index builds (15×), snapshot read path

Latest

Choose a tag to compare

@shuruheel shuruheel released this 12 Jun 22:10
· 6 commits to main since this release

Sixth fork release — matches the crate published to crates.io and the wheel on PyPI (this tag is the exact publish commit).

Highlights

  • Flat in-RAM parallel HNSW bulk build (::hnsw create): contiguous vector slab + integer-ID adjacency + per-node-lock parallel insertion (the hnswlib/pgvector/Lucene layout), serialised once into the unchanged on-disk tuple format. Measured: 294 s → 19 s (15×) on the 40k×384 synthetic repro; 89 s → 8.1 s (11×) on a real-embedding corpus, recall unchanged. MNESTIC_INDEX_BUILD_THREADS controls workers.
  • FTS bulk build (::fts create): dead del-pass removed, parallel tokenisation, exact doc-stats seeding — ~2×.
  • Plain-snapshot read path (RocksDB): read-only scripts skip the pessimistic transaction and read through a plain snapshot (standard MVCC read pattern à la TiKV/CockroachDB). Keyed point read p50 28.5 → 23.9 µs (−16%), p99 −19%. Isolation semantics pinned by tests.
  • Batched HNSW neighbour reads: search-path neighbour vectors fetch via one RocksDB MultiGet per expansion step instead of serial point gets (wins on cold-cache / larger-than-RAM corpora).
  • ::describe fix: the op existed upstream but was never wired into the grammar — now reachable, and read-only-guarded like its sibling sys ops.
  • Bulk-build test coverage from a three-review post-ship audit (list-of-vectors columns, F64+Cosine recall guard, multi-column-PK FTS doc-stats equality).

Note: the corrupt-database tooling (::repair_corrupt, panic-free ::index create) landed after this publish and ships in 0.8.6.

Full detail in CHANGELOG-FORK.md.