Skip to content

Releases: shuruheel/mnestic

v0.8.5 — flat parallel index builds (15×), snapshot read path

12 Jun 22:10

Choose a tag to compare

Sixth fork release — matches the crate published to crates.io and the wheel on PyPI (this tag is the exact publish commit).

Highlights

  • Flat in-RAM parallel HNSW bulk build (::hnsw create): contiguous vector slab + integer-ID adjacency + per-node-lock parallel insertion (the hnswlib/pgvector/Lucene layout), serialised once into the unchanged on-disk tuple format. Measured: 294 s → 19 s (15×) on the 40k×384 synthetic repro; 89 s → 8.1 s (11×) on a real-embedding corpus, recall unchanged. MNESTIC_INDEX_BUILD_THREADS controls workers.
  • FTS bulk build (::fts create): dead del-pass removed, parallel tokenisation, exact doc-stats seeding — ~2×.
  • Plain-snapshot read path (RocksDB): read-only scripts skip the pessimistic transaction and read through a plain snapshot (standard MVCC read pattern à la TiKV/CockroachDB). Keyed point read p50 28.5 → 23.9 µs (−16%), p99 −19%. Isolation semantics pinned by tests.
  • Batched HNSW neighbour reads: search-path neighbour vectors fetch via one RocksDB MultiGet per expansion step instead of serial point gets (wins on cold-cache / larger-than-RAM corpora).
  • ::describe fix: the op existed upstream but was never wired into the grammar — now reachable, and read-only-guarded like its sibling sys ops.
  • Bulk-build test coverage from a three-review post-ship audit (list-of-vectors columns, F64+Cosine recall guard, multi-column-PK FTS doc-stats equality).

Note: the corrupt-database tooling (::repair_corrupt, panic-free ::index create) landed after this publish and ships in 0.8.6.

Full detail in CHANGELOG-FORK.md.

v0.8.4 — per-leg RRF detail + 0.8.3 concurrency fix

12 Jun 22:10

Choose a tag to compare

Fifth fork release.

Highlights

  • Per-leg retrieval detail: ReciprocalRankFusion(..., detailed: true) (and HybridSearch) emit one row per (item, contributing list)[item, fused_score, list_id, leg_rank, leg_score] — exposing exactly which legs retrieved each result and at what rank. Powers downstream "why retrieved" explanations.
  • Fixed a 0.8.3 concurrent-write regression: the durable FTS doc-stats counter introduced for O(1) avgdl was a single hot row that serialized all writes to FTS-indexed relations; it's now off the hot path.
  • PyPI family: the embedded Python wheel publishes as mnestic.

Full detail in CHANGELOG-FORK.md.

v0.8.3 — BM25 FTS + native 3-way fused recall

12 Jun 22:10
cf789cd

Choose a tag to compare

Fourth fork release, validated end-to-end on the mnestic-benchmarks hybrid suite (vs SQLite/DuckDB/LanceDB/Kuzu).

Highlights

  • Okapi BM25 is the new default FTS scorer (behaviour change — tf/tf_idf stay selectable, byte-identical to upstream). Adds term-frequency saturation, document-length normalization, and OR-queries that sum per-term contributions instead of taking the max. Fused recall 0.75 → 0.954.
  • O(1) avgdl via a durable per-index doc-stats counter — removes a per-query full index scan; decomposed-path p50 927 → 175 ms, cold p99 2,900 → 258 ms. Legacy indexes self-migrate on first write.
  • Native 3-way fused recall: new typed GraphLeg on HybridSearch runs vector+FTS+graph in one call/one transaction with bounded-hop min-distance ranking — 41.55 ms p50 (~4× faster than the hand-decomposed path), injection-safe (seeds passed as params).
  • Read-path latency baseline (benches/read_path.rs): parse/compile is a fixed ~20–85 µs — material for point reads, noise for retrieval.
  • Python: hybrid_search accepts graph_legs.

Full detail in CHANGELOG-FORK.md.

v0.8.2 — non-blocking HNSW index builds

30 May 20:57

Choose a tag to compare

Third fork release. Makes HNSW index builds non-blocking for readers.

Non-blocking HNSW index builds

Building/rebuilding an HNSW index (::hnsw create) used to hold the base relation's exclusive write lock for the entire build, so every concurrent read blocked until it finished — in production, 10–20+ minutes (76 min for a 151K × 1536 index). The stall was cozo's per-relation lock, not RocksDB.

The build is now done off-lock on RocksDB: the heavy graph construction runs under a read-only snapshot with no relation lock held, and the lock is taken only briefly to set up and to publish. The finished graph is bulk-published via SstFileWriter/IngestExternalFile, ingested before its metadata is committed (so a reader can never observe an index before its keys exist); rows mutated during the build are folded in by a short reconcile pass under a brief final lock.

Measured (release): a 40,000-vector build takes ~5.6 s, during which 90,507 concurrent reads of the same relation completed, the slowest in 0.8 ms — previously those reads would have queued behind the whole ~5.6 s build.

Default-on and transparent (same ::hnsw create). RocksDB only; other backends keep the in-transaction build unchanged via the new Storage::ingest_sorted fallback. No mnestic-rocks change (stays 0.1.8).

All 169 inherited lib tests + integration/feature suites pass; cargo clippy -p mnestic -- -D warnings clean. See CHANGELOG-FORK.md for full detail.

v0.8.1: mnestic 0.8.1

30 May 20:03

Choose a tag to compare

mnestic 0.8.1

One-call hybrid retrieval, a ~3x faster HNSW index build, the maintained
mnestic-rocks bridge fork, and a blocking clippy CI gate.

New

  • HybridSearch: DbInstance::hybrid_search / Db::hybrid_search (+ *_script) run
    HNSW + FTS (+ optional graph traversal), fuse with RRF, optionally diversify
    with MMR — in one typed call. Read-only; values passed as params, identifiers
    validated against injection.

Performance

  • HNSW index build ~3.1x faster (20k x 128: 135s -> 43.6s, release): the build
    now constructs the graph in the in-RAM temp store + shares one VectorCache
    across the build, instead of round-tripping through the transaction's
    WriteBatchWithIndex overlay. Built graph is byte-identical.

Bridge

  • Forked cozorocks -> mnestic-rocks (v0.1.8); importable name stays cozorocks.

Maintenance

  • Blocking clippy CI gate (-D warnings); document-features future-incompat cleared.

Deferred (designed): lock-free out-of-transaction build + IngestExternalFile
atomic publish; native in-RAM graph; LangChain/LlamaIndex adapter.

Full changelog: CHANGELOG-FORK.md.

mnestic 0.8.0

30 May 17:37

Choose a tag to compare

First release of mnestic, an independently maintained fork of CozoDB tuned as a substrate for agentic memory. Built on upstream 481af05 — 30 commits ahead of cozo 0.7.6. The importable crate name stays cozo, so existing CozoDB code works unchanged.

[dependencies]
mnestic = "0.8.0"

Fixes

  • Equality pushdown*rel[k, ..], k == <value> now compiles to a keyed stored_prefix_join instead of a full scan (~28–29× faster single-row primary-key lookups, measured at 5k rows). Numeric equalities keep cross-type op_eq semantics.
  • Parser fix (#281) — identifiers that start with a keyword literal (nullable_column, trueValue, falsey) now parse correctly.
  • Unreleased upstream fixes for free — the fork point is 30 commits ahead of the published 0.7.6, including the stored_prefix_join correctness fix.
  • env_logger moved to a dev-dependency for a slimmer dependency graph (#287).

New — hybrid retrieval for agentic memory

Datalog-composable fixed rules:

  • ReciprocalRankFusion (alias RRF) — fuse vector (HNSW) + full-text (FTS) + graph-traversal result lists into one ranking.
  • MaximalMarginalRelevance (alias MMR) — diversity-aware reranking that avoids near-duplicate recalls.
  • rand_ulid() / ulid_timestamp() — lexicographically-sortable identifiers for time-ordered scans (#296).

Full detail in CHANGELOG-FORK.md. mnestic is not the official CozoDB and is not affiliated with or endorsed by its original authors; all credit for the original design belongs to Ziyang Hu and the Cozo Project Authors.