Skip to content

v0.6.0

Choose a tag to compare

@mbachaud mbachaud released this 28 May 21:12
· 26 commits to master since this release

0.6.0 — 2026-05-28

Substantial release covering corpus-scale retrieval, bench rebuilding,
storage audits, and a stack of stability + portability fixes. Headline
work: EnterpriseRAG-Bench (Layer 3) shipped with 100q variant-A
results (recall@10 = 28% on the 850K-gene v2 fixture); two scaling-wall
bugs (regex ReDoS at ingest, SQL-variable cap at retrieval) fixed and
cross-validated on x86 + ARM64 hardware; ~400 lines of new bench docs.

  • fix(tagger): eliminate catastrophic backtracking in _KV_PAIR_PATTERN
    (PR #155 / PR #162).
    Pre-fix, (\w+(?:_\w+)*) had redundant
    nested-quantifier ambiguity that triggered catastrophic backtracking on
    underscore-heavy content. A single worker spinning on
    tagger.py:439's _KV_PAIR_PATTERN.finditer(content[:5000]) hung the
    EnterpriseRAG-Bench-Onyx-full corpus build for 60+ minutes on a single
    google-drive shared-drives file (underscore-rich JSON keys like
    expected_doc_ids, data_source_id). The fix (\w+) is functionally
    identical (same match set, since \w includes _) but has no nested
    quantifier. Verified on the 3 worst-offender files from the bench corpus:
    0.40-0.52 ms each, down from >60 min hung. 200-underscore stress test:
    0.02 ms. Cross-validated under two independent py-spy investigations
    on different hardware classes (x86 Ryzen + RTX 3080 Ti on 2026-05-19,
    ARM64 Grace + GB10 on 2026-05-27) — same line, same root cause, same fix.

  • perf(ddl): skip FTS5 orphan cleanup when delta is < 5% of gene count
    (PR #156 / PR #162).
    The previous cleanup ran
    DELETE FROM genes_fts WHERE gene_id NOT IN (SELECT gene_id FROM genes)
    — an O(N·M) correlated subquery that hung the daemon's first-query
    response for hours on the 850K-gene / 105-shard EnterpriseRAG-Bench
    fixture. On a single 18K-gene shard with ~40 orphans (0.2% noise) the
    NOT IN scan pegged a core for 5-10 minutes against a cold OS-cache
    page set. Orphan FTS5 entries are harmless at query time (downstream
    gene_id joins return NULL and filter out before delivery), so
    skipping cleanup for trivial deltas costs nothing in retrieval quality
    and unblocks first-query latency entirely. For the rare significant-drift
    case (delta ≥ 5%), the rewritten query uses indexed NOT EXISTS instead
    of NOT IN, turning O(N²) into O(N log N). Daemon /health response
    on the 850K-gene fixture went from "hangs forever" to milliseconds.

  • feat(build): salvage already-complete shards on rebuild (PR #157 /
    PR #162).
    Adds _try_salvage_complete_shard() which opens an existing
    shard .db read-only, verifies the genes table has 100% dense
    backfill coverage and no live WAL sidecar, and returns the same
    result-dict shape that _build_one_shard would normally produce —
    letting the parent's _commit_shard_result re-register the shard via
    INSERT OR REPLACE. Designed for the kill+restart cycle: if
    build_fixture_matrix.py is interrupted (Ctrl+C, OOM, planned restart),
    fully-complete shards on disk are re-registered in seconds instead of
    rebuilt from scratch. Verified at scale: 21 of 22 already-complete
    shards re-registered into a fresh main.genome.db in 2 min 19 sec
    (vs ~13 hours to rebuild from scratch) during a mid-build restart of
    the EnterpriseRAG-Bench-Onyx-full 850K-gene build.

  • fix(knowledge_store): batch IN-clause queries to stay under SQLite cap
    (PR #163).
    SQLite caps WHERE col IN (?, ?, ...) placeholders at
    SQLITE_LIMIT_VARIABLE_NUMBER (999 legacy, 2000 on the Python 3.12 /
    SQLite 3.50 builds we ship to, 32766 on newer compile defaults). Four
    call sites on the gene_scores fan-out path in query_docs
    (_apply_authority_boosts, sema-boost embedding lookup, party-attribution
    lookup, access-rate epigenetics lookup) build the IN clause from a
    caller-determined candidate set that can exceed the cap in production.
    Observed in the 2026-05-28 v2-fixture 100q bench: 3 of 29 queries had a
    per-shard query raise OperationalError: too many SQL variables, which
    the daemon's per-shard try/except swallowed as "shard X query failed;
    skipping" — biasing recall@K by silently dropping shards. Variants where
    SPLADE or the prefilter narrows gene_scores don't hit this; only the
    no-filter SPLADE-off path produces sets large enough to blow up. Adds
    _iter_in_batches(items, batch_size=500) helper and refactors the four
    hot sites. Includes TDD'd regression test at
    tests/test_knowledge_store_batched_in.py that probes the runtime cap
    via conn.getlimit(SQLITE_LIMIT_VARIABLE_NUMBER) and exercises at
    cap, cap + 1, and 4*cap + 7 boundaries.

  • fix(mcp): registry handshake is best-effort, don't kill subprocess on
    failure (PR #169).
    On Windows, claude -p MCP attempts were failing
    with "Connection closed" after ~2 s even when helix was alive on
    http://127.0.0.1:11437. Root cause: _register_with_registry() was
    called synchronously before mcp.run() entered the stdio handshake; an
    exception from register_participant() (auto-heartbeat thread init,
    etc.) propagated out of main() and killed the MCP subprocess before
    the host could complete its handshake. The registry is not load-bearing
    for tool calls — tool calls proxy directly to the helix HTTP API.
    Registry is only used by helix_announce + dashboards. This patch
    wraps _register_with_registry() in a try/except inside main():
    happy path unchanged, failure path logs the exception and continues
    to mcp.run() rather than exiting. Closes #167.

  • feat(bench): add --isolated flag to bench_claude_matrix for
    leak-free measurement (PR #170).
    When set, the claude -p sub-agent
    is launched with --tools "" (all built-in tools disabled),
    --strict-mcp-config, and --mcp-config '{"mcpServers":{}}' (no MCP
    servers). Pair with a sterile --cwd (e.g. F:/tmp/bench_sandbox) to
    also block CLAUDE.md auto-discovery. Isolates retrieval-driven answer
    quality from filesystem-tool access. Records isolated + claude_cwd
    in the per-run JSON so post-hoc analysis can distinguish leak-free runs
    from contaminated runs. Brings shipped code into agreement with
    shipped docs (docs/benchmarks/BENCHMARKS.md §"Layer 3 —
    EnterpriseRAG-Bench" and BENCHMARK_RATIONALE.md addendum already
    described this isolation mode). Closes #168.

  • docs(benchmarks): add Layer 3 (EnterpriseRAG-Bench) + EnterpriseRAG
    fixtures (PR #166).
    ~400 lines across four files. BENCHMARKS.md
    gets a new "Layer 3 — EnterpriseRAG-Bench" section covering the
    2026-05-20→21 bench investigation rebuild (isolated=True mode,
    +32.4 pp helix lift, 65% hallucination reduction), cross-corpus results
    (60% recall@10 @ 10K → 71% @ 50K → 28% @ 850K), the expression-budget
    clamp fix (4%→43% correctness), Wall-1 / Wall-2 scaling-wall framing,
    the v2 100q variant-A result table, and cross-host validation of the
    tagger fix. GENOME_FIXTURE_MATRIX.md gets a new EnterpriseRAG-Bench
    fixtures section (5-row fixture table, shared 9-source-root scope,
    excluded-from-ingest list, auto-subsharding behavior, path-portability
    gotcha, branch/PR routing). BENCHMARK_RATIONALE.md gets an addendum
    on how Layer 3 answered the rationale's NIAH-doesn't-fit problems.
    MULTI_VALID_GOLD.md gets an EnterpriseRAG-Bench gold-path matching
    section (schema diff, _rel_after_sources normalization, prefix-tolerant
    match fix).