v0.6.0
0.6.0 — 2026-05-28
Substantial release covering corpus-scale retrieval, bench rebuilding,
storage audits, and a stack of stability + portability fixes. Headline
work: EnterpriseRAG-Bench (Layer 3) shipped with 100q variant-A
results (recall@10 = 28% on the 850K-gene v2 fixture); two scaling-wall
bugs (regex ReDoS at ingest, SQL-variable cap at retrieval) fixed and
cross-validated on x86 + ARM64 hardware; ~400 lines of new bench docs.
-
fix(tagger): eliminate catastrophic backtracking in
_KV_PAIR_PATTERN
(PR #155 / PR #162). Pre-fix,(\w+(?:_\w+)*)had redundant
nested-quantifier ambiguity that triggered catastrophic backtracking on
underscore-heavy content. A single worker spinning on
tagger.py:439's_KV_PAIR_PATTERN.finditer(content[:5000])hung the
EnterpriseRAG-Bench-Onyx-full corpus build for 60+ minutes on a single
google-drive shared-drives file (underscore-rich JSON keys like
expected_doc_ids,data_source_id). The fix(\w+)is functionally
identical (same match set, since\wincludes_) but has no nested
quantifier. Verified on the 3 worst-offender files from the bench corpus:
0.40-0.52 ms each, down from >60 min hung. 200-underscore stress test:
0.02 ms. Cross-validated under two independent py-spy investigations
on different hardware classes (x86 Ryzen + RTX 3080 Ti on 2026-05-19,
ARM64 Grace + GB10 on 2026-05-27) — same line, same root cause, same fix. -
perf(ddl): skip FTS5 orphan cleanup when delta is < 5% of gene count
(PR #156 / PR #162). The previous cleanup ran
DELETE FROM genes_fts WHERE gene_id NOT IN (SELECT gene_id FROM genes)
— an O(N·M) correlated subquery that hung the daemon's first-query
response for hours on the 850K-gene / 105-shard EnterpriseRAG-Bench
fixture. On a single 18K-gene shard with ~40 orphans (0.2% noise) the
NOT INscan pegged a core for 5-10 minutes against a cold OS-cache
page set. Orphan FTS5 entries are harmless at query time (downstream
gene_idjoins return NULL and filter out before delivery), so
skipping cleanup for trivial deltas costs nothing in retrieval quality
and unblocks first-query latency entirely. For the rare significant-drift
case (delta ≥ 5%), the rewritten query uses indexedNOT EXISTSinstead
ofNOT IN, turning O(N²) into O(N log N). Daemon/healthresponse
on the 850K-gene fixture went from "hangs forever" to milliseconds. -
feat(build): salvage already-complete shards on rebuild (PR #157 /
PR #162). Adds_try_salvage_complete_shard()which opens an existing
shard.dbread-only, verifies thegenestable has 100% dense
backfill coverage and no live WAL sidecar, and returns the same
result-dict shape that_build_one_shardwould normally produce —
letting the parent's_commit_shard_resultre-register the shard via
INSERT OR REPLACE. Designed for the kill+restart cycle: if
build_fixture_matrix.pyis interrupted (Ctrl+C, OOM, planned restart),
fully-complete shards on disk are re-registered in seconds instead of
rebuilt from scratch. Verified at scale: 21 of 22 already-complete
shards re-registered into a freshmain.genome.dbin 2 min 19 sec
(vs ~13 hours to rebuild from scratch) during a mid-build restart of
the EnterpriseRAG-Bench-Onyx-full 850K-gene build. -
fix(knowledge_store): batch IN-clause queries to stay under SQLite cap
(PR #163). SQLite capsWHERE col IN (?, ?, ...)placeholders at
SQLITE_LIMIT_VARIABLE_NUMBER(999 legacy, 2000 on the Python 3.12 /
SQLite 3.50 builds we ship to, 32766 on newer compile defaults). Four
call sites on thegene_scoresfan-out path inquery_docs
(_apply_authority_boosts, sema-boost embedding lookup, party-attribution
lookup, access-rate epigenetics lookup) build the IN clause from a
caller-determined candidate set that can exceed the cap in production.
Observed in the 2026-05-28 v2-fixture 100q bench: 3 of 29 queries had a
per-shard query raiseOperationalError: too many SQL variables, which
the daemon's per-shard try/except swallowed as "shard X query failed;
skipping" — biasing recall@K by silently dropping shards. Variants where
SPLADE or the prefilter narrowsgene_scoresdon't hit this; only the
no-filter SPLADE-off path produces sets large enough to blow up. Adds
_iter_in_batches(items, batch_size=500)helper and refactors the four
hot sites. Includes TDD'd regression test at
tests/test_knowledge_store_batched_in.pythat probes the runtime cap
viaconn.getlimit(SQLITE_LIMIT_VARIABLE_NUMBER)and exercises at
cap,cap + 1, and4*cap + 7boundaries. -
fix(mcp): registry handshake is best-effort, don't kill subprocess on
failure (PR #169). On Windows,claude -pMCP attempts were failing
with "Connection closed" after ~2 s even when helix was alive on
http://127.0.0.1:11437. Root cause:_register_with_registry()was
called synchronously beforemcp.run()entered the stdio handshake; an
exception fromregister_participant()(auto-heartbeat thread init,
etc.) propagated out ofmain()and killed the MCP subprocess before
the host could complete its handshake. The registry is not load-bearing
for tool calls — tool calls proxy directly to the helix HTTP API.
Registry is only used byhelix_announce+ dashboards. This patch
wraps_register_with_registry()in a try/except insidemain():
happy path unchanged, failure path logs the exception and continues
tomcp.run()rather than exiting. Closes #167. -
feat(bench): add
--isolatedflag tobench_claude_matrixfor
leak-free measurement (PR #170). When set, theclaude -psub-agent
is launched with--tools ""(all built-in tools disabled),
--strict-mcp-config, and--mcp-config '{"mcpServers":{}}'(no MCP
servers). Pair with a sterile--cwd(e.g.F:/tmp/bench_sandbox) to
also block CLAUDE.md auto-discovery. Isolates retrieval-driven answer
quality from filesystem-tool access. Recordsisolated+claude_cwd
in the per-run JSON so post-hoc analysis can distinguish leak-free runs
from contaminated runs. Brings shipped code into agreement with
shipped docs (docs/benchmarks/BENCHMARKS.md§"Layer 3 —
EnterpriseRAG-Bench" andBENCHMARK_RATIONALE.mdaddendum already
described this isolation mode). Closes #168. -
docs(benchmarks): add Layer 3 (EnterpriseRAG-Bench) + EnterpriseRAG
fixtures (PR #166). ~400 lines across four files.BENCHMARKS.md
gets a new "Layer 3 — EnterpriseRAG-Bench" section covering the
2026-05-20→21 bench investigation rebuild (isolated=Truemode,
+32.4 pp helix lift, 65% hallucination reduction), cross-corpus results
(60% recall@10 @ 10K → 71% @ 50K → 28% @ 850K), the expression-budget
clamp fix (4%→43% correctness), Wall-1 / Wall-2 scaling-wall framing,
the v2 100q variant-A result table, and cross-host validation of the
tagger fix.GENOME_FIXTURE_MATRIX.mdgets a new EnterpriseRAG-Bench
fixtures section (5-row fixture table, shared 9-source-root scope,
excluded-from-ingest list, auto-subsharding behavior, path-portability
gotcha, branch/PR routing).BENCHMARK_RATIONALE.mdgets an addendum
on how Layer 3 answered the rationale's NIAH-doesn't-fit problems.
MULTI_VALID_GOLD.mdgets an EnterpriseRAG-Bench gold-path matching
section (schema diff,_rel_after_sourcesnormalization, prefix-tolerant
match fix).