Skip to content

v0.5.0 — Retrieval Stack Upgrade + README v2

Choose a tag to compare

@mbachaud mbachaud released this 09 May 00:02
· 117 commits to master since this release

What's new in v0.5.0

All new retrieval features are dark-shipped — feature-flagged off by default. Enable them individually as you need them.

Retrieval improvements

  • BM25 pre-filter (tier-0): restricts the scoring corpus to FTS5 top-200 before the 9-tier scorer runs. ~85× cheaper SEMA cosine scan, better noise rejection. Enable: bm25_prefilter_enabled = true in helix.toml

  • Sub-query decomposition: broad queries (multi_hop/default) are decomposed into 3 point-fact sub-queries, run in parallel, merged with cross-query hit weighting. Converts 12-gene diluted BROAD results into targeted TIGHT/FOCUSED results. Enable: query_decomposition_enabled = true

  • D8 complete — intent taxonomy + entity graph:

    • IntentClass enum on PromoterTags with heuristic _classify_intent() at ingest
    • intent_router.py for LLM-free template decomposition
    • Entity graph wired as Tier 5b (+0.5 score boost per matched entity node)
    • SR gate-benched at N=50, confirmed live
    • Enable entity graph: entity_graph_retrieval_enabled = true
  • BGE-M3 dense vectors + ANN threshold: bgem3_codec.py with asymmetric query/passage encoding and Matryoshka 256D truncation; query_genes_ann() replaces TIGHT/FOCUSED/BROAD step function with cosine similarity threshold gate. Enable: dense_embedding_enabled = true (run scripts/backfill_bgem3.py first)

Infrastructure

  • helix_context/_asgi.py entry point — server.py is now importable without opening a database connection (fixes pytest collection failures in worktrees and fresh clones)

Documentation

  • README v2: benchmark-led structure, hardware-specified bench data, full 9-tier Mermaid pipeline diagram, collapsible terminal walkthrough
  • docs/api/ — HTTP endpoint and MCP tool reference
  • docs/archive/ — internal research/sprint/design docs reorganised

Benchmarks (Ryzen 7 5800x · 48 GB DDR4 · RTX 3080 Ti 12 GB · gemma4:e4b)

  • Token savings: 28.7× on WAL checkpoint queries · 20.1× on port lookups · 5.4× median across 15 query types
  • GPQA diamond: +4 pp accuracy (Helix ON 26% vs OFF 22%, N=100)
  • Dim-lock axis-4: 34% recall@1 vs 8% single-axis

Breaking changes: none. All new flags default to their previous values.


🤖 Generated with Claude Code