v0.2.0b2 — Retrieval authority boosts
Fix pass release
Patch release focused on retrieval quality. The genome contains the right content, but retrieval couldn't distinguish "about X" from "mentions X in passing" — authoritative sources like BENCHMARK_NOTES.md were ranking alongside tangential files like oom_prevent.py.
Changes
Authority boosts (new)
Three post-rank signals added in _apply_authority_boosts():
-
Source authority (+2.0) — query term in
source_idpathBENCHMARK_NOTES.mdoutranksbench_needle.pyfor query "benchmark"context_manager.pyoutranks unrelated Python files for "context manager"
-
Domain primacy (+1.5) — query term in top-3 promoter domains
- Primary domains = what the gene is ABOUT
- Gene whose top-3 domains are
[biged, fleet, skills]answering "biged fleet" → boost - Gene whose top domain is
pythonthat mentions biged in content → no boost
-
Creation recency (+0.5) — gene created in last 48 hours
- Bootstraps newly-ingested concepts before they build co-activation history
- Helps today's work surface tomorrow's queries
IDF cap lowered 5.0 → 3.0
The old 5.0 cap over-boosted tangential rare-term matches. A gene with "monetization" at low document frequency could get +5.0 just for having the term, even if the gene is about pricing not the actual topic. New 3.0 cap reduces this noise.
Implementation notes
- All boosts additive — only raises the ceiling on already-scored genes
- Never adds new candidates (no false positives from the fix itself)
- Single batched SQL fetch for all three signals — negligible latency cost
- Called after IDF anchoring, before score-gated expression
All 179 tests passing.
🤖 Generated with Claude Code