Skip to content

v0.2.0b2 — Retrieval authority boosts

Choose a tag to compare

@mbachaud mbachaud released this 10 Apr 07:05
· 473 commits to master since this release

Fix pass release

Patch release focused on retrieval quality. The genome contains the right content, but retrieval couldn't distinguish "about X" from "mentions X in passing" — authoritative sources like BENCHMARK_NOTES.md were ranking alongside tangential files like oom_prevent.py.

Changes

Authority boosts (new)

Three post-rank signals added in _apply_authority_boosts():

  1. Source authority (+2.0) — query term in source_id path

    • BENCHMARK_NOTES.md outranks bench_needle.py for query "benchmark"
    • context_manager.py outranks unrelated Python files for "context manager"
  2. Domain primacy (+1.5) — query term in top-3 promoter domains

    • Primary domains = what the gene is ABOUT
    • Gene whose top-3 domains are [biged, fleet, skills] answering "biged fleet" → boost
    • Gene whose top domain is python that mentions biged in content → no boost
  3. Creation recency (+0.5) — gene created in last 48 hours

    • Bootstraps newly-ingested concepts before they build co-activation history
    • Helps today's work surface tomorrow's queries

IDF cap lowered 5.0 → 3.0

The old 5.0 cap over-boosted tangential rare-term matches. A gene with "monetization" at low document frequency could get +5.0 just for having the term, even if the gene is about pricing not the actual topic. New 3.0 cap reduces this noise.

Implementation notes

  • All boosts additive — only raises the ceiling on already-scored genes
  • Never adds new candidates (no false positives from the fix itself)
  • Single batched SQL fetch for all three signals — negligible latency cost
  • Called after IDF anchoring, before score-gated expression

All 179 tests passing.

🤖 Generated with Claude Code