-
Notifications
You must be signed in to change notification settings - Fork 0
Query Expansion
web search gets one extra trick news doesn't: for queries with enough words to benefit from it, Mnemolis asks the LLM for a genuinely differently-worded version of the same question, searches SearXNG with both phrasings, and merges the raw result pools before scoring decides what's actually relevant. The idea is straightforward — SearXNG's own ranking depends heavily on the exact words you used, and a real, equally-valid phrasing can surface results the first wording missed entirely.
Only for queries with 3 or more words (_MIN_WORDS). Shorter queries don't have enough room for a genuinely different phrasing to exist — there's only so many ways to rephrase a two-word query before you're just repeating it with different filler.
The LLM's response isn't trusted blindly. Three checks run before an alternate phrasing is used at all:
- Not empty — a blank or failed LLM response is discarded, not treated as "no alternate available" silently
- Not absurdly longer than the original — if the rephrasing comes back more than twice the original's word count, it's discarded as unreliable rather than searched
- Not identical to the original — if the "rephrasing" is just the same query back, there's no point searching it again
Any failure here means query expansion simply doesn't happen for that query — the primary search still runs and returns normally, expansion is a pure bonus, never a requirement.
A successfully-generated alternate phrasing is cached in the routing cache (altquery:{query}), the same way other LLM-backed routing decisions are — see Caching — so a repeated query doesn't pay the LLM cost twice within the cache's TTL.
Original query (3+ words)
│
▼
Fetch SearXNG with the ORIGINAL query
│
▼
Ask LLM for an alternate phrasing
│
┌───────┴───────┐
▼ none/invalid ▼ valid
Use only the Fetch SearXNG with the
original results ALTERNATE phrasing too
│ │
│ ▼
│ Merge both raw result pools,
│ dedupe by normalized URL
│ │
└──────────┬──────────┘
▼
Score EVERY result — original-search
results AND alternate-search results
alike — against the ORIGINAL query only
│
▼
Filter & rank as usual
The merge step deduplicates using the same normalized URL comparison used elsewhere, so a result that happens to surface in both searches doesn't get counted twice.
This is the detail that makes the whole feature trustworthy rather than just noisy. Every result from both searches gets scored against the query you actually typed — never against the LLM's rephrasing. A result only survives into the final response because it's genuinely relevant to what was actually asked, not because it happened to match the wording of an LLM-generated alternate phrasing. The alternate phrasing's only job is to surface a wider net of raw candidates; it has zero influence over which of those candidates is judged relevant.
If a failed alternate fetch happens (network issue, SearXNG hiccup), it's non-fatal — the primary search's results still stand on their own and the response proceeds normally. Query expansion is additive, never a point of fragility for the base case.
news searches your own RSS feeds, which are a small, fixed, already-curated set of sources — there's no equivalent to "SearXNG's ranking might miss something with different wording," because there's no external search ranking involved at all. The relevant scoring problem for news is "which of my existing articles is actually about this," which Confidence-Aware Fusion already handles directly; there's nothing a second, differently-worded search would surface that a single pass over your own feed wouldn't already see.