-
-
Notifications
You must be signed in to change notification settings - Fork 7
How Recall Works
A recall(query, k) runs a multi-stage pipeline in @bastra-recall/core (search.ts). Each stage emits a start/stop event so latency can be streamed and logged.
query.parse → [cache.hit] → bm25.search → vector.search → rrf.fuse → hops.expand → staleness.rank → done
- query.parse — tokenize/validate the query.
-
cache.hit (optional) — a 30-second LRU query cache (max 100 entries). On a hit, the rest of the pipeline is skipped and
donefires immediately. Invalidated on any vault change. -
bm25.search — full-text search over a MiniSearch index. Field weights:
recall_when×5,title×4,tags×3, thensummary,topic_path,body.recall_whenis the highest-weighted field, which is why goodrecall_whenphrases at save time matter most. -
vector.search (hybrid only) — embedding similarity, if an embedding provider is configured (Ollama
embeddinggemmaor OpenAI). Skipped when no embeddings exist (then recall is BM25-only). - rrf.fuse (hybrid only) — Reciprocal-Rank-Fusion merges the BM25 and vector rankings into one score.
-
hops.expand (if
expand_hops=1) — after the direct hits, pull in their 1-hop neighbors viarelated_vialinks, with a reduced score, taggedhop: "1-hop". - staleness.rank — re-rank by lifecycle. Each memory's score is multiplied by a freshness factor: fresh ×1.0, aging ×0.85, stale ×0.5, expired ×0.2. Type-specific expiration windows (e.g. lessons age slower than project-facts).
The result is the top-k hits, each with a score.
| Score | Meaning |
|---|---|
| ≥ ~100 | Strong match. load_memory and apply before acting. |
| 30–100 | Read the summary; load only if directly relevant. |
| < 30 | Noise. recall drops these by default (the score floor). |
The score floor (BASTRA_RECALL_FLOOR, default 30) is applied in recallHandler: hits below it never leave the daemon, so tail-noise doesn't cost context. Override per call with min_score.
recall returns lean payloads by default to keep Claude's context small. The assistant validates from the lean candidate, then load_memorys only what it needs — a two-step flow.
| lean (default) | verbosity: "full" |
|
|---|---|---|
| per hit | id, title, type, scope, summary, score |
+ matched_terms, mode, hop, topic_path
|
summary |
truncated to ~160 chars (word boundary + …) | full (≤400 chars) |
top-level stages block |
omitted | included |
Use verbosity: "full" for debugging or for UIs that render the extra fields (e.g. the Mac-App). On the Claude Code MCP path, the forwarder additionally drops its synthesized stages block and uses expand_hops=0 (exactly k hits).
Measured saving on a real 141-memory vault: ~32 % per recall vs. full, mostly from dropping matched_terms and stages. See scripts/measure-recall-payload.ts.
load_memory(id) returns the full content. Lean by default: essential frontmatter + body with the auto-related section stripped; verbosity: "full" returns the complete frontmatter (related_via cosines, source, confidence, …) and the raw body.
Ein recall(query, k) durchläuft eine mehrstufige Pipeline in @bastra-recall/core (search.ts). Jede Stage emittiert ein Start-/Stop-Event, damit Latenz gestreamt und geloggt werden kann.
query.parse → [cache.hit] → bm25.search → vector.search → rrf.fuse → hops.expand → staleness.rank → done
- query.parse — Query tokenisieren/validieren.
-
cache.hit (optional) — ein 30-Sekunden-LRU-Query-Cache (max 100 Einträge). Bei Treffer wird der Rest der Pipeline übersprungen und
donefeuert sofort. Invalidiert bei jeder Vault-Änderung. -
bm25.search — Volltextsuche über einen MiniSearch-Index. Feld-Gewichte:
recall_when×5,title×4,tags×3, dannsummary,topic_path,body.recall_whenist das höchstgewichtete Feld — deshalb sind guterecall_when-Phrasen beim Speichern am wichtigsten. -
vector.search (nur hybrid) — Embedding-Ähnlichkeit, falls ein Embedding-Provider konfiguriert ist (Ollama
embeddinggemmaoder OpenAI). Übersprungen, wenn keine Embeddings existieren (dann ist Recall BM25-only). - rrf.fuse (nur hybrid) — Reciprocal-Rank-Fusion verschmilzt BM25- und Vector-Ranking zu einem Score.
-
hops.expand (bei
expand_hops=1) — nach den direkten Treffern deren 1-Hop-Nachbarn überrelated_via-Links einhängen, mit reduziertem Score, markiert alshop: "1-hop". - staleness.rank — Re-Ranking nach Lifecycle. Der Score jeder Memory wird mit einem Frische-Faktor multipliziert: fresh ×1.0, aging ×0.85, stale ×0.5, expired ×0.2. Typ-spezifische Ablauffenster (z.B. altern Lessons langsamer als Project-Facts).
Ergebnis sind die Top-k-Hits, je mit score.
| Score | Bedeutung |
|---|---|
| ≥ ~100 | Starker Match. load_memory und vor dem Handeln anwenden. |
| 30–100 | Summary lesen; nur laden, wenn direkt relevant. |
| < 30 | Rauschen. recall dropt diese standardmäßig (Score-Floor). |
Der Score-Floor (BASTRA_RECALL_FLOOR, Default 30) wird in recallHandler angewandt: Hits darunter verlassen den Daemon nie, sodass Tail-Rauschen keinen Context kostet. Pro Call mit min_score überschreibbar.
recall liefert standardmäßig lean Payloads, um Claudes Context klein zu halten. Der Assistent validiert anhand des lean-Kandidaten und load_memoryt dann nur das Nötige — ein Zwei-Schritt-Flow.
| lean (Default) | verbosity: "full" |
|
|---|---|---|
| pro Hit | id, title, type, scope, summary, score |
+ matched_terms, mode, hop, topic_path
|
summary |
auf ~160 Zeichen gekürzt (Wortgrenze + …) | voll (≤400 Zeichen) |
top-level stages-Block |
weggelassen | enthalten |
verbosity: "full" für Debugging oder UIs, die die Extra-Felder rendern (z.B. die Mac-App). Auf dem Claude-Code-MCP-Pfad dropt der Forwarder zusätzlich seinen synthetisierten stages-Block und nutzt expand_hops=0 (genau k Hits).
Gemessene Ersparnis auf einem echten 141-Memory-Vault: ~32 % pro Recall vs. full, vor allem durch Wegfall von matched_terms und stages. Siehe scripts/measure-recall-payload.ts.
load_memory(id) liefert den vollen Inhalt. Lean by default: essenzielle Frontmatter + body ohne Auto-Related-Section; verbosity: "full" liefert die komplette Frontmatter (related_via-Cosines, source, confidence, …) und den rohen body.