docs(memory): architecture doc for SessionRetriever

jddunn · jddunn · commit de57958f1a55 · 2026-04-20T18:43:11.000-07:00
diff --git a/docs/architecture/session-retriever.md b/docs/architecture/session-retriever.md
@@ -0,0 +1,57 @@
+# Session-Level Hierarchical Retriever
+
+## What this is
+
+`SessionRetriever` implements two-stage hierarchical retrieval at session granularity. It pairs with `SessionSummarizer` (which generates per-session summaries at ingest time) and `SessionSummaryStore` (which indexes those summaries in a dedicated vector collection). At retrieval time, it selects top-K sessions by summary similarity, then takes top-M chunks per selected session from the standard `MemoryStore`.
+
+The effect is a coverage mechanism: by construction, retrieved chunks span multiple distinct sessions. Single-stage retrieval tends to cluster on the single most-relevant session, missing multi-session evidence. This retriever forces diversity without re-ranking hacks.
+
+## Mental model
+
+Parallel to `HydeRetriever` (hypothesis-driven) and `ProspectiveMemoryManager` (time/event-triggered), `SessionRetriever` is a query-time retrieval strategy under `memory/retrieval/`. All three are opt-in; callers wire them up when their use case benefits.
+
+## Two-stage flow
+
+1. **Stage 1**: `summaryStore.querySessions(query, topK=K)` selects the top-K sessions by cosine similarity between the query embedding and the indexed session summaries.
+2. **Stage 2**: a single `memoryStore.query(query, topK=K*M*3)` over-fetches candidates so post-filtering has enough per-session representatives.
+3. **Post-filter**: keep only traces whose `bench-session:<id>` tag (configurable via `sessionTagPrefix`) matches a Stage-1 session.
+4. **Group by session**, take top-M chunks per session (already sorted by cognitive score).
+5. **Optional rerank** over the merged pool via an injected `RerankerService` (0.7 cognitive + 0.3 neural blend, matching `CognitiveMemoryManager.retrieve`).
+6. **Truncate** to `recallTopK` (default 10).
+
+## Fallbacks
+
+- **Stage 1 empty**: no sessions indexed for the scope. Fall through to plain `memoryStore.query` and return its top-`recallTopK`. Diagnostics tag: `escalations: ['session-retriever:stage1-empty']`.
+- **Stage 2 post-filter empty**: Stage-2 pool had no chunks tagged for Stage-1 sessions. Return raw Stage-2 top-`recallTopK` without session filtering. Diagnostics tag: `escalations: ['session-retriever:stage2-empty']`.
+
+## When to use
+
+- Long-term conversational memory where answers span multiple sessions (LongMemEval multi-session, LOCOMO multi-hop).
+- Deployments where per-session topical coherence is high and session boundaries are semantically meaningful.
+- Configurations with an LLM budget for ingest-time summary generation (`SessionSummarizer` call per unique session).
+
+## When NOT to use
+
+- Single-session question answering where `CognitiveMemoryManager.retrieve` already surfaces the right chunks.
+- Deployments without ingest-time summarization (no `SessionSummarizer`). SessionRetriever would fall through to plain retrieval every call.
+- Very short sessions (< 5 turns) where the summary and chunks are essentially the same content.
+
+## References
+
+- **xMemory** ([arxiv 2602.02007v3](https://arxiv.org/abs/2602.02007v3), 2026) — four-level hierarchy (raw → episode → semantic → theme) with two-stage retrieval. Ablation on LoCoMo shows hierarchy alone beats Naive RAG BLEU 27.9→31.8, F1 36.4→40.8 before any retrieval optimization.
+- **TACITREE** ([EMNLP 2025](https://aclanthology.org/2025.emnlp-main.580.pdf), UCSD) — hierarchical tree for multi-session personalized conversation. Level-based retrieval progressively refines from abstract summaries to detail.
+- **Anthropic contextual retrieval** (Sep 2024) — the pattern `SessionSummarizer` implements at session granularity. `SessionRetriever` is the retrieval-time counterpart.
+
+## Performance characteristics
+
+- **Stage 1 cost**: one embedding (reusable via a shared `CachedEmbedder`) plus one vector search per query. Bounded by `topK=K`.
+- **Stage 2 cost**: one `MemoryStore.query` per query with `topK = K × M × 3` (over-fetch multiplier). Typical: 45 at the defaults.
+- **Optional rerank cost**: one Cohere `rerank-v3.5` call over the merged K×M pool (~15 documents at defaults). Approximately $0.0001 per query.
+- **Fallback cost**: Stage-1 empty → plain `MemoryStore.query` (no extra cost). Stage-2 empty → raw Stage-2 pool (no extra cost).
+
+## Related modules
+
+- [`src/memory/retrieval/session/SessionSummaryStore.ts`](../../src/memory/retrieval/session/SessionSummaryStore.ts)
+- [`src/memory/retrieval/session/SessionRetriever.ts`](../../src/memory/retrieval/session/SessionRetriever.ts)
+- [`src/memory/ingest/SessionSummarizer.ts`](../../src/memory/ingest/SessionSummarizer.ts) — summary generation (companion)
+- [`src/memory/retrieval/store/MemoryStore.ts`](../../src/memory/retrieval/store/MemoryStore.ts) — underlying trace store used by Stage 2