|
| 1 | +# Session-Level Hierarchical Retriever |
| 2 | + |
| 3 | +## What this is |
| 4 | + |
| 5 | +`SessionRetriever` implements two-stage hierarchical retrieval at session granularity. It pairs with `SessionSummarizer` (which generates per-session summaries at ingest time) and `SessionSummaryStore` (which indexes those summaries in a dedicated vector collection). At retrieval time, it selects top-K sessions by summary similarity, then takes top-M chunks per selected session from the standard `MemoryStore`. |
| 6 | + |
| 7 | +The effect is a coverage mechanism: by construction, retrieved chunks span multiple distinct sessions. Single-stage retrieval tends to cluster on the single most-relevant session, missing multi-session evidence. This retriever forces diversity without re-ranking hacks. |
| 8 | + |
| 9 | +## Mental model |
| 10 | + |
| 11 | +Parallel to `HydeRetriever` (hypothesis-driven) and `ProspectiveMemoryManager` (time/event-triggered), `SessionRetriever` is a query-time retrieval strategy under `memory/retrieval/`. All three are opt-in; callers wire them up when their use case benefits. |
| 12 | + |
| 13 | +## Two-stage flow |
| 14 | + |
| 15 | +1. **Stage 1**: `summaryStore.querySessions(query, topK=K)` selects the top-K sessions by cosine similarity between the query embedding and the indexed session summaries. |
| 16 | +2. **Stage 2**: a single `memoryStore.query(query, topK=K*M*3)` over-fetches candidates so post-filtering has enough per-session representatives. |
| 17 | +3. **Post-filter**: keep only traces whose `bench-session:<id>` tag (configurable via `sessionTagPrefix`) matches a Stage-1 session. |
| 18 | +4. **Group by session**, take top-M chunks per session (already sorted by cognitive score). |
| 19 | +5. **Optional rerank** over the merged pool via an injected `RerankerService` (0.7 cognitive + 0.3 neural blend, matching `CognitiveMemoryManager.retrieve`). |
| 20 | +6. **Truncate** to `recallTopK` (default 10). |
| 21 | + |
| 22 | +## Fallbacks |
| 23 | + |
| 24 | +- **Stage 1 empty**: no sessions indexed for the scope. Fall through to plain `memoryStore.query` and return its top-`recallTopK`. Diagnostics tag: `escalations: ['session-retriever:stage1-empty']`. |
| 25 | +- **Stage 2 post-filter empty**: Stage-2 pool had no chunks tagged for Stage-1 sessions. Return raw Stage-2 top-`recallTopK` without session filtering. Diagnostics tag: `escalations: ['session-retriever:stage2-empty']`. |
| 26 | + |
| 27 | +## When to use |
| 28 | + |
| 29 | +- Long-term conversational memory where answers span multiple sessions (LongMemEval multi-session, LOCOMO multi-hop). |
| 30 | +- Deployments where per-session topical coherence is high and session boundaries are semantically meaningful. |
| 31 | +- Configurations with an LLM budget for ingest-time summary generation (`SessionSummarizer` call per unique session). |
| 32 | + |
| 33 | +## When NOT to use |
| 34 | + |
| 35 | +- Single-session question answering where `CognitiveMemoryManager.retrieve` already surfaces the right chunks. |
| 36 | +- Deployments without ingest-time summarization (no `SessionSummarizer`). SessionRetriever would fall through to plain retrieval every call. |
| 37 | +- Very short sessions (< 5 turns) where the summary and chunks are essentially the same content. |
| 38 | + |
| 39 | +## References |
| 40 | + |
| 41 | +- **xMemory** ([arxiv 2602.02007v3](https://arxiv.org/abs/2602.02007v3), 2026) — four-level hierarchy (raw → episode → semantic → theme) with two-stage retrieval. Ablation on LoCoMo shows hierarchy alone beats Naive RAG BLEU 27.9→31.8, F1 36.4→40.8 before any retrieval optimization. |
| 42 | +- **TACITREE** ([EMNLP 2025](https://aclanthology.org/2025.emnlp-main.580.pdf), UCSD) — hierarchical tree for multi-session personalized conversation. Level-based retrieval progressively refines from abstract summaries to detail. |
| 43 | +- **Anthropic contextual retrieval** (Sep 2024) — the pattern `SessionSummarizer` implements at session granularity. `SessionRetriever` is the retrieval-time counterpart. |
| 44 | + |
| 45 | +## Performance characteristics |
| 46 | + |
| 47 | +- **Stage 1 cost**: one embedding (reusable via a shared `CachedEmbedder`) plus one vector search per query. Bounded by `topK=K`. |
| 48 | +- **Stage 2 cost**: one `MemoryStore.query` per query with `topK = K × M × 3` (over-fetch multiplier). Typical: 45 at the defaults. |
| 49 | +- **Optional rerank cost**: one Cohere `rerank-v3.5` call over the merged K×M pool (~15 documents at defaults). Approximately $0.0001 per query. |
| 50 | +- **Fallback cost**: Stage-1 empty → plain `MemoryStore.query` (no extra cost). Stage-2 empty → raw Stage-2 pool (no extra cost). |
| 51 | + |
| 52 | +## Related modules |
| 53 | + |
| 54 | +- [`src/memory/retrieval/session/SessionSummaryStore.ts`](../../src/memory/retrieval/session/SessionSummaryStore.ts) |
| 55 | +- [`src/memory/retrieval/session/SessionRetriever.ts`](../../src/memory/retrieval/session/SessionRetriever.ts) |
| 56 | +- [`src/memory/ingest/SessionSummarizer.ts`](../../src/memory/ingest/SessionSummarizer.ts) — summary generation (companion) |
| 57 | +- [`src/memory/retrieval/store/MemoryStore.ts`](../../src/memory/retrieval/store/MemoryStore.ts) — underlying trace store used by Stage 2 |
0 commit comments