Context
We already have a similarity graph: chunk_links stores pre-computed cosine similarity edges between chunks across sessions (top-3 neighbors per chunk, min similarity 0.35). This is traversed at query time via _expand_cross_session() with a 0.7 discount factor.
The current edges are untyped — they say "these are related" but not HOW (supports, contradicts, supersedes, extends, etc.). The question: does adding edge types improve retrieval quality enough to justify the cost?
What we already have (5 expansion mechanisms)
- Cross-session chunk links —
chunk_links table, cosine similarity, pre-computed at build
- Same-session context — surrounding turns via
turn_index lookup
- Knowledge node source expansion —
source_sessions/source_turns back-references
- Topic clustering — Jaccard-based clusters in
cluster table
- Hybrid RRF — BM25 + embedding fusion with auto-boost
Proposed evaluation
Phase 1: Measure the gap (no code changes)
Add a "multi-hop" category to CodeMemo with 10-15 questions that require connecting information across sessions where the connection isn't obvious from keywords or embeddings alone. Examples:
- "We decided X in March, then reversed it in April. What's the current state?"
- "Agent A proposed a fix. Agent B found a problem with it. What was the problem?"
- Questions where following a typed edge (supersedes, contradicts) would help but generic similarity wouldn't
Run against current system. If flat retrieval scores >85%, typed edges may not be worth the cost.
Phase 2: Prototype typed edges (if Phase 1 shows a gap)
Option A — Classify at build time: During build_cross_session_links(), run a lightweight classifier on each (source, target) pair to assign a type. Could be rule-based (temporal ordering → supersedes, high similarity + different conclusion → contradicts) or a small model.
Option B — Classify at query time: Keep untyped edges, but when expanding, use query intent to filter. A "what changed" query only follows edges where timestamps differ significantly. A "what supports" query only follows high-similarity same-conclusion edges.
Option C — Add link_type column to chunk_links: Enrichment model outputs edge type alongside knowledge nodes in the same pass. No extra LLM call — just an extra field in the extraction prompt.
Speed constraints (local-first)
- Current
build_cross_session_links() time: needs benchmarking
- Budget: typed edge classification should add <20% to build time
- Query-time expansion must stay <50ms for the full graph hop
- No external API calls for edge classification — must run on local models or rules
CROSS_LINK_MAX_EXPAND=3 means we're only traversing 3 edges per result — typed filtering on 3 edges is essentially free
Decision criteria
| Metric |
Threshold to ship |
| Multi-hop accuracy improvement |
>5% on new CodeMemo category |
| Build time increase |
<20% |
| Query latency increase |
<10ms |
| Cold start impact |
<2s additional |
If we can't hit all four, the current untyped graph is good enough.
References
- r/AIMemory thread on knowledge graph expectations (2026-04-06)
- Current expansion:
core.py:1160-1321 (build), core.py:2068 (query-time expand)
chunk_links schema: (source_id TEXT, target_id TEXT, similarity REAL)
Context
We already have a similarity graph:
chunk_linksstores pre-computed cosine similarity edges between chunks across sessions (top-3 neighbors per chunk, min similarity 0.35). This is traversed at query time via_expand_cross_session()with a 0.7 discount factor.The current edges are untyped — they say "these are related" but not HOW (supports, contradicts, supersedes, extends, etc.). The question: does adding edge types improve retrieval quality enough to justify the cost?
What we already have (5 expansion mechanisms)
chunk_linkstable, cosine similarity, pre-computed at buildturn_indexlookupsource_sessions/source_turnsback-referencesclustertableProposed evaluation
Phase 1: Measure the gap (no code changes)
Add a "multi-hop" category to CodeMemo with 10-15 questions that require connecting information across sessions where the connection isn't obvious from keywords or embeddings alone. Examples:
Run against current system. If flat retrieval scores >85%, typed edges may not be worth the cost.
Phase 2: Prototype typed edges (if Phase 1 shows a gap)
Option A — Classify at build time: During
build_cross_session_links(), run a lightweight classifier on each (source, target) pair to assign a type. Could be rule-based (temporal ordering → supersedes, high similarity + different conclusion → contradicts) or a small model.Option B — Classify at query time: Keep untyped edges, but when expanding, use query intent to filter. A "what changed" query only follows edges where timestamps differ significantly. A "what supports" query only follows high-similarity same-conclusion edges.
Option C — Add
link_typecolumn tochunk_links: Enrichment model outputs edge type alongside knowledge nodes in the same pass. No extra LLM call — just an extra field in the extraction prompt.Speed constraints (local-first)
build_cross_session_links()time: needs benchmarkingCROSS_LINK_MAX_EXPAND=3means we're only traversing 3 edges per result — typed filtering on 3 edges is essentially freeDecision criteria
If we can't hit all four, the current untyped graph is good enough.
References
core.py:1160-1321(build),core.py:2068(query-time expand)chunk_linksschema:(source_id TEXT, target_id TEXT, similarity REAL)