ADR: one linked-artifact substrate, three operators (traverse / expand / purpose-index) #43
thorwhalen
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Decision record: one linked-artifact substrate, three operators
Date: 2026-06-12 · Status: accepted (maintainer-confirmed) · Scope: ir + raglab · Refs: epic #38; research reports 12 (Retrieval over Linked Structures), 13 (Retrieval-Time Context Expansion), 14 (Purpose-Centric Memory) in the semantic_search series.
Context
Three capabilities are being added to the stack: (1) retrieval over a linked structure of derived artifacts (synopsis→chunks routing, cross-references, possibly cyclic graphs) with pluggable query-time traversal; (2) retrieval-time context expansion (
expand(hit) -> passageover NEXT/PREV/PARENT/CHILD segment relationships); (3) a purpose-centric memory overlay persisting agent extractions across runs. A coverage pass found: cap 1 partial-weak (the surface→artifact hop exists; no edges, no traversal), cap 2 partial (chunk adjacency metadata exists; no operator, "bare chunk or whole doc" only), cap 3 missing (strong precedents: the calibration store view, eval cases, XDG layout).Decisions
Unifying frame: all three are operators over one linked-artifact substrate —
traverse(multi-node walk),expand(single-hit neighborhood; the degenerate traverse), and purpose-indexing (a query dimension on node identity). We build a shared typed-edge "links" view onCorpusStore(the calibration-view growth pattern) with node identity(source, artifact_id)— not three unrelated subsystems.Boundary revision to epic Epic: evolve ir toward the ir_09 Composable Search Agent — layering decision + role map #38: the role map assigned "graph retrievers → new layer". Revised: the traversal primitive (
traverse(query, store, policy)with operator-enforced safety: visited-set, depth cap, node budget) lands in ir — it improves single-shot agent-free search, passing Epic: evolve ir toward the ir_09 Composable Search Agent — layering decision + role map #38's own decision rule, exactly likefuse_hits. What stays in the agent layer: when to traverse, LLM-in-the-loop routing/decomposition, and any policy that loops on model calls. Report 12's frame: all five traversal families (recursive retrieval, routing, RAPTOR collapsed-tree, PPR, beam walks) are one operator parameterized by an injectedWalkPolicyProtocol; "safety primitives live in WalkState and are enforced by traverse itself, so a buggy policy cannot cause an infinite loop."Flat top-k + rerank stays the default. Report 12's evidence: a strong flat retriever beats most graph methods on simple lookup (e.g. VanillaRAG 60.8 vs GraphRAG-local 45.5 on PopQA; GraphRAG-global ≈57× time / ≈210× tokens per query); only the hardest multi-hop and global/sense-making queries justify traversal (HippoRAG 2 +13.9 recall on 2Wiki). Every traversal policy must beat flat+rerank on our own eval before promotion. First policy: pure-vector summary-routing/collapsed-tree (no LLM in the query loop); PPR later; LLM-guided walks cost-gated.
Vocabulary: "links" ≠ ef's artifact graph.
ef.artifact_graphis a content-addressed derivation/lineage DAG (build-time producer graph; the documented heavy upgrade for incremental indexing). The new substrate is the semantic link graph between corpus artifacts (NEXT/PREV/PARENT/CHILD/REF/CITES), may be cyclic, and is traversed at query time. We keep content-addressed id conventions compatible so a later unification stays open, but we do not adoptef.artifact_graphnow.Expansion is an ir hit-operation beside
fuse_hits/best_per_artifact(report 13'sexpand(hit, corpus) -> Passagewith injectableNeighborhoodPolicy), composing retrieve → expand → rerank and extending the existing disclosure seam rather than creating a parallel mechanism.Purpose memory lives in raglab (
PurposeStore, report 14): raglab is the only layer that sees purpose (goals, refinements, judgements), and the design notes ban memory classes from ef/vd. dol-backedMutableMappingfacade; write/read/consolidate/decay as injected strategies; extraction provenance points at(source, artifact_id)units. ir stays signal-only. Its write-record schema is co-designed with the agent run-log (the budget-governor milestone) so per-run observability and cross-run memory share one record shape.vd stays graph-free. No relationship modeling in the Collection protocol (design-notes principle: facade, not framework; the protocol must stay minimal for the vd-js mirror). Link metadata must merely survive vd round-trips (provenance principle).
Sequencing: expansion (cheapest, zero deps, immediately improves what the agent reads) → links view + traverse v1 → budget governor + run-log (raglab) → PurposeStore v0.
Rejected alternatives
ir.discover/qh single-shot users; duplicates store access.Beta Was this translation helpful? Give feedback.
All reactions