feat(dsl): SYS EXPLAIN REMEMBER dry-runs the pipeline#151
Merged
KailasMahavarkar merged 1 commit intomainfrom Apr 20, 2026
Merged
feat(dsl): SYS EXPLAIN REMEMBER dry-runs the pipeline#151KailasMahavarkar merged 1 commit intomainfrom
KailasMahavarkar merged 1 commit intomainfrom
Conversation
Step 2 of the retrieval-observability effort. `SYS EXPLAIN REMEMBER "q"`
runs the gather + fuse + temporal pipeline without materializing nodes,
running the reranker, expanding nucleus, or mutating recall counts.
Returns a plan listing candidate slot ids with per-signal scores, plus
the same `meta["signals"]` telemetry shipped in Step 1.
Usage:
gs.execute('SYS EXPLAIN REMEMBER "European capitals" LIMIT 3')
# kind="plan"
# data.candidates = [
# {slot, id, fused_score, vector_sim, bm25_score, recency_score,
# graph_score, co_bonus, recall_boost},
# ...
# ]
# meta.signals = {fusion, recency, stages, reranker, nucleus, ...}
Implementation:
- RememberQuery handler gains an internal `_plan_only=False` kwarg. When
True, returns after fusion + temporal filter with a `Result(kind="plan",
data={candidates}, meta={signals})`. Skips: materialization, rerank,
nucleus expansion, recall-count bumps, similarity buffer.
- SYS EXPLAIN handler (sys/queries.py) dispatches `RememberQuery` inners
to `self._executor._remember(inner, _plan_only=True)` via a new
`_executor` back-reference on `SystemExecutor`. Wired in store.py at
construction.
- Empty-store and empty-gather branches also respect `_plan_only` and
return plan-shaped results.
Why:
- Callers tuning fusion weights or debugging "why did this rank where it
did" needed the signal breakdown without paying recall-count mutation
cost on every inspection.
- Foundation for Step 3 (ANSWER verb) which will reuse the dry-run path
to fetch candidates before handing them to a reader LLM.
Tests:
test_sys_explain_remember_returns_plan_without_side_effects
test_sys_explain_remember_empty_store_returns_empty_plan
Full suite: 1758 passed, 101 skipped (+2 new), zero regressions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
KailasMahavarkar
added a commit
that referenced
this pull request
Apr 20, 2026
We shipped three retrieval-observability features in #150/#151/#152 but the skills, docs, and README said nothing about them. An LLM loading graphstore-dsl today wouldn't know ANSWER exists; a human reading the README wouldn't either. Fix: graphstore-dsl SKILL: - ANSWER verb in the reads table + dedicated subsection explaining reader resolution + error capture semantics - Full per-node signal list (old doc predicted _graph_score / _recall before they shipped; now they do and we have _co_bonus, _recall_boost, _rank_stage, _fusion_score, _rerank_score) - meta["signals"] telemetry block documented - SYS EXPLAIN REMEMBER dry-run subsection - Added ANSWER + SYS EXPLAIN rows to query-generation pattern table graphstore-builder SKILL: - q.answer(...) row in the reads table - Debugging section expanded: full signal list + meta block JSON + q.sys.explain(inner) dry-run example + q.answer() end-to-end example + named-reader A/B + reader resolution order website/docs/dsl/reference.md: - ANSWER examples (bare + USING "reader") - New subsections on signal scores, SYS EXPLAIN REMEMBER dry-run, and ANSWER retrieval-augmented synthesis website/docs/query-builder.md: - q.answer(...) row in reads table - Retrieval-pipeline observability section - Retrieval + reader synthesis section with named-reader A/B pattern README.md: - REMEMBER section expanded: per-node scores + meta["signals"] + SYS EXPLAIN REMEMBER dry-run example - New ANSWER section showing reader wiring, cited_slots, named readers Every remaining claim verified against code. Em dash sweep clean (Rule 9). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
KailasMahavarkar
added a commit
that referenced
this pull request
Apr 20, 2026
* feat(dsl): ANSWER verb - retrieval + pluggable reader LLM
Step 3 of the retrieval-observability effort. Full-loop: retrieve with
REMEMBER + synthesize with a user-configured reader callable.
Grammar:
answer_q: "ANSWER" STRING at_clause? tokens_clause?
limit_clause? where_clause? using_reader?
using_reader: "USING" STRING
Usage:
def my_reader(prompt: str, max_tokens: int = 1000) -> str: ...
gs = GraphStore(reader=my_reader)
gs.execute('ANSWER "What is the capital of France?" LIMIT 3')
# Result(
# kind="answer",
# data={
# "answer": "Paris",
# "cited_slots": ["n0", "n1", "n2"],
# "candidates": [<full REMEMBER nodes>],
# "reader": None,
# },
# count=1,
# meta={"signals": <REMEMBER's full meta block>},
# )
Named reader registry for A/B'ing reader LLMs:
gs = GraphStore(readers={"fast": a, "careful": b})
gs.execute('ANSWER "q" LIMIT 3 USING "careful"')
q.answer("q", limit=3, using="fast")
Implementation:
- Grammar rule `answer_q` mirrors `remember_q` shape + adds optional
`USING "reader-name"` suffix.
- New `AnswerQuery` dataclass in ast_nodes.py.
- Transformer wires `answer_q` + `using_reader` Lark rules.
- Handler `_answer` in intelligence.py:
- Resolves reader: `USING name` looks up `self._readers[name]`;
else falls back to `self._reader` (default); else to sole entry
of `_readers` if exactly one; else raise GraphStoreError.
- Builds equivalent RememberQuery with same limit / where / at /
tokens; calls real `_remember` (bumps recall counts - intentional).
- Formats retrieved passages as numbered context blocks with source
ids. Empty retrieval still surfaces "(no retrieved context)" to
reader so it can say "no information available".
- Reader exception caught: returns Result with data["error"] and
empty answer. Callers inspect without try/except on execute.
- Builder `q.answer(text, limit, tokens, at, where, using)` added to
reads.py. Registered on `q` namespace. Parser-roundtrip-verified.
- GraphStore gains `reader` and `readers` kwargs. Validated callable.
Held as live refs on the executor; not in the config layer
(callables are not serialisable).
Zero LLM dependency in core. graphstore ships no HTTP client, no
litellm, no openai. Bring-your-own reader.
Tests (tests/test_answer.py):
- test_answer_end_to_end
- test_answer_without_reader_raises
- test_answer_picks_named_reader_via_using
- test_answer_unknown_named_reader_raises
- test_answer_reader_exception_surfaced_in_result
- test_answer_builder_roundtrip_matches_string_dsl
- test_answer_builder_compiles_full_surface
- test_answer_on_empty_store_still_calls_reader
Also: test_query_coverage.py EXPECTED_VERBS += "answer".
Full suite: 1766 passed, 101 skipped, zero regressions.
Next (Step 4): temporal anchor extraction at query time. Auto-add AT
clauses when the question has a date. Targets temporal F1 (weakest
LoCoMo category for us).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: surface-sync for REMEMBER telemetry + EXPLAIN + ANSWER (Steps 1-3)
We shipped three retrieval-observability features in #150/#151/#152 but
the skills, docs, and README said nothing about them. An LLM loading
graphstore-dsl today wouldn't know ANSWER exists; a human reading the
README wouldn't either. Fix:
graphstore-dsl SKILL:
- ANSWER verb in the reads table + dedicated subsection explaining
reader resolution + error capture semantics
- Full per-node signal list (old doc predicted _graph_score / _recall
before they shipped; now they do and we have _co_bonus, _recall_boost,
_rank_stage, _fusion_score, _rerank_score)
- meta["signals"] telemetry block documented
- SYS EXPLAIN REMEMBER dry-run subsection
- Added ANSWER + SYS EXPLAIN rows to query-generation pattern table
graphstore-builder SKILL:
- q.answer(...) row in the reads table
- Debugging section expanded: full signal list + meta block JSON +
q.sys.explain(inner) dry-run example + q.answer() end-to-end
example + named-reader A/B + reader resolution order
website/docs/dsl/reference.md:
- ANSWER examples (bare + USING "reader")
- New subsections on signal scores, SYS EXPLAIN REMEMBER dry-run, and
ANSWER retrieval-augmented synthesis
website/docs/query-builder.md:
- q.answer(...) row in reads table
- Retrieval-pipeline observability section
- Retrieval + reader synthesis section with named-reader A/B pattern
README.md:
- REMEMBER section expanded: per-node scores + meta["signals"] +
SYS EXPLAIN REMEMBER dry-run example
- New ANSWER section showing reader wiring, cited_slots, named readers
Every remaining claim verified against code. Em dash sweep clean (Rule 9).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
KailasMahavarkar
added a commit
that referenced
this pull request
Apr 20, 2026
v0.4 ships retrieval observability triangle: - REMEMBER signal telemetry + rich meta["signals"] (#150) - SYS EXPLAIN REMEMBER dry-run (#151) - ANSWER verb with pluggable reader LLM (#152) Plus: - Skills split: graphstore-dsl (runtime) + graphstore-builder (Python) (#148) - Skill-guided LLM ingest adapter + LoCoMo wiring fix (#149) - Docusaurus docs site @ graphstore-docs.orkait.com (#142-147) Breaking changes: none. All additions are additive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merged
KailasMahavarkar
added a commit
that referenced
this pull request
Apr 20, 2026
* chore(release): bump v0.3.0 -> v0.4.0 v0.4 ships retrieval observability triangle: - REMEMBER signal telemetry + rich meta["signals"] (#150) - SYS EXPLAIN REMEMBER dry-run (#151) - ANSWER verb with pluggable reader LLM (#152) Plus: - Skills split: graphstore-dsl (runtime) + graphstore-builder (Python) (#148) - Skill-guided LLM ingest adapter + LoCoMo wiring fix (#149) - Docusaurus docs site @ graphstore-docs.orkait.com (#142-147) Breaking changes: none. All additions are additive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): bump pyproject.toml version to 0.4.0 Missed in 07c9986. Pairs with src/graphstore/__init__.py bump. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Step 2 of retrieval-observability. `SYS EXPLAIN REMEMBER "q"` runs gather + fuse + temporal filter without materializing nodes, running the reranker, expanding nucleus, or mutating recall counts. Returns candidate slot ids with per-signal scores + the Step 1 telemetry block.
Example
```python
r = gs.execute('SYS EXPLAIN REMEMBER "European capitals" LIMIT 3')
r.kind # 'plan'
r.data
{
"verb": "REMEMBER",
"query": "European capitals",
"limit": 3,
"candidates": [
{"slot": 9, "id": "mem1", "fused_score": 0.42,
"vector_sim": 0.52, "bm25_score": 0.0, "recency_score": 1.0,
"graph_score": 0.0, "co_bonus": 0.0, "recall_boost": 0.0},
...
]
}
r.meta["signals"] # same shape as real REMEMBER: fusion / recency /
# stages / reranker / nucleus / sentence-query-expansion
```
Why
Tuning fusion weights or diagnosing "why did this rank where it did" required running real REMEMBER, which bumps `recall_count` on every candidate. EXPLAIN is side-effect free - safe to call repeatedly during interactive debugging.
Also the foundation for Step 3 (`ANSWER` verb), which will consume the same candidate plan before handing context to a reader LLM.
Implementation
Tests
Full suite
1758 passed, 101 skipped (+2 new), zero regressions.
Follow-up
Step 3: `ANSWER` verb. Retrieves + formats context + calls reader LLM via `litellm`. Returns `{answer, cited_slots, signals}`. Reuses the plan-only path internally to grab candidates before generation.
🤖 Generated with Claude Code