Skip to content

feat(dsl): SYS EXPLAIN REMEMBER dry-runs the pipeline#151

Merged
KailasMahavarkar merged 1 commit intomainfrom
feat/explain-remember
Apr 20, 2026
Merged

feat(dsl): SYS EXPLAIN REMEMBER dry-runs the pipeline#151
KailasMahavarkar merged 1 commit intomainfrom
feat/explain-remember

Conversation

@KailasMahavarkar
Copy link
Copy Markdown
Contributor

Step 2 of retrieval-observability. `SYS EXPLAIN REMEMBER "q"` runs gather + fuse + temporal filter without materializing nodes, running the reranker, expanding nucleus, or mutating recall counts. Returns candidate slot ids with per-signal scores + the Step 1 telemetry block.

Example

```python
r = gs.execute('SYS EXPLAIN REMEMBER "European capitals" LIMIT 3')

r.kind # 'plan'
r.data

{

"verb": "REMEMBER",

"query": "European capitals",

"limit": 3,

"candidates": [

{"slot": 9, "id": "mem1", "fused_score": 0.42,

"vector_sim": 0.52, "bm25_score": 0.0, "recency_score": 1.0,

"graph_score": 0.0, "co_bonus": 0.0, "recall_boost": 0.0},

...

]

}

r.meta["signals"] # same shape as real REMEMBER: fusion / recency /
# stages / reranker / nucleus / sentence-query-expansion
```

Why

Tuning fusion weights or diagnosing "why did this rank where it did" required running real REMEMBER, which bumps `recall_count` on every candidate. EXPLAIN is side-effect free - safe to call repeatedly during interactive debugging.

Also the foundation for Step 3 (`ANSWER` verb), which will consume the same candidate plan before handing context to a reader LLM.

Implementation

  • `RememberQuery` handler gains internal `_plan_only=False` kwarg. When True: returns after fusion + temporal filter as `Result(kind="plan", data={candidates}, meta={signals})`. Skips materialization, rerank, nucleus, recall-count bumps, similarity buffer.
  • `SYS EXPLAIN` handler dispatches `RememberQuery` inners via `self._executor._remember(inner, _plan_only=True)`.
  • New `_executor` back-reference on `SystemExecutor`, wired from `store.py` at construction.
  • Empty-store and empty-gather branches also respect `_plan_only`.

Tests

  • `test_sys_explain_remember_returns_plan_without_side_effects` - asserts plan shape + no recall-count mutation
  • `test_sys_explain_remember_empty_store_returns_empty_plan` - degenerate case

Full suite

1758 passed, 101 skipped (+2 new), zero regressions.

Follow-up

Step 3: `ANSWER` verb. Retrieves + formats context + calls reader LLM via `litellm`. Returns `{answer, cited_slots, signals}`. Reuses the plan-only path internally to grab candidates before generation.

🤖 Generated with Claude Code

Step 2 of the retrieval-observability effort. `SYS EXPLAIN REMEMBER "q"`
runs the gather + fuse + temporal pipeline without materializing nodes,
running the reranker, expanding nucleus, or mutating recall counts.
Returns a plan listing candidate slot ids with per-signal scores, plus
the same `meta["signals"]` telemetry shipped in Step 1.

Usage:

    gs.execute('SYS EXPLAIN REMEMBER "European capitals" LIMIT 3')
    #  kind="plan"
    #  data.candidates = [
    #    {slot, id, fused_score, vector_sim, bm25_score, recency_score,
    #     graph_score, co_bonus, recall_boost},
    #    ...
    #  ]
    #  meta.signals = {fusion, recency, stages, reranker, nucleus, ...}

Implementation:

- RememberQuery handler gains an internal `_plan_only=False` kwarg. When
  True, returns after fusion + temporal filter with a `Result(kind="plan",
  data={candidates}, meta={signals})`. Skips: materialization, rerank,
  nucleus expansion, recall-count bumps, similarity buffer.
- SYS EXPLAIN handler (sys/queries.py) dispatches `RememberQuery` inners
  to `self._executor._remember(inner, _plan_only=True)` via a new
  `_executor` back-reference on `SystemExecutor`. Wired in store.py at
  construction.
- Empty-store and empty-gather branches also respect `_plan_only` and
  return plan-shaped results.

Why:

- Callers tuning fusion weights or debugging "why did this rank where it
  did" needed the signal breakdown without paying recall-count mutation
  cost on every inspection.
- Foundation for Step 3 (ANSWER verb) which will reuse the dry-run path
  to fetch candidates before handing them to a reader LLM.

Tests:
    test_sys_explain_remember_returns_plan_without_side_effects
    test_sys_explain_remember_empty_store_returns_empty_plan

Full suite: 1758 passed, 101 skipped (+2 new), zero regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@KailasMahavarkar KailasMahavarkar merged commit 77ba749 into main Apr 20, 2026
4 checks passed
@KailasMahavarkar KailasMahavarkar deleted the feat/explain-remember branch April 20, 2026 07:02
KailasMahavarkar added a commit that referenced this pull request Apr 20, 2026
We shipped three retrieval-observability features in #150/#151/#152 but
the skills, docs, and README said nothing about them. An LLM loading
graphstore-dsl today wouldn't know ANSWER exists; a human reading the
README wouldn't either. Fix:

graphstore-dsl SKILL:
  - ANSWER verb in the reads table + dedicated subsection explaining
    reader resolution + error capture semantics
  - Full per-node signal list (old doc predicted _graph_score / _recall
    before they shipped; now they do and we have _co_bonus, _recall_boost,
    _rank_stage, _fusion_score, _rerank_score)
  - meta["signals"] telemetry block documented
  - SYS EXPLAIN REMEMBER dry-run subsection
  - Added ANSWER + SYS EXPLAIN rows to query-generation pattern table

graphstore-builder SKILL:
  - q.answer(...) row in the reads table
  - Debugging section expanded: full signal list + meta block JSON +
    q.sys.explain(inner) dry-run example + q.answer() end-to-end
    example + named-reader A/B + reader resolution order

website/docs/dsl/reference.md:
  - ANSWER examples (bare + USING "reader")
  - New subsections on signal scores, SYS EXPLAIN REMEMBER dry-run, and
    ANSWER retrieval-augmented synthesis

website/docs/query-builder.md:
  - q.answer(...) row in reads table
  - Retrieval-pipeline observability section
  - Retrieval + reader synthesis section with named-reader A/B pattern

README.md:
  - REMEMBER section expanded: per-node scores + meta["signals"] +
    SYS EXPLAIN REMEMBER dry-run example
  - New ANSWER section showing reader wiring, cited_slots, named readers

Every remaining claim verified against code. Em dash sweep clean (Rule 9).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
KailasMahavarkar added a commit that referenced this pull request Apr 20, 2026
* feat(dsl): ANSWER verb - retrieval + pluggable reader LLM

Step 3 of the retrieval-observability effort. Full-loop: retrieve with
REMEMBER + synthesize with a user-configured reader callable.

Grammar:
    answer_q: "ANSWER" STRING at_clause? tokens_clause?
              limit_clause? where_clause? using_reader?
    using_reader: "USING" STRING

Usage:

    def my_reader(prompt: str, max_tokens: int = 1000) -> str: ...

    gs = GraphStore(reader=my_reader)
    gs.execute('ANSWER "What is the capital of France?" LIMIT 3')
    # Result(
    #   kind="answer",
    #   data={
    #     "answer": "Paris",
    #     "cited_slots": ["n0", "n1", "n2"],
    #     "candidates": [<full REMEMBER nodes>],
    #     "reader": None,
    #   },
    #   count=1,
    #   meta={"signals": <REMEMBER's full meta block>},
    # )

Named reader registry for A/B'ing reader LLMs:

    gs = GraphStore(readers={"fast": a, "careful": b})
    gs.execute('ANSWER "q" LIMIT 3 USING "careful"')
    q.answer("q", limit=3, using="fast")

Implementation:

- Grammar rule `answer_q` mirrors `remember_q` shape + adds optional
  `USING "reader-name"` suffix.
- New `AnswerQuery` dataclass in ast_nodes.py.
- Transformer wires `answer_q` + `using_reader` Lark rules.
- Handler `_answer` in intelligence.py:
    - Resolves reader: `USING name` looks up `self._readers[name]`;
      else falls back to `self._reader` (default); else to sole entry
      of `_readers` if exactly one; else raise GraphStoreError.
    - Builds equivalent RememberQuery with same limit / where / at /
      tokens; calls real `_remember` (bumps recall counts - intentional).
    - Formats retrieved passages as numbered context blocks with source
      ids. Empty retrieval still surfaces "(no retrieved context)" to
      reader so it can say "no information available".
    - Reader exception caught: returns Result with data["error"] and
      empty answer. Callers inspect without try/except on execute.
- Builder `q.answer(text, limit, tokens, at, where, using)` added to
  reads.py. Registered on `q` namespace. Parser-roundtrip-verified.
- GraphStore gains `reader` and `readers` kwargs. Validated callable.
  Held as live refs on the executor; not in the config layer
  (callables are not serialisable).

Zero LLM dependency in core. graphstore ships no HTTP client, no
litellm, no openai. Bring-your-own reader.

Tests (tests/test_answer.py):
- test_answer_end_to_end
- test_answer_without_reader_raises
- test_answer_picks_named_reader_via_using
- test_answer_unknown_named_reader_raises
- test_answer_reader_exception_surfaced_in_result
- test_answer_builder_roundtrip_matches_string_dsl
- test_answer_builder_compiles_full_surface
- test_answer_on_empty_store_still_calls_reader

Also: test_query_coverage.py EXPECTED_VERBS += "answer".

Full suite: 1766 passed, 101 skipped, zero regressions.

Next (Step 4): temporal anchor extraction at query time. Auto-add AT
clauses when the question has a date. Targets temporal F1 (weakest
LoCoMo category for us).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: surface-sync for REMEMBER telemetry + EXPLAIN + ANSWER (Steps 1-3)

We shipped three retrieval-observability features in #150/#151/#152 but
the skills, docs, and README said nothing about them. An LLM loading
graphstore-dsl today wouldn't know ANSWER exists; a human reading the
README wouldn't either. Fix:

graphstore-dsl SKILL:
  - ANSWER verb in the reads table + dedicated subsection explaining
    reader resolution + error capture semantics
  - Full per-node signal list (old doc predicted _graph_score / _recall
    before they shipped; now they do and we have _co_bonus, _recall_boost,
    _rank_stage, _fusion_score, _rerank_score)
  - meta["signals"] telemetry block documented
  - SYS EXPLAIN REMEMBER dry-run subsection
  - Added ANSWER + SYS EXPLAIN rows to query-generation pattern table

graphstore-builder SKILL:
  - q.answer(...) row in the reads table
  - Debugging section expanded: full signal list + meta block JSON +
    q.sys.explain(inner) dry-run example + q.answer() end-to-end
    example + named-reader A/B + reader resolution order

website/docs/dsl/reference.md:
  - ANSWER examples (bare + USING "reader")
  - New subsections on signal scores, SYS EXPLAIN REMEMBER dry-run, and
    ANSWER retrieval-augmented synthesis

website/docs/query-builder.md:
  - q.answer(...) row in reads table
  - Retrieval-pipeline observability section
  - Retrieval + reader synthesis section with named-reader A/B pattern

README.md:
  - REMEMBER section expanded: per-node scores + meta["signals"] +
    SYS EXPLAIN REMEMBER dry-run example
  - New ANSWER section showing reader wiring, cited_slots, named readers

Every remaining claim verified against code. Em dash sweep clean (Rule 9).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
KailasMahavarkar added a commit that referenced this pull request Apr 20, 2026
v0.4 ships retrieval observability triangle:
- REMEMBER signal telemetry + rich meta["signals"] (#150)
- SYS EXPLAIN REMEMBER dry-run (#151)
- ANSWER verb with pluggable reader LLM (#152)

Plus:
- Skills split: graphstore-dsl (runtime) + graphstore-builder (Python) (#148)
- Skill-guided LLM ingest adapter + LoCoMo wiring fix (#149)
- Docusaurus docs site @ graphstore-docs.orkait.com (#142-147)

Breaking changes: none. All additions are additive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
KailasMahavarkar added a commit that referenced this pull request Apr 20, 2026
* chore(release): bump v0.3.0 -> v0.4.0

v0.4 ships retrieval observability triangle:
- REMEMBER signal telemetry + rich meta["signals"] (#150)
- SYS EXPLAIN REMEMBER dry-run (#151)
- ANSWER verb with pluggable reader LLM (#152)

Plus:
- Skills split: graphstore-dsl (runtime) + graphstore-builder (Python) (#148)
- Skill-guided LLM ingest adapter + LoCoMo wiring fix (#149)
- Docusaurus docs site @ graphstore-docs.orkait.com (#142-147)

Breaking changes: none. All additions are additive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): bump pyproject.toml version to 0.4.0

Missed in 07c9986. Pairs with src/graphstore/__init__.py bump.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant