feat(remember): full signal telemetry in Result.meta + per-node scores#150
Merged
KailasMahavarkar merged 1 commit intomainfrom Apr 20, 2026
Merged
feat(remember): full signal telemetry in Result.meta + per-node scores#150KailasMahavarkar merged 1 commit intomainfrom
KailasMahavarkar merged 1 commit intomainfrom
Conversation
Step 1 of the retrieval-observability effort. Make REMEMBER self-describing
so callers can see why a result looks the way it does without reading
handler source.
Per-node scores on every returned node (additive, no removals):
_remember_score (fused final, or reranker score when rerank ran)
_vector_sim (max sentence cosine)
_bm25_score (normalised FTS5 score)
_recency_score (exp(-age/half_life))
_graph_score NEW: normalised entity-degree contribution
_co_bonus NEW: co-occurrence bonus when found by vec AND bm25
_recall_boost NEW: log1p(__recall_count__) * 0.05 nudge
_rank_stage NEW: "fusion" or "rerank"
_fusion_score NEW: pre-rerank base score, preserved when rerank ran
_rerank_score NEW: raw reranker score, when rerank ran
Rich Result.meta["signals"] block (always populated, zero-break):
fusion: {method, weights, graph_signal_enabled}
recency: {half_life_days}
sentence_query_expansion: {enabled, num_sentences}
stages: {gathered_vec, gathered_bm25, union, cap_applied,
after_cap, before_rerank, final}
reranker: {ran, model, error}
nucleus: {enabled}
This is the instrumentation foundation for Step 2 (EXPLAIN REMEMBER) and
Step 3 (ANSWER verb). Both need per-candidate scores + pipeline-stage
counts to be useful.
Also updates the existing test_remember_includes_score_breakdown to
assert the full breakdown (prior comment "removed in pipeline refactor"
was stale - signals exist again under their original names).
Two new tests:
test_remember_meta_signals_telemetry full meta block shape
test_remember_graph_signal_reflected_in_meta config drives meta
Full suite: 1756 passed, 101 skipped, zero regressions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
KailasMahavarkar
added a commit
that referenced
this pull request
Apr 20, 2026
We shipped three retrieval-observability features in #150/#151/#152 but the skills, docs, and README said nothing about them. An LLM loading graphstore-dsl today wouldn't know ANSWER exists; a human reading the README wouldn't either. Fix: graphstore-dsl SKILL: - ANSWER verb in the reads table + dedicated subsection explaining reader resolution + error capture semantics - Full per-node signal list (old doc predicted _graph_score / _recall before they shipped; now they do and we have _co_bonus, _recall_boost, _rank_stage, _fusion_score, _rerank_score) - meta["signals"] telemetry block documented - SYS EXPLAIN REMEMBER dry-run subsection - Added ANSWER + SYS EXPLAIN rows to query-generation pattern table graphstore-builder SKILL: - q.answer(...) row in the reads table - Debugging section expanded: full signal list + meta block JSON + q.sys.explain(inner) dry-run example + q.answer() end-to-end example + named-reader A/B + reader resolution order website/docs/dsl/reference.md: - ANSWER examples (bare + USING "reader") - New subsections on signal scores, SYS EXPLAIN REMEMBER dry-run, and ANSWER retrieval-augmented synthesis website/docs/query-builder.md: - q.answer(...) row in reads table - Retrieval-pipeline observability section - Retrieval + reader synthesis section with named-reader A/B pattern README.md: - REMEMBER section expanded: per-node scores + meta["signals"] + SYS EXPLAIN REMEMBER dry-run example - New ANSWER section showing reader wiring, cited_slots, named readers Every remaining claim verified against code. Em dash sweep clean (Rule 9). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
KailasMahavarkar
added a commit
that referenced
this pull request
Apr 20, 2026
* feat(dsl): ANSWER verb - retrieval + pluggable reader LLM
Step 3 of the retrieval-observability effort. Full-loop: retrieve with
REMEMBER + synthesize with a user-configured reader callable.
Grammar:
answer_q: "ANSWER" STRING at_clause? tokens_clause?
limit_clause? where_clause? using_reader?
using_reader: "USING" STRING
Usage:
def my_reader(prompt: str, max_tokens: int = 1000) -> str: ...
gs = GraphStore(reader=my_reader)
gs.execute('ANSWER "What is the capital of France?" LIMIT 3')
# Result(
# kind="answer",
# data={
# "answer": "Paris",
# "cited_slots": ["n0", "n1", "n2"],
# "candidates": [<full REMEMBER nodes>],
# "reader": None,
# },
# count=1,
# meta={"signals": <REMEMBER's full meta block>},
# )
Named reader registry for A/B'ing reader LLMs:
gs = GraphStore(readers={"fast": a, "careful": b})
gs.execute('ANSWER "q" LIMIT 3 USING "careful"')
q.answer("q", limit=3, using="fast")
Implementation:
- Grammar rule `answer_q` mirrors `remember_q` shape + adds optional
`USING "reader-name"` suffix.
- New `AnswerQuery` dataclass in ast_nodes.py.
- Transformer wires `answer_q` + `using_reader` Lark rules.
- Handler `_answer` in intelligence.py:
- Resolves reader: `USING name` looks up `self._readers[name]`;
else falls back to `self._reader` (default); else to sole entry
of `_readers` if exactly one; else raise GraphStoreError.
- Builds equivalent RememberQuery with same limit / where / at /
tokens; calls real `_remember` (bumps recall counts - intentional).
- Formats retrieved passages as numbered context blocks with source
ids. Empty retrieval still surfaces "(no retrieved context)" to
reader so it can say "no information available".
- Reader exception caught: returns Result with data["error"] and
empty answer. Callers inspect without try/except on execute.
- Builder `q.answer(text, limit, tokens, at, where, using)` added to
reads.py. Registered on `q` namespace. Parser-roundtrip-verified.
- GraphStore gains `reader` and `readers` kwargs. Validated callable.
Held as live refs on the executor; not in the config layer
(callables are not serialisable).
Zero LLM dependency in core. graphstore ships no HTTP client, no
litellm, no openai. Bring-your-own reader.
Tests (tests/test_answer.py):
- test_answer_end_to_end
- test_answer_without_reader_raises
- test_answer_picks_named_reader_via_using
- test_answer_unknown_named_reader_raises
- test_answer_reader_exception_surfaced_in_result
- test_answer_builder_roundtrip_matches_string_dsl
- test_answer_builder_compiles_full_surface
- test_answer_on_empty_store_still_calls_reader
Also: test_query_coverage.py EXPECTED_VERBS += "answer".
Full suite: 1766 passed, 101 skipped, zero regressions.
Next (Step 4): temporal anchor extraction at query time. Auto-add AT
clauses when the question has a date. Targets temporal F1 (weakest
LoCoMo category for us).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: surface-sync for REMEMBER telemetry + EXPLAIN + ANSWER (Steps 1-3)
We shipped three retrieval-observability features in #150/#151/#152 but
the skills, docs, and README said nothing about them. An LLM loading
graphstore-dsl today wouldn't know ANSWER exists; a human reading the
README wouldn't either. Fix:
graphstore-dsl SKILL:
- ANSWER verb in the reads table + dedicated subsection explaining
reader resolution + error capture semantics
- Full per-node signal list (old doc predicted _graph_score / _recall
before they shipped; now they do and we have _co_bonus, _recall_boost,
_rank_stage, _fusion_score, _rerank_score)
- meta["signals"] telemetry block documented
- SYS EXPLAIN REMEMBER dry-run subsection
- Added ANSWER + SYS EXPLAIN rows to query-generation pattern table
graphstore-builder SKILL:
- q.answer(...) row in the reads table
- Debugging section expanded: full signal list + meta block JSON +
q.sys.explain(inner) dry-run example + q.answer() end-to-end
example + named-reader A/B + reader resolution order
website/docs/dsl/reference.md:
- ANSWER examples (bare + USING "reader")
- New subsections on signal scores, SYS EXPLAIN REMEMBER dry-run, and
ANSWER retrieval-augmented synthesis
website/docs/query-builder.md:
- q.answer(...) row in reads table
- Retrieval-pipeline observability section
- Retrieval + reader synthesis section with named-reader A/B pattern
README.md:
- REMEMBER section expanded: per-node scores + meta["signals"] +
SYS EXPLAIN REMEMBER dry-run example
- New ANSWER section showing reader wiring, cited_slots, named readers
Every remaining claim verified against code. Em dash sweep clean (Rule 9).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
KailasMahavarkar
added a commit
that referenced
this pull request
Apr 20, 2026
v0.4 ships retrieval observability triangle: - REMEMBER signal telemetry + rich meta["signals"] (#150) - SYS EXPLAIN REMEMBER dry-run (#151) - ANSWER verb with pluggable reader LLM (#152) Plus: - Skills split: graphstore-dsl (runtime) + graphstore-builder (Python) (#148) - Skill-guided LLM ingest adapter + LoCoMo wiring fix (#149) - Docusaurus docs site @ graphstore-docs.orkait.com (#142-147) Breaking changes: none. All additions are additive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merged
KailasMahavarkar
added a commit
that referenced
this pull request
Apr 20, 2026
* chore(release): bump v0.3.0 -> v0.4.0 v0.4 ships retrieval observability triangle: - REMEMBER signal telemetry + rich meta["signals"] (#150) - SYS EXPLAIN REMEMBER dry-run (#151) - ANSWER verb with pluggable reader LLM (#152) Plus: - Skills split: graphstore-dsl (runtime) + graphstore-builder (Python) (#148) - Skill-guided LLM ingest adapter + LoCoMo wiring fix (#149) - Docusaurus docs site @ graphstore-docs.orkait.com (#142-147) Breaking changes: none. All additions are additive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): bump pyproject.toml version to 0.4.0 Missed in 07c9986. Pairs with src/graphstore/__init__.py bump. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Step 1 of the retrieval-observability effort. Makes `REMEMBER` self-describing so callers can see why a result looks the way it does without reading handler source. Foundation for Step 2 (`EXPLAIN REMEMBER`) and Step 3 (`ANSWER` verb).
Per-node scores (additive, no removals)
Every returned node carries:
Rich `Result.meta["signals"]` (always populated)
```python
{
"fusion": {"method": "weighted", "weights": [...], "graph_signal_enabled": True},
"recency": {"half_life_days": 7300.0},
"sentence_query_expansion": {"enabled": True, "num_sentences": 1},
"stages": {
"gathered_vec": int, "gathered_bm25": int,
"union": int, "cap_applied": bool,
"after_cap": int, "before_rerank": int,
"final": int,
},
"reranker": {"ran": bool, "model": str|None, "error": str|None},
"nucleus": {"enabled": bool},
}
```
Why this matters
We burned a session debugging "why does REMEMBER return 'Melanie' as content". Answer lived 200 lines deep in handler code. With this telemetry, the story is: look at `meta["signals"]["stages"]` + the `_*` fields. Done.
Tests
Full suite
```
1756 passed, 101 skipped
```
Zero regressions.
Next
Step 2: extend `SYS EXPLAIN REMEMBER "q"` to dry-run and emit the same telemetry without side effects. Step 3: add `ANSWER` verb that consumes the meta for attribution.
🤖 Generated with Claude Code