feat(remember): full signal telemetry in Result.meta + per-node scores by KailasMahavarkar · Pull Request #150 · orkait/graphstore

KailasMahavarkar · 2026-04-20T06:55:20Z

Step 1 of the retrieval-observability effort. Makes `REMEMBER` self-describing so callers can see why a result looks the way it does without reading handler source. Foundation for Step 2 (`EXPLAIN REMEMBER`) and Step 3 (`ANSWER` verb).

Per-node scores (additive, no removals)

Every returned node carries:

Field	Meaning
`_remember_score`	fused final (or reranker score when rerank ran)
`_vector_sim`	max sentence cosine
`_bm25_score`	normalised FTS5 score
`_recency_score`	`exp(-age/half_life)`
`_graph_score`	new: normalised entity-degree contribution
`_co_bonus`	new: bonus when found by both vec and bm25
`_recall_boost`	new: `log1p(recall_count) * 0.05`
`_rank_stage`	new: `"fusion"` or `"rerank"`
`_fusion_score`	new: pre-rerank base, preserved when rerank ran
`_rerank_score`	new: raw reranker score, when rerank ran

Rich `Result.meta["signals"]` (always populated)

```python
{
"fusion": {"method": "weighted", "weights": [...], "graph_signal_enabled": True},
"recency": {"half_life_days": 7300.0},
"sentence_query_expansion": {"enabled": True, "num_sentences": 1},
"stages": {
"gathered_vec": int, "gathered_bm25": int,
"union": int, "cap_applied": bool,
"after_cap": int, "before_rerank": int,
"final": int,
},
"reranker": {"ran": bool, "model": str|None, "error": str|None},
"nucleus": {"enabled": bool},
}
```

Why this matters

We burned a session debugging "why does REMEMBER return 'Melanie' as content". Answer lived 200 lines deep in handler code. With this telemetry, the story is: look at `meta["signals"]["stages"]` + the `_*` fields. Done.

Tests

`test_remember_includes_score_breakdown` updated to assert the full surface (prior comment claimed scores were removed in pipeline refactor - stale, they're back)
`test_remember_meta_signals_telemetry` new - full meta shape + key presence
`test_remember_graph_signal_reflected_in_meta` new - config drives meta

Full suite

```
1756 passed, 101 skipped
```

Zero regressions.

Step 1 of the retrieval-observability effort. Make REMEMBER self-describing so callers can see why a result looks the way it does without reading handler source. Per-node scores on every returned node (additive, no removals): _remember_score (fused final, or reranker score when rerank ran) _vector_sim (max sentence cosine) _bm25_score (normalised FTS5 score) _recency_score (exp(-age/half_life)) _graph_score NEW: normalised entity-degree contribution _co_bonus NEW: co-occurrence bonus when found by vec AND bm25 _recall_boost NEW: log1p(__recall_count__) * 0.05 nudge _rank_stage NEW: "fusion" or "rerank" _fusion_score NEW: pre-rerank base score, preserved when rerank ran _rerank_score NEW: raw reranker score, when rerank ran Rich Result.meta["signals"] block (always populated, zero-break): fusion: {method, weights, graph_signal_enabled} recency: {half_life_days} sentence_query_expansion: {enabled, num_sentences} stages: {gathered_vec, gathered_bm25, union, cap_applied, after_cap, before_rerank, final} reranker: {ran, model, error} nucleus: {enabled} This is the instrumentation foundation for Step 2 (EXPLAIN REMEMBER) and Step 3 (ANSWER verb). Both need per-candidate scores + pipeline-stage counts to be useful. Also updates the existing test_remember_includes_score_breakdown to assert the full breakdown (prior comment "removed in pipeline refactor" was stale - signals exist again under their original names). Two new tests: test_remember_meta_signals_telemetry full meta block shape test_remember_graph_signal_reflected_in_meta config drives meta Full suite: 1756 passed, 101 skipped, zero regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

We shipped three retrieval-observability features in #150/#151/#152 but the skills, docs, and README said nothing about them. An LLM loading graphstore-dsl today wouldn't know ANSWER exists; a human reading the README wouldn't either. Fix: graphstore-dsl SKILL: - ANSWER verb in the reads table + dedicated subsection explaining reader resolution + error capture semantics - Full per-node signal list (old doc predicted _graph_score / _recall before they shipped; now they do and we have _co_bonus, _recall_boost, _rank_stage, _fusion_score, _rerank_score) - meta["signals"] telemetry block documented - SYS EXPLAIN REMEMBER dry-run subsection - Added ANSWER + SYS EXPLAIN rows to query-generation pattern table graphstore-builder SKILL: - q.answer(...) row in the reads table - Debugging section expanded: full signal list + meta block JSON + q.sys.explain(inner) dry-run example + q.answer() end-to-end example + named-reader A/B + reader resolution order website/docs/dsl/reference.md: - ANSWER examples (bare + USING "reader") - New subsections on signal scores, SYS EXPLAIN REMEMBER dry-run, and ANSWER retrieval-augmented synthesis website/docs/query-builder.md: - q.answer(...) row in reads table - Retrieval-pipeline observability section - Retrieval + reader synthesis section with named-reader A/B pattern README.md: - REMEMBER section expanded: per-node scores + meta["signals"] + SYS EXPLAIN REMEMBER dry-run example - New ANSWER section showing reader wiring, cited_slots, named readers Every remaining claim verified against code. Em dash sweep clean (Rule 9). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(dsl): ANSWER verb - retrieval + pluggable reader LLM Step 3 of the retrieval-observability effort. Full-loop: retrieve with REMEMBER + synthesize with a user-configured reader callable. Grammar: answer_q: "ANSWER" STRING at_clause? tokens_clause? limit_clause? where_clause? using_reader? using_reader: "USING" STRING Usage: def my_reader(prompt: str, max_tokens: int = 1000) -> str: ... gs = GraphStore(reader=my_reader) gs.execute('ANSWER "What is the capital of France?" LIMIT 3') # Result( # kind="answer", # data={ # "answer": "Paris", # "cited_slots": ["n0", "n1", "n2"], # "candidates": [<full REMEMBER nodes>], # "reader": None, # }, # count=1, # meta={"signals": <REMEMBER's full meta block>}, # ) Named reader registry for A/B'ing reader LLMs: gs = GraphStore(readers={"fast": a, "careful": b}) gs.execute('ANSWER "q" LIMIT 3 USING "careful"') q.answer("q", limit=3, using="fast") Implementation: - Grammar rule `answer_q` mirrors `remember_q` shape + adds optional `USING "reader-name"` suffix. - New `AnswerQuery` dataclass in ast_nodes.py. - Transformer wires `answer_q` + `using_reader` Lark rules. - Handler `_answer` in intelligence.py: - Resolves reader: `USING name` looks up `self._readers[name]`; else falls back to `self._reader` (default); else to sole entry of `_readers` if exactly one; else raise GraphStoreError. - Builds equivalent RememberQuery with same limit / where / at / tokens; calls real `_remember` (bumps recall counts - intentional). - Formats retrieved passages as numbered context blocks with source ids. Empty retrieval still surfaces "(no retrieved context)" to reader so it can say "no information available". - Reader exception caught: returns Result with data["error"] and empty answer. Callers inspect without try/except on execute. - Builder `q.answer(text, limit, tokens, at, where, using)` added to reads.py. Registered on `q` namespace. Parser-roundtrip-verified. - GraphStore gains `reader` and `readers` kwargs. Validated callable. Held as live refs on the executor; not in the config layer (callables are not serialisable). Zero LLM dependency in core. graphstore ships no HTTP client, no litellm, no openai. Bring-your-own reader. Tests (tests/test_answer.py): - test_answer_end_to_end - test_answer_without_reader_raises - test_answer_picks_named_reader_via_using - test_answer_unknown_named_reader_raises - test_answer_reader_exception_surfaced_in_result - test_answer_builder_roundtrip_matches_string_dsl - test_answer_builder_compiles_full_surface - test_answer_on_empty_store_still_calls_reader Also: test_query_coverage.py EXPECTED_VERBS += "answer". Full suite: 1766 passed, 101 skipped, zero regressions. Next (Step 4): temporal anchor extraction at query time. Auto-add AT clauses when the question has a date. Targets temporal F1 (weakest LoCoMo category for us). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: surface-sync for REMEMBER telemetry + EXPLAIN + ANSWER (Steps 1-3) We shipped three retrieval-observability features in #150/#151/#152 but the skills, docs, and README said nothing about them. An LLM loading graphstore-dsl today wouldn't know ANSWER exists; a human reading the README wouldn't either. Fix: graphstore-dsl SKILL: - ANSWER verb in the reads table + dedicated subsection explaining reader resolution + error capture semantics - Full per-node signal list (old doc predicted _graph_score / _recall before they shipped; now they do and we have _co_bonus, _recall_boost, _rank_stage, _fusion_score, _rerank_score) - meta["signals"] telemetry block documented - SYS EXPLAIN REMEMBER dry-run subsection - Added ANSWER + SYS EXPLAIN rows to query-generation pattern table graphstore-builder SKILL: - q.answer(...) row in the reads table - Debugging section expanded: full signal list + meta block JSON + q.sys.explain(inner) dry-run example + q.answer() end-to-end example + named-reader A/B + reader resolution order website/docs/dsl/reference.md: - ANSWER examples (bare + USING "reader") - New subsections on signal scores, SYS EXPLAIN REMEMBER dry-run, and ANSWER retrieval-augmented synthesis website/docs/query-builder.md: - q.answer(...) row in reads table - Retrieval-pipeline observability section - Retrieval + reader synthesis section with named-reader A/B pattern README.md: - REMEMBER section expanded: per-node scores + meta["signals"] + SYS EXPLAIN REMEMBER dry-run example - New ANSWER section showing reader wiring, cited_slots, named readers Every remaining claim verified against code. Em dash sweep clean (Rule 9). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

v0.4 ships retrieval observability triangle: - REMEMBER signal telemetry + rich meta["signals"] (#150) - SYS EXPLAIN REMEMBER dry-run (#151) - ANSWER verb with pluggable reader LLM (#152) Plus: - Skills split: graphstore-dsl (runtime) + graphstore-builder (Python) (#148) - Skill-guided LLM ingest adapter + LoCoMo wiring fix (#149) - Docusaurus docs site @ graphstore-docs.orkait.com (#142-147) Breaking changes: none. All additions are additive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): bump v0.3.0 -> v0.4.0 v0.4 ships retrieval observability triangle: - REMEMBER signal telemetry + rich meta["signals"] (#150) - SYS EXPLAIN REMEMBER dry-run (#151) - ANSWER verb with pluggable reader LLM (#152) Plus: - Skills split: graphstore-dsl (runtime) + graphstore-builder (Python) (#148) - Skill-guided LLM ingest adapter + LoCoMo wiring fix (#149) - Docusaurus docs site @ graphstore-docs.orkait.com (#142-147) Breaking changes: none. All additions are additive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): bump pyproject.toml version to 0.4.0 Missed in 07c9986. Pairs with src/graphstore/__init__.py bump. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

KailasMahavarkar merged commit f36ed47 into main Apr 20, 2026
4 checks passed

KailasMahavarkar deleted the feat/remember-signal-telemetry branch April 20, 2026 06:56

KailasMahavarkar mentioned this pull request Apr 20, 2026

chore(release): v0.4.0 #153

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(remember): full signal telemetry in Result.meta + per-node scores#150

feat(remember): full signal telemetry in Result.meta + per-node scores#150
KailasMahavarkar merged 1 commit intomainfrom
feat/remember-signal-telemetry

KailasMahavarkar commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KailasMahavarkar commented Apr 20, 2026

Per-node scores (additive, no removals)

Rich `Result.meta["signals"]` (always populated)

Why this matters

Tests

Full suite

Next

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant