Skip to content

feat(remember): full signal telemetry in Result.meta + per-node scores#150

Merged
KailasMahavarkar merged 1 commit intomainfrom
feat/remember-signal-telemetry
Apr 20, 2026
Merged

feat(remember): full signal telemetry in Result.meta + per-node scores#150
KailasMahavarkar merged 1 commit intomainfrom
feat/remember-signal-telemetry

Conversation

@KailasMahavarkar
Copy link
Copy Markdown
Contributor

Step 1 of the retrieval-observability effort. Makes `REMEMBER` self-describing so callers can see why a result looks the way it does without reading handler source. Foundation for Step 2 (`EXPLAIN REMEMBER`) and Step 3 (`ANSWER` verb).

Per-node scores (additive, no removals)

Every returned node carries:

Field Meaning
`_remember_score` fused final (or reranker score when rerank ran)
`_vector_sim` max sentence cosine
`_bm25_score` normalised FTS5 score
`_recency_score` `exp(-age/half_life)`
`_graph_score` new: normalised entity-degree contribution
`_co_bonus` new: bonus when found by both vec and bm25
`_recall_boost` new: `log1p(recall_count) * 0.05`
`_rank_stage` new: `"fusion"` or `"rerank"`
`_fusion_score` new: pre-rerank base, preserved when rerank ran
`_rerank_score` new: raw reranker score, when rerank ran

Rich `Result.meta["signals"]` (always populated)

```python
{
"fusion": {"method": "weighted", "weights": [...], "graph_signal_enabled": True},
"recency": {"half_life_days": 7300.0},
"sentence_query_expansion": {"enabled": True, "num_sentences": 1},
"stages": {
"gathered_vec": int, "gathered_bm25": int,
"union": int, "cap_applied": bool,
"after_cap": int, "before_rerank": int,
"final": int,
},
"reranker": {"ran": bool, "model": str|None, "error": str|None},
"nucleus": {"enabled": bool},
}
```

Why this matters

We burned a session debugging "why does REMEMBER return 'Melanie' as content". Answer lived 200 lines deep in handler code. With this telemetry, the story is: look at `meta["signals"]["stages"]` + the `_*` fields. Done.

Tests

  • `test_remember_includes_score_breakdown` updated to assert the full surface (prior comment claimed scores were removed in pipeline refactor - stale, they're back)
  • `test_remember_meta_signals_telemetry` new - full meta shape + key presence
  • `test_remember_graph_signal_reflected_in_meta` new - config drives meta

Full suite

```
1756 passed, 101 skipped
```

Zero regressions.

Next

Step 2: extend `SYS EXPLAIN REMEMBER "q"` to dry-run and emit the same telemetry without side effects. Step 3: add `ANSWER` verb that consumes the meta for attribution.

🤖 Generated with Claude Code

Step 1 of the retrieval-observability effort. Make REMEMBER self-describing
so callers can see why a result looks the way it does without reading
handler source.

Per-node scores on every returned node (additive, no removals):

    _remember_score   (fused final, or reranker score when rerank ran)
    _vector_sim       (max sentence cosine)
    _bm25_score       (normalised FTS5 score)
    _recency_score    (exp(-age/half_life))
    _graph_score      NEW: normalised entity-degree contribution
    _co_bonus         NEW: co-occurrence bonus when found by vec AND bm25
    _recall_boost     NEW: log1p(__recall_count__) * 0.05 nudge
    _rank_stage       NEW: "fusion" or "rerank"
    _fusion_score     NEW: pre-rerank base score, preserved when rerank ran
    _rerank_score     NEW: raw reranker score, when rerank ran

Rich Result.meta["signals"] block (always populated, zero-break):

    fusion: {method, weights, graph_signal_enabled}
    recency: {half_life_days}
    sentence_query_expansion: {enabled, num_sentences}
    stages: {gathered_vec, gathered_bm25, union, cap_applied,
             after_cap, before_rerank, final}
    reranker: {ran, model, error}
    nucleus: {enabled}

This is the instrumentation foundation for Step 2 (EXPLAIN REMEMBER) and
Step 3 (ANSWER verb). Both need per-candidate scores + pipeline-stage
counts to be useful.

Also updates the existing test_remember_includes_score_breakdown to
assert the full breakdown (prior comment "removed in pipeline refactor"
was stale - signals exist again under their original names).

Two new tests:
    test_remember_meta_signals_telemetry   full meta block shape
    test_remember_graph_signal_reflected_in_meta   config drives meta

Full suite: 1756 passed, 101 skipped, zero regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@KailasMahavarkar KailasMahavarkar merged commit f36ed47 into main Apr 20, 2026
4 checks passed
@KailasMahavarkar KailasMahavarkar deleted the feat/remember-signal-telemetry branch April 20, 2026 06:56
KailasMahavarkar added a commit that referenced this pull request Apr 20, 2026
We shipped three retrieval-observability features in #150/#151/#152 but
the skills, docs, and README said nothing about them. An LLM loading
graphstore-dsl today wouldn't know ANSWER exists; a human reading the
README wouldn't either. Fix:

graphstore-dsl SKILL:
  - ANSWER verb in the reads table + dedicated subsection explaining
    reader resolution + error capture semantics
  - Full per-node signal list (old doc predicted _graph_score / _recall
    before they shipped; now they do and we have _co_bonus, _recall_boost,
    _rank_stage, _fusion_score, _rerank_score)
  - meta["signals"] telemetry block documented
  - SYS EXPLAIN REMEMBER dry-run subsection
  - Added ANSWER + SYS EXPLAIN rows to query-generation pattern table

graphstore-builder SKILL:
  - q.answer(...) row in the reads table
  - Debugging section expanded: full signal list + meta block JSON +
    q.sys.explain(inner) dry-run example + q.answer() end-to-end
    example + named-reader A/B + reader resolution order

website/docs/dsl/reference.md:
  - ANSWER examples (bare + USING "reader")
  - New subsections on signal scores, SYS EXPLAIN REMEMBER dry-run, and
    ANSWER retrieval-augmented synthesis

website/docs/query-builder.md:
  - q.answer(...) row in reads table
  - Retrieval-pipeline observability section
  - Retrieval + reader synthesis section with named-reader A/B pattern

README.md:
  - REMEMBER section expanded: per-node scores + meta["signals"] +
    SYS EXPLAIN REMEMBER dry-run example
  - New ANSWER section showing reader wiring, cited_slots, named readers

Every remaining claim verified against code. Em dash sweep clean (Rule 9).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
KailasMahavarkar added a commit that referenced this pull request Apr 20, 2026
* feat(dsl): ANSWER verb - retrieval + pluggable reader LLM

Step 3 of the retrieval-observability effort. Full-loop: retrieve with
REMEMBER + synthesize with a user-configured reader callable.

Grammar:
    answer_q: "ANSWER" STRING at_clause? tokens_clause?
              limit_clause? where_clause? using_reader?
    using_reader: "USING" STRING

Usage:

    def my_reader(prompt: str, max_tokens: int = 1000) -> str: ...

    gs = GraphStore(reader=my_reader)
    gs.execute('ANSWER "What is the capital of France?" LIMIT 3')
    # Result(
    #   kind="answer",
    #   data={
    #     "answer": "Paris",
    #     "cited_slots": ["n0", "n1", "n2"],
    #     "candidates": [<full REMEMBER nodes>],
    #     "reader": None,
    #   },
    #   count=1,
    #   meta={"signals": <REMEMBER's full meta block>},
    # )

Named reader registry for A/B'ing reader LLMs:

    gs = GraphStore(readers={"fast": a, "careful": b})
    gs.execute('ANSWER "q" LIMIT 3 USING "careful"')
    q.answer("q", limit=3, using="fast")

Implementation:

- Grammar rule `answer_q` mirrors `remember_q` shape + adds optional
  `USING "reader-name"` suffix.
- New `AnswerQuery` dataclass in ast_nodes.py.
- Transformer wires `answer_q` + `using_reader` Lark rules.
- Handler `_answer` in intelligence.py:
    - Resolves reader: `USING name` looks up `self._readers[name]`;
      else falls back to `self._reader` (default); else to sole entry
      of `_readers` if exactly one; else raise GraphStoreError.
    - Builds equivalent RememberQuery with same limit / where / at /
      tokens; calls real `_remember` (bumps recall counts - intentional).
    - Formats retrieved passages as numbered context blocks with source
      ids. Empty retrieval still surfaces "(no retrieved context)" to
      reader so it can say "no information available".
    - Reader exception caught: returns Result with data["error"] and
      empty answer. Callers inspect without try/except on execute.
- Builder `q.answer(text, limit, tokens, at, where, using)` added to
  reads.py. Registered on `q` namespace. Parser-roundtrip-verified.
- GraphStore gains `reader` and `readers` kwargs. Validated callable.
  Held as live refs on the executor; not in the config layer
  (callables are not serialisable).

Zero LLM dependency in core. graphstore ships no HTTP client, no
litellm, no openai. Bring-your-own reader.

Tests (tests/test_answer.py):
- test_answer_end_to_end
- test_answer_without_reader_raises
- test_answer_picks_named_reader_via_using
- test_answer_unknown_named_reader_raises
- test_answer_reader_exception_surfaced_in_result
- test_answer_builder_roundtrip_matches_string_dsl
- test_answer_builder_compiles_full_surface
- test_answer_on_empty_store_still_calls_reader

Also: test_query_coverage.py EXPECTED_VERBS += "answer".

Full suite: 1766 passed, 101 skipped, zero regressions.

Next (Step 4): temporal anchor extraction at query time. Auto-add AT
clauses when the question has a date. Targets temporal F1 (weakest
LoCoMo category for us).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: surface-sync for REMEMBER telemetry + EXPLAIN + ANSWER (Steps 1-3)

We shipped three retrieval-observability features in #150/#151/#152 but
the skills, docs, and README said nothing about them. An LLM loading
graphstore-dsl today wouldn't know ANSWER exists; a human reading the
README wouldn't either. Fix:

graphstore-dsl SKILL:
  - ANSWER verb in the reads table + dedicated subsection explaining
    reader resolution + error capture semantics
  - Full per-node signal list (old doc predicted _graph_score / _recall
    before they shipped; now they do and we have _co_bonus, _recall_boost,
    _rank_stage, _fusion_score, _rerank_score)
  - meta["signals"] telemetry block documented
  - SYS EXPLAIN REMEMBER dry-run subsection
  - Added ANSWER + SYS EXPLAIN rows to query-generation pattern table

graphstore-builder SKILL:
  - q.answer(...) row in the reads table
  - Debugging section expanded: full signal list + meta block JSON +
    q.sys.explain(inner) dry-run example + q.answer() end-to-end
    example + named-reader A/B + reader resolution order

website/docs/dsl/reference.md:
  - ANSWER examples (bare + USING "reader")
  - New subsections on signal scores, SYS EXPLAIN REMEMBER dry-run, and
    ANSWER retrieval-augmented synthesis

website/docs/query-builder.md:
  - q.answer(...) row in reads table
  - Retrieval-pipeline observability section
  - Retrieval + reader synthesis section with named-reader A/B pattern

README.md:
  - REMEMBER section expanded: per-node scores + meta["signals"] +
    SYS EXPLAIN REMEMBER dry-run example
  - New ANSWER section showing reader wiring, cited_slots, named readers

Every remaining claim verified against code. Em dash sweep clean (Rule 9).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
KailasMahavarkar added a commit that referenced this pull request Apr 20, 2026
v0.4 ships retrieval observability triangle:
- REMEMBER signal telemetry + rich meta["signals"] (#150)
- SYS EXPLAIN REMEMBER dry-run (#151)
- ANSWER verb with pluggable reader LLM (#152)

Plus:
- Skills split: graphstore-dsl (runtime) + graphstore-builder (Python) (#148)
- Skill-guided LLM ingest adapter + LoCoMo wiring fix (#149)
- Docusaurus docs site @ graphstore-docs.orkait.com (#142-147)

Breaking changes: none. All additions are additive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
KailasMahavarkar added a commit that referenced this pull request Apr 20, 2026
* chore(release): bump v0.3.0 -> v0.4.0

v0.4 ships retrieval observability triangle:
- REMEMBER signal telemetry + rich meta["signals"] (#150)
- SYS EXPLAIN REMEMBER dry-run (#151)
- ANSWER verb with pluggable reader LLM (#152)

Plus:
- Skills split: graphstore-dsl (runtime) + graphstore-builder (Python) (#148)
- Skill-guided LLM ingest adapter + LoCoMo wiring fix (#149)
- Docusaurus docs site @ graphstore-docs.orkait.com (#142-147)

Breaking changes: none. All additions are additive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): bump pyproject.toml version to 0.4.0

Missed in 07c9986. Pairs with src/graphstore/__init__.py bump.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant