Skip to content

feat: v0.15 finishers — diagnostics (#78) + delta context packs (#81) + value-per-token (#74)#121

Merged
mohanagy merged 3 commits into
mainfrom
feat/v0.15-finishers
May 11, 2026
Merged

feat: v0.15 finishers — diagnostics (#78) + delta context packs (#81) + value-per-token (#74)#121
mohanagy merged 3 commits into
mainfrom
feat/v0.15-finishers

Conversation

@mohanagy
Copy link
Copy Markdown
Owner

@mohanagy mohanagy commented May 11, 2026

Closes #78
Closes #81
Closes #74

Summary

Closes out the v0.15 substrate work in one bundled PR (as requested). Three issues landed:

What's in the bundle

#78 Context-pack quality diagnostics

New scorer computeContextPackDiagnostics(pack) with 9 weighted rules:

Rule Severity Weight When
missing_required_evidence error 2 coverage.missing_required.length > 0
missing_required_semantic warn 1 coverage.missing_semantic.length > 0
zero_claims warn 1 claims.length === 0
undersized_retrieval warn 1 0 < nodes.length < 3
budget_underutilized info 1 token_count < 25% of >= 500-token budget
missing_snippets warn 1 > 50% of nodes lack a source snippet
low_avg_match_score warn 1 mean match_score < 0.30
orphan_nodes warn 1 nodes>1 but zero relationships
no_graph_signals info 1 both god_nodes and bridge_nodes empty

Wired into the stdio context_pack response (explain branch + delta branch).

#81 Delta-only context packs via stdio

  • New parameter on context_pack: delta_session_id. When set, the response ships only nodes not already received by this session, plus referenced_ids[] and bytes_saved.
  • New MCP tool context_pack_session_reset — clears a delta session so the next call ships the full pack again.
  • 3 new helpers on StdioToolHelpers: getContextPackNodeIds, recordContextPackNodeIds, clearContextPackNodeIds. Backed by an LRU Map<sessionId, Set<nodeId>> (256 sessions, same bound as prompt sessions).
  • Reuses computeDeltaContextPack (already had the both-endpoints relationship filter from PR feat: bundle — v0.16 finishers + 1c-ii.a framework propagation (#79, #80, #81, #72) #106).
  • Diagnostics on delta responses skip the budget_underutilized rule (deltas are small-by-design after dedup).

#74 Value-per-token budget selector

  • New module src/runtime/value-per-token.ts exporting selectByValuePerToken(candidates, options).
  • Greedy density heuristic: sort by score / token_cost desc, pick the prefix that fits within budget. Deterministic tie-breaking: score desc → cost asc → id asc.
  • Optional pinZeroCost (default true): zero-cost items always included; flip to exclude.
  • Skips items whose individual cost > budget and non-finite scores/costs.
  • Returns selected payloads, total_cost, remaining_budget, and per-candidate ranking[] with rank/density/included flags for diagnostics.
  • Pure helper for now — adopting inside retrieve.ts's candidate selection is a follow-up once we have a benchmark to A/B against.

Not in this PR (deferred to v0.16)

Test plan

  • npm run typecheck clean
  • npm run build clean
  • npm run test:run — 1760/1760 pass (103 files, +27 new tests)
  • MCP tool count = 26 in the full profile; schema-budget test stays under the 12,000-byte ceiling

Summary by CodeRabbit

  • New Features

    • Pack quality diagnostics: pack health score, ordered warnings, and measurable signals (snippet coverage, match scores, budget utilization); diagnostics now included in pack responses.
    • Incremental "delta" context-pack delivery with per-session deduplication and a session-reset tool.
    • Value-per-token selector to greedily pick items by score/token density within a budget.
  • Tests

    • Comprehensive unit tests covering diagnostics and value-per-token behaviors and edge cases.

Review Change Stack

…5 slice 1)

Adds a deterministic structural quality scorer for compiled context-packs. Returns a 0-1 quality_score, a list of triggered warnings with kind/severity/message/detail, and the raw signals used to compute the score (node_count, claim_count, snippet_coverage, avg_match_score, budget_utilization, etc.).

Rules implemented (each weighted into the score):

- missing_required_evidence (error, weight 2) — pack lacks a required evidence class

- missing_required_semantic (warn)        — pack lacks a required semantic category

- zero_claims (warn)                      — claims array is empty

- undersized_retrieval (warn)             — fewer than 3 nodes returned

- budget_underutilized (info)             — token_count < 25% of budget on a >= 500-token request

- missing_snippets (warn)                 — > 50% of nodes lack a source snippet

- low_avg_match_score (warn)              — mean match_score < 0.30 (when scores exist)

- orphan_nodes (warn)                     — > 1 nodes but zero relationships

- no_graph_signals (info)                  — both god_nodes and bridge_nodes empty

Surface points:

- New contracts file src/contracts/context-pack-diagnostics.ts with ContextPackDiagnosticKind / Severity / Warning / Signals / Diagnostics types.

- New runtime helper src/runtime/context-pack-diagnostics.ts exporting computeContextPackDiagnostics(pack, options?). Pure, deterministic, no I/O — fully unit-testable against synthetic CompiledContextPack inputs.

- contextPackFromRetrieveResult is now exported from retrieve.ts (was private) so the stdio handler can construct the full pack shape from a RetrieveResult and feed it to the scorer.

- stdio context_pack tool response now includes a diagnostics field on the explain branch. Impact and review branches use different pack taxonomies and will land in a follow-up.

Verified: typecheck + build clean, 1749/1749 tests pass (+16 new). No public API surface changes outside the additive diagnostics field.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

📝 Walkthrough

Walkthrough

This PR adds deterministic context-pack quality diagnostics (contracts, engine, retrieval export, stdio tool/server integration, and tests) and a greedy value-per-token selector with types and unit tests.

Changes

Quality Diagnostics Implementation

Layer / File(s) Summary
Data Contracts
src/contracts/context-pack-diagnostics.ts
Introduces ContextPackDiagnosticKind, ContextPackDiagnosticSeverity, ContextPackDiagnosticWarning, ContextPackQualitySignals, and ContextPackDiagnostics.
Diagnostics Engine
src/runtime/context-pack-diagnostics.ts
Implements computeContextPackDiagnostics(pack, options) with rule weights, thresholds, computeSignals, triggered-warnings assembly, deterministic severity/kind sorting, computeQualityScore, and skipBudgetUnderutilization option.
Public API Exposure
src/runtime/retrieve.ts
Exports contextPackFromRetrieveResult for building CompiledContextPack inputs for diagnostics.
Stdio Server Session State
src/runtime/stdio-server.ts
Adds per-delta_session_id contextPackNodeIds store, initializer, lazy ensure helper, and record/get/clear callbacks with eviction.
Tool Definition
src/runtime/stdio/definitions.ts
Extends context_pack MCP tool schema and description with optional delta_session_id input.
Tool Integration
src/runtime/stdio/tools.ts
Imports diagnostics, computes diagnostics for full and delta packs (delta uses skipBudgetUnderutilization), records newly shipped node IDs, and adds context_pack_session_reset tool case.
Tests / Documentation
tests/unit/context-pack-diagnostics.test.ts
Comprehensive Vitest coverage validating all diagnostic kinds, severities, ordering, quality_score behavior, edge cases (NaN match scores, zero nodes), and skipBudgetUnderutilization behavior.

Value-per-Token Selector

Layer / File(s) Summary
Types / Options
src/runtime/value-per-token.ts
Adds ValuePerTokenCandidate, ValuePerTokenResult, and ValuePerTokenOptions types.
Implementation
src/runtime/value-per-token.ts
Adds selectByValuePerToken implementing density heuristic (score/token_cost), zero-cost handling, deterministic sorting tie-breakers, greedy selection under a token budget, and ranking metadata.
Tests
tests/unit/value-per-token.test.ts
Unit tests for density preference, zero-cost pinning, budget enforcement, filtering non-finite values, deterministic ties, negative budget clamping, and empty input.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

  • mohanagy/graphify-ts#64 — Touches retrieval and context-pack tooling; changes may interact with exported retrieve helper and tool wiring here.
  • mohanagy/graphify-ts#106 — Related work on per-session delta context-pack deduplication and node-ID collection that complements the delta_session_id plumbing added here.

Poem

I’m a rabbit who tallies snippets and scores,
I hop through packs, check thresholds and doors,
I warn and I sort from error to mild,
“Tidy your graph!” says this ever-prudent child. 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: three bundled features for v0.15 (diagnostics, delta context packs, value-per-token), each clearly referenced by issue number.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed Pull request description comprehensively covers all three bundled features with clear structure, testing evidence, and explicit issue references.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/v0.15-finishers

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/runtime/context-pack-diagnostics.ts`:
- Around line 141-145: The condition that gates the low-average-match-score
warning incorrectly excludes the worst-case value 0; update the predicate in the
diagnostics check (the if that references signals,
UNDERSIZED_RETRIEVAL_THRESHOLD, and LOW_AVG_MATCH_SCORE) so it accepts zero
values (e.g., replace the strict > 0 check with a non-strict check such as >= 0
or remove the > 0 clause) while keeping the existing Number.isNaN guard and the
rest of the logic intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 4eff0ec6-c351-4d67-970b-49256ac64e83

📥 Commits

Reviewing files that changed from the base of the PR and between 21c8b60 and af2f5e6.

📒 Files selected for processing (5)
  • src/contracts/context-pack-diagnostics.ts
  • src/runtime/context-pack-diagnostics.ts
  • src/runtime/retrieve.ts
  • src/runtime/stdio/tools.ts
  • tests/unit/context-pack-diagnostics.test.ts

Comment thread src/runtime/context-pack-diagnostics.ts
…lector (#74)

Extends PR #121 to cover the rest of v0.15 in one slice as requested:

## #81 Delta-only context packs via stdio

- New context_pack parameter: delta_session_id. When set, the response ships only nodes the session hasn't received yet, plus referenced_ids[] for dropped nodes and bytes_saved.

- New MCP tool context_pack_session_reset to clear a delta session and force the next call to ship the full pack.

- 3 new StdioToolHelpers methods: getContextPackNodeIds, recordContextPackNodeIds, clearContextPackNodeIds. Backed by a per-MCP-process Map<sessionId, Set<nodeId>> in StdioSessionState with the same LRU bound as the prompt-session store (256 sessions).

- Reuses the existing computeDeltaContextPack helper (already had the both-endpoints relationship filter fix from PR #106) and collectPackNodeIds for recording shipped ids.

- Diagnostics on delta responses skip the budget_underutilized rule since a delta pack is small-by-design after dedup.

## #74 Value-per-token budget selector

- New module src/runtime/value-per-token.ts exporting selectByValuePerToken(candidates, options).

- Greedy density heuristic: sort by score / token_cost descending, pick the prefix that fits within budget. Tie-break: score desc, cost asc, id asc (deterministic).

- Optional pinZeroCost (default true): zero-cost candidates are always included; set false to exclude them entirely.

- Skips items whose individual cost exceeds the budget (cannot fit by definition) and items with non-finite scores or costs.

- Returns selected payload list, total_cost, remaining_budget, and per-candidate ranking[] with rank/density/included for diagnostics.

- Pure helper for now — adopting it inside retrieve.ts's candidate-selection pipeline is a follow-up once we have a benchmark to A/B against the current selector.

## Tests

- 11 new value-per-token tests covering density preference, zero-cost gating, budget overflow skip, non-finite filtering, ranking shape, tie-break determinism, negative budget clamp, empty input.

- MCP tool count increased to 26 (full profile). mcp-schema-budget test stays under the 12,000-byte ceiling after tightening the new tool descriptions.

- Verified: typecheck + build clean, 1760/1760 pass.

## Not in this PR (deferred to v0.16 slice train)

- #76 multi-resolution context representations — needs a new representation layer, structurally invasive.

- #79 PR-impact coverage calibration — needs real PRs to calibrate against, not a code-only delivery.

- #80 cache-aware prompt layout measurement — purely measurement work; sort_key bands already shipped.
@mohanagy mohanagy changed the title feat(#78): context-pack quality diagnostics + bad-run detection (v0.15 slice 1) feat: v0.15 finishers — diagnostics (#78) + delta context packs (#81) + value-per-token (#74) May 11, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/runtime/stdio/tools.ts (1)

804-807: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject or explicitly gate delta_session_id for non-explain packs.

delta_session_id is advertised on context_pack, but the review and impact branches return before that parameter is ever read. A call like task: "review" plus delta_session_id silently falls back to a full pack instead of a delta pack.

Suggested guard
       const resolvedBudget = budget ?? 3000
+      const deltaSessionId = helpers.stringParamAlias(toolArguments, ['delta_session_id', 'deltaSessionId'])
+      if (deltaSessionId && task !== 'explain') {
+        return helpers.failure(id, helpers.jsonrpcInvalidParams, 'delta_session_id is currently supported only for task=explain')
+      }
       const plannerBudget = Math.max(resolvedBudget, 3)
       const initialPlan = buildTaskContextPlan({
         task_kind: task,
         prompt,
         budget: plannerBudget,
@@
-      const deltaSessionId = helpers.stringParamAlias(toolArguments, ['delta_session_id', 'deltaSessionId'])
       if (deltaSessionId) {

Also applies to: 820-843, 857-876, 893-894

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/stdio/tools.ts` around lines 804 - 807, The code currently
accepts a delta_session_id in the context_pack but never reads or gates it for
non-explain tasks, causing unexpected full-pack behavior for task === 'review'
or 'impact'; add an explicit guard after reading task (where
helpers.stringParam(toolArguments, 'task') is used) that checks for the presence
of delta_session_id via helpers.stringParam(toolArguments, 'delta_session_id')
and if found while task !== 'explain' immediately return helpers.failure(id,
helpers.jsonrpcInvalidParams, 'delta_session_id is only allowed when task is
"explain"'); apply the same guard in the other branches/handlers that parse
context_pack (the blocks covering the ranges you noted) so delta_session_id is
rejected unless task === 'explain'.
🧹 Nitpick comments (1)
src/runtime/value-per-token.ts (1)

36-38: 💤 Low value

Consider clarifying the token_cost constraint in the comment.

The comment states "Must be > 0" but immediately mentions zero-cost items as a special case. The implementation correctly handles token_cost === 0 by assigning infinite density and conditionally including via pinZeroCost. Consider rewording to something like: "Non-negative. Items with token_cost === 0 are treated as 'pinned' and sorted first."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/value-per-token.ts` around lines 36 - 38, Update the comment for
the token_cost field in value-per-token.ts to state that token_cost is
non-negative (>= 0) and that items with token_cost === 0 are treated as pinned
and sorted first via the pinZeroCost logic; locate the doc comment above the
token_cost property and replace "Must be > 0; zero-cost items are pinned (always
included)" with wording like "Non-negative. Items with token_cost === 0 are
treated as 'pinned' and sorted first (handled by pinZeroCost)."
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/runtime/stdio/tools.ts`:
- Around line 878-885: The diagnostics call is scoring only the delta fragment;
change computeContextPackDiagnostics to be invoked with the full context pack
(the variable fullPack produced by contextPackFromRetrieveResult(retrieval))
instead of deltaResult.delta_pack so diagnostics reflect the whole session
state; update all other occurrences (the similar call around the 895-919 region)
to pass the fullPack variable as well and ensure any helper variables
(compactPack, metadata) remain unchanged.

---

Outside diff comments:
In `@src/runtime/stdio/tools.ts`:
- Around line 804-807: The code currently accepts a delta_session_id in the
context_pack but never reads or gates it for non-explain tasks, causing
unexpected full-pack behavior for task === 'review' or 'impact'; add an explicit
guard after reading task (where helpers.stringParam(toolArguments, 'task') is
used) that checks for the presence of delta_session_id via
helpers.stringParam(toolArguments, 'delta_session_id') and if found while task
!== 'explain' immediately return helpers.failure(id,
helpers.jsonrpcInvalidParams, 'delta_session_id is only allowed when task is
"explain"'); apply the same guard in the other branches/handlers that parse
context_pack (the blocks covering the ranges you noted) so delta_session_id is
rejected unless task === 'explain'.

---

Nitpick comments:
In `@src/runtime/value-per-token.ts`:
- Around line 36-38: Update the comment for the token_cost field in
value-per-token.ts to state that token_cost is non-negative (>= 0) and that
items with token_cost === 0 are treated as pinned and sorted first via the
pinZeroCost logic; locate the doc comment above the token_cost property and
replace "Must be > 0; zero-cost items are pinned (always included)" with wording
like "Non-negative. Items with token_cost === 0 are treated as 'pinned' and
sorted first (handled by pinZeroCost)."
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 9e446405-fe5f-44d4-91c8-49a8e52e4080

📥 Commits

Reviewing files that changed from the base of the PR and between af2f5e6 and 046c10e.

📒 Files selected for processing (5)
  • src/runtime/stdio-server.ts
  • src/runtime/stdio/definitions.ts
  • src/runtime/stdio/tools.ts
  • src/runtime/value-per-token.ts
  • tests/unit/value-per-token.test.ts

Comment on lines +878 to +885
const fullPack = contextPackFromRetrieveResult(retrieval)
const compactPack = compactRetrieveResult(retrieval)
const metadata = contextMetadata(retrieval)
storeExpandableHandles(prompt, task, initialPlan.evidence.recipe_id, metadata.expandable, helpers)
// Slice #78: emit context-pack quality diagnostics so callers can
// detect bad runs (missing required evidence, zero claims, weak
// retrieval, etc.) without re-implementing the heuristics.
const diagnostics = computeContextPackDiagnostics(fullPack)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Score the full pack, not just the delta fragment.

computeContextPackDiagnostics(deltaResult.delta_pack, ...) evaluates only newly shipped nodes. After the first turn, that can falsely trigger missing-evidence or zero-claims warnings even when the omitted nodes are already present in the client’s cached session.

Suggested fix
       const diagnostics = computeContextPackDiagnostics(fullPack)
@@
-          diagnostics: computeContextPackDiagnostics(deltaResult.delta_pack, { skipBudgetUnderutilization: true }),
+          diagnostics,

Also applies to: 895-919

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/stdio/tools.ts` around lines 878 - 885, The diagnostics call is
scoring only the delta fragment; change computeContextPackDiagnostics to be
invoked with the full context pack (the variable fullPack produced by
contextPackFromRetrieveResult(retrieval)) instead of deltaResult.delta_pack so
diagnostics reflect the whole session state; update all other occurrences (the
similar call around the 895-919 region) to pass the fullPack variable as well
and ensure any helper variables (compactPack, metadata) remain unchanged.

CodeRabbit caught that the predicate excluded the worst-possible case (every node scoring exactly 0). The '> 0' clause meant a pack with three zero-scored nodes silently passed the rule, but that is precisely the kind of retrieval the warning was supposed to catch.

Fix: drop the '> 0' clause. The NaN guard above already covers the 'no scored nodes' case (where avg_match_score is NaN). Added a test pinning the avg=0 case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant