feat: v0.15 finishers — diagnostics (#78) + delta context packs (#81) + value-per-token (#74) by mohanagy · Pull Request #121 · mohanagy/graphify-ts

mohanagy · 2026-05-11T07:28:29Z

Closes #78
Closes #81
Closes #74

Summary

Closes out the v0.15 substrate work in one bundled PR (as requested). Three issues landed:

Add context-pack quality diagnostics and bad-run detection #78 — Context-pack quality diagnostics: structural quality scorer
Delta-only context packs between runs #81 — Delta-only context packs via stdio: per-session dedup
Add budgeted context selection using value-per-token scoring #74 — Value-per-token budget selector: density-greedy selection

What's in the bundle

#78 Context-pack quality diagnostics

New scorer computeContextPackDiagnostics(pack) with 9 weighted rules:

Rule	Severity	Weight	When
`missing_required_evidence`	error	2	coverage.missing_required.length > 0
`missing_required_semantic`	warn	1	coverage.missing_semantic.length > 0
`zero_claims`	warn	1	claims.length === 0
`undersized_retrieval`	warn	1	0 < nodes.length < 3
`budget_underutilized`	info	1	token_count < 25% of >= 500-token budget
`missing_snippets`	warn	1	> 50% of nodes lack a source snippet
`low_avg_match_score`	warn	1	mean match_score < 0.30
`orphan_nodes`	warn	1	nodes>1 but zero relationships
`no_graph_signals`	info	1	both god_nodes and bridge_nodes empty

Wired into the stdio context_pack response (explain branch + delta branch).

#81 Delta-only context packs via stdio

New parameter on context_pack: delta_session_id. When set, the response ships only nodes not already received by this session, plus referenced_ids[] and bytes_saved.
New MCP tool context_pack_session_reset — clears a delta session so the next call ships the full pack again.
3 new helpers on StdioToolHelpers: getContextPackNodeIds, recordContextPackNodeIds, clearContextPackNodeIds. Backed by an LRU Map<sessionId, Set<nodeId>> (256 sessions, same bound as prompt sessions).
Reuses computeDeltaContextPack (already had the both-endpoints relationship filter from PR feat: bundle — v0.16 finishers + 1c-ii.a framework propagation (#79, #80, #81, #72) #106).
Diagnostics on delta responses skip the budget_underutilized rule (deltas are small-by-design after dedup).

#74 Value-per-token budget selector

New module src/runtime/value-per-token.ts exporting selectByValuePerToken(candidates, options).
Greedy density heuristic: sort by score / token_cost desc, pick the prefix that fits within budget. Deterministic tie-breaking: score desc → cost asc → id asc.
Optional pinZeroCost (default true): zero-cost items always included; flip to exclude.
Skips items whose individual cost > budget and non-finite scores/costs.
Returns selected payloads, total_cost, remaining_budget, and per-candidate ranking[] with rank/density/included flags for diagnostics.
Pure helper for now — adopting inside retrieve.ts's candidate selection is a follow-up once we have a benchmark to A/B against.

Not in this PR (deferred to v0.16)

Add multi-resolution context representations for nodes and slices #76 multi-resolution context representations — needs a new representation layer; structurally invasive.
Better PR-impact coverage scoring #79 PR-impact coverage calibration — needs real PRs to calibrate against; not a code-only delivery.
Cache-aware prompt layout to minimize Claude cache invalidation #80 cache-aware prompt layout measurement — purely measurement work; the sort_key bands already shipped earlier.

Test plan

npm run typecheck clean
npm run build clean
npm run test:run — 1760/1760 pass (103 files, +27 new tests)
MCP tool count = 26 in the full profile; schema-budget test stays under the 12,000-byte ceiling

Summary by CodeRabbit

New Features
- Pack quality diagnostics: pack health score, ordered warnings, and measurable signals (snippet coverage, match scores, budget utilization); diagnostics now included in pack responses.
- Incremental "delta" context-pack delivery with per-session deduplication and a session-reset tool.
- Value-per-token selector to greedily pick items by score/token density within a budget.
Tests
- Comprehensive unit tests covering diagnostics and value-per-token behaviors and edge cases.

…5 slice 1) Adds a deterministic structural quality scorer for compiled context-packs. Returns a 0-1 quality_score, a list of triggered warnings with kind/severity/message/detail, and the raw signals used to compute the score (node_count, claim_count, snippet_coverage, avg_match_score, budget_utilization, etc.). Rules implemented (each weighted into the score): - missing_required_evidence (error, weight 2) — pack lacks a required evidence class - missing_required_semantic (warn) — pack lacks a required semantic category - zero_claims (warn) — claims array is empty - undersized_retrieval (warn) — fewer than 3 nodes returned - budget_underutilized (info) — token_count < 25% of budget on a >= 500-token request - missing_snippets (warn) — > 50% of nodes lack a source snippet - low_avg_match_score (warn) — mean match_score < 0.30 (when scores exist) - orphan_nodes (warn) — > 1 nodes but zero relationships - no_graph_signals (info) — both god_nodes and bridge_nodes empty Surface points: - New contracts file src/contracts/context-pack-diagnostics.ts with ContextPackDiagnosticKind / Severity / Warning / Signals / Diagnostics types. - New runtime helper src/runtime/context-pack-diagnostics.ts exporting computeContextPackDiagnostics(pack, options?). Pure, deterministic, no I/O — fully unit-testable against synthetic CompiledContextPack inputs. - contextPackFromRetrieveResult is now exported from retrieve.ts (was private) so the stdio handler can construct the full pack shape from a RetrieveResult and feed it to the scorer. - stdio context_pack tool response now includes a diagnostics field on the explain branch. Impact and review branches use different pack taxonomies and will land in a follow-up. Verified: typecheck + build clean, 1749/1749 tests pass (+16 new). No public API surface changes outside the additive diagnostics field.

coderabbitai · 2026-05-11T07:28:43Z

📝 Walkthrough

Walkthrough

This PR adds deterministic context-pack quality diagnostics (contracts, engine, retrieval export, stdio tool/server integration, and tests) and a greedy value-per-token selector with types and unit tests.

Changes

Quality Diagnostics Implementation

Layer / File(s)	Summary
Data Contracts `src/contracts/context-pack-diagnostics.ts`	Introduces `ContextPackDiagnosticKind`, `ContextPackDiagnosticSeverity`, `ContextPackDiagnosticWarning`, `ContextPackQualitySignals`, and `ContextPackDiagnostics`.
Diagnostics Engine `src/runtime/context-pack-diagnostics.ts`	Implements `computeContextPackDiagnostics(pack, options)` with rule weights, thresholds, `computeSignals`, triggered-warnings assembly, deterministic severity/kind sorting, `computeQualityScore`, and `skipBudgetUnderutilization` option.
Public API Exposure `src/runtime/retrieve.ts`	Exports `contextPackFromRetrieveResult` for building CompiledContextPack inputs for diagnostics.
Stdio Server Session State `src/runtime/stdio-server.ts`	Adds per-`delta_session_id` `contextPackNodeIds` store, initializer, lazy ensure helper, and record/get/clear callbacks with eviction.
Tool Definition `src/runtime/stdio/definitions.ts`	Extends `context_pack` MCP tool schema and description with optional `delta_session_id` input.
Tool Integration `src/runtime/stdio/tools.ts`	Imports diagnostics, computes diagnostics for full and delta packs (delta uses skipBudgetUnderutilization), records newly shipped node IDs, and adds `context_pack_session_reset` tool case.
Tests / Documentation `tests/unit/context-pack-diagnostics.test.ts`	Comprehensive Vitest coverage validating all diagnostic kinds, severities, ordering, quality_score behavior, edge cases (NaN match scores, zero nodes), and skipBudgetUnderutilization behavior.

Value-per-Token Selector

Layer / File(s)	Summary
Types / Options `src/runtime/value-per-token.ts`	Adds `ValuePerTokenCandidate`, `ValuePerTokenResult`, and `ValuePerTokenOptions` types.
Implementation `src/runtime/value-per-token.ts`	Adds `selectByValuePerToken` implementing density heuristic (score/token_cost), zero-cost handling, deterministic sorting tie-breakers, greedy selection under a token budget, and ranking metadata.
Tests `tests/unit/value-per-token.test.ts`	Unit tests for density preference, zero-cost pinning, budget enforcement, filtering non-finite values, deterministic ties, negative budget clamping, and empty input.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Add context-pack quality diagnostics and bad-run detection #78 — Similar diagnostics functionality referenced by earlier issue and may be addressed by these diagnostic contracts and engine.

Possibly related PRs

mohanagy/graphify-ts#64 — Touches retrieval and context-pack tooling; changes may interact with exported retrieve helper and tool wiring here.
mohanagy/graphify-ts#106 — Related work on per-session delta context-pack deduplication and node-ID collection that complements the delta_session_id plumbing added here.

Poem

I’m a rabbit who tallies snippets and scores,
I hop through packs, check thresholds and doors,
I warn and I sort from error to mild,
“Tidy your graph!” says this ever-prudent child. 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: three bundled features for v0.15 (diagnostics, delta context packs, value-per-token), each clearly referenced by issue number.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	Pull request description comprehensively covers all three bundled features with clear structure, testing evidence, and explicit issue references.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/v0.15-finishers

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/runtime/context-pack-diagnostics.ts`:
- Around line 141-145: The condition that gates the low-average-match-score
warning incorrectly excludes the worst-case value 0; update the predicate in the
diagnostics check (the if that references signals,
UNDERSIZED_RETRIEVAL_THRESHOLD, and LOW_AVG_MATCH_SCORE) so it accepts zero
values (e.g., replace the strict > 0 check with a non-strict check such as >= 0
or remove the > 0 clause) while keeping the existing Number.isNaN guard and the
rest of the logic intact.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 4eff0ec6-c351-4d67-970b-49256ac64e83

📥 Commits

Reviewing files that changed from the base of the PR and between 21c8b60 and af2f5e6.

📒 Files selected for processing (5)

src/contracts/context-pack-diagnostics.ts
src/runtime/context-pack-diagnostics.ts
src/runtime/retrieve.ts
src/runtime/stdio/tools.ts
tests/unit/context-pack-diagnostics.test.ts

…lector (#74) Extends PR #121 to cover the rest of v0.15 in one slice as requested: ## #81 Delta-only context packs via stdio - New context_pack parameter: delta_session_id. When set, the response ships only nodes the session hasn't received yet, plus referenced_ids[] for dropped nodes and bytes_saved. - New MCP tool context_pack_session_reset to clear a delta session and force the next call to ship the full pack. - 3 new StdioToolHelpers methods: getContextPackNodeIds, recordContextPackNodeIds, clearContextPackNodeIds. Backed by a per-MCP-process Map<sessionId, Set<nodeId>> in StdioSessionState with the same LRU bound as the prompt-session store (256 sessions). - Reuses the existing computeDeltaContextPack helper (already had the both-endpoints relationship filter fix from PR #106) and collectPackNodeIds for recording shipped ids. - Diagnostics on delta responses skip the budget_underutilized rule since a delta pack is small-by-design after dedup. ## #74 Value-per-token budget selector - New module src/runtime/value-per-token.ts exporting selectByValuePerToken(candidates, options). - Greedy density heuristic: sort by score / token_cost descending, pick the prefix that fits within budget. Tie-break: score desc, cost asc, id asc (deterministic). - Optional pinZeroCost (default true): zero-cost candidates are always included; set false to exclude them entirely. - Skips items whose individual cost exceeds the budget (cannot fit by definition) and items with non-finite scores or costs. - Returns selected payload list, total_cost, remaining_budget, and per-candidate ranking[] with rank/density/included for diagnostics. - Pure helper for now — adopting it inside retrieve.ts's candidate-selection pipeline is a follow-up once we have a benchmark to A/B against the current selector. ## Tests - 11 new value-per-token tests covering density preference, zero-cost gating, budget overflow skip, non-finite filtering, ranking shape, tie-break determinism, negative budget clamp, empty input. - MCP tool count increased to 26 (full profile). mcp-schema-budget test stays under the 12,000-byte ceiling after tightening the new tool descriptions. - Verified: typecheck + build clean, 1760/1760 pass. ## Not in this PR (deferred to v0.16 slice train) - #76 multi-resolution context representations — needs a new representation layer, structurally invasive. - #79 PR-impact coverage calibration — needs real PRs to calibrate against, not a code-only delivery. - #80 cache-aware prompt layout measurement — purely measurement work; sort_key bands already shipped.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/runtime/stdio/tools.ts (1)

804-807: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject or explicitly gate delta_session_id for non-explain packs.

delta_session_id is advertised on context_pack, but the review and impact branches return before that parameter is ever read. A call like task: "review" plus delta_session_id silently falls back to a full pack instead of a delta pack.

Suggested guard

       const resolvedBudget = budget ?? 3000
+      const deltaSessionId = helpers.stringParamAlias(toolArguments, ['delta_session_id', 'deltaSessionId'])
+      if (deltaSessionId && task !== 'explain') {
+        return helpers.failure(id, helpers.jsonrpcInvalidParams, 'delta_session_id is currently supported only for task=explain')
+      }
       const plannerBudget = Math.max(resolvedBudget, 3)
       const initialPlan = buildTaskContextPlan({
         task_kind: task,
         prompt,
         budget: plannerBudget,
@@
-      const deltaSessionId = helpers.stringParamAlias(toolArguments, ['delta_session_id', 'deltaSessionId'])
       if (deltaSessionId) {

Also applies to: 820-843, 857-876, 893-894

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/stdio/tools.ts` around lines 804 - 807, The code currently
accepts a delta_session_id in the context_pack but never reads or gates it for
non-explain tasks, causing unexpected full-pack behavior for task === 'review'
or 'impact'; add an explicit guard after reading task (where
helpers.stringParam(toolArguments, 'task') is used) that checks for the presence
of delta_session_id via helpers.stringParam(toolArguments, 'delta_session_id')
and if found while task !== 'explain' immediately return helpers.failure(id,
helpers.jsonrpcInvalidParams, 'delta_session_id is only allowed when task is
"explain"'); apply the same guard in the other branches/handlers that parse
context_pack (the blocks covering the ranges you noted) so delta_session_id is
rejected unless task === 'explain'.

🧹 Nitpick comments (1)

src/runtime/value-per-token.ts (1)
36-38: 💤 Low value

Consider clarifying the token_cost constraint in the comment.

The comment states "Must be > 0" but immediately mentions zero-cost items as a special case. The implementation correctly handles token_cost === 0 by assigning infinite density and conditionally including via pinZeroCost. Consider rewording to something like: "Non-negative. Items with token_cost === 0 are treated as 'pinned' and sorted first."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/value-per-token.ts` around lines 36 - 38, Update the comment for
the token_cost field in value-per-token.ts to state that token_cost is
non-negative (>= 0) and that items with token_cost === 0 are treated as pinned
and sorted first via the pinZeroCost logic; locate the doc comment above the
token_cost property and replace "Must be > 0; zero-cost items are pinned (always
included)" with wording like "Non-negative. Items with token_cost === 0 are
treated as 'pinned' and sorted first (handled by pinZeroCost)."

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/runtime/stdio/tools.ts`:
- Around line 878-885: The diagnostics call is scoring only the delta fragment;
change computeContextPackDiagnostics to be invoked with the full context pack
(the variable fullPack produced by contextPackFromRetrieveResult(retrieval))
instead of deltaResult.delta_pack so diagnostics reflect the whole session
state; update all other occurrences (the similar call around the 895-919 region)
to pass the fullPack variable as well and ensure any helper variables
(compactPack, metadata) remain unchanged.

---

Outside diff comments:
In `@src/runtime/stdio/tools.ts`:
- Around line 804-807: The code currently accepts a delta_session_id in the
context_pack but never reads or gates it for non-explain tasks, causing
unexpected full-pack behavior for task === 'review' or 'impact'; add an explicit
guard after reading task (where helpers.stringParam(toolArguments, 'task') is
used) that checks for the presence of delta_session_id via
helpers.stringParam(toolArguments, 'delta_session_id') and if found while task
!== 'explain' immediately return helpers.failure(id,
helpers.jsonrpcInvalidParams, 'delta_session_id is only allowed when task is
"explain"'); apply the same guard in the other branches/handlers that parse
context_pack (the blocks covering the ranges you noted) so delta_session_id is
rejected unless task === 'explain'.

---

Nitpick comments:
In `@src/runtime/value-per-token.ts`:
- Around line 36-38: Update the comment for the token_cost field in
value-per-token.ts to state that token_cost is non-negative (>= 0) and that
items with token_cost === 0 are treated as pinned and sorted first via the
pinZeroCost logic; locate the doc comment above the token_cost property and
replace "Must be > 0; zero-cost items are pinned (always included)" with wording
like "Non-negative. Items with token_cost === 0 are treated as 'pinned' and
sorted first (handled by pinZeroCost)."

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 9e446405-fe5f-44d4-91c8-49a8e52e4080

📥 Commits

Reviewing files that changed from the base of the PR and between af2f5e6 and 046c10e.

📒 Files selected for processing (5)

src/runtime/stdio-server.ts
src/runtime/stdio/definitions.ts
src/runtime/stdio/tools.ts
src/runtime/value-per-token.ts
tests/unit/value-per-token.test.ts

coderabbitai · 2026-05-11T07:41:32Z

+      const fullPack = contextPackFromRetrieveResult(retrieval)
      const compactPack = compactRetrieveResult(retrieval)
      const metadata = contextMetadata(retrieval)
      storeExpandableHandles(prompt, task, initialPlan.evidence.recipe_id, metadata.expandable, helpers)
+      // Slice #78: emit context-pack quality diagnostics so callers can
+      // detect bad runs (missing required evidence, zero claims, weak
+      // retrieval, etc.) without re-implementing the heuristics.
+      const diagnostics = computeContextPackDiagnostics(fullPack)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Score the full pack, not just the delta fragment.

computeContextPackDiagnostics(deltaResult.delta_pack, ...) evaluates only newly shipped nodes. After the first turn, that can falsely trigger missing-evidence or zero-claims warnings even when the omitted nodes are already present in the client’s cached session.

Suggested fix

const diagnostics = computeContextPackDiagnostics(fullPack) @@ - diagnostics: computeContextPackDiagnostics(deltaResult.delta_pack, { skipBudgetUnderutilization: true }), + diagnostics,

Also applies to: 895-919

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/runtime/stdio/tools.ts` around lines 878 - 885, The diagnostics call is scoring only the delta fragment; change computeContextPackDiagnostics to be invoked with the full context pack (the variable fullPack produced by contextPackFromRetrieveResult(retrieval)) instead of deltaResult.delta_pack so diagnostics reflect the whole session state; update all other occurrences (the similar call around the 895-919 region) to pass the fullPack variable as well and ensure any helper variables (compactPack, metadata) remain unchanged.

CodeRabbit caught that the predicate excluded the worst-possible case (every node scoring exactly 0). The '> 0' clause meant a pack with three zero-scored nodes silently passed the rule, but that is precisely the kind of retrieval the warning was supposed to catch. Fix: drop the '> 0' clause. The NaN guard above already covers the 'no scored nodes' case (where avg_match_score is NaN). Added a test pinning the avg=0 case.

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

Comment thread src/runtime/context-pack-diagnostics.ts

mohanagy changed the title ~~feat(#78): context-pack quality diagnostics + bad-run detection (v0.15 slice 1)~~ feat: v0.15 finishers — diagnostics (#78) + delta context packs (#81) + value-per-token (#74) May 11, 2026

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

mohanagy merged commit 0e8df5a into main May 11, 2026
7 checks passed

mohanagy mentioned this pull request May 11, 2026

chore(release): v0.15.0 — context-pack quality signals #122

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: v0.15 finishers — diagnostics (#78) + delta context packs (#81) + value-per-token (#74)#121

feat: v0.15 finishers — diagnostics (#78) + delta context packs (#81) + value-per-token (#74)#121
mohanagy merged 3 commits into
mainfrom
feat/v0.15-finishers

mohanagy commented May 11, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 11, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mohanagy commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in the bundle

#78 Context-pack quality diagnostics

#81 Delta-only context packs via stdio

#74 Value-per-token budget selector

Not in this PR (deferred to v0.16)

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mohanagy commented May 11, 2026 •

edited

Loading

coderabbitai Bot commented May 11, 2026 •

edited

Loading