feat: v0.15 finishers — diagnostics (#78) + delta context packs (#81) + value-per-token (#74)#121
Conversation
…5 slice 1) Adds a deterministic structural quality scorer for compiled context-packs. Returns a 0-1 quality_score, a list of triggered warnings with kind/severity/message/detail, and the raw signals used to compute the score (node_count, claim_count, snippet_coverage, avg_match_score, budget_utilization, etc.). Rules implemented (each weighted into the score): - missing_required_evidence (error, weight 2) — pack lacks a required evidence class - missing_required_semantic (warn) — pack lacks a required semantic category - zero_claims (warn) — claims array is empty - undersized_retrieval (warn) — fewer than 3 nodes returned - budget_underutilized (info) — token_count < 25% of budget on a >= 500-token request - missing_snippets (warn) — > 50% of nodes lack a source snippet - low_avg_match_score (warn) — mean match_score < 0.30 (when scores exist) - orphan_nodes (warn) — > 1 nodes but zero relationships - no_graph_signals (info) — both god_nodes and bridge_nodes empty Surface points: - New contracts file src/contracts/context-pack-diagnostics.ts with ContextPackDiagnosticKind / Severity / Warning / Signals / Diagnostics types. - New runtime helper src/runtime/context-pack-diagnostics.ts exporting computeContextPackDiagnostics(pack, options?). Pure, deterministic, no I/O — fully unit-testable against synthetic CompiledContextPack inputs. - contextPackFromRetrieveResult is now exported from retrieve.ts (was private) so the stdio handler can construct the full pack shape from a RetrieveResult and feed it to the scorer. - stdio context_pack tool response now includes a diagnostics field on the explain branch. Impact and review branches use different pack taxonomies and will land in a follow-up. Verified: typecheck + build clean, 1749/1749 tests pass (+16 new). No public API surface changes outside the additive diagnostics field.
📝 WalkthroughWalkthroughThis PR adds deterministic context-pack quality diagnostics (contracts, engine, retrieval export, stdio tool/server integration, and tests) and a greedy value-per-token selector with types and unit tests. ChangesQuality Diagnostics Implementation
Value-per-Token Selector
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/runtime/context-pack-diagnostics.ts`:
- Around line 141-145: The condition that gates the low-average-match-score
warning incorrectly excludes the worst-case value 0; update the predicate in the
diagnostics check (the if that references signals,
UNDERSIZED_RETRIEVAL_THRESHOLD, and LOW_AVG_MATCH_SCORE) so it accepts zero
values (e.g., replace the strict > 0 check with a non-strict check such as >= 0
or remove the > 0 clause) while keeping the existing Number.isNaN guard and the
rest of the logic intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 4eff0ec6-c351-4d67-970b-49256ac64e83
📒 Files selected for processing (5)
src/contracts/context-pack-diagnostics.tssrc/runtime/context-pack-diagnostics.tssrc/runtime/retrieve.tssrc/runtime/stdio/tools.tstests/unit/context-pack-diagnostics.test.ts
…lector (#74) Extends PR #121 to cover the rest of v0.15 in one slice as requested: ## #81 Delta-only context packs via stdio - New context_pack parameter: delta_session_id. When set, the response ships only nodes the session hasn't received yet, plus referenced_ids[] for dropped nodes and bytes_saved. - New MCP tool context_pack_session_reset to clear a delta session and force the next call to ship the full pack. - 3 new StdioToolHelpers methods: getContextPackNodeIds, recordContextPackNodeIds, clearContextPackNodeIds. Backed by a per-MCP-process Map<sessionId, Set<nodeId>> in StdioSessionState with the same LRU bound as the prompt-session store (256 sessions). - Reuses the existing computeDeltaContextPack helper (already had the both-endpoints relationship filter fix from PR #106) and collectPackNodeIds for recording shipped ids. - Diagnostics on delta responses skip the budget_underutilized rule since a delta pack is small-by-design after dedup. ## #74 Value-per-token budget selector - New module src/runtime/value-per-token.ts exporting selectByValuePerToken(candidates, options). - Greedy density heuristic: sort by score / token_cost descending, pick the prefix that fits within budget. Tie-break: score desc, cost asc, id asc (deterministic). - Optional pinZeroCost (default true): zero-cost candidates are always included; set false to exclude them entirely. - Skips items whose individual cost exceeds the budget (cannot fit by definition) and items with non-finite scores or costs. - Returns selected payload list, total_cost, remaining_budget, and per-candidate ranking[] with rank/density/included for diagnostics. - Pure helper for now — adopting it inside retrieve.ts's candidate-selection pipeline is a follow-up once we have a benchmark to A/B against the current selector. ## Tests - 11 new value-per-token tests covering density preference, zero-cost gating, budget overflow skip, non-finite filtering, ranking shape, tie-break determinism, negative budget clamp, empty input. - MCP tool count increased to 26 (full profile). mcp-schema-budget test stays under the 12,000-byte ceiling after tightening the new tool descriptions. - Verified: typecheck + build clean, 1760/1760 pass. ## Not in this PR (deferred to v0.16 slice train) - #76 multi-resolution context representations — needs a new representation layer, structurally invasive. - #79 PR-impact coverage calibration — needs real PRs to calibrate against, not a code-only delivery. - #80 cache-aware prompt layout measurement — purely measurement work; sort_key bands already shipped.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/runtime/stdio/tools.ts (1)
804-807:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReject or explicitly gate
delta_session_idfor non-explainpacks.
delta_session_idis advertised oncontext_pack, but thereviewandimpactbranches return before that parameter is ever read. A call liketask: "review"plusdelta_session_idsilently falls back to a full pack instead of a delta pack.Suggested guard
const resolvedBudget = budget ?? 3000 + const deltaSessionId = helpers.stringParamAlias(toolArguments, ['delta_session_id', 'deltaSessionId']) + if (deltaSessionId && task !== 'explain') { + return helpers.failure(id, helpers.jsonrpcInvalidParams, 'delta_session_id is currently supported only for task=explain') + } const plannerBudget = Math.max(resolvedBudget, 3) const initialPlan = buildTaskContextPlan({ task_kind: task, prompt, budget: plannerBudget, @@ - const deltaSessionId = helpers.stringParamAlias(toolArguments, ['delta_session_id', 'deltaSessionId']) if (deltaSessionId) {Also applies to: 820-843, 857-876, 893-894
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/runtime/stdio/tools.ts` around lines 804 - 807, The code currently accepts a delta_session_id in the context_pack but never reads or gates it for non-explain tasks, causing unexpected full-pack behavior for task === 'review' or 'impact'; add an explicit guard after reading task (where helpers.stringParam(toolArguments, 'task') is used) that checks for the presence of delta_session_id via helpers.stringParam(toolArguments, 'delta_session_id') and if found while task !== 'explain' immediately return helpers.failure(id, helpers.jsonrpcInvalidParams, 'delta_session_id is only allowed when task is "explain"'); apply the same guard in the other branches/handlers that parse context_pack (the blocks covering the ranges you noted) so delta_session_id is rejected unless task === 'explain'.
🧹 Nitpick comments (1)
src/runtime/value-per-token.ts (1)
36-38: 💤 Low valueConsider clarifying the
token_costconstraint in the comment.The comment states "Must be > 0" but immediately mentions zero-cost items as a special case. The implementation correctly handles
token_cost === 0by assigning infinite density and conditionally including viapinZeroCost. Consider rewording to something like: "Non-negative. Items withtoken_cost === 0are treated as 'pinned' and sorted first."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/runtime/value-per-token.ts` around lines 36 - 38, Update the comment for the token_cost field in value-per-token.ts to state that token_cost is non-negative (>= 0) and that items with token_cost === 0 are treated as pinned and sorted first via the pinZeroCost logic; locate the doc comment above the token_cost property and replace "Must be > 0; zero-cost items are pinned (always included)" with wording like "Non-negative. Items with token_cost === 0 are treated as 'pinned' and sorted first (handled by pinZeroCost)."
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/runtime/stdio/tools.ts`:
- Around line 878-885: The diagnostics call is scoring only the delta fragment;
change computeContextPackDiagnostics to be invoked with the full context pack
(the variable fullPack produced by contextPackFromRetrieveResult(retrieval))
instead of deltaResult.delta_pack so diagnostics reflect the whole session
state; update all other occurrences (the similar call around the 895-919 region)
to pass the fullPack variable as well and ensure any helper variables
(compactPack, metadata) remain unchanged.
---
Outside diff comments:
In `@src/runtime/stdio/tools.ts`:
- Around line 804-807: The code currently accepts a delta_session_id in the
context_pack but never reads or gates it for non-explain tasks, causing
unexpected full-pack behavior for task === 'review' or 'impact'; add an explicit
guard after reading task (where helpers.stringParam(toolArguments, 'task') is
used) that checks for the presence of delta_session_id via
helpers.stringParam(toolArguments, 'delta_session_id') and if found while task
!== 'explain' immediately return helpers.failure(id,
helpers.jsonrpcInvalidParams, 'delta_session_id is only allowed when task is
"explain"'); apply the same guard in the other branches/handlers that parse
context_pack (the blocks covering the ranges you noted) so delta_session_id is
rejected unless task === 'explain'.
---
Nitpick comments:
In `@src/runtime/value-per-token.ts`:
- Around line 36-38: Update the comment for the token_cost field in
value-per-token.ts to state that token_cost is non-negative (>= 0) and that
items with token_cost === 0 are treated as pinned and sorted first via the
pinZeroCost logic; locate the doc comment above the token_cost property and
replace "Must be > 0; zero-cost items are pinned (always included)" with wording
like "Non-negative. Items with token_cost === 0 are treated as 'pinned' and
sorted first (handled by pinZeroCost)."
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 9e446405-fe5f-44d4-91c8-49a8e52e4080
📒 Files selected for processing (5)
src/runtime/stdio-server.tssrc/runtime/stdio/definitions.tssrc/runtime/stdio/tools.tssrc/runtime/value-per-token.tstests/unit/value-per-token.test.ts
| const fullPack = contextPackFromRetrieveResult(retrieval) | ||
| const compactPack = compactRetrieveResult(retrieval) | ||
| const metadata = contextMetadata(retrieval) | ||
| storeExpandableHandles(prompt, task, initialPlan.evidence.recipe_id, metadata.expandable, helpers) | ||
| // Slice #78: emit context-pack quality diagnostics so callers can | ||
| // detect bad runs (missing required evidence, zero claims, weak | ||
| // retrieval, etc.) without re-implementing the heuristics. | ||
| const diagnostics = computeContextPackDiagnostics(fullPack) |
There was a problem hiding this comment.
Score the full pack, not just the delta fragment.
computeContextPackDiagnostics(deltaResult.delta_pack, ...) evaluates only newly shipped nodes. After the first turn, that can falsely trigger missing-evidence or zero-claims warnings even when the omitted nodes are already present in the client’s cached session.
Suggested fix
const diagnostics = computeContextPackDiagnostics(fullPack)
@@
- diagnostics: computeContextPackDiagnostics(deltaResult.delta_pack, { skipBudgetUnderutilization: true }),
+ diagnostics,Also applies to: 895-919
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/runtime/stdio/tools.ts` around lines 878 - 885, The diagnostics call is
scoring only the delta fragment; change computeContextPackDiagnostics to be
invoked with the full context pack (the variable fullPack produced by
contextPackFromRetrieveResult(retrieval)) instead of deltaResult.delta_pack so
diagnostics reflect the whole session state; update all other occurrences (the
similar call around the 895-919 region) to pass the fullPack variable as well
and ensure any helper variables (compactPack, metadata) remain unchanged.
CodeRabbit caught that the predicate excluded the worst-possible case (every node scoring exactly 0). The '> 0' clause meant a pack with three zero-scored nodes silently passed the rule, but that is precisely the kind of retrieval the warning was supposed to catch. Fix: drop the '> 0' clause. The NaN guard above already covers the 'no scored nodes' case (where avg_match_score is NaN). Added a test pinning the avg=0 case.
Closes #78
Closes #81
Closes #74
Summary
Closes out the v0.15 substrate work in one bundled PR (as requested). Three issues landed:
What's in the bundle
#78 Context-pack quality diagnostics
New scorer
computeContextPackDiagnostics(pack)with 9 weighted rules:missing_required_evidencemissing_required_semanticzero_claimsundersized_retrievalbudget_underutilizedmissing_snippetslow_avg_match_scoreorphan_nodesno_graph_signalsWired into the stdio
context_packresponse (explain branch + delta branch).#81 Delta-only context packs via stdio
context_pack:delta_session_id. When set, the response ships only nodes not already received by this session, plusreferenced_ids[]andbytes_saved.context_pack_session_reset— clears a delta session so the next call ships the full pack again.StdioToolHelpers:getContextPackNodeIds,recordContextPackNodeIds,clearContextPackNodeIds. Backed by an LRUMap<sessionId, Set<nodeId>>(256 sessions, same bound as prompt sessions).computeDeltaContextPack(already had the both-endpoints relationship filter from PR feat: bundle — v0.16 finishers + 1c-ii.a framework propagation (#79, #80, #81, #72) #106).budget_underutilizedrule (deltas are small-by-design after dedup).#74 Value-per-token budget selector
src/runtime/value-per-token.tsexportingselectByValuePerToken(candidates, options).score / token_costdesc, pick the prefix that fits within budget. Deterministic tie-breaking: score desc → cost asc → id asc.pinZeroCost(default true): zero-cost items always included; flip to exclude.ranking[]with rank/density/included flags for diagnostics.retrieve.ts's candidate selection is a follow-up once we have a benchmark to A/B against.Not in this PR (deferred to v0.16)
Test plan
npm run typecheckcleannpm run buildcleannpm run test:run— 1760/1760 pass (103 files, +27 new tests)Summary by CodeRabbit
New Features
Tests