feat: bundle — v0.16 finishers + 1c-ii.a framework propagation (#79, #80, #81, #72)#106
Conversation
… (slice 1c-ii.a) When SPI symbols carry a framework_role (set by slice 3b's NestJS detector), the projector now surfaces `framework`, `framework_role`, and `node_kind` on the projected ExtractionNode. Maps SPI roles back onto the legacy extractor's shape so downstream consumers can route framework-aware UX without re-classifying. Full byte-equivalence on demo-repo's framework-specific synthetic nodes (e.g. NestJS route nodes minted as a separate symbol with route_path) remains slice 1c-iii.
Adds `coverage_score` (numeric, [0, 1]) and `uncovered_hotspots` (ChangedNode[]) to PrImpactResult and CompactPrImpactResult. Coverage score = ratio of high-impact changed nodes whose label appears in the review bundle. uncovered_hotspots is the corresponding list of ChangedNode entries that didn't make it. By convention coverage_score = 1.0 when there are no high-impact nodes (no coverage gap to score). The compact result preserves both fields verbatim so MCP and CLI clients can audit pack coverage without round-tripping through the verbose payload.
Most of #80's mechanics already lived in buildContextPrompt: deterministic sort_key ordering for stable sections, separate stable_prefix vs dynamic_suffix rendering, stable_prefix_tokens / reused_context_tokens / effective_prompt_tokens metrics, and a session-aware delta payload. What was missing was (a) the convention for sort_key prefixes that maximises Anthropic's automatic prompt-cache reuse, and (b) regression tests asserting the prefix is byte-stable across follow-ups. Adds JSDoc on ContextPromptStableSection documenting the recommended sort_key bands (01_workspace_*, 10_communities_*, 20_evidence_*, 90_anchor_*). Adds tests/unit/context-prompt-cache-stability.test.ts pinning: byte-identical stable_prefix on two consecutive calls with the same anchor, deterministic ordering regardless of input order, anchor-shifted prefixes still keep the workspace+communities portion shared, follow-up calls with a prior session_state report non-zero reused_context_tokens, and the stable prefix never embeds an ISO timestamp (cache-invalidation regression guard).
Adds computeDeltaContextPack(pack, previouslySentNodeIds) — a pure helper that filters a CompiledContextPack down to only the nodes the agent has not yet received in the current session, plus an explicit referenced_ids list of the dropped handles. Drops relationships whose endpoints were filtered out, on the basis that the receiver already has the source/target node and can reconstruct the edge from session state if needed. Returns bytes_saved so callers can verify the second-call payload trends down across a multi-turn session. Plus collectPackNodeIds(pack) for callers to build their session-state record after each call. This is the standalone, side-effect-free building block of #81. Wiring into the stdio context_pack tool's session state (the per-session handle store, response shape, and reset flow) is intentionally a follow-up — that surgery touches enough of the stdio session infrastructure that bundling it here would balloon the diff past safe-review size for one PR. The helper has full coverage so the consuming follow-up PR can wire it in with a one-line change.
📝 WalkthroughWalkthroughThis PR propagates SPI framework_role into projected ExtractionNodes (with NestJS role→kind mapping), adds PR-impact coverage metrics (coverage_score and uncovered_hotspots) recalculated after compaction, and introduces a context-pack delta helper plus prompt-cache stability documentation and tests. ChangesFramework Role Propagation to Extracted Nodes
PR Impact Coverage Scoring
Context and Delta Optimization
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/runtime/context-pack-delta.ts`:
- Around line 76-87: The filter currently drops any relationship if either
endpoint is in referencedIds; update the predicate used for keptRelationships
(iterating pack.relationships) to only exclude relationships when both endpoints
are referenced. Specifically, build referencedSet from referencedIds, derive
fromId/toId as done now, and change the condition to return false only when
fromId and toId are both non-null and referencedSet.has(fromId) &&
referencedSet.has(toId); otherwise keep the relationship so mixed
(new↔referenced) edges are preserved.
In `@src/runtime/pr-impact.ts`:
- Around line 1117-1133: Recompute coverageScore and uncoveredHotspots after the
reviewBundle has been compacted: build reviewBundleLabels from the
post-compaction reviewBundle.nodes, create highImpactLabelSet from
highImpactNodes, then recalculate totalHighImpact, coveredHighImpact,
coverageScore (default 1 when totalHighImpact === 0) and uncoveredHotspots by
filtering changedNodes.map(n => n.serialized) against highImpactLabelSet and the
recomputed reviewBundleLabels; update the existing
coverage_score/uncovered_hotspots assignments to use these recomputed values so
compaction-dropped nodes aren’t falsely counted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: b43d6abe-30e9-4085-b101-2b775464ed43
📒 Files selected for processing (9)
src/infrastructure/context-prompt.tssrc/pipeline/spi/projector.tssrc/runtime/context-pack-delta.tssrc/runtime/pr-impact.tstests/unit/context-pack-delta.test.tstests/unit/context-prompt-cache-stability.test.tstests/unit/pr-impact-coverage.test.tstests/unit/pr-impact.test.tstests/unit/spi-projector.test.ts
Two valid catches from the bundle's review: 1. #81 delta-helper edge filter was too aggressive. The original logic dropped a relationship if EITHER endpoint was previously sent (referenced). That's wrong — a mixed edge (one new endpoint, one referenced) carries novel information about how the new node connects to the known one and must be kept. The fix: only drop relationships when BOTH endpoints are already in the receiver's session. The corresponding test was rewritten to assert the new semantic explicitly: 4 input edges (both-new, mixed×2, both-referenced) → 3 kept, 1 dropped. 2. #79 compact coverage_score and uncovered_hotspots inherited the verbose result's values verbatim. If compactReviewBundle drops a high-impact node during compaction, the compact result would silently claim coverage that no longer existed in the compact bundle. The fix: recompute coverage in compactPrImpactResult against the post-compaction review bundle. Builds the compactedReviewBundle once, derives compactedReviewLabels from its nodes, then recomputes compactCoverageScore + compactUncoveredHotspots from the high-impact set. New regression test constructs a 12-hotspot fixture and asserts compact.coverage_score matches the labels actually present in compact.review_bundle.nodes — i.e., honest reporting matches the compact payload.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/runtime/pr-impact.ts (1)
1189-1220:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAlign compact coverage metrics with the compact high-impact list.
Line 1189 computes coverage using the full
result.risk_summary.high_impact_nodes, but Line 1219 truncatesrisk_summary.high_impact_nodes. This can produce acoverage_score/uncovered_hotspotsset that cannot be reconciled with the compact payload’s own high-impact list.💡 Proposed fix
- const compactHighImpactSet = new Set(result.risk_summary.high_impact_nodes) + const compactHighImpactNodes = result.risk_summary.high_impact_nodes.slice(0, MAX_COMPACT_HIGH_IMPACT_NODES) + const compactHighImpactSet = new Set(compactHighImpactNodes) @@ risk_summary: { ...result.risk_summary, - high_impact_nodes: result.risk_summary.high_impact_nodes.slice(0, MAX_COMPACT_HIGH_IMPACT_NODES), + high_impact_nodes: compactHighImpactNodes, },🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/runtime/pr-impact.ts` around lines 1189 - 1220, The coverage and uncovered-hotspots calculations should use the same truncated high-impact list that's put into the compact payload: replace uses of result.risk_summary.high_impact_nodes when building compactHighImpactSet/compactTotalHighImpact/compactCoveredHighImpact/compactCoverageScore/compactUncoveredHotspots with the truncated array result.risk_summary.high_impact_nodes.slice(0, MAX_COMPACT_HIGH_IMPACT_NODES); ensure you still check membership against compactedReviewLabels and use the same MAX_COMPACT_HIGH_IMPACT_NODES symbol so the computed compactCoverageScore and compactUncoveredHotspots align with the risk_summary.high_impact_nodes included in the returned object.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/runtime/pr-impact.ts`:
- Around line 1189-1220: The coverage and uncovered-hotspots calculations should
use the same truncated high-impact list that's put into the compact payload:
replace uses of result.risk_summary.high_impact_nodes when building
compactHighImpactSet/compactTotalHighImpact/compactCoveredHighImpact/compactCoverageScore/compactUncoveredHotspots
with the truncated array result.risk_summary.high_impact_nodes.slice(0,
MAX_COMPACT_HIGH_IMPACT_NODES); ensure you still check membership against
compactedReviewLabels and use the same MAX_COMPACT_HIGH_IMPACT_NODES symbol so
the computed compactCoverageScore and compactUncoveredHotspots align with the
risk_summary.high_impact_nodes included in the returned object.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: e494d3de-8a05-4644-974e-c179ce3bf4b8
📒 Files selected for processing (4)
src/runtime/context-pack-delta.tssrc/runtime/pr-impact.tstests/unit/context-pack-delta.test.tstests/unit/pr-impact-coverage.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- src/runtime/context-pack-delta.ts
- tests/unit/context-pack-delta.test.ts
…lector (#74) Extends PR #121 to cover the rest of v0.15 in one slice as requested: ## #81 Delta-only context packs via stdio - New context_pack parameter: delta_session_id. When set, the response ships only nodes the session hasn't received yet, plus referenced_ids[] for dropped nodes and bytes_saved. - New MCP tool context_pack_session_reset to clear a delta session and force the next call to ship the full pack. - 3 new StdioToolHelpers methods: getContextPackNodeIds, recordContextPackNodeIds, clearContextPackNodeIds. Backed by a per-MCP-process Map<sessionId, Set<nodeId>> in StdioSessionState with the same LRU bound as the prompt-session store (256 sessions). - Reuses the existing computeDeltaContextPack helper (already had the both-endpoints relationship filter fix from PR #106) and collectPackNodeIds for recording shipped ids. - Diagnostics on delta responses skip the budget_underutilized rule since a delta pack is small-by-design after dedup. ## #74 Value-per-token budget selector - New module src/runtime/value-per-token.ts exporting selectByValuePerToken(candidates, options). - Greedy density heuristic: sort by score / token_cost descending, pick the prefix that fits within budget. Tie-break: score desc, cost asc, id asc (deterministic). - Optional pinZeroCost (default true): zero-cost candidates are always included; set false to exclude them entirely. - Skips items whose individual cost exceeds the budget (cannot fit by definition) and items with non-finite scores or costs. - Returns selected payload list, total_cost, remaining_budget, and per-candidate ranking[] with rank/density/included for diagnostics. - Pure helper for now — adopting it inside retrieve.ts's candidate-selection pipeline is a follow-up once we have a benchmark to A/B against the current selector. ## Tests - 11 new value-per-token tests covering density preference, zero-cost gating, budget overflow skip, non-finite filtering, ranking shape, tie-break determinism, negative budget clamp, empty input. - MCP tool count increased to 26 (full profile). mcp-schema-budget test stays under the 12,000-byte ceiling after tightening the new tool descriptions. - Verified: typecheck + build clean, 1760/1760 pass. ## Not in this PR (deferred to v0.16 slice train) - #76 multi-resolution context representations — needs a new representation layer, structurally invasive. - #79 PR-impact coverage calibration — needs real PRs to calibrate against, not a code-only delivery. - #80 cache-aware prompt layout measurement — purely measurement work; sort_key bands already shipped.
… + value-per-token (#74) (#121) * feat(#78): context-pack quality diagnostics + bad-run detection (v0.15 slice 1) Adds a deterministic structural quality scorer for compiled context-packs. Returns a 0-1 quality_score, a list of triggered warnings with kind/severity/message/detail, and the raw signals used to compute the score (node_count, claim_count, snippet_coverage, avg_match_score, budget_utilization, etc.). Rules implemented (each weighted into the score): - missing_required_evidence (error, weight 2) — pack lacks a required evidence class - missing_required_semantic (warn) — pack lacks a required semantic category - zero_claims (warn) — claims array is empty - undersized_retrieval (warn) — fewer than 3 nodes returned - budget_underutilized (info) — token_count < 25% of budget on a >= 500-token request - missing_snippets (warn) — > 50% of nodes lack a source snippet - low_avg_match_score (warn) — mean match_score < 0.30 (when scores exist) - orphan_nodes (warn) — > 1 nodes but zero relationships - no_graph_signals (info) — both god_nodes and bridge_nodes empty Surface points: - New contracts file src/contracts/context-pack-diagnostics.ts with ContextPackDiagnosticKind / Severity / Warning / Signals / Diagnostics types. - New runtime helper src/runtime/context-pack-diagnostics.ts exporting computeContextPackDiagnostics(pack, options?). Pure, deterministic, no I/O — fully unit-testable against synthetic CompiledContextPack inputs. - contextPackFromRetrieveResult is now exported from retrieve.ts (was private) so the stdio handler can construct the full pack shape from a RetrieveResult and feed it to the scorer. - stdio context_pack tool response now includes a diagnostics field on the explain branch. Impact and review branches use different pack taxonomies and will land in a follow-up. Verified: typecheck + build clean, 1749/1749 tests pass (+16 new). No public API surface changes outside the additive diagnostics field. * feat(v0.15): delta-only context-pack stdio (#81) + value-per-token selector (#74) Extends PR #121 to cover the rest of v0.15 in one slice as requested: ## #81 Delta-only context packs via stdio - New context_pack parameter: delta_session_id. When set, the response ships only nodes the session hasn't received yet, plus referenced_ids[] for dropped nodes and bytes_saved. - New MCP tool context_pack_session_reset to clear a delta session and force the next call to ship the full pack. - 3 new StdioToolHelpers methods: getContextPackNodeIds, recordContextPackNodeIds, clearContextPackNodeIds. Backed by a per-MCP-process Map<sessionId, Set<nodeId>> in StdioSessionState with the same LRU bound as the prompt-session store (256 sessions). - Reuses the existing computeDeltaContextPack helper (already had the both-endpoints relationship filter fix from PR #106) and collectPackNodeIds for recording shipped ids. - Diagnostics on delta responses skip the budget_underutilized rule since a delta pack is small-by-design after dedup. ## #74 Value-per-token budget selector - New module src/runtime/value-per-token.ts exporting selectByValuePerToken(candidates, options). - Greedy density heuristic: sort by score / token_cost descending, pick the prefix that fits within budget. Tie-break: score desc, cost asc, id asc (deterministic). - Optional pinZeroCost (default true): zero-cost candidates are always included; set false to exclude them entirely. - Skips items whose individual cost exceeds the budget (cannot fit by definition) and items with non-finite scores or costs. - Returns selected payload list, total_cost, remaining_budget, and per-candidate ranking[] with rank/density/included for diagnostics. - Pure helper for now — adopting it inside retrieve.ts's candidate-selection pipeline is a follow-up once we have a benchmark to A/B against the current selector. ## Tests - 11 new value-per-token tests covering density preference, zero-cost gating, budget overflow skip, non-finite filtering, ranking shape, tie-break determinism, negative budget clamp, empty input. - MCP tool count increased to 26 (full profile). mcp-schema-budget test stays under the 12,000-byte ceiling after tightening the new tool descriptions. - Verified: typecheck + build clean, 1760/1760 pass. ## Not in this PR (deferred to v0.16 slice train) - #76 multi-resolution context representations — needs a new representation layer, structurally invasive. - #79 PR-impact coverage calibration — needs real PRs to calibrate against, not a code-only delivery. - #80 cache-aware prompt layout measurement — purely measurement work; sort_key bands already shipped. * fix(#78): low_avg_match_score must fire on avg=0 (CodeRabbit) CodeRabbit caught that the predicate excluded the worst-possible case (every node scoring exactly 0). The '> 0' clause meant a pack with three zero-scored nodes silently passed the rule, but that is precisely the kind of retrieval the warning was supposed to catch. Fix: drop the '> 0' clause. The NaN guard above already covers the 'no scored nodes' case (where avg_match_score is NaN). Added a test pinning the avg=0 case.
Closes part of #79, #80, #81. Advances #72 (slice 1c-ii.a). Built on top of #105 (post-merge to main).
Commits in this PR (each its own logical unit)
1. `adf95ce` — feat(#72): SPI projector → propagate framework_role (slice 1c-ii.a)
When SPI symbols carry a `framework_role` (set by slice 3b's NestJS detector), the projector now surfaces `framework`, `framework_role`, and `node_kind` on the projected ExtractionNode. Maps SPI roles back onto the legacy extractor's shape so downstream consumers can route framework-aware UX without re-classifying. Full byte-equivalence on demo-repo (slice 1c-ii.b through .e — porting Express / Next.js / React Router / Redux extractor logic) remains future work.
2. `20ad12f` — feat(#79): PR-impact coverage scoring
Adds `coverage_score` (numeric, [0, 1]) and `uncovered_hotspots` (`ChangedNode[]`) to `PrImpactResult` and `CompactPrImpactResult`. Coverage score = ratio of high-impact changed nodes whose label appears in the review bundle. Convention: 1.0 when there are no high-impact nodes (no coverage gap to score). The compact result preserves both fields verbatim so MCP and CLI clients can audit pack coverage without round-tripping through the verbose payload.
3. `939c71e` — feat(#80): document + regression-test cache-aware prompt layout
Most of #80's mechanics already lived in `buildContextPrompt` (deterministic sort_key ordering, separate stable_prefix vs dynamic_suffix, `stable_prefix_tokens` / `reused_context_tokens` / `effective_prompt_tokens` metrics, session-aware delta payload). What was missing:
4. `273fa64` — feat(#81): standalone delta-pack helper
Adds `computeDeltaContextPack(pack, previouslySentNodeIds)` — pure side-effect-free filter that returns the input pack with overlapping nodes + their relationships removed, plus an explicit `referenced_ids` list of dropped handles and a `bytes_saved` measurement. Plus `collectPackNodeIds(pack)` for callers to record what the agent received after each call.
Test plan
What's intentionally deferred
Per the bundle's "minimum viable per issue" scope, each issue is partially closed by this PR. The remaining work for each:
Refs #79, #80, #81, #72.
Summary by CodeRabbit
New Features
Documentation
Tests