Add slice-v1 retrieval and benchmark tooling by mohanagy · Pull Request #144 · mohanagy/graphify-ts

mohanagy · 2026-05-11T18:55:57Z

Summary

add opt-in slice-v1 retrieval plus MCP/CLI surface support and regression coverage
extend resolution: 'sketch' with deterministic env/config and side-effect hints
add real-workspace benchmark tooling, graph stats helpers, and value-per-token calibration reporting

Verification

npm run typecheck
npm run build
npm run test:run
bash docs/benchmarks/2026-05-11-spi-vs-legacy/run.sh

Benchmark notes

bundled fixture still does not show token reduction: legacy 947 vs SPI cold 1002
bundled fixture still shows smaller graph output: 67,254 bytes -> 45,589 bytes
calibration stayed honest on the fixture: 0 helps / 7 no material change / 0 hurts-or-expands

Notes

slice-v1 is opt-in and experimental
real-workspace benchmark flow uses env vars only; no private paths or artifacts are committed

Summary by CodeRabbit

New Features
- Opt-in task‑conditioned slicing retrieval mode ("slice‑v1") exposed in CLI and stdio tools with validation.
- Sketch mode improved with deterministic env/config reads and side‑effect/latency hints.
- Real‑workspace benchmark flow: runner, bundling, and consolidated reporting with node/edge metrics and calibration buckets.
Documentation
- Updated benchmark reporting, reproducibility docs, example prompts, and a report template for real‑workspace comparisons.

Add opt-in slice-v1 retrieval, richer sketch semantics, real-workspace benchmark helpers, and calibration/diagnostic coverage for the post-v0.21 payoff layer. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

coderabbitai · 2026-05-11T18:56:10Z

📝 Walkthrough

Walkthrough

This PR adds an opt-in task-conditioned retrieval mode "slice-v1" that anchors and traverses a KnowledgeGraph to produce ordered slice node lists and metadata, exposes retrieval_strategy/slice through CLI/stdio/command paths with validation, enriches sketch output with deterministic env/side-effect hints, and introduces real-workspace benchmark tooling (edge-counts, multi-workspace runner/summarizer, probe calibration) plus tests and docs.

Changes

Task-Conditioned Slice-v1 Retrieval Strategy

Layer / File(s)	Summary
Slice Metadata & Contracts `src/contracts/context-pack.ts`	Adds `ContextPackRetrievalStrategy` union and slice metadata interfaces; `CompiledContextPack` extended with optional `retrieval_strategy` and `slice`.
CLI Parser `src/cli/parser.ts`	Adds `PackCliOptions.retrievalStrategy`, updates usage/help, parses `--retrieval-strategy [default
Context-Pack Command Forwarding `src/infrastructure/context-pack-command.ts`	Expands retrieveContext options typing to accept `retrievalStrategy`; runContextPackCommand forwards it in non-review tasks and rejects it for `task=review`.
MCP Tool Schema `src/runtime/stdio/definitions.ts`	Adds `retrieval_strategy` property to `retrieve` and `context_pack` tool input schemas with enum `['default','slice-v1']` and adjusts descriptions.
Stdio Handlers `src/runtime/stdio/tools.ts`	Adds `parseRetrievalStrategyParam` helper; validate `retrieval_strategy` param, reject invalid values with jsonrpcInvalidParams, disallow for `task=review`, and forward validated `retrievalStrategy` into retrieval calls.
Slicing Policy & Algorithm `src/runtime/retrieve/slicing.ts`	New `SliceScoredNode` shape and `sliceCandidatesForRetrieve` implementing anchor selection (up to 2 anchors), intent→policy mapping, barrel suppression, BFS traversal with relation allowlists, helper neighbor expansion, path recording, and returning `ordered_ids` + metadata or `null`.
Retrieve Integration `src/runtime/retrieve.ts`	`RetrieveOptions` gains `retrievalStrategy`; `RetrieveResult` includes `retrieval_strategy` and `slice`; adds `scoredNodeFromGraph`; implements slice-v1 branch in `retrieveContext` using `sliceCandidatesForRetrieve`, preserves lexical slice during semantic rerank by filtering semantic IDs, injects slice metadata into results, and propagates strategy/slice in compact outputs and compiled packs.
Deterministic Sketch Emission `src/runtime/context-pack-resolution.ts`	Adds internal `sideEffectHints` helper; separates `readsEnv` from `config`, computes side-effect and latency-sensitive hints, expands behavior_sketch emission condition, and appends `reads env:`, `side effects:`, and `latency-sensitive:` lines when present.
Slice-v1 Unit Tests `tests/unit/retrieve-slice-v1.test.ts`	Tests explain/debug/impact slice modes using a deterministic buildSliceGraph fixture; asserts anchor-bounded selection, barrel suppression, direction-aware traversal, and presence of slice metadata fields.
Surface & Integration Tests `tests/unit/retrieve-slice-surface.test.ts`, `tests/unit/stdio-slice-surface.test.ts`	Tests CLI/pack parser accepts/rejects `--retrieval-strategy`, runContextPackCommand behavior for explain/reject for review, stdio retrieve/context_pack validate/forward/disallow strategy, and temp-graph fixture setup/cleanup.
Sketch Enhancement Tests `tests/unit/context-pack-resolution-sketch.test.ts`	Adds tests verifying behavior_sketch includes deterministic `reads env` and `config` markers, side-effect and latency-sensitive hints for inferred HTTP/LLM/DB operations, and that contains-only relationships do not spuriously infer side-effects.

Real-Workspace Benchmark Infrastructure

Layer / File(s)	Summary
Changelog & README `CHANGELOG.md`, `README.md`	Document Unreleased slice-v1, deterministic sketch changes, safe CLI/MCP exposure, and updated README benchmark wording (removes generic pack-token claim; reports framework correctness, retrieval expansion, graph size −32%, cache-hit rebuild −27%, and no explain-pack token win for fixture).
Graph Stats CLI `docs/benchmarks/.../graph-stats.mjs`	New Node.js CLI that reads a `graph.json` path, safely parses JSON, computes `node_count` and `edge_count` (0 defaults), and prints JSON to stdout; usage/parse exit codes implemented.
Runner: edge_count integration `docs/benchmarks/.../run.sh`	run.sh now uses `graph-stats.mjs` for node/edge counting across legacy/spi-cold/spi-warm, writes `edge_count` into per-variant JSON, logs edges, and uses timestamped RESULTS_DIR with optional override.
Probe Extensions `docs/benchmarks/.../probe.mjs`	Adds resolution token estimates and `top_files`, runs additional retrieval with `retrievalStrategy: 'slice-v1'` (value-per-token selection), records `slice` deltas, classifies calibration buckets (helps/hurts_or_expands/no_material_change), and includes `calibration` in final output.
Real-Workspace Orchestration `docs/benchmarks/.../run-real-workspace.sh`, `summarize-real-workspaces.mjs`	run-real-workspace.sh validates prompts/workspace inputs, runs per-workspace run.sh with env forwarding into timestamped results bundle; summarize-real-workspaces.mjs orders workspaces (backend, monorepo, lexicographic fallback), reads/parses per-workspace `summary.json`, flattens `objective_metrics` (build_time_ms, graph_size_bytes, node_count, edge_count), and emits aggregated JSON with qualitative notes.
Prompts & Report Template `docs/benchmarks/.../prompts.real-workspace.example.json`, `REAL_WORKSPACE_REPORT_TEMPLATE.md`	Adds example prompts JSON (schema_version:1 and typed prompt entries) and a report template covering workspace matrix, strategy/resolution comparisons, retrieval-level comparisons, calibration checklist, and privacy/disclaimer guidance.
Benchmark README `docs/benchmarks/.../README.md`	Documents that variant JSON includes `edge_count` and documents the Real-workspace matrix runner (env vars, defaults, produced artifacts).
Benchmark Tests `tests/unit/benchmark-graph-stats.test.ts`, `tests/unit/benchmark-real-workspace.test.ts`, `tests/unit/benchmark-probe-calibration.test.ts`	Adds unit tests for graph-stats (edges present, edges omitted, malformed JSON error), integration-style tests for summarize-real-workspaces ordering/metrics and fail-fast validations, prompt/template schema checks, and calibration classifier tests.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

mohanagy/graphify-ts#24: Overlaps in retrieve pipeline and scoring/compaction logic touched by this PR.
mohanagy/graphify-ts#123: Related context-pack resolution/sketch rendering changes.
mohanagy/graphify-ts#142: Overlaps in benchmark/probe tooling and resolution handling.

"🐰 I hop and mark each anchored node,
I trace the paths where slices roam,
I count the edges, guard private code,
My sketches hum of env and home.
Benchmarks gather workspaces near—
A tidy slice, the result is clear."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 2.56% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main changes: adding slice-v1 retrieval and benchmark tooling, which aligns with the core objectives.
Description check	✅ Passed	The PR description covers all required template sections with concrete details: summary lists three distinct changes, verification includes four specific commands, testing checkboxes are shown, and focused scope is documented.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch payoff-v0-22

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/infrastructure/context-pack-command.ts (1)
221-244: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

--retrieval-strategy is ignored for review packs.

The new flag is forwarded for explain/impact only. runContextPackCommand() returns early for task === 'review', so graphify-ts pack --task review --retrieval-strategy slice-v1 succeeds but has no effect. Either reject that combination or thread it into the review bundle path.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/infrastructure/context-pack-command.ts` around lines 221 - 244,
runContextPackCommand() currently ignores options.retrievalStrategy for task ===
'review' because it returns early; update the review path to either validate and
reject the combination or pass retrievalStrategy into the review bundle
retrieval call: locate runContextPackCommand(), the early-return branch that
handles task === 'review', and the code that builds retrieval options for
dependencies.retrieveContext (and any call sites of retrieveContext or
buildCommunityLabels/compactImpactResult used for review bundles) and ensure
options.retrievalStrategy (and options.retrievalLevel if relevant) are forwarded
into the retrieval options for review processing, or add an explicit validation
that throws a clear error when task === 'review' and retrievalStrategy is set so
callers know the flag is unsupported.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/benchmarks/2026-05-11-spi-vs-legacy/graph-stats.mjs`:
- Line 11: Wrap the JSON.parse call for graph (currently: const graph =
JSON.parse(readFileSync(graphPath, 'utf8'))) in a try/catch so malformed JSON
doesn't crash the CLI: read the file into a string first (using
readFileSync(graphPath, 'utf8')), then try to JSON.parse that string, catch
SyntaxError (or any error), print a clear error via console.error including
graphPath and the error.message, and exit with process.exit(1); ensure the
caught error path does not allow further processing of the invalid graph
variable.

In `@docs/benchmarks/2026-05-11-spi-vs-legacy/probe.mjs`:
- Around line 148-161: The classification logic mislabels non-neutral cases:
change the hurts/expands branch condition in the block that constructs note
(uses tokenDelta, qualityDelta, labelDelta, summary) from (tokenDelta > 0 &&
qualityDelta <= 0) to use OR so it catches either token expansions or quality
regressions; specifically, update the condition guarding
summary.hurts_or_expands.push(note) to (tokenDelta > 0 || qualityDelta < 0) so
quality regressions with token savings and token expansions with quality gains
are correctly classified.

In `@docs/benchmarks/2026-05-11-spi-vs-legacy/run-real-workspace.sh`:
- Around line 8-22: The current run_workspace function and PROMPTS_FILE
assignment can allow typos in GRAPHIFY_BENCH_REAL_PROMPTS,
GRAPHIFY_BENCH_BACKEND, or GRAPHIFY_BENCH_MONOREPO to silently pass through;
update run_workspace (and the PROMPTS_FILE resolution) to validate inputs and
fail fast: check that PROMPTS_FILE exists (-f) and that the workspace_path and
any required backend/monorepo paths exist (-d or -f as appropriate), print a
clear error message naming the missing variable (e.g.,
GRAPHIFY_BENCH_REAL_PROMPTS, GRAPHIFY_BENCH_BACKEND, GRAPHIFY_BENCH_MONOREPO)
and return non-zero/exit so run.sh is not invoked with invalid paths; keep
checks adjacent to the PROMPTS_FILE assignment and at the top of run_workspace
for quick failure.

In `@docs/benchmarks/2026-05-11-spi-vs-legacy/summarize-real-workspaces.mjs`:
- Line 26: The JSON.parse call inside workspaceNames.map can throw on malformed
summary files; wrap the readFileSync/JSON.parse logic used in workspaceNames.map
so each iteration uses a try-catch that captures errors reading/parsing
join(bundleDir, name, 'summary.json') and rethrows or logs a new Error that
includes the workspace name and the file path (e.g., mention name and
'summary.json') so you can identify which workspace failed; update the mapping
to return either the parsed object or propagate the enriched error for later
handling.

In `@src/runtime/context-pack-resolution.ts`:
- Around line 379-385: The side-effect inference currently uses behaviorEdges
(which includes structural labels like 'contains' and 'method') causing false
positives; instead build an execution-only edge set (e.g., execBehaviorEdges =
relationLabels(node, relationIndex, 'outgoing', ['calls']) or another allowlist
of execution-only labels) and pass that execBehaviorEdges into sideEffectHints()
(replace the current sideEffectHints(behaviorEdges) call) so side-effect tags
(queue_event, db_write, etc.) are derived only from executable/calls edges.

In `@src/runtime/retrieve.ts`:
- Around line 1835-1867: The semantic/rerank path currently rebuilds candidates
from all eligible nodes and ignores the lexical slice; update
retrieveContextAsync (and the code paths that call
buildRetrieveResultFromOrderedCandidates) so that when options.retrievalStrategy
=== 'slice-v1' you detect and propagate the lexical slice (lexicalResult.slice
or the sliced.ordered_ids + sliced.metadata returned by
sliceCandidatesForRetrieve) into the semantic/rerank flow and use that
ordered_ids list to constrain the candidate pool before reranking; ensure
sliced.metadata is forwarded into buildRetrieveResultFromOrderedCandidates and
that any fallback to scoredNodeFromGraph only occurs for IDs within the slice so
the slice-v1 constraint and its metadata are honored end-to-end.

In `@src/runtime/retrieve/slicing.ts`:
- Around line 182-185: The traversal code currently drops any neighbor not
already present in scoredCandidates by calling scoredById.get(neighborId) and
continuing when missing; change this so the slice can expand beyond the initial
candidate pool by performing a graph-backed neighbor lookup when scoredById
lacks neighborId (e.g., query the graph index / adjacency map for neighborId,
construct a candidate object for that node, compute its score via the existing
scoring routine, insert it into scoredCandidates/scoredById, and then proceed
with traversal); apply the same change to the other traversal branch that uses
scoredById (the block flagged around the second occurrence) so both paths can
pull in graph-adjacent evidence rather than treating scoredCandidates as a hard
boundary.

In `@src/runtime/stdio/tools.ts`:
- Around line 888-891: The code currently parses retrieval_strategy via
parseRetrievalStrategyParam and returns failure when invalid, but the review
flow (task: 'review') bypasses this check so context_pack can silently accept
unsupported values; update the review branch to validate retrieval_strategy
early: call parseRetrievalStrategyParam(helpers, toolArguments) (or reuse the
existing contextPackStrategy check) inside the review handling path and if it
returns 'invalid' call helpers.failure(id, helpers.jsonrpcInvalidParams,
'retrieval_strategy must be one of default, slice-v1'), or alternatively pass
the validated contextPackStrategy into the review flow so review honors
slice-v1. Ensure you reference parseRetrievalStrategyParam, contextPackStrategy,
helpers.failure and task: 'review' when making the change.

---

Outside diff comments:
In `@src/infrastructure/context-pack-command.ts`:
- Around line 221-244: runContextPackCommand() currently ignores
options.retrievalStrategy for task === 'review' because it returns early; update
the review path to either validate and reject the combination or pass
retrievalStrategy into the review bundle retrieval call: locate
runContextPackCommand(), the early-return branch that handles task === 'review',
and the code that builds retrieval options for dependencies.retrieveContext (and
any call sites of retrieveContext or buildCommunityLabels/compactImpactResult
used for review bundles) and ensure options.retrievalStrategy (and
options.retrievalLevel if relevant) are forwarded into the retrieval options for
review processing, or add an explicit validation that throws a clear error when
task === 'review' and retrievalStrategy is set so callers know the flag is
unsupported.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 70fe1576-27ea-49e9-abf2-0305a581fe5a

📥 Commits

Reviewing files that changed from the base of the PR and between d91ca67 and 965fde8.

📒 Files selected for processing (24)

CHANGELOG.md
README.md
docs/benchmarks/2026-05-11-spi-vs-legacy/README.md
docs/benchmarks/2026-05-11-spi-vs-legacy/REAL_WORKSPACE_REPORT_TEMPLATE.md
docs/benchmarks/2026-05-11-spi-vs-legacy/graph-stats.mjs
docs/benchmarks/2026-05-11-spi-vs-legacy/probe.mjs
docs/benchmarks/2026-05-11-spi-vs-legacy/prompts.real-workspace.example.json
docs/benchmarks/2026-05-11-spi-vs-legacy/run-real-workspace.sh
docs/benchmarks/2026-05-11-spi-vs-legacy/run.sh
docs/benchmarks/2026-05-11-spi-vs-legacy/summarize-real-workspaces.mjs
src/cli/parser.ts
src/contracts/context-pack.ts
src/infrastructure/context-pack-command.ts
src/runtime/context-pack-resolution.ts
src/runtime/retrieve.ts
src/runtime/retrieve/slicing.ts
src/runtime/stdio/definitions.ts
src/runtime/stdio/tools.ts
tests/unit/benchmark-graph-stats.test.ts
tests/unit/benchmark-real-workspace.test.ts
tests/unit/context-pack-resolution-sketch.test.ts
tests/unit/retrieve-slice-surface.test.ts
tests/unit/retrieve-slice-v1.test.ts
tests/unit/stdio-slice-surface.test.ts

Tighten benchmark script failures, keep slice-v1 constrained through async retrieval, reject unsupported review retrieval_strategy usage, and limit sketch side-effect inference to execution edges. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

coderabbitai

🧹 Nitpick comments (3)

tests/unit/benchmark-graph-stats.test.ts (1)

33-43: ⚡ Quick win

Add an explicit non-array edges regression test.

You already cover the “missing edges” fallback. Adding a case where edges exists but is not an array would harden this contract and prevent silent regressions.

Proposed test addition

+  it('falls back to zero edges when graph.json has a non-array edges value', () => {
+    withTempGraph({
+      nodes: [{ id: 'a' }, { id: 'b' }],
+      edges: { source: 'a', target: 'b' },
+    }, (graphPath) => {
+      const output = execFileSync('node', [
+        'docs/benchmarks/2026-05-11-spi-vs-legacy/graph-stats.mjs',
+        graphPath,
+      ], { cwd: process.cwd(), encoding: 'utf8' })
+      expect(JSON.parse(output)).toEqual({ node_count: 2, edge_count: 0 })
+    })
+  })

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/benchmark-graph-stats.test.ts` around lines 33 - 43, Add a new
unit test (next to the existing "falls back to zero edges when graph.json omits
the edges array") that uses withTempGraph to write a graph JSON where edges is
present but not an array (e.g., edges: null or edges: {} or a string) and then
calls the same script
('docs/benchmarks/2026-05-11-spi-vs-legacy/graph-stats.mjs') via execFileSync;
assert JSON.parse(output) equals { node_count: 2, edge_count: 0 } to ensure the
graph-stats.mjs code path that reads edges treats non-array values the same as
missing arrays.

src/runtime/context-pack-resolution.ts (1)

382-383: ⚡ Quick win

Stabilize env/config label ordering for deterministic sketch output.

relationLabels(...) preserves relationship insertion order, so output can drift if upstream edge order varies. Sorting readsEnv and config before rendering will make this deterministic.

Proposed patch

-  const config = relationLabels(node, relationIndex, 'outgoing', ['uses_config'])
-  const readsEnv = relationLabels(node, relationIndex, 'outgoing', ['reads_env'])
+  const config = relationLabels(node, relationIndex, 'outgoing', ['uses_config']).sort((a, b) => a.localeCompare(b))
+  const readsEnv = relationLabels(node, relationIndex, 'outgoing', ['reads_env']).sort((a, b) => a.localeCompare(b))

Also applies to: 402-407

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/context-pack-resolution.ts` around lines 382 - 383, The output
can vary because relationLabels(node, relationIndex, 'outgoing',
['uses_config']) and relationLabels(node, relationIndex, 'outgoing',
['reads_env']) preserve insertion order; to make sketch output deterministic,
sort the arrays returned by relationLabels (e.g., sort config and readsEnv)
before any rendering/serialization. Locate places where config and readsEnv are
used (the calls to relationLabels and subsequent rendering logic around the
variables named config and readsEnv, plus the similar block around lines
402-407) and replace direct use with a stableSortedConfig and
stableSortedReadsEnv (or sort in-place) so downstream code always consumes a
consistently ordered list.

tests/unit/benchmark-real-workspace.test.ts (1)

135-137: ⚡ Quick win

Replace deprecated toThrowError matcher with toThrow across test files.

Vitest 4 treats toThrowError as a deprecated alias. Replace with toThrow to keep matcher usage current.

Also applies to tests/unit/benchmark-graph-stats.test.ts:53

Suggested diff

tests/unit/benchmark-real-workspace.test.ts:
-      })).toThrowError(expect.objectContaining({
+      })).toThrow(expect.objectContaining({
         stderr: expect.stringContaining('GRAPHIFY_BENCH_REAL_PROMPTS'),
       }))
...
-      })).toThrowError(expect.objectContaining({
+      })).toThrow(expect.objectContaining({
         stderr: expect.stringContaining('GRAPHIFY_BENCH_BACKEND'),
       }))
...
-      })).toThrowError(expect.objectContaining({
+      })).toThrow(expect.objectContaining({
         stderr: expect.stringContaining('monorepo'),
       }))

tests/unit/benchmark-graph-stats.test.ts:
-      ], { cwd: process.cwd(), encoding: 'utf8', stdio: 'pipe' })).toThrowError(
+      ], { cwd: process.cwd(), encoding: 'utf8', stdio: 'pipe' })).toThrow(

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/benchmark-real-workspace.test.ts` around lines 135 - 137, Replace
the deprecated matcher call expect(...).toThrowError(...) with
expect(...).toThrow(...) in the failing tests: locate the occurrences of
expect(...).toThrowError(expect.objectContaining({ stderr:
expect.stringContaining('GRAPHIFY_BENCH_REAL_PROMPTS') })) in the
benchmark-real-workspace test and the similar expect(...).toThrowError(...) in
benchmark-graph-stats.test.ts (the exact invocation uses expect.objectContaining
and expect.stringContaining) and swap toThrowError to toThrow while keeping the
objectContaining argument unchanged so the assertion semantics remain the same.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/runtime/context-pack-resolution.ts`:
- Around line 382-383: The output can vary because relationLabels(node,
relationIndex, 'outgoing', ['uses_config']) and relationLabels(node,
relationIndex, 'outgoing', ['reads_env']) preserve insertion order; to make
sketch output deterministic, sort the arrays returned by relationLabels (e.g.,
sort config and readsEnv) before any rendering/serialization. Locate places
where config and readsEnv are used (the calls to relationLabels and subsequent
rendering logic around the variables named config and readsEnv, plus the similar
block around lines 402-407) and replace direct use with a stableSortedConfig and
stableSortedReadsEnv (or sort in-place) so downstream code always consumes a
consistently ordered list.

In `@tests/unit/benchmark-graph-stats.test.ts`:
- Around line 33-43: Add a new unit test (next to the existing "falls back to
zero edges when graph.json omits the edges array") that uses withTempGraph to
write a graph JSON where edges is present but not an array (e.g., edges: null or
edges: {} or a string) and then calls the same script
('docs/benchmarks/2026-05-11-spi-vs-legacy/graph-stats.mjs') via execFileSync;
assert JSON.parse(output) equals { node_count: 2, edge_count: 0 } to ensure the
graph-stats.mjs code path that reads edges treats non-array values the same as
missing arrays.

In `@tests/unit/benchmark-real-workspace.test.ts`:
- Around line 135-137: Replace the deprecated matcher call
expect(...).toThrowError(...) with expect(...).toThrow(...) in the failing
tests: locate the occurrences of
expect(...).toThrowError(expect.objectContaining({ stderr:
expect.stringContaining('GRAPHIFY_BENCH_REAL_PROMPTS') })) in the
benchmark-real-workspace test and the similar expect(...).toThrowError(...) in
benchmark-graph-stats.test.ts (the exact invocation uses expect.objectContaining
and expect.stringContaining) and swap toThrowError to toThrow while keeping the
objectContaining argument unchanged so the assertion semantics remain the same.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 76a15b07-9bac-422f-b668-9af5e8bf3c51

📥 Commits

Reviewing files that changed from the base of the PR and between 965fde8 and debf9ba.

📒 Files selected for processing (18)

docs/benchmarks/2026-05-11-spi-vs-legacy/graph-stats.mjs
docs/benchmarks/2026-05-11-spi-vs-legacy/probe.mjs
docs/benchmarks/2026-05-11-spi-vs-legacy/run-real-workspace.sh
docs/benchmarks/2026-05-11-spi-vs-legacy/summarize-real-workspaces.mjs
src/infrastructure/context-pack-command.ts
src/runtime/benchmark/probe-calibration.ts
src/runtime/context-pack-resolution.ts
src/runtime/retrieve.ts
src/runtime/retrieve/slicing.ts
src/runtime/stdio/tools.ts
tests/unit/benchmark-graph-stats.test.ts
tests/unit/benchmark-probe-calibration.test.ts
tests/unit/benchmark-real-workspace.test.ts
tests/unit/context-pack-resolution-sketch.test.ts
tests/unit/retrieve-semantic.test.ts
tests/unit/retrieve-slice-surface.test.ts
tests/unit/retrieve-slice-v1.test.ts
tests/unit/stdio-slice-surface.test.ts

✅ Files skipped from review due to trivial changes (2)

tests/unit/benchmark-probe-calibration.test.ts
tests/unit/retrieve-slice-surface.test.ts

🚧 Files skipped from review as they are similar to previous changes (7)

docs/benchmarks/2026-05-11-spi-vs-legacy/graph-stats.mjs
docs/benchmarks/2026-05-11-spi-vs-legacy/summarize-real-workspaces.mjs
docs/benchmarks/2026-05-11-spi-vs-legacy/probe.mjs
tests/unit/retrieve-slice-v1.test.ts
docs/benchmarks/2026-05-11-spi-vs-legacy/run-real-workspace.sh
src/runtime/retrieve/slicing.ts
src/runtime/retrieve.ts

Add slice-v1 retrieval and benchmark tooling

965fde8

Add opt-in slice-v1 retrieval, richer sketch semantics, real-workspace benchmark helpers, and calibration/diagnostic coverage for the post-v0.21 payoff layer. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

mohanagy merged commit c681f2f into main May 11, 2026
7 checks passed

coderabbitai Bot mentioned this pull request May 11, 2026

Fix production retrieval correctness #146

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add slice-v1 retrieval and benchmark tooling#144

Add slice-v1 retrieval and benchmark tooling#144
mohanagy merged 2 commits into
mainfrom
payoff-v0-22

mohanagy commented May 11, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 11, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mohanagy commented May 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Benchmark notes

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mohanagy commented May 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 11, 2026 •

edited

Loading