v0.10.1 — honest numbers + lean MCP profile by mohanagy · Pull Request #28 · mohanagy/graphify-ts

mohanagy · 2026-05-01T02:51:21Z

Summary

Single PR landing five connected changes for v0.10.1:

GRAPHIFY_TOOL_PROFILE env var — defaults to core (6 tools) instead of advertising all 21. Cuts ~16–22K of cache_creation_input_tokens per Claude Code session.
Honest benchmark numbers — replaces the discredited 384× / 897× strawman headline (computed against an internal-only baseline) with the measured 2026-04-30 native_agent comparison.
compare --baseline-mode native_agent — runs the user's --exec twice (with and without graphify config files snapshot-renamed) and reports Anthropic-billed usage blocks verbatim. Atomic try/finally restore guarantees no project state is left behind even if the runner crashes.
Public benchmark artifact — committed docs/benchmarks/2026-04-30-govalidate/ with both raw claude --output-format json outputs and a verify.sh reproducer.
Release — bumps package.json to 0.10.1 and dates the CHANGELOG entry.

Headline numbers (measured 2026-04-30 against the GoValidate codebase)

Metric	Baseline (no graphify)	Graphify (core profile)	Δ
Tool-call turns	9	3	3× fewer
Latency	96,368 ms	34,744 ms	~2.77× faster
Total input tokens (Anthropic-reported)	615,190	233,508	2.63× less
Cost per session	$0.62	$0.70	+13% on cold start; amortizes on multi-question sessions

All numbers come from claude --output-format json usage fields, not local prompt-token estimates. Reproduce with bash docs/benchmarks/2026-04-30-govalidate/verify.sh.

CHANGELOG entry

Added

GRAPHIFY_TOOL_PROFILE env var: defaults to core (6 tools — retrieve, impact, call_chain, community_overview, pr_impact, graph_stats); set to full to opt into the legacy 21-tool surface. The Claude / Cursor / VS Code Copilot install templates now write env: { GRAPHIFY_TOOL_PROFILE: "core" } into the generated .mcp.json. Reduces cache_creation_input_tokens per session by roughly 16–22K on a Claude Code session start.
compare --baseline-mode native_agent: runs --exec twice (snapshot-renamed and restored) and reports Anthropic-billed usage blocks verbatim. Atomic try/finally restore.
Public benchmark artifact: docs/benchmarks/2026-04-30-govalidate/ with both raw claude --output-format json outputs and a verify.sh reproducer.

Changed

Honest benchmark numbers: replaced 384× / 397× / 897× strawman headlines with measured 3× fewer turns, ~2.8× faster, 2.6× fewer total input tokens in the README, examples/why-graphify.md, and the claude install PreToolUse hook payload.
Compare summary framing: synthetic full/bounded baselines are now explicitly disclosed as "synthetic prompt-token estimate (cl100k_base)".

Fixed

Cold-start cost regression: lean core MCP tool profile by default cuts cache_creation_input_tokens overhead by ~16–22K tokens per fresh session.

Test plan

npm run clean && npm run build — clean
npm run typecheck — clean
npm run test:run — 1234/1234 passing across 64 test files
Eval gate against examples/demo-repo — Recall 100%, MRR 1.000, Snippet coverage 100% (CI thresholds ≥95/≥0.95/≥95)
compare --baseline-mode native_agent smoke test — emits 3× turns / 2.77× faster / 2.63× tokens with both Anthropic-reported usage blocks; report.json shape matches the spec
Tool profile count — core: 6, full: 21
Hook payload — decoded base64 contains "3x fewer turns", no stale 384x/897x
docs/benchmarks/2026-04-30-govalidate/verify.sh — exits 0, prints 615190 / 233508 totals
npm pack --dry-run — mohammednagy-graphify-ts-0.10.1.tgz builds cleanly

What this PR does NOT do

Does not delete baseline_mode: 'full' or 'bounded' (adds 'native_agent' alongside; default stays 'full' for one minor).
Does not change MCP_PROTOCOL_VERSION.
Does not bump EXTRACTOR_CACHE_VERSION (extraction shape unchanged).
Does not lower the eval CI thresholds.
Does not auto-publish to npm (the release workflow on the v0.10.1 tag handles publish).

Summary by CodeRabbit

New Features
- Added GRAPHIFY_TOOL_PROFILE env var (defaults to "core") to control MCP tool availability.
- Added compare --baseline-mode native_agent for real A/B runs that report Anthropic usage.
Documentation
- Updated benchmarks and examples with reproducible baseline vs Graphify “core” metrics (turns, latency, tokens, cost).
- Published public benchmark artifacts and a verify.sh reproducer.

Ship a lean MCP tool surface by default. The Claude Code session-start overhead from advertising all 21 tools writes ~16-22K tokens to cache_creation_input_tokens (priced at 1.25x input rate) on a fresh session. Most agents only call retrieve, impact, call_chain, community_overview, pr_impact, and graph_stats — gating the other 15 behind GRAPHIFY_TOOL_PROFILE=full reclaims the cache budget. - Add CORE_TOOL_NAMES, McpToolProfile, activeMcpTools, and resolveToolProfileFromEnv to runtime/stdio/definitions.ts. - Wire tools/list and tools/call through the active profile in runtime/stdio-server.ts; non-core calls in core mode return JSONRPC_METHOD_NOT_FOUND with a documented hint pointing at GRAPHIFY_TOOL_PROFILE=full in .mcp.json. - Default the generated .mcp.json (Claude / Cursor / VS Code Copilot) to env: { GRAPHIFY_TOOL_PROFILE: 'core' } via installMcpServer. - Tests: stdio-tool-profile.test.ts covers profile selection and gating end-to-end; install.test.ts asserts the env block; the existing stdio-server.test.ts opts into 'full' for behavior tests.

The previously-published 384x retrieve-compression headline was computed against an internal-only baseline_mode='full' prompt that no real agent ever sends. The credible measurement is the 2026-04-30 native_agent comparison against a production NestJS+Next.js codebase: 3x fewer tool-call turns, ~2.8x faster end-to-end latency, and 2.6x fewer total input tokens as billed by Anthropic. The cold-start cost premium is disclosed honestly (~+13% on a single-question session, amortizing on multi-question sessions); v0.10.1 also flips the default to the lean 6-tool 'core' profile so cold-start cost approaches parity. - src/infrastructure/install.ts: replace the RETRIEVE_FIRST_MESSAGE string baked into the .claude PreToolUse / .gemini BeforeTool / .codex hook payloads with the measured copy ('3x fewer turns, ~2.8x faster on a real production codebase'). The base64 encoding is regenerated automatically at install time. - README.md: replace the Benchmarks section with the measured table and the cold-start cost honesty disclosure; cite the public artifact at docs/benchmarks/2026-04-30-govalidate/ that lands in the next commit. - examples/why-graphify.md: replace the headline efficiency section and the Benchmark Summary with the same measured numbers; drop the stale '17 MCP tools' line and document the core/full profile split. - Tests: install-templates.test.ts and why-graphify-doc.test.ts enforce no '384x'/'397x'/'897x' substrings in the public copy and the install hook payload.

The existing 'full' and 'bounded' baseline modes both build synthetic baseline prompts from the project corpus, which no real agent ever sends. Their reduction_ratio is a cl100k_base estimate, not an Anthropic-billed measurement. native_agent runs the user's --exec command twice — once with graphify-out/graph.json, .mcp.json, CLAUDE.md, and .claude/ snapshot-renamed out of the working directory (baseline), once with them restored (graphify) — and reports the usage blocks from 'claude --output-format json' verbatim. - src/infrastructure/compare.ts: extend CompareBaselineMode with 'native_agent' alongside the existing modes. Add executeNativeAgentCompare, parseAnthropicResultEvent, and the NativeAgentCompareReport / NativeAgentRunner types. Atomic rename / try-finally restore guarantees no project state is left behind even if the baseline runner crashes; a probe test verifies the snapshot doesn't hide graphify-out/compare/<ts>/. Add an explicit synthetic-baseline disclosure line to formatCompareSummary for full/bounded so a reader cannot mistake the synthetic ratio for an Anthropic-billed measurement. - src/cli/parser.ts and src/cli/main.ts: accept and document the new mode. Default stays 'full' for backward compat. - tests/fixtures/mock-claude-runner.mjs: deterministic mock that emits the Anthropic JSON shape so the smoke test runs without a real model call. - tests/unit/compare-native-agent.test.ts: covers success path, crash safety, bare-project absent state, exec_command redaction, runner_error fallback, snapshot scope (compare/<ts> stays writable), and stream-json parsing.

The headline numbers in the README, examples/why-graphify.md, and the install hook payload all reference the 2026-04-30 native_agent measurement against the GoValidate codebase. Commit the raw evidence so anyone can reproduce the totals from the same files graphify-ts ships in its package directory. - docs/benchmarks/2026-04-30-govalidate/baseline-session.json: the 'claude --output-format json' result event from the no-graphify run, with the answer body redacted (the question is internal). - docs/benchmarks/2026-04-30-govalidate/graphify-session.json: the graphify-enabled run, same shape. - docs/benchmarks/2026-04-30-govalidate/verify.sh: bash + node reproducer that prints the per-file usage blocks and computes the headline reductions (3x turns, 2.77x latency, 2.63x input tokens). Uses $DIR rather than absolute paths so the script reproduces from any checkout. - docs/benchmarks/2026-04-30-govalidate/README.md: narrative, setup, headline table, sum-of-fields explanation, and reproduction recipe for running the same comparison on a reader's own codebase. - tests/unit/benchmark-artifact.test.ts: ensures the README cites numbers that are computable from the committed JSON, asserts no stale '384x'/'897x' marketing claims in the README, and runs verify.sh end-to-end (skipped only when jq is missing).

Bump package.json from 0.10.0 to 0.10.1 and document the changes that shipped in this PR: Added: GRAPHIFY_TOOL_PROFILE env var (core profile = 6 tools); compare --baseline-mode native_agent with Anthropic-reported usage; public benchmark artifact under docs/benchmarks/2026-04-30-govalidate/. Changed: replace 384x/897x strawman headline with measured 3x/2.8x numbers everywhere (README, examples/why-graphify.md, install hook payload); compare summary now labels synthetic full/bounded ratios explicitly. Fixed: cold-start cost regression (cache_creation overhead from the full 21-tool MCP surface) by shipping core as the default. Eval gate stays green (Recall 100%, MRR 1.000, Snippet coverage 100%).

coderabbitai · 2026-05-01T02:51:32Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Adds a tool-profile system (core/full), a new compare --baseline-mode native_agent flow that captures real Anthropic usage by running the user command twice (baseline with artifacts hidden, then graphify), commits benchmark artifacts and a verifier, and updates docs/tests and installer to default GRAPHIFY_TOOL_PROFILE=core.

Changes

Cohort / File(s)	Summary
Version & Changelog `CHANGELOG.md`, `package.json`	Bump package to v0.10.1 and add changelog entry documenting `GRAPHIFY_TOOL_PROFILE`, `native_agent` baseline mode, and new benchmark workflow.
Documentation & Benchmarks `README.md`, `examples/why-graphify.md`, `docs/benchmarks/2026-04-30-govalidate/*`	Add reproducible A/B benchmark README, raw JSON artifacts, `verify.sh`, update README/examples with measured metrics and cold-start disclosure.
CLI `src/cli/main.ts`, `src/cli/parser.ts`	Extend `baselineMode` type and CLI help to accept/document `'native_agent'` (executes `--exec` twice and reports Anthropic usage).
Compare implementation `src/infrastructure/compare.ts`	Add `native_agent` baseline mode, artifact snapshot/restore logic, parsing of trailing Anthropic JSON events, new types/helpers (`parseAnthropicResultEvent`, `executeNativeAgentCompare`, `formatNativeAgentCompareSummary`), and native-agent summary formatting.
Installer `src/infrastructure/install.ts`	Install templates now inject `env: { GRAPHIFY_TOOL_PROFILE: "core" }` into generated MCP server configs and update pre-tool instructions.
MCP tool profiles & runtime gating `src/runtime/stdio/definitions.ts`, `src/runtime/stdio-server.ts`	Introduce `McpToolProfile`, `CORE_TOOL_NAMES`, `activeMcpTools()`, `resolveToolProfileFromEnv()`, `isCoreToolName()`; `tools/list` and `tools/call` now gate available tools by profile and return JSON-RPC not-found for disabled tools.
Tests & Fixtures `tests/fixtures/mock-claude-runner.mjs`, `tests/unit/*`	Add mock claude runner and multiple tests: native-agent compare tests, benchmark artifact verifier tests, stdio tool-profile/unit tests, install/template tests, CLI help update test, and suite setup to control `GRAPHIFY_TOOL_PROFILE`.

Sequence Diagram

sequenceDiagram
    actor User
    participant CLI as "CLI\n(compare --baseline-mode native_agent)"
    participant FS as "Filesystem\n(artifact snapshot/restore)"
    participant Baseline as "Baseline Env\n(hidden artifacts)"
    participant Graphify as "Graphify Env\n(restored artifacts)"
    participant API as "Anthropic API"
    participant Reporter as "Reporter / Formatter"

    User->>CLI: run compare --baseline-mode native_agent --exec <cmd>
    CLI->>FS: snapshot graphify artifacts
    CLI->>FS: hide/remove artifacts
    CLI->>Baseline: run user --exec (baseline)
    Baseline->>API: invoke model (no MCP tools)
    API-->>Baseline: response + trailing JSON usage
    Baseline->>CLI: emit/parse Anthropic usage
    CLI->>FS: restore artifacts
    CLI->>Graphify: run user --exec (graphify)
    Graphify->>API: invoke model (with MCP tools)
    API-->>Graphify: response + trailing JSON usage
    Graphify->>CLI: emit/parse Anthropic usage
    CLI->>Reporter: compute reductions (turns, tokens, cost)
    Reporter-->>User: display native-agent compare summary

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

Possibly related PRs

feat: support Gemini compare usage capture #19: Extends compare logic to parse runner stdout and capture provider-reported usage — closely related to the native_agent usage-capture additions in src/infrastructure/compare.ts.
Feature/competitive roadmap #9: Modifies MCP tool definitions and stdio handling — relevant because this PR adds tool-profile gating that filters/controls those tools.
feat: ship workflow hardening #25: Updates MCP tool-selection logic and definitions — overlaps with the new CORE_TOOL_NAMES, activeMcpTools, and stdio-server gating.

Poem

🐰 I hid the graph, then ran it twice,

measured tokens, turns, and price.
Six tools light, or full to call,
snapshots dance and benchmarks fall.
A hop for truth — metrics aligned! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 3.03% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'v0.10.1 — honest numbers + lean MCP profile' directly summarizes the main changes: version bump, honest benchmark replacement, and lean MCP profile introduction.
Description check	✅ Passed	The PR description fully covers all required template sections with concrete details on changes, testing procedures, and results, plus comprehensive context on implementation and validation.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/v0.10.1-honest-numbers-lean-profile

_{Review rate limit: 9/10 reviews remaining, refill in 6 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

mohanagy · 2026-05-01T02:53:00Z

Final acceptance check

=== EVAL GATE ===
Recall:           100.0%
MRR:              1.000
Snippet coverage: 100.0%
- backend: recall 100%, MRR 1.000, snippets 100%, grounded 100%
- general: recall 100%, MRR 1.000, snippets 100%, grounded 100%

=== TOOL PROFILE COUNT ===
core: 6 full: 21

=== HOOK PAYLOAD CHECK ===
OK: hook contains "3x fewer turns", no stale 384x/897x claims

=== COMPARE NATIVE_AGENT SMOKE ===
[graphify compare] completed 1 native_agent question(s)
- "What is the cluster module?"
    num_turns: baseline 9 → graphify 3 (3x fewer)
    latency:   baseline 96368ms → graphify 34744ms (2.77x faster)
    input_tokens (Anthropic-reported): baseline 615190 → graphify 233508 (2.63x less)

=== verify.sh (against committed evidence) ===
baseline_total_input_tokens : 615190
graphify_total_input_tokens : 233508
input_token_reduction        : 2.63x
num_turns_reduction          : 3x
latency_reduction            : 2.77x
baseline_total_cost_usd      : $0.62
graphify_total_cost_usd      : $0.70

=== package.json + npm pack ===
version: 0.10.1
filename: mohammednagy-graphify-ts-0.10.1.tgz
total files: 302  package size: 335.0 kB

All ship-readiness criteria from the plan are satisfied:

✅ Eval gate green (Recall ≥95, MRR ≥0.95, Snippet coverage ≥95)
✅ core profile returns exactly 6 tools, full returns 21
✅ Decoded hook contains 3x fewer turns, contains no 384x / 897x
✅ compare --baseline-mode native_agent produces a report with both Anthropic-reported usage blocks and computed reductions matching the public artifact
✅ docs/benchmarks/2026-04-30-govalidate/verify.sh exits 0 on the committed evidence and prints totals matching the README
✅ package.json is 0.10.1, CHANGELOG.md has the dated ## [0.10.1] - 2026-05-01 entry
✅ npm pack --dry-run succeeds

Note: the public artifact uses 233,508 total input tokens for the graphify run (the exact sum of 13 + 92,833 + 140,662 from the committed usage block) rather than the original spec's headline 234,308. This was a discrepancy in the spec — using the JSON-derived sum makes the README, why-graphify.md, the CHANGELOG, and verify.sh self-consistent.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

tests/fixtures/mock-claude-runner.mjs (1)

23-57: ⚡ Quick win

Reduce benchmark-number drift by sourcing fixture payloads from artifact JSON.

The baseline/graphify numeric blocks are duplicated from docs/benchmarks/2026-04-30-govalidate/*.json. Loading those files here would keep smoke fixtures aligned automatically as artifacts evolve.

Proposed refactor sketch

 import { existsSync, readFileSync } from 'node:fs'
+import { dirname, resolve } from 'node:path'
+import { fileURLToPath } from 'node:url'
@@
-const baseline = {
-  ...
-}
-
-const graphify = {
-  ...
-}
+const here = dirname(fileURLToPath(import.meta.url))
+const benchmarkDir = resolve(here, '../../docs/benchmarks/2026-04-30-govalidate')
+const baselineArtifact = JSON.parse(readFileSync(resolve(benchmarkDir, 'baseline-session.json'), 'utf8'))
+const graphifyArtifact = JSON.parse(readFileSync(resolve(benchmarkDir, 'graphify-session.json'), 'utf8'))
+
+const baseline = {
+  ...baselineArtifact,
+  result: `mock baseline answer for prompt of length ${prompt.length}`,
+}
+const graphify = {
+  ...graphifyArtifact,
+  result: `mock graphify answer for prompt of length ${prompt.length}`,
+}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/fixtures/mock-claude-runner.mjs` around lines 23 - 57, Replace the
duplicated hard-coded numeric fixture blocks by loading the corresponding
artifact JSON(s) at runtime and mapping their fields into the existing baseline
and graphify objects: read and parse the benchmark artifact JSON(s) referenced
in docs/benchmarks/2026-04-30-govalidate, extract fields for duration_ms,
duration_api_ms, num_turns, result (use prompt.length when composing result
strings), session_id, total_cost_usd and the usage subfields (input_tokens,
cache_creation_input_tokens, cache_read_input_tokens, output_tokens), and assign
them into the existing baseline and graphify variables (keeping their keys
unchanged); add a small fallback/default behavior if the artifact file is
missing or malformed so tests still run.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/benchmarks/2026-04-30-govalidate/README.md`:
- Around line 26-29: The fenced code block containing the token calculations
(the lines starting with "baseline_total_input_tokens = 14 + 40,648 + 574,528 =
615,190" and "graphify_total_input_tokens = 13 + 92,833 + 140,662 = 233,508")
needs a language tag to satisfy markdownlint MD040; edit the README.md fenced
block to start with ```text instead of ``` so the block is explicitly marked as
plain text.
- Around line 57-60: Update the README reproduction steps to use the current CLI
flow: replace the invalid command string "graphify-ts claude install --project
/path/to/your/repo" with instructions to cd into the target repository and run
"graphify-ts claude install" (i.e., run the install command from the target repo
directory rather than passing a --project flag); update the two-line snippet
under the graph generation steps so it reflects "graphify-ts generate
/path/to/your/repo" followed by the corrected install invocation.

In `@examples/why-graphify.md`:
- Line 22: Update the paragraph that tells users to set
GRAPHIFY_TOOL_PROFILE=full in .mcp.json to also mention alternative MCP config
locations for non-Claude installs: explicitly list .cursor/mcp.json and
.vscode/mcp.json so Cursor and Copilot users are directed to the correct file,
and ensure the sentence that names Cursor and Copilot references these
alternative paths (e.g., "set GRAPHIFY_TOOL_PROFILE=full in .mcp.json (or
.cursor/mcp.json / .vscode/mcp.json for Cursor/Copilot installs)").

In `@src/infrastructure/install.ts`:
- Around line 499-505: Existing server config env is always overwritten with env
= { GRAPHIFY_TOOL_PROFILE: 'core' } which drops user custom entries and silently
downgrades a preexisting GRAPHIFY_TOOL_PROFILE='full'; instead, detect and merge
with any existing env block before writing: read the current server config's env
(e.g., existingEnv), build mergedEnv = { GRAPHIFY_TOOL_PROFILE: 'core',
...existingEnv } so existing keys (including a user-set GRAPHIFY_TOOL_PROFILE)
override the default, then use mergedEnv for serverConfig in place of the
hardcoded env; apply this in the code that creates serverConfig (the block
referencing env, serverConfig, isVscode, npxCommand, npxArgs).

In `@src/runtime/stdio-server.ts`:
- Around line 553-558: Update the error text returned in the branch that checks
if a tool is disabled (the block that uses toolName, isCoreToolName(profile),
failure(id, JSONRPC_METHOD_NOT_FOUND, ...)) to use a generic, profile-agnostic
message; replace the hard-coded reference to "'core' profile" and ".mcp.json"
with something like "Tool '<toolName>' is not enabled in the active profile."
and optionally suggest checking the application's MCP/configuration, ensuring
the change is made where isCoreToolName is evaluated so all callers (Cursor,
Copilot, etc.) get the generic message.

---

Nitpick comments:
In `@tests/fixtures/mock-claude-runner.mjs`:
- Around line 23-57: Replace the duplicated hard-coded numeric fixture blocks by
loading the corresponding artifact JSON(s) at runtime and mapping their fields
into the existing baseline and graphify objects: read and parse the benchmark
artifact JSON(s) referenced in docs/benchmarks/2026-04-30-govalidate, extract
fields for duration_ms, duration_api_ms, num_turns, result (use prompt.length
when composing result strings), session_id, total_cost_usd and the usage
subfields (input_tokens, cache_creation_input_tokens, cache_read_input_tokens,
output_tokens), and assign them into the existing baseline and graphify
variables (keeping their keys unchanged); add a small fallback/default behavior
if the artifact file is missing or malformed so tests still run.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6f7b4a1d-7a20-4f03-9ea5-3365b6703b88

📥 Commits

Reviewing files that changed from the base of the PR and between b607208 and 298d142.

📒 Files selected for processing (23)

CHANGELOG.md
README.md
docs/benchmarks/2026-04-30-govalidate/README.md
docs/benchmarks/2026-04-30-govalidate/baseline-session.json
docs/benchmarks/2026-04-30-govalidate/graphify-session.json
docs/benchmarks/2026-04-30-govalidate/verify.sh
examples/why-graphify.md
package.json
src/cli/main.ts
src/cli/parser.ts
src/infrastructure/compare.ts
src/infrastructure/install.ts
src/runtime/stdio-server.ts
src/runtime/stdio/definitions.ts
tests/fixtures/mock-claude-runner.mjs
tests/unit/benchmark-artifact.test.ts
tests/unit/cli.test.ts
tests/unit/compare-native-agent.test.ts
tests/unit/install-templates.test.ts
tests/unit/install.test.ts
tests/unit/stdio-server.test.ts
tests/unit/stdio-tool-profile.test.ts
tests/unit/why-graphify-doc.test.ts

- src/infrastructure/install.ts: real bug — installMcpServer was overwriting the env block on re-install, silently downgrading a user-customized GRAPHIFY_TOOL_PROFILE=full back to 'core' and dropping unrelated user-set env keys (e.g. HTTP_PROXY). Now reads the existing server config's env (if any) and merges with the defaults so user values win. Test in install.test.ts covers the reinstall round-trip. - src/runtime/stdio-server.ts: gating error message no longer hardcodes "'core' profile" / ".mcp.json" — it's now profile- and client-agnostic and lists the three supported MCP config locations (.mcp.json / .cursor/mcp.json / .vscode/mcp.json) so Cursor and Copilot users see the right path. - examples/why-graphify.md: extend the GRAPHIFY_TOOL_PROFILE=full pointer to mention .cursor/mcp.json and .vscode/mcp.json alongside .mcp.json. - docs/benchmarks/2026-04-30-govalidate/README.md: tag the math fenced block as 'text' (markdownlint MD040); rewrite the reproduction recipe to use the actual CLI flow ('cd /path/to/repo && graphify-ts claude install') instead of the invalid '--project /path/to/repo' flag. - tests/fixtures/mock-claude-runner.mjs: load duration_ms, num_turns, total_cost_usd, and the usage block from docs/benchmarks/2026-04-30-govalidate/{baseline,graphify}-session.json at runtime so the smoke fixture and the public artifact stay in sync. Falls back to inline defaults when the artifact is missing. All 1236 tests pass; native_agent smoke test still emits 3x/2.77x/2.63x matching the public artifact's verify.sh output.

mohanagy · 2026-05-01T03:10:09Z

Addressed CodeRabbit feedback in `811adc5`

#	Comment	Fix
1	`docs/benchmarks/.../README.md` MD040 — math fenced block missing language tag	Changed to ```text
2	`docs/benchmarks/.../README.md` reproduction step uses non-existent `--project` flag	Rewrote to `cd /path/to/your/repo && graphify-ts generate . && graphify-ts claude install`
3	`examples/why-graphify.md` only mentions `.mcp.json` for opting into full profile	Now lists `.mcp.json` / `.cursor/mcp.json` / `.vscode/mcp.json`
4	`src/infrastructure/install.ts` overwrote env block on re-install, silently downgrading user-set `GRAPHIFY_TOOL_PROFILE=full` and dropping user-set env keys	Real bug — fixed via merge: `env = { GRAPHIFY_TOOL_PROFILE: 'core', ...existingEnv }`. Test added in `install.test.ts` covers the round-trip
5	`src/runtime/stdio-server.ts` gating error hardcodes `'core' profile` and `.mcp.json`	Generic message naming the active profile; lists all three supported MCP config locations
6	(Nitpick) `tests/fixtures/mock-claude-runner.mjs` duplicated benchmark numbers	Now loads from `docs/benchmarks/2026-04-30-govalidate/*.json` at runtime; fallback to inline defaults if artifact missing

Skipping: the docstring-coverage warning (3.13% vs CodeRabbit's 80% target) — the codebase doesn't follow that convention and adding boilerplate JSDoc to every function would be noise.

Verification re-run:

npm run typecheck clean
npm run test:run — 1236/1236 passing (added 2 new tests for the env-merge fix)
Smoke test still emits 3x turns / 2.77x faster / 2.63x tokens with the artifact-loaded fixtures

mohanagy · 2026-05-01T03:16:23Z

@coderabbitai review

coderabbitai · 2026-05-01T03:16:27Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

mohanagy added 5 commits May 1, 2026 06:46

coderabbitai Bot reviewed May 1, 2026

View reviewed changes

Comment thread docs/benchmarks/2026-04-30-govalidate/README.md Outdated

Comment thread docs/benchmarks/2026-04-30-govalidate/README.md Outdated

Comment thread examples/why-graphify.md Outdated

Comment thread src/infrastructure/install.ts

Comment thread src/runtime/stdio-server.ts

mohanagy merged commit bba3bd5 into main May 1, 2026
6 checks passed

This was referenced May 1, 2026

Improve graph tool routing guidance #29

Merged

docs: refresh README positioning #37

Merged

docs: add benchmark proof and distribution assets #42

Merged

This was referenced May 8, 2026

Feature/task context planner #60

Merged

Feature/profile messaging #64

Merged

Handle answer-only native compare runs #65

Merged

docs: add demo video to README hero #68

Merged

feat(#82): cut MCP core-profile schema overhead by 30% #105

Merged

Conversation

mohanagy commented May 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Headline numbers (measured 2026-04-30 against the GoValidate codebase)

CHANGELOG entry

Added

Changed

Fixed

Test plan

What this PR does NOT do

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

mohanagy commented May 1, 2026

Final acceptance check

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mohanagy commented May 1, 2026

Addressed CodeRabbit feedback in 811adc5

Uh oh!

mohanagy commented May 1, 2026

Uh oh!

coderabbitai Bot commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mohanagy commented May 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 1, 2026 •

edited

Loading

Addressed CodeRabbit feedback in `811adc5`