feat(#82): cut MCP core-profile schema overhead by 30%#105
Conversation
Tightens the 6-tool core profile that ships by default with the
graphify-ts MCP server, dropping the JSON payload emitted on
\`tools/list\` from 4,271 bytes (~1,068 tokens) to 2,982 bytes
(~746 tokens) — a 30% reduction, exactly the issue's acceptance
target.
Per-tool measurements after the change:
graph_stats 119 bytes (unchanged — already minimal)
community_overview 197 bytes (was 263)
call_chain 511 bytes (was 599)
pr_impact 519 bytes (was 828)
impact 539 bytes (was 887)
retrieve 1,080 bytes (was 1,558)
Where the savings came from:
* Tightened every tool description: dropped trailing "Use this..."
marketing sentences, replaced "Analyze the blast radius of changing
a node" with "Blast-radius for a node:", etc. Preserves
discoverability — every tool still describes what it does and what
it returns; only the editorializing is gone.
* De-advertised the deprecated \`compact\` flag on impact, pr_impact,
and retrieve. The handler still validates and accepts \`compact\`
on receipt for backward compat with existing clients (see the
runtime tolerance check in tools.ts), but it no longer counts
against the cold-start cache_creation_input_tokens budget.
* Dropped the redundant "Optional:" prefix on parameter descriptions —
parameters not in \`required\` are optional by JSON Schema definition.
* Tightened the verbose-flag descriptions: "Optional: return the
legacy verbose X payload instead of the default compact response."
becomes "Return verbose payload (default: compact)."
* Surfaced the \`retrieval_level\` argument the retrieve tool already
accepted (#75) so override is now discoverable in tools/list. Cost:
50 bytes; benefit: closes the discoverability gap in #75-iv.
CI/test hardening:
* tests/unit/mcp-schema-budget.test.ts pins the JSON payload size
of both the core profile (ceiling 3,100 bytes) and the full
profile (ceiling 12,000 bytes). A future tool addition or
description regression that pushes either profile over the
ceiling fails CI with a clear message.
* tests/unit/stdio-pr-impact.test.ts updated to assert that
\`compact\` is NOT advertised in the schema, locking in the
intentional removal.
* tests/unit/why-graphify-doc.test.ts gains a doc-honesty pin on
the new measured numbers in README.md (~750 tokens, ~3,000 bytes,
30% reduction). Future PRs that change the README's claim must
update this regex in the same PR.
README updates:
* "Honest disclosure" section: rewrites the cold-start premium
paragraph with the actual measured post-#82 numbers and an
explicit pointer to the new regression test.
* "Honest summary" callout: replaces the stale ~13% premium phrase
with the measured token-overhead figure and a reference to the
pull request.
* Roadmap: flips the "🔜 Tighter cold-start MCP overhead" item to
"✅" with the measured 30% drop.
📝 WalkthroughWalkthroughThis PR reduces cold-start MCP tool-schema overhead from ~5K tokens to ~3,000 bytes (~750 tokens) by optimizing tool descriptions, removing deprecated ChangesTool-Schema Overhead Reduction
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/unit/stdio-pr-impact.test.ts (1)
335-339: ⚡ Quick winBroaden de-advertised
compactassertions to all affected tools.Line 339 currently locks this only for
pr_impact. Since#82de-advertisedcompacton multiple schemas, add matching assertions forimpactandretrieveto prevent partial regressions.Suggested test expansion
- const tool = (tools?.result as { tools: Array<{ name: string; inputSchema: { properties: Record<string, unknown> } }> }).tools.find( - (entry) => entry.name === 'pr_impact', - ) + const toolList = (tools?.result as { tools: Array<{ name: string; inputSchema: { properties: Record<string, unknown> } }> }).tools + const byName = new Map(toolList.map((entry) => [entry.name, entry])) - expect(tool?.inputSchema.properties.budget).toEqual(expect.objectContaining({ type: 'number' })) - expect(tool?.inputSchema.properties.verbose).toEqual(expect.objectContaining({ type: 'boolean' })) + const tool = byName.get('pr_impact') + expect(tool?.inputSchema.properties.budget).toEqual(expect.objectContaining({ type: 'number' })) + expect(tool?.inputSchema.properties.verbose).toEqual(expect.objectContaining({ type: 'boolean' })) expect(tool?.inputSchema.properties.compact).toBeUndefined() + expect(byName.get('impact')?.inputSchema.properties.compact).toBeUndefined() + expect(byName.get('retrieve')?.inputSchema.properties.compact).toBeUndefined()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/stdio-pr-impact.test.ts` around lines 335 - 339, The test currently asserts that the de-advertised `compact` property is undefined only for the `pr_impact` tool; expand this to assert the same for the other affected tools by adding the same expectation for `impact` and `retrieve`. Locate the existing assertion (expect(tool?.inputSchema.properties.compact).toBeUndefined()) and replicate it for the tool entries identified by the symbols/names `impact` and `retrieve` in the same test setup so all three tools verify `compact` is not surfaced.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/unit/mcp-schema-budget.test.ts`:
- Around line 22-24: The payloadBytes function currently returns
JSON.stringify({ tools }).length which measures UTF-16 code units, not UTF-8
byte length; change it to compute UTF-8 bytes using
Buffer.byteLength(JSON.stringify({ tools }), 'utf8') so the returned value
reflects actual UTF-8 payload size (update the payloadBytes function to use
Buffer.byteLength and keep the same input/output behavior).
---
Nitpick comments:
In `@tests/unit/stdio-pr-impact.test.ts`:
- Around line 335-339: The test currently asserts that the de-advertised
`compact` property is undefined only for the `pr_impact` tool; expand this to
assert the same for the other affected tools by adding the same expectation for
`impact` and `retrieve`. Locate the existing assertion
(expect(tool?.inputSchema.properties.compact).toBeUndefined()) and replicate it
for the tool entries identified by the symbols/names `impact` and `retrieve` in
the same test setup so all three tools verify `compact` is not surfaced.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: d7251282-0afe-41c2-9677-ca9347f36891
📒 Files selected for processing (5)
README.mdsrc/runtime/stdio/definitions.tstests/unit/mcp-schema-budget.test.tstests/unit/stdio-pr-impact.test.tstests/unit/why-graphify-doc.test.ts
| function payloadBytes(tools: ReadonlyArray<unknown>): number { | ||
| return JSON.stringify({ tools }).length | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Confirm current implementation pattern in repo
rg -n "JSON.stringify\\(\\{ tools \\}\\)\\.length"
# 2) Demonstrate why string length is not UTF-8 bytes
python - <<'PY'
s = '{"tools":[{"description":"~750 tokens, ≈ 3,000 bytes"}]}'
print("string length:", len(s))
print("utf8 bytes :", len(s.encode('utf-8')))
PYRepository: mohanagy/graphify-ts
Length of output: 182
🏁 Script executed:
cat -n tests/unit/mcp-schema-budget.test.tsRepository: mohanagy/graphify-ts
Length of output: 2999
🏁 Script executed:
rg -n "activeMcpTools|MCP_TOOLS" --type ts -A 2 tests/unit/mcp-schema-budget.test.ts | head -20Repository: mohanagy/graphify-ts
Length of output: 948
🏁 Script executed:
find . -path ./node_modules -prune -o -name "definitions.js" -type f -printRepository: mohanagy/graphify-ts
Length of output: 46
🏁 Script executed:
find . -name "definitions.js" -o -name "definitions.ts" 2>/dev/null | head -20Repository: mohanagy/graphify-ts
Length of output: 99
🏁 Script executed:
wc -l src/runtime/stdio/definitions.tsRepository: mohanagy/graphify-ts
Length of output: 101
🏁 Script executed:
cat -n src/runtime/stdio/definitions.ts | head -200Repository: mohanagy/graphify-ts
Length of output: 7128
🏁 Script executed:
cat -n src/runtime/stdio/definitions.ts | sed -n '200,300p'Repository: mohanagy/graphify-ts
Length of output: 6034
🏁 Script executed:
# Check for non-ASCII characters in the entire definitions file
python3 <<'PYEOF'
with open('src/runtime/stdio/definitions.ts', 'r', encoding='utf-8') as f:
content = f.read()
# Check for non-ASCII characters
non_ascii = [c for c in content if ord(c) > 127]
if non_ascii:
print(f"Found {len(non_ascii)} non-ASCII characters")
# Show a sample
unique_chars = set(non_ascii)
for char in sorted(unique_chars)[:10]:
print(f" {repr(char)} (U+{ord(char):04X})")
else:
print("No non-ASCII characters found in definitions.ts")
# Now test the actual measurement difference
import json
import sys
# Simulate what JSON.stringify would produce for a sample
sample_str = '{"tools":[{"description":"Blast-radius for a node"}]}'
str_len = len(sample_str)
utf8_bytes = len(sample_str.encode('utf-8'))
print(f"\nSample measurement difference:")
print(f" String .length: {str_len}")
print(f" UTF-8 bytes: {utf8_bytes}")
print(f" Difference: {utf8_bytes - str_len}")
PYEOFRepository: mohanagy/graphify-ts
Length of output: 206
🏁 Script executed:
# Find and examine the activeMcpTools function
rg -n "export.*activeMcpTools|function activeMcpTools" src/runtime/stdio/definitions.ts -A 5Repository: mohanagy/graphify-ts
Length of output: 334
🏁 Script executed:
# Test the actual measurement with real tool data
node -e "
const tools = [
{
name: 'query_graph',
description: 'Traverse the graph to answer a question from graph evidence.',
inputSchema: {
type: 'object',
required: ['question'],
properties: {
question: { type: 'string' },
mode: { type: 'string', enum: ['bfs', 'dfs'] },
depth: { type: 'number' },
token_budget: { type: 'number' },
rank_by: { type: 'string', enum: ['relevance', 'degree'] },
community_id: { type: 'number' },
file_type: { type: 'string' },
},
},
}
];
const str = JSON.stringify({ tools });
console.log('String .length:', str.length);
console.log('Buffer.byteLength:', Buffer.byteLength(str, 'utf8'));
console.log('Difference:', Buffer.byteLength(str, 'utf8') - str.length);
"Repository: mohanagy/graphify-ts
Length of output: 121
Use Buffer.byteLength() to correctly measure UTF-8 bytes instead of string length.
Line 23 uses .length, which counts UTF-16 code units, not UTF-8 bytes. While current tool descriptions are ASCII-only (so the difference is zero today), the byte ceilings have tight margins (~118 bytes for core, ~460 for full). Adding non-ASCII characters to tool descriptions in the future would silently undercount bytes and allow a regression to pass. Use Buffer.byteLength() for correct UTF-8 byte measurement:
Proposed fix
function payloadBytes(tools: ReadonlyArray<unknown>): number {
- return JSON.stringify({ tools }).length
+ return Buffer.byteLength(JSON.stringify({ tools }), 'utf8')
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function payloadBytes(tools: ReadonlyArray<unknown>): number { | |
| return JSON.stringify({ tools }).length | |
| } | |
| function payloadBytes(tools: ReadonlyArray<unknown>): number { | |
| return Buffer.byteLength(JSON.stringify({ tools }), 'utf8') | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/unit/mcp-schema-budget.test.ts` around lines 22 - 24, The payloadBytes
function currently returns JSON.stringify({ tools }).length which measures
UTF-16 code units, not UTF-8 byte length; change it to compute UTF-8 bytes using
Buffer.byteLength(JSON.stringify({ tools }), 'utf8') so the returned value
reflects actual UTF-8 payload size (update the payloadBytes function to use
Buffer.byteLength and keep the same input/output behavior).
Closes #82.
Summary
Tightens the 6-tool core profile that ships by default with the graphify-ts MCP server, dropping the JSON payload emitted on `tools/list` from 4,271 bytes / ~1,068 tokens to 2,982 bytes / ~746 tokens — exactly the 30% reduction target in the issue.
Where the savings came from
CI hardening (the issue's third deliverable)
`tests/unit/mcp-schema-budget.test.ts` pins the JSON payload size of both profiles:
A future tool addition or description regression that pushes either profile over the ceiling fails CI with a clear message identifying which profile and by how much. Test also asserts the core profile contains exactly the 6 documented tools and that every tool has a non-empty description.
`tests/unit/stdio-pr-impact.test.ts` updated to assert `compact` is NOT advertised in the schema, locking in the intentional removal.
Doc-honesty pins (the issue's fourth deliverable)
`tests/unit/why-graphify-doc.test.ts` gains a regex pin requiring the README's "Honest disclosure" section to contain:
A future PR that changes any of these numbers must update both the README and the regex in the same commit.
README updates
Test plan
What's deferred
The shared `$ref` schema-fragment idea from the issue body is intentionally NOT implemented here — Claude's MCP client doesn't expand `$ref` in tool input schemas at this time, so the trade-off (less readable definitions, no token savings) didn't pencil. If MCP transport-level dedup ever lands client-side, that's a follow-up.
Refs the issue's "Honest disclosure" link in the README. The README's old "~13%" cold-start premium claim was tied to the pre-#82 5K overhead; the next benchmark refresh will re-measure the new percentage against the no-graph baseline.
Summary by CodeRabbit
Release Notes
Documentation
Improvements
compactflag from impact and pr_impact tools.Tests