-
Notifications
You must be signed in to change notification settings - Fork 294
perf: add schema caching to avoid repeated reflection #685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
perf: add schema caching to avoid repeated reflection #685
Conversation
|
Thank you! This makes sense in principle, and we hadn't considered the effect that lack of a cache would have on the stateless hosting model. Will review expediently. |
|
@findleyr also don’t feel attacked by the AI generated performance test results report and PR description 😅. I believe the principle of this change is exactly right but happy for you to take over implementation or re-implement if you like the idea too. I have enabled maintainer edits. |
mcp/schema_cache.go
Outdated
|
|
||
| // resetForTesting clears the cache. Only for use in tests. | ||
| func (c *schemaCache) resetForTesting() { | ||
| c.byType = sync.Map{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call sync.Map.Clear?
mcp/schema_cache_benchmark_test.go
Outdated
|
|
||
| // BenchmarkAddTool_TypedHandler measures performance of AddTool with typed handlers. | ||
| // This simulates the stateless server pattern where new servers are created per request. | ||
| func BenchmarkAddTool_TypedHandler(b *testing.B) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove underscores in benchmark and test names
mcp/server.go
Outdated
| // Apply defaults directly to the struct. | ||
| // This is more efficient than re-marshaling the map with defaults applied. | ||
| if inputResolved != nil { | ||
| if err := inputResolved.ApplyDefaults(&in); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The jsonschema ApplyDefaults function is just wrong when applied to structs. I'm going to remove that behavior, as I did for validating against structs.
Consider where the default is 1 and a zero is explicitly passed in the JSON. ApplyDefaults will overwrite that zero with a 1. That's wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
As I detail in my comments, I don't believe the tool call implementation will work as it stands. But you also haven't motivated it, as you have for the schema cache. Maybe it isn't necessary, especially given the latency of current LLMs. It should have been in a second PR anyway, since it's an independent change. |
|
Thanks for the review @jba I'll definitely be happy to split. I agree. Regarding:
We do have sufficient motivation beyond user perception of latency to keep our fixed startup costs down, consider that this adds additional fixed cost latency to every single agent that runs our MCP by default (like say Copilot Coding Agent that runs on Actions runners), which adds up, and it's not per request as all clients need to get all the initial routes at startup and I don't believe all client SDKs make those requests in parallel so the effect is multiplicative to the number of server features offered. I'll redo benchmarking with just the cache changes and see, but I suspect we will leave the shim in place and avoid the problem, gradually moving away from the tool for type method more permanently. Appreciate the quick response. |
This change adds a global schema cache that dramatically reduces the cost of registering tools in stateless server patterns (new server per request). Key improvements: - Cache schemas by reflect.Type for typed handlers - Cache resolved schemas by pointer for pre-defined schemas - 132x faster tool registration after first call - 51x fewer allocations per AddTool call - 32x less memory per AddTool call Benchmarks: - BenchmarkAddToolTypedHandler: 1,223 ns/op vs 161,463 ns/op (no cache) - BenchmarkAddToolTypedHandler: 21 allocs vs 1,072 allocs (no cache) This benefits integrators like github-mcp-server automatically without any code changes required.
46310a7 to
988974d
Compare
|
Thanks for the thorough review @jba! I've addressed all feedback:
The PR is now squashed to a single commit with just the schema caching optimization. Force pushed the cleaned branch. |
Comprehensive Benchmark Report: Schema Caching Performance ImpactExecutive SummaryThis PR's schema caching fix fully resolves the performance regression introduced when migrating from mcp-go to go-sdk, while also providing significant memory improvements.
Test Configurations
1. MCP Protocol Latency Tests (n=30)Operation:
|
| Configuration | P50 | P99 | vs main | Status |
|---|---|---|---|---|
| main | 13.42ms | 16.38ms | baseline | ✅ |
| gosdk | 21.20ms | 25.61ms | +58% | 🔴 REGRESSION |
| gosdk_shimmed | 11.89ms | 14.45ms | -11% | ✅ FIXED |
| gosdk_optimized | 11.73ms | 15.22ms | -13% | 🏆 WINNER |
Operation: tools/list
| Configuration | P50 | P99 | vs main | Status |
|---|---|---|---|---|
| main | 15.44ms | 24.84ms | baseline | ✅ |
| gosdk | 24.74ms | 31.06ms | +60% | 🔴 REGRESSION |
| gosdk_shimmed | 15.13ms | 21.46ms | -2% | ✅ FIXED |
| gosdk_optimized | 15.07ms | 19.40ms | -2% | 🏆 WINNER |
Operation: prompts/list
| Configuration | P50 | P99 | vs main | Status |
|---|---|---|---|---|
| main | 13.01ms | 15.56ms | baseline | ✅ |
| gosdk | 22.33ms | 29.35ms | +72% | 🔴 REGRESSION |
| gosdk_shimmed | 11.72ms | 17.78ms | -10% | ✅ FIXED |
| gosdk_optimized | 11.41ms | 16.30ms | -12% | 🏆 WINNER |
2. Stress Tests (n=100)
Operation: initialize
| Configuration | P50 | P99 | vs main | Status |
|---|---|---|---|---|
| main | 13.12ms | 18.95ms | baseline | ✅ |
| gosdk | 21.96ms | 29.70ms | +67% | 🔴 REGRESSION |
| gosdk_shimmed | 11.63ms | 15.39ms | -11% | ✅ FIXED |
| gosdk_optimized | 11.73ms | 18.05ms | -11% | 🏆 WINNER |
3. Memory Allocation Comparison (pprof)
| Configuration | Total Allocations | vs main | vs broken gosdk |
|---|---|---|---|
| main | 343 MB | baseline | — |
| gosdk (broken) | 1186.82 MB | +246% | baseline |
| gosdk_optimized | 149.43 MB | -56% 🏆 | -87% 🏆 |
Root Cause - Top Allocations in Broken gosdk
| Function | Size | % | Issue |
|---|---|---|---|
| reflect.New | 105.57 MB | 8.9% | 🚨 Schema reflection on every request |
| slices.AppendSeq | 85.50 MB | 7.2% | String collection for schema |
| encoding/json.Unmarshal | 66.01 MB | 5.6% | Schema parsing |
| jsonschema.checkStructure | 24.50 MB | 2.1% | 🚨 Schema validation |
Top Allocations in gosdk_optimized (After Fix)
| Function | Size | % | Notes |
|---|---|---|---|
| bytes.growSlice | 7.82 MB | 5.2% | Normal JSON operations |
| runtime.allocm | 5.51 MB | 3.7% | Goroutine stacks |
| encoding/json.Marshal | 5.10 MB | 3.4% | Normal serialization |
No more reflect.New, jsonschema.checkStructure, or repeated schema generation!
4. Tool Call Tests (n=25, includes GitHub API latency)
⚠️ Times include GitHub API round-trip (~200-750ms)
Tool: get_me
| Configuration | P50 | P99 | Avg | Errors |
|---|---|---|---|---|
| main | 215.33ms 🏆 | 411.90ms | 232.58ms | 0 |
| gosdk | 230.62ms | 414.06ms | 241.64ms | 0 |
| gosdk_shimmed | 226.53ms | 282.34ms | 228.62ms | 0 |
| gosdk_optimized | 219.92ms | 412.27ms | 235.64ms | 0 |
Tool: list_issues
| Configuration | P50 | P99 | Avg | Errors |
|---|---|---|---|---|
| main | 758.72ms | 1114.68ms | 798.82ms | 0 |
| gosdk | 754.56ms | 1093.81ms | 758.48ms | 0 |
| gosdk_shimmed | 710.04ms 🏆 | 1725.47ms | 781.53ms | 0 |
| gosdk_optimized | 745.10ms | 928.19ms 🏆 | 744.38ms 🏆 | 0 |
Tool call performance is equivalent across all configurations - variance is primarily from GitHub API response times.
5. Visual Summary
Latency (P50 initialize) - Lower is Better
================================================================================
main (mcp-go) █████████████████████████████████ 13.42ms (baseline)
gosdk (broken) ██████████████████████████████████████████████████████ 21.20ms (+58%)
gosdk_shimmed █████████████████████████████ 11.89ms (-11%) ✅
gosdk_optimized █████████████████████████████ 11.73ms (-13%) 🏆
================================================================================
Memory Allocations - Lower is Better
================================================================================
main (mcp-go) ██████████████████ 343 MB
gosdk (broken) ██████████████████████████████████████████████████████████████ 1187 MB (+246%)
gosdk_optimized █████████ 149 MB (-56% vs main, -87% vs broken) 🏆
================================================================================
Winners Summary
| Category | Winner | Value |
|---|---|---|
| initialize latency | gosdk_optimized | 11.73ms |
| tools/list latency | gosdk_optimized | 15.07ms |
| prompts/list latency | gosdk_optimized | 11.41ms |
| Memory efficiency | gosdk_optimized | 149.43 MB |
| get_me tool call | main | 215.33ms |
| list_issues P99 | gosdk_optimized | 928.19ms |
Conclusion
This PR's schema caching approach fully resolves the performance regression and provides:
- ✅ 13% faster latency than mcp-go baseline for MCP protocol operations
- ✅ 87% less memory than broken go-sdk (1187 MB → 149 MB)
- ✅ 56% less memory than mcp-go baseline (343 MB → 149 MB)
- ✅ Equivalent tool call performance (within 2-5%)
- ✅ Lowest P99 variance for complex operations
The fix eliminates repeated JSON schema generation via reflection by caching schemas after first generation.
Strongly recommend merging this fix. 🚀
Summary
This PR adds schema caching that dramatically reduces the cost of tool registration in stateless server patterns.
Problem
In stateless deployments like
github/github-mcp-server, a new*mcp.Serveris created for each incoming request. This meansAddToolis called repeatedly for the same tools, causing:jsonschema.ForType()reflection is called every timeschema.Resolve()is called every timeSolution
Add a
schemaCachethat stores:reflect.Type(for auto-generated schemas from typed handlers)The cache is:
sync.MapBenchmark Results
Files Changed
mcp/schema_cache.go- New cache implementationmcp/server.go- ModifiedsetSchemato use cachemcp/schema_cache_test.go- Unit tests for caching behaviormcp/schema_cache_benchmark_test.go- BenchmarksImpact for Integrators
Automatic - no code changes required. Integrators using:
AddTool[In, Out]) → cache by typeTool{InputSchema: schema}) → cache by pointerBoth patterns benefit from caching after the first call.
Real-World Performance Validation
The following benchmarks were conducted using
github/github-mcp-server(a production MCP server with ~130 tools) to validate the performance impact in a real-world scenario.Test Environment
Configurations Tested
Latency Test Results (n=30)
Operation:
initializeOperation:
tools/listOperation:
prompts/listStress Test Results (n=100)
Memory/Allocation Comparison (from pprof)
Top Allocation Sources - go-sdk WITHOUT cache (broken)
jsonschema.UnmarshalJSONencoding/json.Unmarshaljsonschema.resolvejsonschema.MarshalJSONRoot Cause Analysis
The
google/jsonschema-golibrary regenerates JSON schemas on every request instead of caching them. In a server with ~130 tools, this causes:Key Findings