You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a tool returns a large response over the Streamable HTTP client transport, the client takes ~5s to receive a ~4 MB body that curl/HttpClient.ofString() reads in ~0.4s. The bottleneck is the client-side SSE body reading in ResponseSubscribers.sseToBodySubscriber, not the server, network, or JSON parsing.
Server: Spring AI 2.0.0 mcp-spring-webmvc (HttpServletStreamableServerTransportProvider) returning a single large SSE message event — one compact-JSON data: line (~4.17 MB)
Same payload via curl / HttpClientBodyHandlers.ofString()
~0.4–0.75 s
Jackson parse of the received JSON
< 70 ms
→ ~10–13× slower than a plain one-shot read of the identical bytes.
Root cause analysis
The response body is effectively a single huge data: line (compact JSON has no newlines). ResponseSubscribers.sseToBodySubscriber uses HttpResponse.BodySubscribers.fromLineSubscriber(...); assembling that one ~4 MB line through the JDK line subscriber is the cost.
Things I tried:
Changing SseLineSubscriber demand from upstream().request(1) to request(Long.MAX_VALUE) → no improvement (so it isn't per-line backpressure).
Replacing the body subscriber with a streaming byte-level SSE parser (BodySubscribers.fromSubscriber, unbounded demand, accumulate ByteBuffers, split on \n\n event boundaries, decode each event once) → ~0.4 s (≈13×).
Important constraint (must stay streaming)
A whole-body ofString read fixes the speed but breaks progress: the server interleaves notifications/progress on the same POST response SSE stream before the final result. So the fix must remain a streaming parser that emits each SSE event as its boundary arrives (a byte-level parser does this while still avoiding the line-assembly cost).
Questions
Is there a recommended approach/workaround for large tool responses on the client that we're missing?
Would you accept a PR replacing fromLineSubscriber with a streaming byte-level SSE parser in ResponseSubscribers.sseToBodySubscriber (preserving incremental event emission)?
Summary
When a tool returns a large response over the Streamable HTTP client transport, the client takes ~5s to receive a ~4 MB body that
curl/HttpClient.ofString()reads in ~0.4s. The bottleneck is the client-side SSE body reading inResponseSubscribers.sseToBodySubscriber, not the server, network, or JSON parsing.Environment
io.modelcontextprotocol.sdk:mcp-core/mcp2.0.0 (latest)mcp-spring-webmvc(HttpServletStreamableServerTransportProvider) returning a single large SSEmessageevent — one compact-JSONdata:line (~4.17 MB)HttpClientStreamableHttpTransport(McpSyncClient.callTool)Measurements (steady-state, 3 runs)
McpSyncClient.callTool(this SDK)curl/HttpClientBodyHandlers.ofString()→ ~10–13× slower than a plain one-shot read of the identical bytes.
Root cause analysis
The response body is effectively a single huge
data:line (compact JSON has no newlines).ResponseSubscribers.sseToBodySubscriberusesHttpResponse.BodySubscribers.fromLineSubscriber(...); assembling that one ~4 MB line through the JDK line subscriber is the cost.Things I tried:
SseLineSubscriberdemand fromupstream().request(1)torequest(Long.MAX_VALUE)→ no improvement (so it isn't per-line backpressure).BodySubscribers.fromSubscriber, unbounded demand, accumulateByteBuffers, split on\n\nevent boundaries, decode each event once) → ~0.4 s (≈13×).Important constraint (must stay streaming)
A whole-body
ofStringread fixes the speed but breaks progress: the server interleavesnotifications/progresson the same POST response SSE stream before the final result. So the fix must remain a streaming parser that emits each SSE event as its boundary arrives (a byte-level parser does this while still avoiding the line-assembly cost).Questions
fromLineSubscriberwith a streaming byte-level SSE parser inResponseSubscribers.sseToBodySubscriber(preserving incremental event emission)?application/jsonresponse mode)? That would avoid SSE framing on the server side, but clients receiving SSE responses would still benefit from this fix.Happy to open a PR with the streaming parser + a benchmark if that's welcome.