Problem
The SDK spawns 9 goroutines across the mcp/ package that have no defer recover(). A panic in any of these goroutines (from user-provided tool handlers, middleware, or runtime errors like nil pointer dereferences) crashes the entire host process.
As a library, the SDK doesn't own the process, doesn't control the supervisor, and doesn't know what else runs in the same address space. A panicking goroutine takes down the consumer's HTTP server, metrics exporters, graceful shutdown hooks, and any other in-flight work.
Affected goroutines
| File |
Line |
Purpose |
mcp/server.go |
976 |
Server.Run session handler |
mcp/transport.go |
215 |
Cancel propagation |
mcp/transport.go |
405 |
Transport goroutine |
mcp/cmd.go |
78 |
Command signal handling |
mcp/sse.go |
421 |
SSE connection handler |
mcp/streamable.go |
1816 |
handleSSE response processing |
mcp/streamable.go |
2000 |
handleJSON response processing |
mcp/streamable.go |
2011 |
handleSSE request processing |
mcp/shared.go |
608 |
Shared logic goroutine |
Why this matters for MCP servers specifically
MCP servers commonly run embedded inside other processes: as sidecars, within web services, or spawned by AI coding tools. A tool handler with a nil pointer bug shouldn't be able to kill the host. The failure mode should be an error response to the client, not process death.
Proposal
Happy to submit a PR if this approach aligns with the project's philosophy. Questions:
- Do you prefer per-goroutine recovery, or a middleware/wrapper approach?
- Should recovery log via the SDK's structured logger, or stdlib
log?
- Any goroutines in the list above that you'd intentionally leave unprotected?
Problem
The SDK spawns 9 goroutines across the
mcp/package that have nodefer recover(). A panic in any of these goroutines (from user-provided tool handlers, middleware, or runtime errors like nil pointer dereferences) crashes the entire host process.As a library, the SDK doesn't own the process, doesn't control the supervisor, and doesn't know what else runs in the same address space. A panicking goroutine takes down the consumer's HTTP server, metrics exporters, graceful shutdown hooks, and any other in-flight work.
Affected goroutines
mcp/server.gomcp/transport.gomcp/transport.gomcp/cmd.gomcp/sse.gomcp/streamable.gomcp/streamable.gomcp/streamable.gomcp/shared.goWhy this matters for MCP servers specifically
MCP servers commonly run embedded inside other processes: as sidecars, within web services, or spawned by AI coding tools. A tool handler with a nil pointer bug shouldn't be able to kill the host. The failure mode should be an error response to the client, not process death.
Proposal
Happy to submit a PR if this approach aligns with the project's philosophy. Questions:
log?