fix(mcp): protect connection on non-fatal client side timeout error #1231
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR enhances MCP client resilience by implementing error filtering to prevent connection collapse from recoverable client-side errors. The changes address "unknown request id" errors that occur when responses arrive after client-side timeouts, which previously caused unnecessary connection termination.
The implementation adds a non-fatal error pattern system in the MCP client's message handler. When errors match predefined patterns like "unknown request id", they are logged and ignored rather than terminating the connection. This maintains connection stability while still allowing genuine server errors to propagate normally.
Also included is a new md file to help demystify the decisions made regarding the MCPClient. As part of the this PR review, I encourage the review of the _MCP_CLIENT_ARCHITECTURE.md file.
After this PR, we need to improve our mocking posture. The test_mcp_client is testing private methods seemingly because testing from a public entry point is too difficult at the moment. We have integration test coverage. But we need to make sure our unit tests have the coverage we expect.
Related Issues
#1221
#922
#1169
Documentation PR
N/A
Type of Change
Bug fix
Testing
How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli
hatch run prepareChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.