fix: prevent MCP client hanging on JSON-RPC error responses #792

zxkane · 2025-09-04T03:33:29Z

Add timeout support to call_tool_sync and call_tool_async methods
Only apply timeout when read_timeout_seconds is explicitly provided to maintain backward compatibility
Handle TimeoutError exceptions with proper error responses
Add comprehensive tests for timeout scenarios and backward compatibility

Fixes issue where MCP client would hang indefinitely when remote streamable HTTP servers responded with JSON-RPC error codes like -32603, caused by response.aread() waiting for stream completion in the MCP SDK's HTTP transport layer.

🤖 Generated with Claude Code

Description

I have an agentic application built on Strands Agent v1.7.0, which connects to a streamable HTTP MCP server. The MCP server is hosted on the AWS AgentCore runtime.

Sometimes, the MCP server responded to the below JSON-RPC message with an error message, then the MCP Client hung infinitely without returning the result or error.

{"jsonrpc":"2.0","error":{"code":-32603,"message":"An internal error occurred while processing the request."},"id":"92a9759e-cc7f-4090-91f0-c45e50abfaff"}

Code snippet for using strands MCPClient to connect the MCP server,

    def _setup_oauth2_mcp_client(self) -> MCPClient:
        """Set up MCP client with OAuth2 authentication via AgentCore Identity."""
        if not AGENTCORE_IDENTITY_AVAILABLE:
            raise RuntimeError("AgentCore Identity SDK not available for OAuth2 authentication")
        
        oauth2_provider_id = os.environ.get("OAUTH2_PROVIDER_ID")
        mcp_server_url = os.environ.get("MCP_SERVER_URL")
        
        if not oauth2_provider_id:
            raise ValueError("OAUTH2_PROVIDER_ID environment variable not set")
        
        if not mcp_server_url:
            raise ValueError("MCP_SERVER_URL environment variable not set for OAuth2 authentication")
        
        # Store the MCP URL from environment for OAuth2 mode
        self.mcp_url = mcp_server_url
        
        @requires_access_token(
            provider_name=oauth2_provider_id,
            scopes=["mcp-server/read", "mcp-server/write"],  # MCP-specific scopes - adjust as needed
            auth_flow="M2M",      # M2M authentication flow
            force_authentication=False
        )
        async def get_authenticated_mcp_client(*, access_token: str):
            headers = {"Authorization": f"Bearer {access_token}"}
            
            # Get timeout configuration from environment with reasonable defaults
            http_timeout = float(os.environ.get('HTTP_TIMEOUT', 30))
            sse_read_timeout = float(os.environ.get('SSE_READ_TIMEOUT', 300))
            
            logger.debug(f"Setting up OAuth2 MCP client with timeouts: http={http_timeout}s, sse_read={sse_read_timeout}s")
            
            return MCPClient(lambda: streamablehttp_client(
                url=mcp_server_url,
                headers=headers,
                timeout=http_timeout,  # HTTP operations timeout
                sse_read_timeout=sse_read_timeout  # SSE read timeout
            ))
        
        # Execute the async function and return the client
        try:
            # Check if we're already in an event loop (e.g., FastAPI lifespan context)
            try:
                asyncio.get_running_loop()
                # We're in an event loop, need to run in a separate thread
                logger.debug("Running event loop detected, using threaded execution")
                
                import threading
                import queue
                result_queue = queue.Queue()
                
                def run_oauth_in_thread():
                    """Run OAuth2 setup in a separate thread with its own event loop."""
                    try:
                        # Create new event loop for this thread
                        new_loop = asyncio.new_event_loop()
                        asyncio.set_event_loop(new_loop)
                        try:
                            result = new_loop.run_until_complete(get_authenticated_mcp_client())
                            result_queue.put(('success', result))
                        finally:
                            new_loop.close()
                    except Exception as e:
                        result_queue.put(('error', e))
                
                # Run OAuth2 setup in separate thread
                oauth_thread = threading.Thread(target=run_oauth_in_thread, daemon=True)
                oauth_thread.start()
                oauth_thread.join(timeout=30)  # 30 second timeout
                
                if oauth_thread.is_alive():
                    raise TimeoutError("OAuth2 authentication timed out after 30 seconds")
                
                try:
                    status, result = result_queue.get_nowait()
                    if status == 'error':
                        raise result
                    return result
                except queue.Empty:
                    raise RuntimeError("OAuth2 thread completed but no result received")
                
            except RuntimeError as e:
                if "no running event loop" in str(e).lower():
                    # No event loop running, safe to use asyncio.run
                    logger.debug("No running event loop detected, using asyncio.run")
                    return asyncio.run(get_authenticated_mcp_client())
                else:
                    raise
                    
        except Exception as e:
            logger.error(f"Failed to setup OAuth2 MCP client: {e}")
            raise RuntimeError(f"OAuth2 MCP client setup failed: {e}")

Related Issues

It might be caused by the MCP SDK's issue.

This fix provides defense at the application layer.

Documentation PR

Type of Change

Bug fix

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

I ran hatch run prepare

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

src/strands/tools/mcp/mcp_client.py

- Add timeout support to call_tool_sync and call_tool_async methods - Only apply timeout when read_timeout_seconds is explicitly provided to maintain backward compatibility - Handle TimeoutError exceptions with proper error responses - Add comprehensive tests for timeout scenarios and backward compatibility

zxkane had a problem deploying to manual-approval September 4, 2025 03:33 — with GitHub Actions Failure

dbschmigelski requested changes Sep 18, 2025

View reviewed changes

src/strands/tools/mcp/mcp_client.py Outdated Show resolved Hide resolved

zxkane force-pushed the main branch from 770afd4 to e0fa596 Compare September 23, 2025 07:26

zxkane requested a deployment to manual-approval September 23, 2025 07:26 — with GitHub Actions Waiting

zxkane requested a review from dbschmigelski September 23, 2025 07:30

mattvaughan mentioned this pull request Sep 30, 2025

fix: Notify futures when MCP connection encounters an error #951

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: prevent MCP client hanging on JSON-RPC error responses #792

fix: prevent MCP client hanging on JSON-RPC error responses #792

Uh oh!

zxkane commented Sep 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: prevent MCP client hanging on JSON-RPC error responses #792

Are you sure you want to change the base?

fix: prevent MCP client hanging on JSON-RPC error responses #792

Uh oh!

Conversation

zxkane commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Documentation PR

Type of Change

Testing

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zxkane commented Sep 4, 2025 •

edited

Loading