Skip to content

Conversation

zxkane
Copy link

@zxkane zxkane commented Sep 4, 2025

  • Add timeout support to call_tool_sync and call_tool_async methods
  • Only apply timeout when read_timeout_seconds is explicitly provided to maintain backward compatibility
  • Handle TimeoutError exceptions with proper error responses
  • Add comprehensive tests for timeout scenarios and backward compatibility

Fixes issue where MCP client would hang indefinitely when remote streamable HTTP servers responded with JSON-RPC error codes like -32603, caused by response.aread() waiting for stream completion in the MCP SDK's HTTP transport layer.

🤖 Generated with Claude Code

Description

I have an agentic application built on Strands Agent v1.7.0, which connects to a streamable HTTP MCP server. The MCP server is hosted on the AWS AgentCore runtime.

Sometimes, the MCP server responded to the below JSON-RPC message with an error message, then the MCP Client hung infinitely without returning the result or error.

{"jsonrpc":"2.0","error":{"code":-32603,"message":"An internal error occurred while processing the request."},"id":"92a9759e-cc7f-4090-91f0-c45e50abfaff"}

Code snippet for using strands MCPClient to connect the MCP server,

    def _setup_oauth2_mcp_client(self) -> MCPClient:
        """Set up MCP client with OAuth2 authentication via AgentCore Identity."""
        if not AGENTCORE_IDENTITY_AVAILABLE:
            raise RuntimeError("AgentCore Identity SDK not available for OAuth2 authentication")
        
        oauth2_provider_id = os.environ.get("OAUTH2_PROVIDER_ID")
        mcp_server_url = os.environ.get("MCP_SERVER_URL")
        
        if not oauth2_provider_id:
            raise ValueError("OAUTH2_PROVIDER_ID environment variable not set")
        
        if not mcp_server_url:
            raise ValueError("MCP_SERVER_URL environment variable not set for OAuth2 authentication")
        
        # Store the MCP URL from environment for OAuth2 mode
        self.mcp_url = mcp_server_url
        
        @requires_access_token(
            provider_name=oauth2_provider_id,
            scopes=["mcp-server/read", "mcp-server/write"],  # MCP-specific scopes - adjust as needed
            auth_flow="M2M",      # M2M authentication flow
            force_authentication=False
        )
        async def get_authenticated_mcp_client(*, access_token: str):
            headers = {"Authorization": f"Bearer {access_token}"}
            
            # Get timeout configuration from environment with reasonable defaults
            http_timeout = float(os.environ.get('HTTP_TIMEOUT', 30))
            sse_read_timeout = float(os.environ.get('SSE_READ_TIMEOUT', 300))
            
            logger.debug(f"Setting up OAuth2 MCP client with timeouts: http={http_timeout}s, sse_read={sse_read_timeout}s")
            
            return MCPClient(lambda: streamablehttp_client(
                url=mcp_server_url,
                headers=headers,
                timeout=http_timeout,  # HTTP operations timeout
                sse_read_timeout=sse_read_timeout  # SSE read timeout
            ))
        
        # Execute the async function and return the client
        try:
            # Check if we're already in an event loop (e.g., FastAPI lifespan context)
            try:
                asyncio.get_running_loop()
                # We're in an event loop, need to run in a separate thread
                logger.debug("Running event loop detected, using threaded execution")
                
                import threading
                import queue
                result_queue = queue.Queue()
                
                def run_oauth_in_thread():
                    """Run OAuth2 setup in a separate thread with its own event loop."""
                    try:
                        # Create new event loop for this thread
                        new_loop = asyncio.new_event_loop()
                        asyncio.set_event_loop(new_loop)
                        try:
                            result = new_loop.run_until_complete(get_authenticated_mcp_client())
                            result_queue.put(('success', result))
                        finally:
                            new_loop.close()
                    except Exception as e:
                        result_queue.put(('error', e))
                
                # Run OAuth2 setup in separate thread
                oauth_thread = threading.Thread(target=run_oauth_in_thread, daemon=True)
                oauth_thread.start()
                oauth_thread.join(timeout=30)  # 30 second timeout
                
                if oauth_thread.is_alive():
                    raise TimeoutError("OAuth2 authentication timed out after 30 seconds")
                
                try:
                    status, result = result_queue.get_nowait()
                    if status == 'error':
                        raise result
                    return result
                except queue.Empty:
                    raise RuntimeError("OAuth2 thread completed but no result received")
                
            except RuntimeError as e:
                if "no running event loop" in str(e).lower():
                    # No event loop running, safe to use asyncio.run
                    logger.debug("No running event loop detected, using asyncio.run")
                    return asyncio.run(get_authenticated_mcp_client())
                else:
                    raise
                    
        except Exception as e:
            logger.error(f"Failed to setup OAuth2 MCP client: {e}")
            raise RuntimeError(f"OAuth2 MCP client setup failed: {e}")

Related Issues

It might be caused by the MCP SDK's issue.

This fix provides defense at the application layer.

Documentation PR

Type of Change

Bug fix

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

- Add timeout support to call_tool_sync and call_tool_async methods
- Only apply timeout when read_timeout_seconds is explicitly provided to maintain backward compatibility
- Handle TimeoutError exceptions with proper error responses
- Add comprehensive tests for timeout scenarios and backward compatibility
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants