Skip to content

Remote MCP servers stuck in "starting" state when upstream returns errors #4459

@JAORMX

Description

@JAORMX

Description

When a remote MCP server returns errors (e.g., HTTP 500), ToolHive keeps retrying the health check indefinitely without surfacing the failure to the user. The server remains stuck in starting status and the user gets no indication that something is wrong.

Steps to reproduce

  1. Run a remote MCP server that is returning errors (e.g., HTTP 500):

    thv run --name google-drive https://mcp.stacklok.dev/google-drive/mcp
    
  2. Check server status:

    thv list -a
    
  3. Observe the server stays in starting state indefinitely.

  4. With debug logging enabled, the proxy logs show repeated failed attempts:

    level:DEBUG,msg:Server returned status,status_code:500,attempt:43
    

Expected behavior

  • ToolHive should detect that the remote server is consistently returning errors and transition the server to an unhealthy state.
  • An error message should be surfaced to the user so they can act on it (e.g., check the remote server, retry later, etc.).

Actual behavior

  • The server stays in starting status forever.
  • No error is shown to the user unless they manually inspect debug logs.
  • thv list (without -a) doesn't even show the server since it never reaches running.

Additional context

  • This is a pre-existing issue, reproduced as far back as v0.12.0.
  • The root cause in the initial investigation was the upstream MCP server itself being down (returning 500s), but ToolHive should handle this gracefully regardless.
  • Consensus is to use the existing unhealthy state and surface a clear error message to the user.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcliChanges that impact CLI functionalityproxy

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions