Problem
Our project uses several external MCPs mounted inside our project MCP. During transient external MCP health or remount events, our project workflowserver may still have a cached view of the full tool list, while Wayflow’s direct MCP query can temporarily receive a partial tool list.
Example:
Expected/cached tools: a, b, c
Wayflow MCP query returns: b, c
Workflow step requires: a
Result: step fails because Wayflow believes tool a does not exist
This appears to be transient rather than a true configuration error. The external MCP may become healthy again shortly afterward.
Observed behavior
When Wayflow calls a tool that is missing from the MCP server’s current tool list, the MCP server returns a valid HTTP 200 response. Because of that, the existing MCP Transport-level RetryPolicy does not treat this as a transport failure and therefore does not retry.
Impact
Workflows can fail immediately even though the missing tool condition may be caused by a temporary external MCP or mount issue. This is especially problematic when our project workflowserver still reports the expected tool from its cached view, but Wayflow’s live MCP query sees only a partial list.
Request
Add a Wayflow-side mechanism to handle this condition as retryable or detectable by callers.
Possible approaches:
Retry missing expected tools
When a workflow step references a tool that is expected/configured but is absent from the current MCP tool list, retry the MCP tool-list query a few times before failing.
Make retry count/backoff configurable for assistant developers.
Return a specific retryable error/status
If Wayflow determines that a required tool is missing from the MCP list, return a distinct error code/status that our project can detect.
Our project can then retry from its side instead of treating the workflow step as a permanent failure.
Expected behavior
If a required MCP tool is missing due to a transient partial tool-list response, Wayflow should either:
retry tool-list resolution before failing
or
return a specific retryable missing-tool error/status
so the workflow does not fail immediately on temporary external MCP or remount issues.
Notes
Team’s investigation found that the current transport retry policy will not help because the MCP server returns HTTP 200 for the missing-tool case. A new Wayflow-level mechanism would likely be needed to catch this semantic error and allow retry configuration around it.
Problem
Our project uses several external MCPs mounted inside our project MCP. During transient external MCP health or remount events, our project workflowserver may still have a cached view of the full tool list, while Wayflow’s direct MCP query can temporarily receive a partial tool list.
Example:
Expected/cached tools: a, b, c
Wayflow MCP query returns: b, c
Workflow step requires: a
Result: step fails because Wayflow believes tool a does not exist
This appears to be transient rather than a true configuration error. The external MCP may become healthy again shortly afterward.
Observed behavior
When Wayflow calls a tool that is missing from the MCP server’s current tool list, the MCP server returns a valid HTTP 200 response. Because of that, the existing MCP Transport-level RetryPolicy does not treat this as a transport failure and therefore does not retry.
Impact
Workflows can fail immediately even though the missing tool condition may be caused by a temporary external MCP or mount issue. This is especially problematic when our project workflowserver still reports the expected tool from its cached view, but Wayflow’s live MCP query sees only a partial list.
Request
Add a Wayflow-side mechanism to handle this condition as retryable or detectable by callers.
Possible approaches:
Retry missing expected tools
When a workflow step references a tool that is expected/configured but is absent from the current MCP tool list, retry the MCP tool-list query a few times before failing.
Make retry count/backoff configurable for assistant developers.
Return a specific retryable error/status
If Wayflow determines that a required tool is missing from the MCP list, return a distinct error code/status that our project can detect.
Our project can then retry from its side instead of treating the workflow step as a permanent failure.
Expected behavior
If a required MCP tool is missing due to a transient partial tool-list response, Wayflow should either:
retry tool-list resolution before failing
or
return a specific retryable missing-tool error/status
so the workflow does not fail immediately on temporary external MCP or remount issues.
Notes
Team’s investigation found that the current transport retry policy will not help because the MCP server returns HTTP 200 for the missing-tool case. A new Wayflow-level mechanism would likely be needed to catch this semantic error and allow retry configuration around it.