Skip to content

refactor: shared MCP error handling with Diagnostic integration#227

Merged
spinje merged 2 commits into
mainfrom
refactor/mcp-error-handling
Apr 5, 2026
Merged

refactor: shared MCP error handling with Diagnostic integration#227
spinje merged 2 commits into
mainfrom
refactor/mcp-error-handling

Conversation

@spinje
Copy link
Copy Markdown
Owner

@spinje spinje commented Apr 5, 2026

Summary

Extract MCP SDK error handling from MCPNode into shared mcp/errors.py. Both MCPNode.exec_fallback and MCP discovery now use the same describe_mcp_error() function, which unwraps ExceptionGroups and produces structured Diagnostics with actionable suggestions.

Fixes the wall-of-text error output from pflow mcp sync where triple logger.exception() calls dumped full ExceptionGroup tracebacks for expected failures (auth expired, SSL, unreachable servers).

Changes

  • New: src/pflow/mcp/errors.pyunwrap_exception_group() + describe_mcp_error() returning Diagnostic
  • Simplified: MCPNode.exec_fallback — deleted 5 helper methods (~160 lines), replaced with single call to shared function
  • Fixed: discovery.py — removed triple logger.exception(), removed redundant try/except, clean error messages via describe_mcp_error()
  • Fixed: registrar.py — removed logger.exception(), returns Diagnostic in sync result dict for CLI rendering
  • Added: --verbose flag on pflow mcp sync for technical error details
  • Updated: mcp/CLAUDE.md — added errors.py to file tree

Explanation

The MCP SDK wraps all exceptions in anyio ExceptionGroups. Previously, logger.exception() at 3 levels (discovery, discover_tools, registrar) each dumped the full ExceptionGroup traceback (~40 lines each), producing a wall of unreadable text for simple failures like expired auth tokens.

MCPNode already had ~170 lines of error handling code (ExceptionGroup unwrapping, HTTP status classification, regex safety nets) that solved the same problem for tool execution. This code was trapped as instance methods on the node class despite not using self meaningfully.

The fix extracts this logic to a shared module that:

  1. Unwraps ExceptionGroups recursively to find the leaf exception
  2. Classifies by exception type name (httpx is a transitive dep — can't use isinstance)
  3. Returns a Diagnostic with title, message, suggestions, and technical_details context
  4. Integrates with the existing diagnostic rendering system (format_diagnostic with verbose support)

Before (pflow mcp sync --all):

ERROR: Error during HTTP discovery for supabase
  + Exception Group Traceback (most recent call last):
  ... (120+ lines of nested ExceptionGroup tracebacks per failure) ...
  ✗ supabase: Tool discovery failed for supabase: unhandled errors in a TaskGroup (1 sub-exception)

After:

  ✗ supabase: Authentication failed.
    → Check your API credentials or token for this server
  ✗ test-http: SSL certificate verification failed.
  ✓ Railway: 14 discovered, 14 registered

After (--verbose):

  ✗ supabase: Authentication failed.
    → Check your API credentials or token for this server
    Detail: Client error '401 Unauthorized' for url 'https://mcp.supabase.com/...'

Testing

  • 21 new tests across test_mcp_errors.py (shared function) and test_mcp_discovery_critical.py (registrar→Diagnostic integration)
  • 2 existing tests updated for new message format
  • 4601 total tests passing, make check clean
make test && make check

@claude
Copy link
Copy Markdown

claude Bot commented Apr 5, 2026

Code Review: PR #227 — Shared MCP Error Handling with Diagnostic Integration

Overall Assessment

Clean, well-motivated refactor. Extracting ~160 lines of error-handling instance methods from MCPNode into a shared module is the right call — the original code didn't use self meaningfully and was duplicated across discovery. The before/after error output shown in the PR description is a clear UX win. Tests are thorough and test the right things.


Suggestions — optional improvements

1. Inconsistent diagnostic rendering between _sync_all_servers and _sync_single_server

_sync_single_server uses the project's format_diagnostic() renderer (which handles indentation, severity, verbose mode, etc.), while _sync_all_servers hand-rolls its own inline rendering with prefixes and Detail: labels. This means the two paths will diverge as format_diagnostic() evolves.

src/pflow/cli/commands/mcp.py_sync_all_servers:

# Current: inline rendering
if diagnostic.suggestions:
    for suggestion in diagnostic.suggestions:
        click.echo(f"    → {suggestion}", err=True)
context = diagnostic.context or {}
technical = context.get("technical_details")
if verbose and technical:
    click.echo(f"    Detail: {technical[:200]}", err=True)

The inline rendering is defensible here because _sync_all_servers has a compact multi-server summary format ( ✗ server: message) that doesn't match format_diagnostic()'s full-block rendering. This is a design choice, not a bug — just be aware that the two rendering paths will need to stay in sync manually.

2. unwrap_exception_group always takes the first exception

errors.py:24 — when an ExceptionGroup contains multiple exceptions, only exceptions[0] is inspected. This is documented in tests (test_multi_child_takes_first) and matches the previous MCPNode behavior, so it's fine for the MCP SDK's typical single-child groups. Just noting that multi-exception groups from other sources would lose information silently.

3. Minor: technical_details truncation inconsistency

In _sync_all_servers, technical details are truncated to 200 chars (technical[:200]), and in _describe_http_status_error's fallback path, response text is truncated to 200 chars (exc.response.text[:200]). But describe_mcp_error stores the full str(exc) in technical_details without truncation. The truncation happens at different layers for different paths. Not a bug, but worth noting for consistency.

4. Consider: _STATUS_MAP could use None status guard

errors.py:145status in _STATUS_MAP works fine when status is None (returns False), and the 500 <= status < 600 check on line 156 is guarded by if status and .... This is correct but relies on None being falsy in the and chain. The current code reads fine.


What's done well

  • Test quality is excellent — 21 tests covering unwrapping, all HTTP status codes, SSL, timeouts, ExceptionGroup nesting, McpError extraction, and the registrar→Diagnostic integration path. Tests verify behavior, not implementation.
  • The e.__cause__ unwrapping in registrar.py correctly bypasses the RuntimeError wrapping added by discover_tools, getting to the original MCP SDK exception. Good comment explaining why.
  • Removing the redundant try/except + logger.exception() in _discover_async_http — this was a no-op that dumped tracebacks before re-raising to the caller's handler.
  • logger.debug with exc_info=True instead of logger.exception — preserves debuggability without spamming users.
  • String-based type checking (name == "HTTPStatusError") correctly avoids importing httpx as a direct dependency (it's transitive via the MCP SDK). Well-documented pattern.
  • The --verbose flag is a good addition — users get clean output by default, full details on demand.

No critical issues or warnings. This is ready to merge.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a centralized error handling and diagnostic system for the MCP SDK, replacing fragmented error logic across the codebase. It adds a new errors.py module to unwrap ExceptionGroups and generate structured Diagnostic objects containing actionable suggestions and technical details. The CLI sync command is updated with a --verbose flag to expose these details. Feedback focuses on removing a redundant error description call in discovery.py and ensuring that full exception tracebacks are preserved in MCPNode logs for better debuggability.

Comment on lines +78 to +82
from pflow.mcp.errors import describe_mcp_error

diagnostic = describe_mcp_error(e)
logger.debug("Discovery failed for %s: %s", server_name, diagnostic.message, exc_info=True)
raise RuntimeError(f"Tool discovery failed for {server_name}: {diagnostic.message}") from e
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The describe_mcp_error function is called here to generate a diagnostic, but the diagnostic object itself is discarded. Only its message is used to create a RuntimeError. The calling function, registrar.sync_server, then unwraps the original exception from __cause__ and calls describe_mcp_error again to create the final diagnostic that is shown to the user.

This initial call to describe_mcp_error is redundant. You can simplify this block to just log the debug information and re-raise the exception with a generic wrapper message. The registrar is already responsible for creating the detailed, user-facing error message.

Suggested change
from pflow.mcp.errors import describe_mcp_error
diagnostic = describe_mcp_error(e)
logger.debug("Discovery failed for %s: %s", server_name, diagnostic.message, exc_info=True)
raise RuntimeError(f"Tool discovery failed for {server_name}: {diagnostic.message}") from e
logger.debug("Discovery failed for %s", server_name, exc_info=True)
raise RuntimeError(f"Tool discovery failed for {server_name}") from e

Comment on lines 475 to 482
logger.debug(
error_msg,
extra={
"server": prep_res.get("server"),
"tool": prep_res.get("tool"),
"exception_type": type(exc).__name__,
"full_exception": exc_str[:500], # Full exception for debugging
},
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The refactoring to use describe_mcp_error is great for cleaning up the error messages. However, in this logger.debug call, the full exception traceback is no longer logged. The previous implementation logged full_exception, which was valuable for debugging.

To restore this important debugging information, you can add exc_info=True to the logger.debug call. This will attach the full exception information and stack trace to the log record.

Suggested change
logger.debug(
error_msg,
extra={
"server": prep_res.get("server"),
"tool": prep_res.get("tool"),
"exception_type": type(exc).__name__,
"full_exception": exc_str[:500], # Full exception for debugging
},
)
logger.debug(
error_msg,
exc_info=True,
extra={
"server": prep_res.get("server"),
"tool": prep_res.get("tool"),
"exception_type": type(exc).__name__,
},
)

@claude
Copy link
Copy Markdown

claude Bot commented Apr 5, 2026

Code Review — PR #227: Shared MCP Error Handling

Great refactor. Extracting ~160 lines of duplicated error handling from MCPNode into a shared module with Diagnostic integration is a clear win. The before/after output improvement is significant. Tests are solid and well-structured.

Critical — must fix before merge

None.

Warnings — should be addressed

1. _describe_http_status_error can produce "HTTP error None." for edge cases
src/pflow/mcp/errors.py:166-176

If response exists but status_code is somehow missing (or response is None), the fallback path produces "HTTP error None.". In practice this shouldn't happen since this branch only runs for HTTPStatusError, but the defensive path should be clean:

# Current:
message=f"HTTP error {status}: {response_text}" if response_text else f"HTTP error {status}.",

# Suggested — handle status=None:
if status is None:
    return Diagnostic(
        severity=Severity.ERROR,
        source="mcp",
        title="HTTP Error",
        message=f"HTTP error: {str(exc)[:200]}",
        context={"technical_details": technical_details},
    )

2. Inconsistent diagnostic rendering between _sync_all_servers and _sync_single_server
src/pflow/cli/commands/mcp.py:401-412 vs mcp.py:448-453

_sync_all_servers manually renders suggestions/technical_details inline, while _sync_single_server uses format_diagnostic(). Same data, two rendering paths. This means the output format differs between pflow mcp sync --all and pflow mcp sync <name> for the same error. Consider extracting a small helper or using format_diagnostic in both places (with appropriate compact mode for the multi-server case).

3. describe_mcp_error is called twice for the same error in the discovery→registrar path

  • discovery.py:78 calls describe_mcp_error(e) to build the RuntimeError message
  • registrar.py:119 unwraps e.__cause__ and calls describe_mcp_error(original) again

The double-call isn't a bug (both produce the same result), but it's subtly confusing about where the "canonical" error handling lives. Consider having discover_tools attach the diagnostic to the RuntimeError (e.g., as an attribute) so the registrar can reuse it instead of re-deriving it. Or, since registrar.py already unwraps to __cause__, skip the describe_mcp_error call in discover_tools and just log the raw exception at debug level.

Suggestions — optional improvements

4. unwrap_exception_group silently takes only the first exception
src/pflow/mcp/errors.py:22-25

This is documented in tests (test_multi_child_takes_first), which is good. But if an ExceptionGroup has multiple exceptions of different types (e.g., a 401 and a connection error), the second one is silently dropped. This is fine for the MCP SDK's typical single-exception groups, but a brief inline comment explaining the "first wins" choice would help future readers understand this isn't an oversight.

5. Removal of full_exception from debug log extras
src/pflow/nodes/mcp/node.py (exec_fallback)

The old code logged "full_exception": exc_str[:500] in the debug extras. The new code drops this entirely. The technical_details field in the Diagnostic captures some of this, but it's not logged. If someone is debugging MCP errors via log output, they lose the exception string. Consider keeping exc_info=True on the debug log call (like discovery.py:115 already does) for stack trace preservation.

6. Minor: technical[:200] truncation in CLI
src/pflow/cli/commands/mcp.py:412

The 200-char truncation is hardcoded. For some URLs or error messages this might cut off meaningful info. Not urgent, but a named constant or slightly more generous limit (300?) would be more forgiving.


Overall: Clean, well-tested refactor that solves a real UX problem. The test suite (21 new tests covering unwrapping, classification, and integration) covers the important paths well. Ship after addressing #1 and considering #2-3.

@spinje spinje force-pushed the refactor/mcp-error-handling branch 2 times, most recently from da8d832 to c3d184f Compare April 5, 2026 09:42
Extract MCP SDK error handling from MCPNode into shared mcp/errors.py.
Both MCPNode.exec_fallback and MCP discovery now use the same
describe_mcp_error() function, which unwraps ExceptionGroups and
produces structured Diagnostics with actionable suggestions.

Fixes the wall-of-text error output from `pflow mcp sync` where
triple logger.exception() calls dumped full ExceptionGroup tracebacks
for expected failures (auth expired, SSL, unreachable servers).

- New: src/pflow/mcp/errors.py — unwrap_exception_group + describe_mcp_error
- Simplified: MCPNode.exec_fallback (deleted 5 helper methods, ~160 lines)
- Fixed: discovery.py — removed triple logger.exception, clean error messages
- Fixed: registrar.py — returns Diagnostic in sync result for CLI rendering
- Added: --verbose flag on `pflow mcp sync` for technical error details
- Tests: 21 new tests, 4601 total passing
@spinje spinje force-pushed the refactor/mcp-error-handling branch 3 times, most recently from f0497f6 to f7f0d80 Compare April 5, 2026 10:31
@spinje spinje merged commit 190ddbf into main Apr 5, 2026
9 checks passed
… ExceptionGroups [skip review]

Two review fixes:
- exec_fallback now appends first suggestion to error message, restoring
  the remediation hint that was lost when extracting to shared function
- technical_details uses str(root) instead of str(exc), so --verbose
  shows the actual error instead of "unhandled errors in a TaskGroup"
- Added exc_info=True to MCPNode debug logger for full traceback at debug level
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant