refactor: shared MCP error handling with Diagnostic integration#227
Conversation
Code Review: PR #227 — Shared MCP Error Handling with Diagnostic IntegrationOverall AssessmentClean, well-motivated refactor. Extracting ~160 lines of error-handling instance methods from MCPNode into a shared module is the right call — the original code didn't use Suggestions — optional improvements1. Inconsistent diagnostic rendering between
# Current: inline rendering
if diagnostic.suggestions:
for suggestion in diagnostic.suggestions:
click.echo(f" → {suggestion}", err=True)
context = diagnostic.context or {}
technical = context.get("technical_details")
if verbose and technical:
click.echo(f" Detail: {technical[:200]}", err=True)The inline rendering is defensible here because 2.
3. Minor: In 4. Consider:
What's done well
No critical issues or warnings. This is ready to merge. |
There was a problem hiding this comment.
Code Review
This pull request introduces a centralized error handling and diagnostic system for the MCP SDK, replacing fragmented error logic across the codebase. It adds a new errors.py module to unwrap ExceptionGroups and generate structured Diagnostic objects containing actionable suggestions and technical details. The CLI sync command is updated with a --verbose flag to expose these details. Feedback focuses on removing a redundant error description call in discovery.py and ensuring that full exception tracebacks are preserved in MCPNode logs for better debuggability.
| from pflow.mcp.errors import describe_mcp_error | ||
|
|
||
| diagnostic = describe_mcp_error(e) | ||
| logger.debug("Discovery failed for %s: %s", server_name, diagnostic.message, exc_info=True) | ||
| raise RuntimeError(f"Tool discovery failed for {server_name}: {diagnostic.message}") from e |
There was a problem hiding this comment.
The describe_mcp_error function is called here to generate a diagnostic, but the diagnostic object itself is discarded. Only its message is used to create a RuntimeError. The calling function, registrar.sync_server, then unwraps the original exception from __cause__ and calls describe_mcp_error again to create the final diagnostic that is shown to the user.
This initial call to describe_mcp_error is redundant. You can simplify this block to just log the debug information and re-raise the exception with a generic wrapper message. The registrar is already responsible for creating the detailed, user-facing error message.
| from pflow.mcp.errors import describe_mcp_error | |
| diagnostic = describe_mcp_error(e) | |
| logger.debug("Discovery failed for %s: %s", server_name, diagnostic.message, exc_info=True) | |
| raise RuntimeError(f"Tool discovery failed for {server_name}: {diagnostic.message}") from e | |
| logger.debug("Discovery failed for %s", server_name, exc_info=True) | |
| raise RuntimeError(f"Tool discovery failed for {server_name}") from e |
| logger.debug( | ||
| error_msg, | ||
| extra={ | ||
| "server": prep_res.get("server"), | ||
| "tool": prep_res.get("tool"), | ||
| "exception_type": type(exc).__name__, | ||
| "full_exception": exc_str[:500], # Full exception for debugging | ||
| }, | ||
| ) |
There was a problem hiding this comment.
The refactoring to use describe_mcp_error is great for cleaning up the error messages. However, in this logger.debug call, the full exception traceback is no longer logged. The previous implementation logged full_exception, which was valuable for debugging.
To restore this important debugging information, you can add exc_info=True to the logger.debug call. This will attach the full exception information and stack trace to the log record.
| logger.debug( | |
| error_msg, | |
| extra={ | |
| "server": prep_res.get("server"), | |
| "tool": prep_res.get("tool"), | |
| "exception_type": type(exc).__name__, | |
| "full_exception": exc_str[:500], # Full exception for debugging | |
| }, | |
| ) | |
| logger.debug( | |
| error_msg, | |
| exc_info=True, | |
| extra={ | |
| "server": prep_res.get("server"), | |
| "tool": prep_res.get("tool"), | |
| "exception_type": type(exc).__name__, | |
| }, | |
| ) |
Code Review — PR #227: Shared MCP Error HandlingGreat refactor. Extracting ~160 lines of duplicated error handling from MCPNode into a shared module with Diagnostic integration is a clear win. The before/after output improvement is significant. Tests are solid and well-structured. Critical — must fix before mergeNone. Warnings — should be addressed1. If # Current:
message=f"HTTP error {status}: {response_text}" if response_text else f"HTTP error {status}.",
# Suggested — handle status=None:
if status is None:
return Diagnostic(
severity=Severity.ERROR,
source="mcp",
title="HTTP Error",
message=f"HTTP error: {str(exc)[:200]}",
context={"technical_details": technical_details},
)2. Inconsistent diagnostic rendering between
3.
The double-call isn't a bug (both produce the same result), but it's subtly confusing about where the "canonical" error handling lives. Consider having Suggestions — optional improvements4. This is documented in tests ( 5. Removal of The old code logged 6. Minor: The 200-char truncation is hardcoded. For some URLs or error messages this might cut off meaningful info. Not urgent, but a named constant or slightly more generous limit (300?) would be more forgiving. Overall: Clean, well-tested refactor that solves a real UX problem. The test suite (21 new tests covering unwrapping, classification, and integration) covers the important paths well. Ship after addressing #1 and considering #2-3. |
da8d832 to
c3d184f
Compare
Extract MCP SDK error handling from MCPNode into shared mcp/errors.py. Both MCPNode.exec_fallback and MCP discovery now use the same describe_mcp_error() function, which unwraps ExceptionGroups and produces structured Diagnostics with actionable suggestions. Fixes the wall-of-text error output from `pflow mcp sync` where triple logger.exception() calls dumped full ExceptionGroup tracebacks for expected failures (auth expired, SSL, unreachable servers). - New: src/pflow/mcp/errors.py — unwrap_exception_group + describe_mcp_error - Simplified: MCPNode.exec_fallback (deleted 5 helper methods, ~160 lines) - Fixed: discovery.py — removed triple logger.exception, clean error messages - Fixed: registrar.py — returns Diagnostic in sync result for CLI rendering - Added: --verbose flag on `pflow mcp sync` for technical error details - Tests: 21 new tests, 4601 total passing
f0497f6 to
f7f0d80
Compare
… ExceptionGroups [skip review] Two review fixes: - exec_fallback now appends first suggestion to error message, restoring the remediation hint that was lost when extracting to shared function - technical_details uses str(root) instead of str(exc), so --verbose shows the actual error instead of "unhandled errors in a TaskGroup" - Added exc_info=True to MCPNode debug logger for full traceback at debug level
Summary
Extract MCP SDK error handling from MCPNode into shared
mcp/errors.py. Both MCPNode.exec_fallback and MCP discovery now use the samedescribe_mcp_error()function, which unwraps ExceptionGroups and produces structured Diagnostics with actionable suggestions.Fixes the wall-of-text error output from
pflow mcp syncwhere triplelogger.exception()calls dumped full ExceptionGroup tracebacks for expected failures (auth expired, SSL, unreachable servers).Changes
src/pflow/mcp/errors.py—unwrap_exception_group()+describe_mcp_error()returningDiagnosticMCPNode.exec_fallback— deleted 5 helper methods (~160 lines), replaced with single call to shared functiondiscovery.py— removed triplelogger.exception(), removed redundant try/except, clean error messages viadescribe_mcp_error()registrar.py— removedlogger.exception(), returnsDiagnosticin sync result dict for CLI rendering--verboseflag onpflow mcp syncfor technical error detailsmcp/CLAUDE.md— addederrors.pyto file treeExplanation
The MCP SDK wraps all exceptions in
anyioExceptionGroups. Previously,logger.exception()at 3 levels (discovery, discover_tools, registrar) each dumped the full ExceptionGroup traceback (~40 lines each), producing a wall of unreadable text for simple failures like expired auth tokens.MCPNode already had ~170 lines of error handling code (ExceptionGroup unwrapping, HTTP status classification, regex safety nets) that solved the same problem for tool execution. This code was trapped as instance methods on the node class despite not using
selfmeaningfully.The fix extracts this logic to a shared module that:
Diagnosticwith title, message, suggestions, and technical_details contextformat_diagnosticwith verbose support)Before (
pflow mcp sync --all):After:
After (
--verbose):Testing
test_mcp_errors.py(shared function) andtest_mcp_discovery_critical.py(registrar→Diagnostic integration)make checkclean