fix: surface shell stderr when exit_code=0 (fixes #55)#56
Conversation
Based on real-world workflow building experience, add practical guidance: - Standardize all commands to use 'pflow' (remove 'uv run' prefix) - Add BSD sed compatibility notes with specific examples (avoid .*?, use -E not -r) - Add tip for testing shell pipelines independently before integration - Clarify trace file structure debugging with shared_after access method - Clarify batch items requirement (must be JSON array) These minimal changes (~10 lines) address common pain points without restructuring the document.
Shell node stderr was hidden when exit_code=0, making pipeline failures invisible. This caused silent failures where workflows appeared successful but produced wrong/empty results. Changes: - Add has_stderr/stderr detection in build_execution_steps() - Show⚠️ workflow indicator when any node has stderr - Show ⚠ node indicator for nodes with stderr - Display stderr content in new warning section - Include has_stderr/stderr in JSON output for agents The fix is always-on (no flag required) since hidden errors are a debugging nightmare.
- Remove redundant status check in _format_node_status_line (has_stderr already implies status=completed) - Update docstring to mention JSON output for programmatic detection - Update test to document the invariant properly
The original test used `sed -E 's/.*?/bad/'` which fails on BSD sed (macOS) but works on GNU sed (Linux). Changed to a portable subshell that writes to stderr while passing data through, which works on both platforms.
Code Review - PR #56: Shell Stderr Visibility FixThis is an excellent fix that addresses a real debugging pain point. The implementation is clean, well-tested, and follows project conventions. I have a few suggestions for improvement. ✅ Strengths
🔴 Critical — Must Fix Before Merge1. Inconsistent status indicators logicLocation: if step.get("has_stderr") and status == "completed":
return f" ⚠ {node_id} {timing}{tag_str}"Issue: This only checks Fix: Remove the redundant status check since if step.get("has_stderr"):
return f" ⚠ {node_id} {timing}{tag_str}"2. Silent failure on non-string stderr typesLocation: if status == "completed" and exit_code == 0 and stderr and isinstance(stderr, str) and stderr.strip():Issue: If a shell node somehow produces non-string stderr (dict, list, etc.), this silently ignores it. While unlikely, this could hide bugs in node implementations. Fix: Add explicit type validation that fails fast: # Validate stderr type if present
if "stderr" in node_output and node_output["stderr"] and not isinstance(node_output["stderr"], str):
# This indicates a bug in the shell node implementation
logger.warning(f"Node {node_id} produced non-string stderr: {type(node_output['stderr'])}")
# Only process valid string stderr
if status == "completed" and exit_code == 0 and isinstance(stderr, str) and stderr.strip():
step["has_stderr"] = True
step["stderr"] = stderr.strip()
|
Code Review ResponseThanks for the thorough review! I've addressed the following items: ✅ FixedCritical #1: Redundant status check
Suggestion #9: Docstring improvement
⏭️ Skipped (with reasoning)Critical #2: Silent failure on non-string stderr
Warning #3: Magic number 300
Warning #4: JSON status inconsistency
Warning #5: Two warning sources
Suggestions #6-8: Out of scope or already covered Commit: 6262926 |
Summary
Shell node stderr was hidden when
exit_code=0, making pipeline failures invisible. This caused silent failures where workflows appeared successful but produced wrong/empty results.Before:
After:
Changes
has_stderr/stderrdetection inbuild_execution_steps()for shell nodes withexit_code=0_display_stderr_warnings()function to show stderr content_display_workflow_completion_status()helper for status indicatorsFile Stats
Testing
make test- all 3916 tests passmake check- lint, type check, format all passNotes
has_stderr: trueandstderr: "..."for AI agents--debugflag (see braindump for details)