Skip to content

fix: surface shell stderr when exit_code=0 (fixes #55)#56

Merged
spinje merged 4 commits into
mainfrom
fix/shell-stderr-visibility
Jan 9, 2026
Merged

fix: surface shell stderr when exit_code=0 (fixes #55)#56
spinje merged 4 commits into
mainfrom
fix/shell-stderr-visibility

Conversation

@spinje
Copy link
Copy Markdown
Owner

@spinje spinje commented Jan 9, 2026

Summary

Shell node stderr was hidden when exit_code=0, making pipeline failures invisible. This caused silent failures where workflows appeared successful but produced wrong/empty results.

Before:

✓ Workflow completed in 0.4s
Nodes executed (1):
  ✓ extract-urls (6ms)
Workflow executed successfully

After:

⚠️ Workflow completed in 0.367s
Nodes executed (1):
  ⚠ extract-urls (5ms)

⚠️  Shell stderr (exit code 0):
  • extract-urls: sed: 1: "s/.*?/bad/": RE error: repetition-operator operand invalid

Workflow executed successfully

Changes

  • execution_state.py: Add has_stderr/stderr detection in build_execution_steps() for shell nodes with exit_code=0
  • main.py:
    • Add _display_stderr_warnings() function to show stderr content
    • Add _display_workflow_completion_status() helper for status indicators
    • Show ⚠️ workflow indicator when any node has stderr
    • Show ⚠ node indicator for nodes with stderr
  • tests: 28 new tests covering CLI output, JSON output, and edge cases

File Stats

 .../braindump-stderr-visibility-overlap.md         | 130 +++
 src/pflow/cli/main.py                              |  63 +-
 src/pflow/execution/execution_state.py             |  14 +
 tests/test_cli/test_shell_stderr_warnings.py       | 462 +++++++++
 tests/test_execution/test_execution_state.py       | 240 +++++
 5 files changed, 896 insertions(+), 7 deletions(-)

Testing

  • Run make test - all 3916 tests pass
  • Run make check - lint, type check, format all pass
  • Manual verification of all bug report scenarios

Notes

  • The fix is always-on (no flag required) since hidden errors are a debugging nightmare
  • JSON output includes has_stderr: true and stderr: "..." for AI agents
  • Complements Task 108's planned --debug flag (see braindump for details)

spinje added 4 commits January 9, 2026 12:00
Based on real-world workflow building experience, add practical guidance:
- Standardize all commands to use 'pflow' (remove 'uv run' prefix)
- Add BSD sed compatibility notes with specific examples (avoid .*?, use -E not -r)
- Add tip for testing shell pipelines independently before integration
- Clarify trace file structure debugging with shared_after access method
- Clarify batch items requirement (must be JSON array)

These minimal changes (~10 lines) address common pain points without
restructuring the document.
Shell node stderr was hidden when exit_code=0, making pipeline failures
invisible. This caused silent failures where workflows appeared successful
but produced wrong/empty results.

Changes:
- Add has_stderr/stderr detection in build_execution_steps()
- Show ⚠️ workflow indicator when any node has stderr
- Show ⚠ node indicator for nodes with stderr
- Display stderr content in new warning section
- Include has_stderr/stderr in JSON output for agents

The fix is always-on (no flag required) since hidden errors are a
debugging nightmare.
- Remove redundant status check in _format_node_status_line
  (has_stderr already implies status=completed)
- Update docstring to mention JSON output for programmatic detection
- Update test to document the invariant properly
The original test used `sed -E 's/.*?/bad/'` which fails on BSD sed
(macOS) but works on GNU sed (Linux). Changed to a portable subshell
that writes to stderr while passing data through, which works on both
platforms.
@claude
Copy link
Copy Markdown

claude Bot commented Jan 9, 2026

Code Review - PR #56: Shell Stderr Visibility Fix

This is an excellent fix that addresses a real debugging pain point. The implementation is clean, well-tested, and follows project conventions. I have a few suggestions for improvement.


Strengths

  1. Solves a real problem: Hidden stderr from pipeline failures is genuinely a debugging nightmare. The motivation is crystal clear.

  2. Excellent test coverage: 28 tests covering CLI output, JSON output, edge cases, truncation, multiline handling, etc. The tests are focused and test the right things.

  3. Good separation of concerns:

    • Detection logic in execution_state.py
    • Display logic in main.py
    • Clear helper functions with single responsibilities
  4. Defensive coding: Proper type checks (isinstance(stderr, str)), whitespace stripping, empty string checks

  5. User experience: Warning indicators (⚠) are intuitive and the stderr display format is clear

  6. Documentation: The braindump explains overlap with Task 108 well, showing thoughtful consideration of future work


🔴 Critical — Must Fix Before Merge

1. Inconsistent status indicators logic

Location: src/pflow/cli/main.py:694-695

if step.get("has_stderr") and status == "completed":
    return f"  ⚠ {node_id} {timing}{tag_str}"

Issue: This only checks status == "completed", but the detection logic in execution_state.py:139 already filters to only set has_stderr for completed nodes. This check is redundant and creates confusion about where the filtering happens.

Fix: Remove the redundant status check since has_stderr already implies completed:

if step.get("has_stderr"):
    return f"  ⚠ {node_id} {timing}{tag_str}"

2. Silent failure on non-string stderr types

Location: src/pflow/execution/execution_state.py:139

if status == "completed" and exit_code == 0 and stderr and isinstance(stderr, str) and stderr.strip():

Issue: If a shell node somehow produces non-string stderr (dict, list, etc.), this silently ignores it. While unlikely, this could hide bugs in node implementations.

Fix: Add explicit type validation that fails fast:

# Validate stderr type if present
if "stderr" in node_output and node_output["stderr"] and not isinstance(node_output["stderr"], str):
    # This indicates a bug in the shell node implementation
    logger.warning(f"Node {node_id} produced non-string stderr: {type(node_output['stderr'])}")

# Only process valid string stderr
if status == "completed" and exit_code == 0 and isinstance(stderr, str) and stderr.strip():
    step["has_stderr"] = True
    step["stderr"] = stderr.strip()

⚠️ Warnings — Should Be Addressed

3. Magic number: 300 character truncation

Location: src/pflow/cli/main.py:752-755

# Truncate long stderr to 300 chars
stderr_preview = stderr[:300]
if len(stderr) > 300:
    stderr_preview += "..."

Issue:

  • Hard-coded magic number 300 differs from _truncate_error_message which uses 200
  • No consistency with Task 108's planned truncation
  • No rationale for why 300 specifically

Fix: Extract to a named constant and document the choice:

# At top of file
STDERR_PREVIEW_MAX_LENGTH = 300  # Show more context than error messages (200) since stderr often contains stack traces

# In function
stderr_preview = stderr[:STDERR_PREVIEW_MAX_LENGTH]
if len(stderr) > STDERR_PREVIEW_MAX_LENGTH:
    stderr_preview += "..."

4. JSON output status inconsistency

Location: Mentioned in braindump but not addressed

Issue: The braindump correctly identifies that JSON output shows status: "success" even when stderr warnings are present. This could mislead agents making decisions based on status.

Question for user: Should the status field reflect stderr warnings? Options:

  • Keep as "success" (current) - warnings don't change success/failure
  • Add "success_with_warnings" - explicit indication for agents
  • Add separate has_warnings boolean field at top level

Recommendation: Add a top-level has_warnings boolean to the JSON output for programmatic detection, keeping status as-is for backward compatibility.

5. Potential UX confusion: Two warning sources

Location: src/pflow/cli/main.py:805-812

Issue: Now we have two places that show ⚠️:

  • "degraded" status (line 806)
  • stderr warnings (line 810)

Users might be confused about the difference between "completed with warnings" vs "completed" with stderr warnings.

Fix: Add a brief comment explaining when each applies, or consider if stderr should trigger degraded status instead of being separate.


💡 Suggestions — Optional Improvements

6. Consider full stderr in trace file reference

Location: src/pflow/cli/main.py:758

When stderr is truncated, you could hint to users where to find the full output:

click.echo(f"  • {node_id}: {indented}", err=True)
if len(stderr) > STDERR_PREVIEW_MAX_LENGTH:
    click.echo(f"    (full output in trace file: ~/.pflow/debug/workflow-trace-*.json)", err=True)

7. Test quality: Could add negative test for batch nodes

The tests thoroughly cover shell nodes but don't verify that batch nodes with stderr DON'T trigger warnings (since batch nodes might have different stderr handling).

Add one test:

def test_no_stderr_warning_for_batch_nodes(self, tmp_path):
    """Batch nodes should not trigger stderr warnings (have different error handling)."""
    # Test that batch nodes with stderr field don't show in stderr warnings section

8. Documentation: CLI agent instructions are verbose

Location: src/pflow/cli/resources/cli-agent-instructions.md

The diff shows you removed all uv run prefixes from commands (good!) but the file is still very long. Consider if the shell command testing guidance (lines 242-247) should be in a separate "Best Practices" doc.

9. Minor: Docstring could mention JSON output

Location: src/pflow/execution/execution_state.py:133-137

The comment mentions "helps surface hidden errors" but doesn't mention it also adds has_stderr to JSON output for agents. Consider updating:

# Detect shell nodes with stderr output but exit_code=0
# This helps surface hidden errors from shell pipeline failures
# Also adds has_stderr flag to JSON output for programmatic detection

🎯 Summary

Priority Actions:

  1. Remove redundant status check in _format_node_status_line (Critical docs: Set up Mintlify documentation (Task 93) #1)
  2. Add explicit type validation for stderr (Critical feat: unify LLM usage via Simon Willison's llm library (Task 95) #2)
  3. Extract magic number to named constant (Warning feat: Smart output display for registry run #3)
  4. Decide on JSON status behavior (Warning feat: add smart output display for registry run (closes #3) #4)

Overall Assessment: Strong implementation that's ready to merge after addressing the two critical issues. The feature is well-motivated, thoroughly tested, and the code is clean and maintainable. Great job!

Lines of Code Review:

  • src/pflow/execution/execution_state.py:133-141 ✅ Detection logic is solid
  • src/pflow/cli/main.py:732-758 ⚠️ Magic number needs extraction
  • src/pflow/cli/main.py:690-698 🔴 Redundant check
  • tests/test_cli/test_shell_stderr_warnings.py ✅ Excellent coverage
  • tests/test_execution/test_execution_state.py ✅ Good unit tests

Recommendation: Request changes for the two critical issues, then merge. This is solid work.

@spinje
Copy link
Copy Markdown
Owner Author

spinje commented Jan 9, 2026

Code Review Response

Thanks for the thorough review! I've addressed the following items:

✅ Fixed

Critical #1: Redundant status check

  • Removed the status == "completed" check in _format_node_status_line
  • Updated comment to clarify that has_stderr already implies completed status
  • Updated test to document this invariant properly

Suggestion #9: Docstring improvement

  • Updated comment in execution_state.py to mention JSON output for programmatic detection

⏭️ Skipped (with reasoning)

Critical #2: Silent failure on non-string stderr

  • The isinstance(stderr, str) check is already defensive
  • Shell nodes always produce string stderr (from subprocess)
  • Adding logger.warning for an impossible case is over-engineering

Warning #3: Magic number 300

  • Minor style issue, comment already explains purpose
  • Constant extraction adds minimal value for this project's philosophy

Warning #4: JSON status inconsistency

  • execution.steps[].has_stderr already provides programmatic detection
  • Agents can easily check any(step.get("has_stderr") for step in steps)
  • Could be a future enhancement if users request it

Warning #5: Two warning sources

  • "degraded" (batch failures) and "stderr" (shell warnings) serve different purposes
  • The distinction is clear from context

Suggestions #6-8: Out of scope or already covered


Commit: 6262926

@spinje spinje merged commit 9e83032 into main Jan 9, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant