fix: surface shell stderr when exit_code=0 (fixes #55) by spinje · Pull Request #56 · spinje/pflow

spinje · 2026-01-09T21:57:00Z

Summary

Shell node stderr was hidden when exit_code=0, making pipeline failures invisible. This caused silent failures where workflows appeared successful but produced wrong/empty results.

Before:

✓ Workflow completed in 0.4s
Nodes executed (1):
  ✓ extract-urls (6ms)
Workflow executed successfully

After:

⚠️ Workflow completed in 0.367s
Nodes executed (1):
  ⚠ extract-urls (5ms)

⚠️  Shell stderr (exit code 0):
  • extract-urls: sed: 1: "s/.*?/bad/": RE error: repetition-operator operand invalid

Workflow executed successfully

Changes

execution_state.py: Add has_stderr/stderr detection in build_execution_steps() for shell nodes with exit_code=0
main.py:
- Add _display_stderr_warnings() function to show stderr content
- Add _display_workflow_completion_status() helper for status indicators
- Show ⚠️ workflow indicator when any node has stderr
- Show ⚠ node indicator for nodes with stderr
tests: 28 new tests covering CLI output, JSON output, and edge cases

File Stats

 .../braindump-stderr-visibility-overlap.md         | 130 +++
 src/pflow/cli/main.py                              |  63 +-
 src/pflow/execution/execution_state.py             |  14 +
 tests/test_cli/test_shell_stderr_warnings.py       | 462 +++++++++
 tests/test_execution/test_execution_state.py       | 240 +++++
 5 files changed, 896 insertions(+), 7 deletions(-)

Testing

Run make test - all 3916 tests pass
Run make check - lint, type check, format all pass
Manual verification of all bug report scenarios

Notes

The fix is always-on (no flag required) since hidden errors are a debugging nightmare
JSON output includes has_stderr: true and stderr: "..." for AI agents
Complements Task 108's planned --debug flag (see braindump for details)

Based on real-world workflow building experience, add practical guidance: - Standardize all commands to use 'pflow' (remove 'uv run' prefix) - Add BSD sed compatibility notes with specific examples (avoid .*?, use -E not -r) - Add tip for testing shell pipelines independently before integration - Clarify trace file structure debugging with shared_after access method - Clarify batch items requirement (must be JSON array) These minimal changes (~10 lines) address common pain points without restructuring the document.

Shell node stderr was hidden when exit_code=0, making pipeline failures invisible. This caused silent failures where workflows appeared successful but produced wrong/empty results. Changes: - Add has_stderr/stderr detection in build_execution_steps() - Show ⚠️ workflow indicator when any node has stderr - Show ⚠ node indicator for nodes with stderr - Display stderr content in new warning section - Include has_stderr/stderr in JSON output for agents The fix is always-on (no flag required) since hidden errors are a debugging nightmare.

- Remove redundant status check in _format_node_status_line (has_stderr already implies status=completed) - Update docstring to mention JSON output for programmatic detection - Update test to document the invariant properly

The original test used `sed -E 's/.*?/bad/'` which fails on BSD sed (macOS) but works on GNU sed (Linux). Changed to a portable subshell that writes to stderr while passing data through, which works on both platforms.

claude · 2026-01-09T21:59:46Z

Code Review - PR #56: Shell Stderr Visibility Fix

This is an excellent fix that addresses a real debugging pain point. The implementation is clean, well-tested, and follows project conventions. I have a few suggestions for improvement.

✅ Strengths

Solves a real problem: Hidden stderr from pipeline failures is genuinely a debugging nightmare. The motivation is crystal clear.
Excellent test coverage: 28 tests covering CLI output, JSON output, edge cases, truncation, multiline handling, etc. The tests are focused and test the right things.
Good separation of concerns:
- Detection logic in execution_state.py
- Display logic in main.py
- Clear helper functions with single responsibilities
Defensive coding: Proper type checks (isinstance(stderr, str)), whitespace stripping, empty string checks
User experience: Warning indicators (⚠) are intuitive and the stderr display format is clear
Documentation: The braindump explains overlap with Task 108 well, showing thoughtful consideration of future work

🔴 Critical — Must Fix Before Merge

1. Inconsistent status indicators logic

Location: src/pflow/cli/main.py:694-695

if step.get("has_stderr") and status == "completed":
    return f"  ⚠ {node_id} {timing}{tag_str}"

Issue: This only checks status == "completed", but the detection logic in execution_state.py:139 already filters to only set has_stderr for completed nodes. This check is redundant and creates confusion about where the filtering happens.

Fix: Remove the redundant status check since has_stderr already implies completed:

if step.get("has_stderr"):
    return f"  ⚠ {node_id} {timing}{tag_str}"

2. Silent failure on non-string stderr types

Location: src/pflow/execution/execution_state.py:139

if status == "completed" and exit_code == 0 and stderr and isinstance(stderr, str) and stderr.strip():

Issue: If a shell node somehow produces non-string stderr (dict, list, etc.), this silently ignores it. While unlikely, this could hide bugs in node implementations.

Fix: Add explicit type validation that fails fast:

# Validate stderr type if present
if "stderr" in node_output and node_output["stderr"] and not isinstance(node_output["stderr"], str):
    # This indicates a bug in the shell node implementation
    logger.warning(f"Node {node_id} produced non-string stderr: {type(node_output['stderr'])}")

# Only process valid string stderr
if status == "completed" and exit_code == 0 and isinstance(stderr, str) and stderr.strip():
    step["has_stderr"] = True
    step["stderr"] = stderr.strip()

⚠️ Warnings — Should Be Addressed

3. Magic number: 300 character truncation

Location: src/pflow/cli/main.py:752-755

# Truncate long stderr to 300 chars
stderr_preview = stderr[:300]
if len(stderr) > 300:
    stderr_preview += "..."

Issue:

Hard-coded magic number 300 differs from _truncate_error_message which uses 200
No consistency with Task 108's planned truncation
No rationale for why 300 specifically

Fix: Extract to a named constant and document the choice:

# At top of file
STDERR_PREVIEW_MAX_LENGTH = 300  # Show more context than error messages (200) since stderr often contains stack traces

# In function
stderr_preview = stderr[:STDERR_PREVIEW_MAX_LENGTH]
if len(stderr) > STDERR_PREVIEW_MAX_LENGTH:
    stderr_preview += "..."

4. JSON output status inconsistency

Location: Mentioned in braindump but not addressed

Issue: The braindump correctly identifies that JSON output shows status: "success" even when stderr warnings are present. This could mislead agents making decisions based on status.

Question for user: Should the status field reflect stderr warnings? Options:

Keep as "success" (current) - warnings don't change success/failure
Add "success_with_warnings" - explicit indication for agents
Add separate has_warnings boolean field at top level

Recommendation: Add a top-level has_warnings boolean to the JSON output for programmatic detection, keeping status as-is for backward compatibility.

5. Potential UX confusion: Two warning sources

Location: src/pflow/cli/main.py:805-812

Issue: Now we have two places that show ⚠️:

"degraded" status (line 806)
stderr warnings (line 810)

Users might be confused about the difference between "completed with warnings" vs "completed" with stderr warnings.

Fix: Add a brief comment explaining when each applies, or consider if stderr should trigger degraded status instead of being separate.

💡 Suggestions — Optional Improvements

6. Consider full stderr in trace file reference

Location: src/pflow/cli/main.py:758

When stderr is truncated, you could hint to users where to find the full output:

click.echo(f"  • {node_id}: {indented}", err=True)
if len(stderr) > STDERR_PREVIEW_MAX_LENGTH:
    click.echo(f"    (full output in trace file: ~/.pflow/debug/workflow-trace-*.json)", err=True)

7. Test quality: Could add negative test for batch nodes

The tests thoroughly cover shell nodes but don't verify that batch nodes with stderr DON'T trigger warnings (since batch nodes might have different stderr handling).

Add one test:

def test_no_stderr_warning_for_batch_nodes(self, tmp_path):
    """Batch nodes should not trigger stderr warnings (have different error handling)."""
    # Test that batch nodes with stderr field don't show in stderr warnings section

8. Documentation: CLI agent instructions are verbose

Location: src/pflow/cli/resources/cli-agent-instructions.md

The diff shows you removed all uv run prefixes from commands (good!) but the file is still very long. Consider if the shell command testing guidance (lines 242-247) should be in a separate "Best Practices" doc.

9. Minor: Docstring could mention JSON output

Location: src/pflow/execution/execution_state.py:133-137

The comment mentions "helps surface hidden errors" but doesn't mention it also adds has_stderr to JSON output for agents. Consider updating:

# Detect shell nodes with stderr output but exit_code=0
# This helps surface hidden errors from shell pipeline failures
# Also adds has_stderr flag to JSON output for programmatic detection

🎯 Summary

Priority Actions:

Remove redundant status check in _format_node_status_line (Critical docs: Set up Mintlify documentation (Task 93) #1)
Add explicit type validation for stderr (Critical feat: unify LLM usage via Simon Willison's llm library (Task 95) #2)
Extract magic number to named constant (Warning feat: Smart output display for registry run #3)
Decide on JSON status behavior (Warning feat: add smart output display for registry run (closes #3) #4)

Overall Assessment: Strong implementation that's ready to merge after addressing the two critical issues. The feature is well-motivated, thoroughly tested, and the code is clean and maintainable. Great job!

Lines of Code Review:

src/pflow/execution/execution_state.py:133-141 ✅ Detection logic is solid
src/pflow/cli/main.py:732-758 ⚠️ Magic number needs extraction
src/pflow/cli/main.py:690-698 🔴 Redundant check
tests/test_cli/test_shell_stderr_warnings.py ✅ Excellent coverage
tests/test_execution/test_execution_state.py ✅ Good unit tests

Recommendation: Request changes for the two critical issues, then merge. This is solid work.

spinje · 2026-01-09T22:06:57Z

Code Review Response

Thanks for the thorough review! I've addressed the following items:

✅ Fixed

Critical #1: Redundant status check

Removed the status == "completed" check in _format_node_status_line
Updated comment to clarify that has_stderr already implies completed status
Updated test to document this invariant properly

Suggestion #9: Docstring improvement

Updated comment in execution_state.py to mention JSON output for programmatic detection

⏭️ Skipped (with reasoning)

Critical #2: Silent failure on non-string stderr

The isinstance(stderr, str) check is already defensive
Shell nodes always produce string stderr (from subprocess)
Adding logger.warning for an impossible case is over-engineering

Warning #3: Magic number 300

Minor style issue, comment already explains purpose
Constant extraction adds minimal value for this project's philosophy

Warning #4: JSON status inconsistency

execution.steps[].has_stderr already provides programmatic detection
Agents can easily check any(step.get("has_stderr") for step in steps)
Could be a future enhancement if users request it

Warning #5: Two warning sources

"degraded" (batch failures) and "stderr" (shell warnings) serve different purposes
The distinction is clear from context

Suggestions #6-8: Out of scope or already covered

Commit: 6262926

spinje added 4 commits January 9, 2026 12:00

fix: use portable command for pipeline stderr test

a8680a2

The original test used `sed -E 's/.*?/bad/'` which fails on BSD sed (macOS) but works on GNU sed (Linux). Changed to a portable subshell that writes to stderr while passing data through, which works on both platforms.

spinje merged commit 9e83032 into main Jan 9, 2026
6 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: surface shell stderr when exit_code=0 (fixes #55)#56

fix: surface shell stderr when exit_code=0 (fixes #55)#56
spinje merged 4 commits into
mainfrom
fix/shell-stderr-visibility

spinje commented Jan 9, 2026

Uh oh!

claude Bot commented Jan 9, 2026

Uh oh!

spinje commented Jan 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

spinje commented Jan 9, 2026

Summary

Changes

File Stats

Testing

Notes

Uh oh!

claude Bot commented Jan 9, 2026

Code Review - PR #56: Shell Stderr Visibility Fix

✅ Strengths

🔴 Critical — Must Fix Before Merge

1. Inconsistent status indicators logic

2. Silent failure on non-string stderr types

⚠️ Warnings — Should Be Addressed

3. Magic number: 300 character truncation

4. JSON output status inconsistency

5. Potential UX confusion: Two warning sources

💡 Suggestions — Optional Improvements

6. Consider full stderr in trace file reference

7. Test quality: Could add negative test for batch nodes

8. Documentation: CLI agent instructions are verbose

9. Minor: Docstring could mention JSON output

🎯 Summary

Uh oh!

spinje commented Jan 9, 2026

Code Review Response

✅ Fixed

⏭️ Skipped (with reasoning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant