Skip to content

fix: prevent SIGPIPE from killing process when subprocess ignores stdin (fixes #25)#26

Merged
spinje merged 2 commits into
mainfrom
fix/silent-workflow-failure
Dec 30, 2025
Merged

fix: prevent SIGPIPE from killing process when subprocess ignores stdin (fixes #25)#26
spinje merged 2 commits into
mainfrom
fix/silent-workflow-failure

Conversation

@spinje
Copy link
Copy Markdown
Owner

@spinje spinje commented Dec 30, 2025

Summary

Fixes a critical bug where shell nodes would cause silent exit 141 when the subprocess doesn't consume all its stdin data. This happened when:

  • Large stdin data (>16KB on macOS, >64KB on Linux) was passed to a shell command
  • The command didn't read stdin (e.g., echo '[]' in a conditional branch)
  • SIGPIPE was set to SIG_DFL, which immediately terminates Python

Root Cause

The signal handler signal.signal(signal.SIGPIPE, signal.SIG_DFL) caused Python to terminate with exit 141 when subprocess.run() tried to write to a pipe that was closed because the subprocess exited without consuming stdin.

Example trigger: A workflow with conditional shell logic:

case '${describe_images}' in
  *[Ff]alse*) echo '[]' ;;    # Doesn't read stdin → SIGPIPE
  *) grep | jq ;;              # Reads all stdin → works fine
esac

The Fix

Change SIG_DFL to SIG_IGN:

signal.signal(signal.SIGPIPE, signal.SIG_IGN)

With SIG_IGN, subprocess.run() handles broken pipes gracefully internally.

Changes

File Changes
src/pflow/cli/main.py SIG_DFLSIG_IGN with detailed comment
tests/test_nodes/test_shell/test_shell_sigpipe.py 9 unit tests
tests/test_integration/test_sigpipe_regression.py 7 integration tests
.taskmaster/knowledge/pitfalls.md Corrected SIGPIPE guidance

Testing

# Run the new regression tests
make test

# All 16 SIGPIPE tests pass
pytest tests/test_nodes/test_shell/test_shell_sigpipe.py tests/test_integration/test_sigpipe_regression.py -v

Test Coverage

The tests verify:

  • ✅ Large stdin (20KB+) with commands that ignore stdin
  • ✅ Partial stdin consumption (head -n 1)
  • ✅ Conditional shell commands (the exact bug pattern)
  • ✅ Template resolution with boolean False
  • ✅ Meta-test verifying SIGPIPE is set to SIG_IGN

Verified the tests catch regression

# With SIG_DFL (the bug): Exit 141
# With SIG_IGN (the fix): Exit 0

Fixes #25

Change SIGPIPE handler from SIG_DFL to SIG_IGN to prevent silent exit 141
when shell nodes don't consume all their stdin data.

Root cause: When a subprocess (e.g., `echo '[]'`) doesn't read its stdin,
and the parent process has >16KB of data to write, SIGPIPE is sent when
the subprocess exits. With SIG_DFL, this immediately terminates Python
with no error message, no trace file, complete silence.

With SIG_IGN, subprocess.run() handles broken pipes gracefully internally.

Includes:
- 9 unit tests for shell node SIGPIPE handling
- 7 integration tests for workflow-level regression
- Updated pitfalls.md with corrected SIGPIPE guidance

Fixes #25
@claude
Copy link
Copy Markdown

claude Bot commented Dec 30, 2025

Code Review - PR #26: SIGPIPE Fix

Overall Assessment: ✅ APPROVED - Excellent bug fix with outstanding test coverage and documentation.

This PR fixes a critical silent failure bug (exit 141) with a minimal, surgical one-line change backed by comprehensive testing and stellar documentation. The fix is correct, the tests are well-designed, and the knowledge capture is exemplary.


Critical — must fix before merge

None - This PR is ready to merge as-is.


Warnings — should be addressed

1. Minor: Test file docstring references PR #XXX

Location: tests/test_nodes/test_shell/test_shell_sigpipe.py:6

The Bug (fixed in PR #XXX):

Issue: Placeholder PR number wasn't updated.

Fix:

The Bug (fixed in PR #26):

Impact: Low - just documentation clarity.


Suggestions — optional improvements

1. Consider extracting LARGE_DATA_SIZE constant to shared test utilities

Context: Both test files define similar constants for large data size:

  • test_shell_sigpipe.py: LARGE_DATA_SIZE = 20 * 1024
  • test_sigpipe_regression.py: LARGE_DATA_LINES = 1000

Suggestion: Since these magic numbers (20KB) are critical for triggering the bug condition, consider extracting to a shared constant in tests/shared/ with comprehensive documentation explaining the OS pipe buffer sizes.

Benefit: Single source of truth if buffer sizes need adjustment for different platforms.

Counter-argument: Current duplication is minimal and tests are self-contained. Not worth the abstraction complexity.

Recommendation: Leave as-is for now. The duplication is acceptable.


Positive Highlights

🌟 Outstanding Test Design

The test suite is exemplary:

  1. Tests at the right level: Both unit tests (shell node) and integration tests (full workflows)
  2. Critical data sizes: 20KB specifically chosen to exceed macOS pipe buffer (16KB) - smaller tests would pass even with the bug
  3. Real-world reproduction: test_image_description_workflow_pattern reproduces the EXACT bug scenario
  4. Edge case coverage: Binary data, timeouts, error exits, sequential shells
  5. Meta-test: test_sigpipe_is_ignored will catch if someone accidentally reverts the fix
  6. Comprehensive comments: Every test explains WHY it matters and WHAT it catches

This is test-driven development done right. These tests would have caught the bug before shipping.

🌟 Excellent Documentation

  1. Detailed issue tracking: Issue Shell node causes silent exit 141 when subprocess doesn't consume stdin #25 has perfect root cause analysis
  2. Knowledge capture: Updated .taskmaster/knowledge/pitfalls.md with failure patterns and lessons learned
  3. Code comments: The 6-line comment in main.py explains the subtle distinction between SIG_DFL and SIG_IGN
  4. Test documentation: Every test class and method has detailed docstrings explaining the failure scenario

🌟 Minimal, Surgical Fix

The actual fix is one line (SIG_DFL → SIG_IGN), which is the hallmark of understanding the root cause deeply.

🌟 Security Consideration

The fix preserves proper error handling - SIG_IGN still allows Python to catch BrokenPipeError on direct writes, maintaining the ability to handle pipe failures gracefully.


Code Quality Assessment

Type Safety: ✅ Excellent

  • All test code follows modern Python patterns
  • Proper type hints where needed
  • No type safety concerns

Error Handling: ✅ Excellent

  • Tests verify both success and error paths
  • test_command_exits_with_error_large_stdin ensures error codes still propagate correctly

Performance: ✅ Excellent

  • Lazy singleton pattern for test data (get_large_stdin_data()) avoids regenerating 20KB strings
  • Test data sizes are minimal to trigger bug (20KB, not 1MB)

Maintainability: ✅ Excellent

  • Tests are self-documenting
  • Clear separation of concerns (unit vs integration)
  • Platform-specific tests properly marked with @pytest.mark.skipif

Security: ✅ No concerns

  • Fix improves reliability without introducing vulnerabilities
  • Proper subprocess handling maintained

Architecture Alignment

Follows project principles:

  • Boring and obvious: Simple one-line fix, not clever
  • Fight complexity: Minimal change with maximum impact
  • Test as you go: 16 comprehensive tests committed with the fix
  • Document decisions: Excellent knowledge capture in pitfalls.md

Testing Coverage

Unit tests (9 tests in test_shell_sigpipe.py):

  • ✅ Large stdin ignored by command
  • ✅ Partial stdin consumption (head -n 1)
  • ✅ Conditional branches (the bug pattern)
  • ✅ Template resolution with booleans
  • ✅ Sequential unconsumed stdin
  • ✅ Error exits with large stdin
  • ✅ Timeout handling
  • ✅ Binary data handling
  • ✅ Meta-test for signal configuration

Integration tests (7 tests in test_sigpipe_regression.py):

  • ✅ Boolean parameter workflows (skip_processing=false)
  • ✅ Control test (skip_processing=true)
  • ✅ Default parameter values
  • ✅ Shell echo with large stdin
  • ✅ Shell head with large stdin
  • ✅ Real-world pattern (image description workflow)
  • ✅ Multiple sequential conditional shells

Coverage: Excellent - tests cover the bug scenario, edge cases, and future regression prevention.


Recommendations

  1. Merge immediately - This fixes a critical silent failure bug
  2. Release note this fix - Users may have hit this and thought their workflows were broken
  3. 📝 Consider adding to troubleshooting docs - Help users identify if they've hit exit 141 in the past

Final Verdict

APPROVED

This is high-quality software engineering:

  • Deep root cause understanding
  • Minimal, correct fix
  • Outstanding test coverage that would have prevented the bug
  • Exemplary documentation for future maintainers
  • Zero technical debt introduced

One minor docstring typo (PR #XXX) is the only blemish, and it's trivial.

Ship it! 🚢

@spinje spinje merged commit 066f108 into main Dec 30, 2025
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Shell node causes silent exit 141 when subprocess doesn't consume stdin

1 participant