Skip to content

fix: strip trailing newlines from shell stdout by default#51

Merged
spinje merged 2 commits into
mainfrom
fix/shell-stdout-trailing-newline
Jan 8, 2026
Merged

fix: strip trailing newlines from shell stdout by default#51
spinje merged 2 commits into
mainfrom
fix/shell-stdout-trailing-newline

Conversation

@spinje
Copy link
Copy Markdown
Owner

@spinje spinje commented Jan 8, 2026

Summary

Shell commands like echo, date, hostname add trailing newlines for terminal formatting. When using ${node.stdout} in file paths, this creates silently corrupted filenames with embedded newlines (e.g., myfile\n.txt).

This fix adds a strip_newline parameter to the shell node (default: true) that strips trailing newlines from text stdout, matching bash $() convention.

Changes

  • Add strip_newline parameter to shell node (default: true)
  • Strip trailing newlines from text stdout in post()
  • Extract _store_output() helper to reduce cyclomatic complexity
  • Update 3 tests that expected trailing newlines
  • Add 4 high-value tests for strip_newline behavior
  • Update agent instructions documentation with opt-out note
 src/pflow/cli/resources/cli-agent-instructions.md  |  2 +-
 .../instructions/mcp-agent-instructions.md         |  2 +-
 src/pflow/nodes/shell/shell.py                     | 58 +++++++++++++++-------
 tests/test_nodes/test_shell/test_shell.py          | 56 +++++++++++++++++++++
 tests/test_nodes/test_shell/test_shell_binary.py   |  6 +--
 5 files changed, 100 insertions(+), 24 deletions(-)

Behavior

Scenario Before After
echo hello${node.stdout} hello\n hello
File path ${node.stdout}.txt hello\n.txt (corrupted) hello.txt
strip_newline: false N/A hello\n (opt-out)
Multi-line output N/A Internal \n preserved

Testing

  • make test - All 3857 tests pass
  • make check - All linting/type checks pass
  • Manual CLI verification with hex dump confirms fix

Fixes #48

spinje added 2 commits January 8, 2026 12:00
Shell commands like echo, date, hostname add trailing newlines for terminal
formatting. When using ${node.stdout} in file paths, this creates corrupted
filenames with embedded newlines (e.g., "myfile\n.txt").

Changes:
- Add strip_newline parameter to shell node (default: true)
- Strip trailing newlines from text stdout (matches bash $() convention)
- Extract _store_output() helper to reduce post() complexity
- Update 3 tests expecting trailing newlines
- Add 4 high-value tests for strip_newline behavior
- Update agent instructions with opt-out documentation

Users can disable stripping with strip_newline: false for edge cases
where raw output is needed.

Fixes #48
Code review fixes:
- Remove default from _store_output() method signature (warning #4)
- Add explicit strip_newline=False for stderr call
- Fix type hints: str | bytes with isinstance narrowing (warning #5)
- Update docstrings to clarify stdout-only behavior (warning #3)
- Add test for binary data + strip_newline interaction (critical #1)
- Add test for empty stdout edge case (critical #2)
- Update agent instructions to clarify stderr is never modified

All code review items addressed except Windows CRLF (documented as Unix-first).
@claude
Copy link
Copy Markdown

claude Bot commented Jan 8, 2026

Code Review - PR #51: Strip trailing newlines from shell stdout

Summary

This PR addresses a real bug where shell commands add trailing newlines that corrupt filenames when using ${node.stdout} in file paths. The fix is sensible and follows bash $() convention. Overall implementation is solid, but there are a few areas for improvement.


Strengths

  1. Real problem solved: Silently corrupted filenames with embedded newlines is a genuine UX bug
  2. Good default choice: Matching bash $() convention is intuitive for CLI users
  3. Escape hatch provided: strip_newline: false for edge cases where raw output is needed
  4. Clean refactoring: _store_output() helper reduces cyclomatic complexity
  5. Test quality: Tests focus on the right edge cases (multi-line, multiple trailing, opt-out)
  6. Documentation updated: Agent instructions reflect the new parameter

🔴 Critical — must fix before merge

1. Missing test for binary data + strip_newline interaction

Issue: The _store_output() method has a logical structure where strip_newline is only applied when is_binary=False. However, there's no test verifying that binary data is never accidentally stripped.

Why critical: If someone refactors this code without understanding the constraint, they could introduce a bug where binary data gets corrupted by string operations.

Suggested fix: Add a test to test_shell_binary.py:

def test_strip_newline_ignored_for_binary_stdout(self):
    """strip_newline parameter has no effect on binary data.

    Binary data should never be processed as text,
    even if strip_newline=true (the default).
    """
    shared = {}
    # Binary data ending with 0x0a (newline byte)
    run_shell_node(
        shared,
        command="printf '\\x89PNG\\r\\n\\x1a\\n'",  # PNG header
        strip_newline=True  # Explicit true, should have no effect
    )

    # Binary data unchanged (base64 encoded)
    assert shared["stdout_is_binary"] is True
    decoded = base64.b64decode(shared["stdout"])
    assert decoded.endswith(b'\n'), "Binary newline byte was stripped"

2. Empty stdout edge case not tested

Issue: When stdout is an empty string, value.rstrip('\n') returns ''. While this works correctly, it's an important edge case for a default-on feature.

Why critical: Empty output is common (e.g., grep with no matches, conditional commands). We should verify the behavior is correct.

Suggested fix: Add to TestStripNewline class:

def test_empty_stdout_unchanged(self):
    """Empty stdout remains empty (edge case for rstrip).

    Common scenario: commands with conditional output.
    """
    shared = {}
    run_shell_node(shared, command="echo -n ''")  # Empty output

    assert shared["stdout"] == ""
    assert shared["stdout_is_binary"] is False

⚠️ Warnings — should be addressed

3. Inconsistent stderr handling not documented

Issue: The code comment says "Handle stderr encoding (never strip newlines)" but this design decision is not documented anywhere user-facing.

Why it matters: Users may wonder why strip_newline only affects stdout. The asymmetry should be explained.

Suggested fix: Add to the docstring in shell.py (line 108 area):

- Params: strip_newline: bool  # Strip trailing newlines from stdout only (optional, default true)

Note: stderr is never stripped, as error messages are typically consumed by
error handlers rather than used in template composition.

And add a note to the agent instructions (line 883 area):

- Shell output with trailing `\n` is automatically stripped from **stdout only**
  (disable with `strip_newline: false`). stderr is never modified.

4. Default parameter placement inconsistency

Issue: In _store_output() method signature, strip_newline: bool = False has a default of False, but the actual default behavior of the node is True.

Code reference (line 351):

def _store_output(self, shared: dict, key: str, value: Any, is_binary: bool, strip_newline: bool = False) -> None:

Why it matters: This is confusing for future maintainers. The method signature default doesn't match the node's default. Someone might call _store_output() without the parameter and get unexpected behavior.

Suggested fix: Remove the default from the method signature since callers always pass it explicitly:

def _store_output(self, shared: dict, key: str, value: Any, is_binary: bool, strip_newline: bool) -> None:

Or if you want defensive programming, make the default match the node default:

def _store_output(self, shared: dict, key: str, value: Any, is_binary: bool, strip_newline: bool = True) -> None:

But I recommend no default since the method is internal and always called explicitly.


5. Type hint for value parameter could be more specific

Issue: value: Any in _store_output() is too permissive. The code assumes it's either str or bytes.

Suggested fix:

from typing import Union

def _store_output(
    self,
    shared: dict,
    key: str,
    value: Union[str, bytes],  # More specific
    is_binary: bool,
    strip_newline: bool
) -> None:

💡 Suggestions — optional improvements

6. Consider \r\n Windows line endings

Context: The current implementation only strips \n (Unix), but shell commands on Windows or cross-platform tools might produce \r\n (CRLF).

Current behavior:

  • "hello\r\n""hello\r" (carriage return left behind)

Suggestion: If you want full cross-platform support, use .rstrip('\r\n') instead of .rstrip('\n'). However, this changes the "match bash convention" rationale.

Recommendation: Document this as a known limitation or accept it as intended Unix-first behavior. If you do change it, add a test:

def test_windows_line_endings_stripped(self):
    """Windows CRLF line endings are fully stripped."""
    shared = {}
    run_shell_node(shared, command="printf 'hello\\r\\n'")
    assert shared["stdout"] == "hello"  # No \r\n

7. Test naming could be more descriptive

Minor: Test names like test_default_strips_trailing_newline are good, but test_opt_out_preserves_trailing_newline could be clearer about when you'd use this.

Suggestion: Consider renaming:

  • test_opt_out_preserves_trailing_newlinetest_strip_newline_false_preserves_exact_output

📊 Test Coverage Assessment

Current tests (56 new lines):

  • ✅ Default stripping behavior
  • ✅ Opt-out with strip_newline=false
  • ✅ Multi-line preservation
  • ✅ Multiple trailing newlines
  • ❌ Binary data interaction (critical gap)
  • ❌ Empty stdout edge case (critical gap)
  • ❌ Windows CRLF (acceptable gap if documented)

Overall test quality: Good focus on real-world edge cases. The two critical gaps should be filled.


🔒 Security & Safety

  • ✅ No security concerns introduced
  • ✅ No credentials or secrets in code
  • ✅ Type hints preserved
  • ✅ Error handling unchanged

🏗️ Architecture & Maintainability

  • ✅ Follows single-responsibility principle (_store_output() helper)
  • ✅ Backwards compatible (default behavior change is justified by bug fix)
  • ✅ Clear separation of concerns (binary encoding vs text stripping)
  • ⚠️ Minor inconsistency in method signature default (see warning feat: add smart output display for registry run (closes #3) #4)

📝 Documentation

  • ✅ Agent instructions updated with opt-out note
  • ✅ Docstring comments added
  • ⚠️ Missing explanation for stderr asymmetry (see warning feat: Smart output display for registry run #3)
  • ℹ️ No architecture docs for shell node to update (checked architecture/core-node-packages/)

Final Recommendation

Approve with changes: Fix the two critical test gaps, address the _store_output() signature inconsistency, and document the stderr behavior. The core implementation is solid.

Action items before merge:

  1. Add test for binary data + strip_newline interaction
  2. Add test for empty stdout edge case
  3. Fix _store_output() method signature default (remove or make it True)
  4. Document stderr never-strip behavior in docstring + agent instructions
  5. Run make test and make check to verify all tests pass

Great work on solving a real UX bug with a sensible default! The implementation is clean and the test cases show good edge-case thinking. 🎯

@spinje
Copy link
Copy Markdown
Owner Author

spinje commented Jan 8, 2026

Code Review Response

Thanks for the thorough review! All critical and warning items have been addressed:

✅ Critical Issues Fixed

#1 Binary data + strip_newline test: Added TestStripNewlineWithBinary::test_strip_newline_ignored_for_binary_stdout

#2 Empty stdout test: Added TestStripNewline::test_empty_stdout_unchanged

✅ Warnings Addressed

#3 stderr asymmetry documented:

  • Updated docstring: "Strip trailing newlines from stdout only (optional, default true). stderr is never stripped."
  • Updated agent instructions: "Shell **stdout** with trailing \\n` is automatically stripped... stderr is never modified."`

#4 Default parameter inconsistency: Removed default from _store_output(), now all callers pass strip_newline explicitly

#5 Type hint specificity: Changed Any to str | bytes with isinstance() narrowing to satisfy mypy

⏭️ Deferred

#6 Windows CRLF: This is Unix-first following bash convention. Documented as known behavior.

Test Results

  • All 3859 tests pass ✅
  • make check passes ✅

@claude
Copy link
Copy Markdown

claude Bot commented Jan 8, 2026

Code Review: Strip Trailing Newlines from Shell Stdout

Summary

This PR addresses a legitimate and well-documented bug where trailing newlines from shell commands corrupt filenames in template variable substitution. The implementation is clean, well-tested, and follows the project's quality standards.


Approval - Ready to Merge

This is high-quality work with excellent documentation and testing. No critical issues found.


Warnings — Should be addressed

1. Stderr Stripping Inconsistency

Location: src/pflow/nodes/shell/shell.py:565-583

The PR correctly never strips stderr, but the behavior could be made more consistent by offering strip_newline for stderr too (even if default is false).

Current behavior:

  • stdout: strip_newline=true by default
  • stderr: always strip_newline=false (hardcoded)

Consideration: What if a user wants to strip stderr newlines? Currently impossible.

Suggested improvement:

# In prep():
strip_newline_stdout = self.params.get("strip_newline", True)
strip_newline_stderr = self.params.get("strip_newline_stderr", False)

# In post():
self._store_output(
    shared, "stderr", exec_res["stderr"],
    exec_res.get("stderr_is_binary", False),
    strip_newline=prep_res.get("strip_newline_stderr", False)
)

This maintains backward compatibility (stderr behavior unchanged) while offering future flexibility.

Priority: Low - can be addressed in a follow-up if users request it.


2. Documentation Could Be More Prominent

Location: src/pflow/cli/resources/cli-agent-instructions.md:883

The documentation update is accurate but buried in a bullet list. Consider adding a dedicated callout or example showing the file path use case, since that's the primary motivation.

Priority: Medium - helps users discover this feature.


Suggestions — Optional improvements

1. Test Coverage: Edge Case for Windows Line Endings

Location: tests/test_nodes/test_shell/test_shell.py:682-746

The tests cover Unix newlines (\n) but not Windows (\r\n) or mixed cases. While unlikely in shell output, it's a potential edge case.

Note: .rstrip('\n') only strips \n, not \r\n. Consider .rstrip() (all whitespace) or .rstrip('\r\n') for more robust handling.

Priority: Very Low - Unix shells almost never produce \r\n.


2. Helper Method Could Be Static

Location: src/pflow/nodes/shell/shell.py:351-373

The _store_output() helper doesn't use self and could be a static method. This signals that the method has no side effects on instance state.

Priority: Very Low - cosmetic improvement.


3. Scratchpad Files Could Be Gitignored

Location: scratchpads/stdout-newline-bug/*

The scratchpad directory contains excellent debugging artifacts. Current choice to keep them is fine - this is valuable historical context. Consider documenting the scratchpad retention policy in CLAUDE.md.

Priority: Very Low - documentation clarification.


Strengths of This PR

✅ Excellent Problem Documentation

The scratchpads/stdout-newline-bug/README.md is exemplary with clear root cause analysis, reproduction steps with hex dumps, impact assessment with real-world examples, and comparison of solution options. This is exactly the kind of reasoning documentation the project needs.

✅ Thoughtful Default Choice

Making strip_newline=true the default is the right call - it matches bash command substitution convention, prevents silent data corruption, handles the 99% use case (file paths, variable substitution), and provides opt-out for edge cases.

✅ Comprehensive Test Coverage

The tests cover default behavior, opt-out, multiline output (internal newlines preserved), multiple trailing newlines, empty output, and binary data. This is high-quality testing focused on behavior, not just coverage metrics.

✅ Cyclomatic Complexity Reduction

Extracting _store_output() improves post() readability - good refactoring discipline.

✅ Backward Compatibility

Existing tests that expected \n were updated (3 tests), proving this is a breaking change, but the old behavior was a bug, the new default matches user expectations, and opt-out preserves old behavior if needed. This is the right tradeoff for a pre-1.0 project.


Security & Safety

No security concerns - the change only affects output processing, not command execution
No credential exposure - stripping newlines doesn't interact with sensitive data
Binary data handling correct - is_binary check prevents string operations on bytes


Performance

Negligible impact - .rstrip('\n') is O(k) where k = trailing newlines (typically 1)
No additional allocations - Python string methods are optimized


Architecture Alignment

Follows project patterns:

  • Single responsibility (_store_output() helper)
  • Clear parameter naming (strip_newline)
  • Consistent with ignore_errors pattern (boolean opt-out)

Matches CLAUDE.md guidelines:

  • Simple, obvious code (no clever abstractions)
  • Tests document behavior
  • Defensive error handling (binary data protected)

Final Recommendation

APPROVE ✅ - This PR should be merged as-is.

The warnings and suggestions are optional enhancements that can be addressed in follow-up PRs if desired. The core fix is solid, well-tested, and solves a real user pain point.

Excellent work on:

  1. Thorough problem investigation
  2. Choosing the right default
  3. Comprehensive testing
  4. Clear documentation

This is the kind of quality the project needs. 🎯

@spinje spinje merged commit 6a1ba84 into main Jan 8, 2026
7 checks passed
@spinje
Copy link
Copy Markdown
Owner Author

spinje commented Jan 8, 2026

Potential edge case consideration

The current implementation uses .rstrip("\n") which strips all trailing newlines:

if strip_newline:
    text_value = text_value.rstrip("\n")

This means data\n\n\n becomes data (all three newlines removed).

Scenarios where this might matter:

  1. Intentional blank lines at end of output - If a command outputs data\n\n (with an intentional blank line), both newlines are stripped
  2. File content generation - Standard Unix convention is for text files to end with a newline; this would strip it

Current mitigation: Users can set strip_newline: false to preserve exact output.

Alternative consideration: Strip only ONE trailing newline (the shell-added artifact) instead of all:

if strip_newline and text_value.endswith("\n"):
    text_value = text_value[:-1]

This is probably not worth changing since:

  • 99% of use cases want all trailing newlines gone (filenames, values passed between nodes)
  • The strip_newline: false escape hatch exists
  • Stripping just one newline might leave unexpected whitespace in edge cases

Just documenting this for future reference in case someone reports unexpected behavior with multi-line outputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Shell node stdout contains trailing newlines, corrupting file paths

1 participant