Skip to content

Conversation

@inureyes
Copy link
Member

Summary

  • Add new --fail-fast / -k option for pdsh compatibility
  • Stop execution immediately when any node fails (connection error or non-zero exit code)
  • Cancel pending commands when failure is detected
  • Report which node caused the failure with detailed error message

Changes

CLI (src/cli.rs)

  • Added --fail-fast / -k flag with help text explaining pdsh compatibility
  • Updated examples in help text to include fail-fast mode examples

Executor (src/executor/parallel.rs)

  • Added fail_fast field to ParallelExecutor struct
  • Added with_fail_fast() builder method
  • Implemented execute_with_fail_fast() method using:
    • tokio::sync::watch channel for cancellation signaling
    • tokio::select! to race between semaphore acquisition and cancellation
    • Early return for cancelled tasks with appropriate error message
    • Detection and reporting of the first failure that triggered fail-fast

Integration (src/commands/exec.rs, src/app/dispatcher.rs)

  • Pass fail_fast parameter through the execution pipeline

Tests (tests/fail_fast_test.rs)

  • Added 10 unit tests covering:
    • CLI flag parsing (-k and --fail-fast)
    • Builder API for ParallelExecutor
    • Result classification (success, failure, error)
    • Flag combinations with other options
    • Parallelism settings compatibility

Documentation (README.md)

  • Added fail-fast to features list
  • Added usage examples for fail-fast mode

Test plan

  • Run cargo test fail_fast - All 10 tests pass
  • Run cargo clippy -- -D warnings - No warnings
  • Run cargo fmt - Code is properly formatted
  • Verify -k flag does not conflict with existing options
  • Verify help text shows fail-fast examples

Related issues

@inureyes inureyes added type:enhancement New feature or request status:review Under review priority:medium Medium priority issue pdsh-compat pdsh compatibility mode features labels Dec 16, 2025
@inureyes
Copy link
Member Author

Security & Performance Review

Analysis Summary

  • Scope: changed-files
  • Languages: Rust
  • Total issues: 2
  • Critical: 0 | High: 0 | Medium: 1 | Low: 1

Prioritized Fix Roadmap

MEDIUM

  • UTF-8 String Slicing Panic (src/executor/parallel.rs:350-351): The code uses byte indexing (&first_line[..47]) to truncate error messages. If an error message contains multi-byte UTF-8 characters (e.g., Chinese, Japanese, emoji) and byte 47 lands in the middle of a character, this will cause a panic at runtime. This can crash the application when handling error messages from internationalized systems.

LOW

  • Code duplication in error message truncation: The same truncation pattern appears in the normal execution path. While not a bug, this could be refactored for consistency.

Progress Log

  • Currently: Fixing UTF-8 string slicing issue

Technical Details

Issue: UTF-8 String Boundary Violation

Current code:

let short_error = if first_line.len() > 50 {
    format!("{}...", &first_line[..47])
} else {
    first_line.to_string()
};

Problem: first_line.len() returns byte count, not character count. Slicing at byte index 47 can split a multi-byte UTF-8 character.

Proof of panic:

let s = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\u{4e2d}\u{6587}"; // 45 ASCII + Chinese
s[..47] // PANICS: byte 47 is inside '中' (bytes 45..48)

Fix approach: Use floor_char_boundary() (Rust 1.80+) or manually find safe boundary.


Reviewer: AI Code Reviewer

@inureyes
Copy link
Member Author

Review Complete

Progress Log

  • UTF-8 String Slicing Panic - Fixed in commit bbfa50e8
    • Changed &first_line[..47] to &first_line[..first_line.floor_char_boundary(47)]
    • Ensures safe string truncation for all Unicode content
  • Code duplication - LOW priority, not addressed (suggestion only)

Verification

  • cargo clippy -- -D warnings - Passes
  • cargo test --test fail_fast_test - All 10 tests pass

Summary

The PR implements a solid --fail-fast / -k option for pdsh compatibility. The implementation correctly uses:

  • tokio::sync::watch channel for cancellation signaling
  • tokio::select! with biased selection for proper cancellation handling
  • futures::future::select_all for efficient task completion monitoring

One bug was found and fixed: potential panic on non-ASCII error messages when truncating for display.

Overall Assessment: The implementation is well-designed and follows async Rust best practices. Ready for merge after the UTF-8 fix.

@inureyes inureyes added status:done Completed and removed status:review Under review labels Dec 16, 2025
@inureyes inureyes self-assigned this Dec 16, 2025
@inureyes inureyes force-pushed the feature/issue-94-fail-fast-option branch from 8524cbd to 2962620 Compare December 16, 2025 15:04
…ible)

Add a new --fail-fast / -k option that stops execution immediately when
any node fails (connection error or non-zero exit code). This provides
pdsh compatibility and is useful for critical operations where partial
execution is unacceptable.

Implementation details:
- Uses tokio::sync::watch channel for cancellation signaling
- Uses tokio::select! to race between execution and cancellation
- Cancels pending tasks waiting in semaphore queue
- Reports which node caused the failure with error details
- Works with existing parallel execution infrastructure

Features:
- Short option -k for pdsh compatibility
- Long option --fail-fast for clarity
- Can be combined with --require-all-success and --check-all-nodes
- Supports all parallelism settings (including --parallel N)

Closes #94
Priority: MEDIUM
Issue: String slicing with byte index could panic on multi-byte UTF-8 chars

The error message truncation in execute_with_fail_fast() used direct
byte indexing (&first_line[..47]) which can panic if the index falls
in the middle of a multi-byte UTF-8 character.

Fixed by using floor_char_boundary(47) to find the largest valid char
boundary at or before byte 47, ensuring safe string truncation for all
Unicode content including CJK characters and emoji.
- Add --fail-fast / -k option description in manpage OPTIONS section
- Add fail-fast examples in manpage EXAMPLES section
- Add fail-fast mode implementation details in ARCHITECTURE.md
@inureyes inureyes force-pushed the feature/issue-94-fail-fast-option branch from 2962620 to 149666c Compare December 16, 2025 15:13
… cache

The test helper was finding stale release binaries in CI, causing tests to
fail because the old binary didn't have --connect-timeout option. Changed
to prefer debug binary since `cargo test` builds debug binaries.
@inureyes inureyes merged commit 08d0a96 into main Dec 16, 2025
3 checks passed
@inureyes inureyes deleted the feature/issue-94-fail-fast-option branch December 16, 2025 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pdsh-compat pdsh compatibility mode features priority:medium Medium priority issue status:done Completed type:enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add --fail-fast option to stop on first failure (pdsh -k compatibility)

2 participants