Expand interop coverage and tune super-linter#1905
Merged
Conversation
3 tasks
oferchen
added a commit
that referenced
this pull request
Apr 30, 2026
…hs (#3456) A Windows oc-rsync sender that builds a `FileEntry` whose path was constructed via `Path::join` / `PathBuf::push` (the normal case for recursive transfers) emitted the native string with `\` separators verbatim on the wire. A POSIX rsync receiver decoding those bytes treated every `\` as part of a single filename, producing one literal filename per source file instead of the expected directory hierarchy. Upstream rsync 3.4.1 writes filename bytes verbatim in `flist.c:send_file_entry()` (lines 534-570). The wire format remains `/`-separated because every upstream build either runs on a POSIX kernel or under Cygwin's POSIX layer, which presents `/`-separated paths to the application before this code is reached. oc-rsync targets native Win32 directly, so the sender must perform the separator normalisation explicitly. Before this fix on Windows: let mut path = PathBuf::from(\"subdir\"); path.push(\"file.txt\"); let entry = FileEntry::new_file(path, 1024, 0o644); entry.name_bytes() == b\"subdir\\\\file.txt\" // wrong After: entry.name_bytes() == b\"subdir/file.txt\" // matches upstream The fix introduces a single-purpose helper `crates/protocol/src/flist/wire_path.rs::path_bytes_to_wire` that mirrors the existing identity-on-Unix / convert-on-Windows pattern of `wire_mode.rs`. The helper is applied at every wire-encode call site: - `FileEntry::name_bytes()` (filename emission via `write_entry`) - `write_symlink_target()` (symlink target emission) - `strip_leading_slashes()` Windows branch trims both `/` and `\` The `name_bytes()` accessor signature changes from `&[u8]` to `Cow<'_, [u8]>` so the helper can borrow on Unix (zero allocation) and own only when conversion is required on Windows. Sort comparators borrow the inner slice via `Deref` and continue to work unchanged. Tests added: - `wire_path` module: 8 unit tests covering forward-slash identity, empty path, dot path, Unix borrow, Unix backslash preservation, Windows translation, mixed separators. - `flist::write` regression tests: assert the wire-encoded filename contains no `\` byte and that a writer-then-reader roundtrip yields `subdir/file.txt` regardless of host platform. - Symlink-target regression test asserting no `\` byte appears in the encoded target bytes. Companion audit: docs/audits/windows-path.md cites upstream flist.c:534-570 and util1.c:955-961. Closes #1905 Refs #1939
oferchen
added a commit
that referenced
this pull request
May 1, 2026
…3488) * docs(fast_io): design session-level io_uring ring pool (#1409) (#3444) * docs(fast_io): audit io_uring SQPOLL and DEFER_TASKRUN for socket I/O (#1267) (#3445) * docs(ci): audit Windows ACL/xattr CI matrix gaps (#1869) (#3446) * docs(fast_io): audit MADV_WILLNEED prefault for mmap'd basis files (#1662) (#3447) * docs(batch): investigate zstd as batch-compatible compression alternative (#1685) (#3448) * docs(fast_io): document mmap page-fault impact on io_uring SQPOLL (#1661) (#3449) * docs: audit --delete-during ordering vs upstream rsync 3.4.1 (#3453) Captures the audit findings from #1893 in docs/architecture/delete-during.md and lays the groundwork for the documentation work tracked in #1894. Contrasts upstream's per-directory interleaved deletion (generator.c::recv_generator + delete_in_dir) against oc-rsync's batched pre-transfer sweep (crates/transfer/src/receiver/transfer.rs:532), enumerates the resulting differences in phase ordering, determinism, filter evaluation, and error semantics, and lists the follow-up actions: CHANGELOG / man-page note, interop test for concurrent new+deleted entries, .rsync-filter investigation, and a possible --delete-strict-order opt-in. Refs #1893, #1894. * feat: add --jump-host proxy-jump CLI flag (#3454) Expose OpenSSH ProxyJump (-J) as --jump-host. Comma-separated [user@]HOST[:PORT] hops are forwarded to the remote shell as `ssh -J <value>` before the destination operand when the configured remote shell is OpenSSH. Empty values are rejected at every layer. Note: only the long form `--jump-host` is provided. Upstream rsync 3.4.1 binds `-J` to `--omit-link-times` (options.c:647), so reusing the short flag would break wire compatibility. Refs #1881 * docs: audit --iconv inert critical gap (#3455) Document the parse-but-never-applied architectural dead-end where the CLI accepts --iconv, validates and stores IconvSetting on ClientConfig, and forwards the option to the remote peer over SSH, but the local process never receives a FilenameConverter. ServerConfigBuilder::iconv() exists with zero call sites; receiver and generator already pull a converter from connection.iconv but always observe None. Severity: critical. Filenames with non-ASCII bytes silently bypass transcoding on every local-side path (file-list ingest, file-list emit, filter matching, daemon module serving) when --iconv is supplied. Companion to docs/audits/iconv-pipeline.md, narrowing on the single missing IconvSetting -> FilenameConverter bridge that currently makes every gap in that broader pipeline simultaneously unobservable. Tracks tasks #1909, #1910, #1918. Remediation path enumerated against existing #1911-#1919 task chain. * fix: normalise Windows backslash to forward slash in wire-encoded paths (#3456) A Windows oc-rsync sender that builds a `FileEntry` whose path was constructed via `Path::join` / `PathBuf::push` (the normal case for recursive transfers) emitted the native string with `\` separators verbatim on the wire. A POSIX rsync receiver decoding those bytes treated every `\` as part of a single filename, producing one literal filename per source file instead of the expected directory hierarchy. Upstream rsync 3.4.1 writes filename bytes verbatim in `flist.c:send_file_entry()` (lines 534-570). The wire format remains `/`-separated because every upstream build either runs on a POSIX kernel or under Cygwin's POSIX layer, which presents `/`-separated paths to the application before this code is reached. oc-rsync targets native Win32 directly, so the sender must perform the separator normalisation explicitly. Before this fix on Windows: let mut path = PathBuf::from(\"subdir\"); path.push(\"file.txt\"); let entry = FileEntry::new_file(path, 1024, 0o644); entry.name_bytes() == b\"subdir\\\\file.txt\" // wrong After: entry.name_bytes() == b\"subdir/file.txt\" // matches upstream The fix introduces a single-purpose helper `crates/protocol/src/flist/wire_path.rs::path_bytes_to_wire` that mirrors the existing identity-on-Unix / convert-on-Windows pattern of `wire_mode.rs`. The helper is applied at every wire-encode call site: - `FileEntry::name_bytes()` (filename emission via `write_entry`) - `write_symlink_target()` (symlink target emission) - `strip_leading_slashes()` Windows branch trims both `/` and `\` The `name_bytes()` accessor signature changes from `&[u8]` to `Cow<'_, [u8]>` so the helper can borrow on Unix (zero allocation) and own only when conversion is required on Windows. Sort comparators borrow the inner slice via `Deref` and continue to work unchanged. Tests added: - `wire_path` module: 8 unit tests covering forward-slash identity, empty path, dot path, Unix borrow, Unix backslash preservation, Windows translation, mixed separators. - `flist::write` regression tests: assert the wire-encoded filename contains no `\` byte and that a writer-then-reader roundtrip yields `subdir/file.txt` regardless of host platform. - Symlink-target regression test asserting no `\` byte appears in the encoded target bytes. Companion audit: docs/audits/windows-path.md cites upstream flist.c:534-570 and util1.c:955-961. Closes #1905 Refs #1939 * fix: validate IPv6 host with Ipv6Addr and support scoped zone ids (#3450) * fix: validate IPv6 host with Ipv6Addr and support scoped zone ids Closes #1879 and #1880. The previous `host_contains_colon` heuristic in `SshCommand::target_argument` treated any string containing `:` as IPv6 and wrapped it in brackets. That silently accepted malformed input such as `2001:db8:::1` or `garbage:input` and forwarded the unparseable literal to ssh. Replace it with strict `Ipv6Addr::from_str` validation via a new `parse_host_for_ssh` returning a `HostKind` enum (`Hostname` / `Ipv4` / `Ipv6 { addr, zone }`). The parser also recognises RFC 4007 scoped zone identifiers (e.g. `fe80::1%eth0`) and re-emits them inside the bracket form (`[addr%zone]`) per upstream rsync convention. Zone ids are rejected when empty or when they contain whitespace or `]`. Before: `host_contains_colon("2001:db8:::1") == true` -> emits `[2001:db8:::1]`. After: `parse_host_for_ssh("2001:db8:::1")` returns `Err(InvalidIpv6)`, so the input is passed through unchanged and ssh surfaces the resolution failure instead of receiving a malformed bracketed literal. Includes unit tests for: bare IPv4, bare hostname, bracketed IPv6, IPv6 with zone (`fe80::1%eth0` and `[fe80::1%en0]`), malformed multi-`::` IPv6, zone with whitespace (rejected), and direct `parse_host_for_ssh` classification across all `HostKind` and `BuildError` variants. * style: apply cargo fmt to ssh ipv6 validation * feat: add AppleDouble filter and optional xattr merge for macOS (#3457) * feat: add AppleDouble filter for macOS sidecar exclusion (#1907) Add the `--apple-double-skip` option which appends a perishable `._*` exclusion to the filter chain. macOS writes AppleDouble sidecar files on filesystems that cannot represent extended attributes natively (FAT, exFAT, NFS, SMB) to carry FinderInfo, resource forks, and xattrs. Replicating them across machines clutters destinations with stale metadata. The filter half of #1907 ships now. Upstream rsync 3.4.1 has no equivalent flag, so the option uses the descriptive `--apple-double-skip` name. The optional xattr merge half is deferred: the existing `apple-fs` crate does not expose AppleDouble parsing or xattr merging primitives, so a follow-up task will land that piece alongside the required parser. - New `filters::apple_double` module with the canonical `._*` pattern. - `FilterSet::from_rules_with_apple_double` mirrors the `from_rules_with_cvs` convenience constructor. - CLI plumbing through `ParsedArgs`, the command builder, the filter-rules collector, and the workflow runner. - Integration tests under `crates/filters/tests/apple_double.rs` covering top-level/nested matching, perishable semantics, override precedence, and clear-rule interaction. - Frontend tests cover argument parsing and an end-to-end transfer scenario that confirms `._foo` files are dropped while their parents copy through. - Man page and `--help` text updated. Refs #1907 * style: apply cargo fmt to apple-double filter * refactor: use shell-words crate for ssh remote-shell parsing (#3451) * refactor: use shell-words crate for ssh remote-shell parsing Replace the bespoke byte-level state machine in `parse_remote_shell` with `shell_words::split`, the well-supported POSIX shell tokenizer. - Tokenization is delegated to the `shell-words` crate (1.1). - The wrapper retains the three behaviours callers rely on: empty/whitespace input -> `Empty`, interior NUL -> `InteriorNull`, non-UTF-8 input -> `InvalidEncoding`. - The historical `UnterminatedEscape`/`UnterminatedSingleQuote`/ `UnterminatedDoubleQuote` variants collapse into a single `Parse(String)` variant carrying the `shell_words::ParseError` description. No external code matched on the removed variants. - Adds a property test corpus (`crates/rsync_io/tests/parse_remote_shell.rs`) asserting parity with `shell_words::split` over arbitrary RSYNC_RSH-style inputs, plus explicit unit tests for the historical behaviours (quoting, escapes, whitespace, NUL byte, invalid Unicode). Tasks: #1877, #1878. * style: apply cargo fmt to ssh shell-words tokenizer tests * chore: update Cargo.lock for shell-words dependency Cargo.lock was not updated when shell-words = "1.1" was added to crates/rsync_io/Cargo.toml in cc01bbc, causing the MSRV 1.88 workflow to fail with --locked. * fix(ssh): reject trailing backslash escape in remote-shell tokenizer shell_words::split silently accepts a dangling trailing backslash, but upstream rsync (and the previous bespoke tokenizer) reject it. Add a small validator that walks the string tracking single- and double-quote regions so that backslashes inside single quotes are not flagged, and returns true only when the input ends with an unescaped backslash that has no character to escape. This restores the test parser_rejects_trailing_escape that fails on the shell-words crate alone. * test: allow stricter Parse rejection vs shell_words in parity property The bespoke remote-shell tokenizer rejects dangling trailing backslashes, matching upstream rsync's `tokens = parse_arguments(...)`. The new shell-words-based wrapper preserves that strictness via has_trailing_escape, but `shell_words::split` silently absorbs the trailing backslash and returns Ok. The parity property test was therefore failing on inputs ending in a bare `\\` because parse_remote_shell returned Err(Parse) while shell_words returned Ok. The wrapper is intentionally a tightening, not a parity match in both directions. Add an explicit (Err(Parse), Ok(_)) arm so the property test encodes the contract: we may reject inputs shell_words accepts, but never the other way around. * chore: disable coverage workflow on push/pull_request (#3460) Nightly llvm-cov 22.1 SIGSEGVs deterministically inside llvm::coverage::CoverageMapping::getInstantiationGroups, breaking both master and PR runs. Restrict the workflow to manual workflow_dispatch until a known-good nightly is pinned via RUSTUP_TOOLCHAIN. * fix: avoid mmap basis when paired with io_uring writer (#3459) * fix: avoid mmap basis when paired with io_uring writer DeltaApplicator opened its basis file via MapFile::open_adaptive, which selects mmap for files >= 1 MiB on Unix. The applicator has no production caller today, but is the obvious next target for an io_uring-output rewrite. Wiring an IoUringWriter to the applicator without changing the basis path would submit mmap-backed pointers to io_uring SQEs on block-ref copies >= the writer batch threshold, exposing two failure modes: 1. Cold-page faults on the basis file are serviced under the SQE submission thread (worse, the SQPOLL kernel thread when SQPOLL is enabled), turning a "free" zero-copy write into a synchronous fault and stalling other in-flight SQEs on the same poller. 2. Concurrent truncation of the basis file raises SIGBUS while the kernel is dereferencing the page on our behalf - recovery from in-kernel SIGBUS is not signal-safe. Upstream rsync 3.4.1 deliberately avoids mmap(2) for basis files for the same truncation reason (fileio.c:214-217). Our BufferedMap matches that decision. Closes the F1 hazard from docs/audits/mmap-iouring-co-usage.md (#1660) by introducing BasisWriterKind in DeltaApplyConfig: when the writer is io_uring-backed, DeltaApplicator::new opens the basis via the new MapFile::open_adaptive_buffered (forces AdaptiveMapStrategy::Buffered regardless of size). Standard / std-write writers retain the existing adaptive selection. Non-Unix is unaffected since BufferedMap is the only basis strategy there. Tests cover both directions: io_uring writer + 2 MiB basis stays on BufferedMap, standard writer + 2 MiB basis still picks mmap on Unix. docs/design/basis-file-io-policy.md updated with implementation status noting that the io_uring-pairing axis is now wired; remaining hazard columns (--inplace, --append, --copy-devices, sparse, network FS) are future work tracked in the same doc. Refs: #1906 (this task), #1660 (audit). * fix: drop redundant Write import in applicator tests The test module brings in `Write` via `use super::*;` which re-exports the file-level `use std::io::{self, Read, Write};`. The explicit `use std::io::Write as _;` is therefore unused and trips clippy under `-D warnings`. * feat: bridge IconvSetting to FilenameConverter (#3458) Adds IconvSetting::resolve_converter() to map the CLI-side iconv setting onto a transfer-side FilenameConverter, and wires the converter through apply_common_server_flags so every SSH and daemon ServerConfig builder populates ConnectionConfig.iconv when --iconv is in effect. Without this bridge, the CLI parsed --iconv, validated it, forwarded it to the remote peer over SSH, but the local file-list reader and writer hooks always observed None and silently passed raw bytes through. Also rejects --iconv=LOCAL,REMOTE with a hard error when the iconv cargo feature is compiled out, rather than silently no-opping. Closes #1911 Closes #1915 * perf: wire IoUringDiskBatch into disk_commit hot path (#3452) * perf: wire IoUringDiskBatch into disk_commit hot path Task #1901. The disk-commit thread previously created an IoUringDiskBatch via try_create_disk_batch() but bound it to `_disk_batch`, leaving every write to fall through to the buffered path (thread.rs:127). All disk writes therefore bypassed io_uring batching even when the policy was Auto or Enabled on Linux 5.6+. Introduce a Writer enum in disk_commit/writer.rs that dispatches between ReusableBufWriter (existing 256 KB buffered path) and a new IoUring variant that borrows the disk thread's persistent IoUringDiskBatch. The IoUring variant is cfg-gated to all(target_os = "linux", feature = "io_uring") so non-Linux builds never reference it. In disk_commit/thread.rs, drop the underscore from the batch binding and thread `disk_batch.as_mut()` into both process_file and process_whole_file on every iteration of the message loop. The batch is created once per thread and reused across all files via begin_file/commit_file. In disk_commit/process.rs, replace the unconditional ReusableBufWriter construction with a make_writer() helper that registers the file with the batch when (a) the batch is present and (b) sparse mode is disabled. Sparse writes require Seek, which IoUringDiskBatch does not implement, so sparse mode keeps using the buffered writer via buffered_for_sparse(). Semantics preserved: - Temp-file commit ordering: Writer::finish() calls commit_file(do_fsync) which flushes + fsyncs + detaches the file before commit_file() runs the rename. Buffered path drops the writer (closing the file) before rename, identical to prior behaviour. - fsync: do_fsync flag flows through Writer::flush_and_sync (buffered) and Writer::finish (io_uring). The io_uring path makes flush_and_sync a no-op so commit_file performs both atomically in a single SQE. - Error propagation: io::Result is preserved end-to-end with file_path context attached on fsync/commit errors. * fix: gate unused Writer::finish parameters on non-Linux builds The do_fsync and file_path parameters on Writer::finish are only consumed by the io_uring arm. On non-Linux (or with the io_uring feature disabled) the Buffered arm ignores them, which under -D warnings escalates to a build failure. * fix(disk_commit): force buffered writer for append mode The IoUringDiskBatch writes via SQEs with absolute offsets starting at 0, ignoring the file position set by `file.seek(append_offset)` in open_output_file. This caused --append-mode transfers under io_uring to overwrite the existing file prefix with zeros, surfacing as the standalone:append interop test failing with NUL-byte corruption. Gate Writer::IoUring off whenever begin.append_offset > 0 and route those transfers through ReusableBufWriter, which honors the seek via standard Write::write_all. upstream: receiver.c:307-308 - file position is the source of truth in append mode; any writer that ignores it cannot be used. * docs(compress): convert restating comments to rustdoc in lz4 (#2026) (#3462) Remove inline comments inside lz4 codec function bodies and tests that simply echo the next statement. All public-item rustdoc and upstream rsync references (token.c) remain intact. * fix(filters): propagate P (protect) filter modifier in sender-side filter exchange (#3463) oc-rsync was emitting `P pattern` directly on the wire while also attaching the `r` (receiver-side) modifier derived from the rule's `applies_to_receiver` flag. Upstream rsync rejects `Pr` because the `P` prefix already specifies the side and `r` after a side-specifying prefix triggers the "invalid modifier" error in exclude.c:1270-1271. Upstream's `get_rule_prefix()` (exclude.c:1536-1572) only ever emits `+`, `-`, or `:` as the leading character; `P` and `R` exist only as parser sugar that lower to plain include/exclude rules carrying the `FILTRULE_RECEIVER_SIDE` flag. The wire encoding for a P rule is `-r pattern`; the wire encoding for an R rule is `+r pattern`. Normalize Protect/Risk in `build_modern_prefix` so the wire stream matches upstream byte-for-byte. The receiver-side flag is forced when serializing Protect/Risk rule types so that hand-built `FilterRuleWireFormat` values without `receiver_side=true` still serialize correctly. Reproduced and verified the fix against upstream rsync 3.4.1 daemon in the rsync-profile podman container with `--filter='P *.log'` push: protected files now survive `--delete`. Removes `standalone:delete-filter-protect` from KNOWN_FAILURES and DASHBOARD_ENTRIES. upstream: exclude.c:1536-1542 upstream: exclude.c:1569-1572 upstream: exclude.c:1201-1206 * docs(checksums): convert restating comments to rustdoc in rolling (#2018) (#3461) Cleans up the `crates/checksums/src/rolling/` test modules: - Drop decorative banner divider in front of the golden-tests block; replace with a short paragraph that describes intent and references upstream checksum.c. - Remove restating-the-code comments (e.g. "Verify it matches manual computation", "Verify rolling produces correct checksum after shift") whose information is already conveyed by the surrounding code or test name. - Sharpen wire-format comments to call out their connection to upstream rsync's packed (s2 << 16) | s1 layout. Production source files in `rolling/` already use proper rustdoc on every public item, so no `///` conversions are required there. SAFETY comments and SIMD weight-table explanations are preserved verbatim. No runtime behaviour changes. * fix(checksums): apply checksum_seed to MD4 strong checksum (upstream parity) (#3465) Upstream rsync's `get_checksum2()` (checksum.c:358-396) appends the negotiated `checksum_seed` after the data when computing MD4 block strong sums: memcpy(buf1, buf, len); if (checksum_seed) { SIVAL(buf1, len, checksum_seed); len += 4; } Our MD4 path was unconditionally calling `Md4::digest(data)` and ignoring the seed, so block-level strong checksums diverged from upstream whenever the session negotiated a non-zero seed. This silently hurt delta-transfer match quality across rsync<->oc-rsync interop and inflated `hash_hits` / `false_alarms` counters in delta-stats output. Changes: * Add `Md4::digest_with_seed(seed, data)` mirroring upstream's append-seed- after-data semantics. A zero seed is a no-op (matches upstream's `if (checksum_seed)` guard). * Add `SignatureAlgorithm::Md4Seeded { seed }` variant. Introducing a new variant rather than mutating the unit `Md4` keeps the 30+ existing call-sites that match on `Md4` working. * `ChecksumFactory::signature_algorithm()` now selects `Md4Seeded` when MD4 (or the legacy `None`/protocol<27 default) runs with a non-zero seed, and keeps unit `Md4` for the seed=0 case so existing wire goldens stay byte- identical. * Extend exhaustive matches in `signature::algorithm`, the parallel checksum executor, and the delta-transfer bench to cover `Md4Seeded`. Tests: * `md4_seeded_appends_seed_after_data` - verifies the byte layout matches building `(data || seed_le_bytes)` and hashing with plain MD4. * `md4_seeded_zero_seed_matches_unseeded` - guards the upstream zero-seed short-circuit. * `md4_seeded_negative_seed_is_le_two_complement` - confirms i32 wire encoding for negative seeds. * `md4_seeded_signature_matches_upstream_format` - parity at the `SignatureAlgorithm` layer. * Three `ChecksumFactory` tests covering MD4/None x seed=0/non-zero. upstream: checksum.c:358-396 `get_checksum2()` MD4 branch * fix(filters): correct R-before-P ordering in delete-filter-risk interop test (#3464) * fix(filters): correct R-before-P ordering in delete-filter-risk interop test The standalone:delete-filter-risk interop test had the filter rules in the wrong order: --filter='P *.log' --filter='P *.sh' --filter='R *.log'. Upstream rsync evaluates filter rules first-match-wins (exclude.c:1038-1065 check_filter()). For R (risk) to override P (protect) on *.log, it must be listed BEFORE the P rule. With R last, the P rule matched first and the *.log files were never deleted. Verified that upstream rsync 3.4.1 itself produces the same "wrong" outcome with the original P-first ordering, confirming the test expectation - not the filter engine - was incorrect. oc-rsync's filter engine already mirrors upstream semantics; the unit test risk_rule_allows_deletion_before_protection in crates/filters/src/tests.rs explicitly documents the required R-before-P order. Reordered both directions (up->oc and oc->up) to use --filter='R *.log' --filter='P *.log' --filter='P *.sh' and updated the function-level comment to cite exclude.c:1201-1207 (R rflags) and exclude.c:1038-1065 (first-match-wins). Verified end-to-end in the rsync-profile container that both upstream->oc and oc->upstream daemon pushes now produce the expected outcome: - source files transferred - risky.log + subdir/nested.log deleted (R overrides P) - keeper.sh preserved (P with no preceding R) - destonly.txt deleted (no protection) Removed the entry from KNOWN_FAILURES and DASHBOARD_ENTRIES in tools/ci/known_failures.conf. upstream: exclude.c:1201-1207 'R' = FILTRULE_INCLUDE|FILTRULE_RECEIVER_SIDE upstream: exclude.c:1038-1065 check_filter() first-match-wins * fix(filters): suppress redundant r/s modifier on Risk/Protect wire prefix The wire encoder unconditionally emitted `s`/`r` modifiers when the sender_side/receiver_side flags were set on a FilterRuleWireFormat. This produced output like `Rr *.log` for a Risk rule (which oc-rsync internally tags as receiver-side). Upstream rsync's parser rejects that with `invalid modifier 'r' at position 1 in filter rule`, breaking the delete-filter-risk interop scenario in oc->upstream direction with exit 12. Upstream's filter parser (exclude.c:1180-1207) sets prefix_specifies_side when the rule type char is R/P (and S/H), and exclude.c:1269-1278 rejects an explicit `r`/`s` modifier in that case. Match upstream's get_rule_prefix (exclude.c:1525-1587), which never emits the redundant flag for those prefixes. Suppress `s`/`r` emission when the rule type is Risk or Protect. Add unit tests at the prefix builder level and wire-level tests including a Risk roundtrip to lock in the format. * docs(protocol): convert restating comments to rustdoc in varint (#1999) (#3466) Remove decorative banner comments from the varint test module and tighten internal comments in `read_varlong` to delete pure restatement while preserving upstream rsync C source references (`io.c:read_varlong`, `memcpy(u.b, b2+1, min_bytes-1)`). No behaviour, signature, or import changes. * docs(compress): convert restating comments to rustdoc in zlib (#2029) (#3467) Remove decorative and restating-the-code comments from the zlib module's tests, while preserving every upstream rsync C source reference and comments that explain non-obvious test design rationale. Non-test files (decoder.rs, encoder.rs, helpers.rs, level.rs, mod.rs) already use proper rustdoc and required no changes. * docs: clean up comments in core::timeout (#2014) (#3471) * docs(signature): convert restating comments to rustdoc (#2034) (#3474) Drops inline comments that merely echo the immediately following code in block_size.rs, pipelined_gen.rs, and async_gen.rs. Reworks the few that remain to focus on the WHY (upstream-derived phase 2 short-circuit, sqrt bound derivation, BATCH_SIZE cross-file consistency, disk I/O thread cap). Comment-only change with no functional impact. * docs(metadata): convert restating comments to rustdoc in chmod (#2022) (#3473) Remove section-banner comments and pure-restatement comments from chmod test modules. Convert a stranded `///` doc-comment in `upstream_compatibility` to a proper `//!` module doc. Preserve all comments that reference upstream rsync semantics or that explain the non-obvious conditional-X / assign-empty-perms / octal-validation rules. No functional code changes. * feat(ci): add upstream rsync testsuite harness (#3470) Adds tools/ci/run_upstream_testsuite.sh, which runs upstream rsync's testsuite/*.test scripts against oc-rsync. Mirrors upstream's runtests.sh contract: exports $RSYNC, $TOOLDIR, $srcdir, $suitedir, $scratchdir, $TLS_ARGS, $POSIXLY_CORRECT, and $ECHO_* per test. Difference vs upstream runtests.sh: $RSYNC points at oc-rsync, not upstream. Upstream sources are still needed for helper tools (tls, getgroups, lsh.sh) and config artefacts (config.h, shconfig); the harness fetches and configures them on first run. Known failures live in tools/ci/upstream_testsuite_known_failures.conf and are grouped by category (lsh.sh-routed remote-shell tests, daemon-mode tests, ACL/xattr, devices, chown, hardlinks INC_RECURSE, atimes/crtimes, etc). Tests in this list are reported XFAIL on failure and UPASS on unexpected success so the list stays self-curating. Wires up risk area #4 from the audit: protocol edge-case compatibility against upstream's own conformance suite. Usage: tools/ci/run_upstream_testsuite.sh # all tests WHICHTESTS=00-hello.test tools/ci/... # one test PRESERVE_SCRATCH=yes tools/ci/... # keep per-test dirs * docs: clean up comments in protocol::error_recovery (#1985) (#3472) Remove restatement and decorative banner comments that echo the code without adding value. Replace two banner-style remarks with focused notes that explain non-obvious behavior (over-receive handling, why HashMap re-insert is the desired semantic, the conservative default classification, and the resume-vs-retry policy on transient errors). Scope is limited to crates/protocol/src/error_recovery/. No functional changes; tests and public APIs are unchanged. Refs #1985 * docs(checksums): convert restating comments to rustdoc in strong (#2020) (#3469) Audit of `crates/checksums/src/strong/` (excluding `strategy/`, `md4_tests.rs`, `xxh64_tests.rs` per task scope). Production source files are already well-documented from prior cleanup passes: every public type, trait, function, and method carries rustdoc; upstream rsync references (`upstream: checksum.c:get_checksum2()` and similar) are preserved on the seed-handling paths in `md4.rs` and `md5.rs`; no decorative banners, debug checkpoints, or commented-out code remain in scope. The single substantive change in this pass adds a quirk comment to the private `detect()` helper in `openssl_support.rs` that captures non-obvious upstream behaviour: MD4 may be absent on OpenSSL builds without the legacy provider, so the MD4 probe is best-effort and overall detection succeeds as long as MD5 works. This explains the otherwise-mysterious `if let Some(md4) = ...` whose result is deliberately discarded. No behaviour, signature, or import changes. * docs(compress): convert restating comments to rustdoc in strategy (#2028) (#3468) Remove decorative banner/divider comments from the compress::strategy module. The module's rustdoc was already comprehensive; this cleanup removes the remaining `// ---- Section ----` test-grouping banners that violate the no-decorative-divider rule from CLAUDE.md. All upstream rsync C source references and load-bearing internal comments are preserved verbatim. No production code, function bodies, signatures, types, or imports were modified. * docs: add v0.6 known limitations and security hardening notes Adds a "Known Limitations / Architectural Trade-offs" section to README.md covering PBUF_RING 5.19+ requirement, io_uring buffer pool ceiling, SSH double-compression interaction, single-thread delta computation, daemon TLS gap, and Windows IOCP wiring status (#1868). Updates SECURITY.md supported-versions table to 0.6.x and adds a "Hardening Notes" section covering recycle_buffer release-mode bounds-check gap, bgid u16 namespace exhaustion, SSH double-compression amplification surface, daemon TLS-in-front guidance (stunnel / SSH tunnel / reverse proxy), and daemon module hardening defaults. Cross-links the two documents so operators planning a deployment have a single source of truth for the documented trade-offs. * docs: comment cleanup in protocol::flist::hardlink Remove restating-the-code comments from the hardlink test suite and fix two malformed upstream references in types.rs that used `// upstream:` inside rustdoc blocks (rendered literally in generated docs). Preserved: all upstream rsync source references, non-obvious WHY explanations (FxHash collision rationale, swapped-value asymmetry, weak-mixing input categorisation), and module-level documentation. * docs(filters): audit .rsync-filter per-directory inheritance (#2050) Compare oc-rsync's per-directory merge file inheritance against upstream rsync 3.4.1. Documents the push/pop lifecycle, modifier semantics (n, e, p, s, r, !), and parent-dirscan behaviour. Identifies three behavioural bugs in FilterChain (n modifier ignored, !/clear not propagated to scopes, include-only scopes cannot override outer excludes), one intentional divergence (parent_dirscan), and one untested area (nested dir-merge declarations).
oferchen
added a commit
that referenced
this pull request
May 1, 2026
* docs(fast_io): design session-level io_uring ring pool (#1409) (#3444) * docs(fast_io): audit io_uring SQPOLL and DEFER_TASKRUN for socket I/O (#1267) (#3445) * docs(ci): audit Windows ACL/xattr CI matrix gaps (#1869) (#3446) * docs(fast_io): audit MADV_WILLNEED prefault for mmap'd basis files (#1662) (#3447) * docs(batch): investigate zstd as batch-compatible compression alternative (#1685) (#3448) * docs(fast_io): document mmap page-fault impact on io_uring SQPOLL (#1661) (#3449) * docs: audit --delete-during ordering vs upstream rsync 3.4.1 (#3453) Captures the audit findings from #1893 in docs/architecture/delete-during.md and lays the groundwork for the documentation work tracked in #1894. Contrasts upstream's per-directory interleaved deletion (generator.c::recv_generator + delete_in_dir) against oc-rsync's batched pre-transfer sweep (crates/transfer/src/receiver/transfer.rs:532), enumerates the resulting differences in phase ordering, determinism, filter evaluation, and error semantics, and lists the follow-up actions: CHANGELOG / man-page note, interop test for concurrent new+deleted entries, .rsync-filter investigation, and a possible --delete-strict-order opt-in. Refs #1893, #1894. * feat: add --jump-host proxy-jump CLI flag (#3454) Expose OpenSSH ProxyJump (-J) as --jump-host. Comma-separated [user@]HOST[:PORT] hops are forwarded to the remote shell as `ssh -J <value>` before the destination operand when the configured remote shell is OpenSSH. Empty values are rejected at every layer. Note: only the long form `--jump-host` is provided. Upstream rsync 3.4.1 binds `-J` to `--omit-link-times` (options.c:647), so reusing the short flag would break wire compatibility. Refs #1881 * docs: audit --iconv inert critical gap (#3455) Document the parse-but-never-applied architectural dead-end where the CLI accepts --iconv, validates and stores IconvSetting on ClientConfig, and forwards the option to the remote peer over SSH, but the local process never receives a FilenameConverter. ServerConfigBuilder::iconv() exists with zero call sites; receiver and generator already pull a converter from connection.iconv but always observe None. Severity: critical. Filenames with non-ASCII bytes silently bypass transcoding on every local-side path (file-list ingest, file-list emit, filter matching, daemon module serving) when --iconv is supplied. Companion to docs/audits/iconv-pipeline.md, narrowing on the single missing IconvSetting -> FilenameConverter bridge that currently makes every gap in that broader pipeline simultaneously unobservable. Tracks tasks #1909, #1910, #1918. Remediation path enumerated against existing #1911-#1919 task chain. * fix: normalise Windows backslash to forward slash in wire-encoded paths (#3456) A Windows oc-rsync sender that builds a `FileEntry` whose path was constructed via `Path::join` / `PathBuf::push` (the normal case for recursive transfers) emitted the native string with `\` separators verbatim on the wire. A POSIX rsync receiver decoding those bytes treated every `\` as part of a single filename, producing one literal filename per source file instead of the expected directory hierarchy. Upstream rsync 3.4.1 writes filename bytes verbatim in `flist.c:send_file_entry()` (lines 534-570). The wire format remains `/`-separated because every upstream build either runs on a POSIX kernel or under Cygwin's POSIX layer, which presents `/`-separated paths to the application before this code is reached. oc-rsync targets native Win32 directly, so the sender must perform the separator normalisation explicitly. Before this fix on Windows: let mut path = PathBuf::from(\"subdir\"); path.push(\"file.txt\"); let entry = FileEntry::new_file(path, 1024, 0o644); entry.name_bytes() == b\"subdir\\\\file.txt\" // wrong After: entry.name_bytes() == b\"subdir/file.txt\" // matches upstream The fix introduces a single-purpose helper `crates/protocol/src/flist/wire_path.rs::path_bytes_to_wire` that mirrors the existing identity-on-Unix / convert-on-Windows pattern of `wire_mode.rs`. The helper is applied at every wire-encode call site: - `FileEntry::name_bytes()` (filename emission via `write_entry`) - `write_symlink_target()` (symlink target emission) - `strip_leading_slashes()` Windows branch trims both `/` and `\` The `name_bytes()` accessor signature changes from `&[u8]` to `Cow<'_, [u8]>` so the helper can borrow on Unix (zero allocation) and own only when conversion is required on Windows. Sort comparators borrow the inner slice via `Deref` and continue to work unchanged. Tests added: - `wire_path` module: 8 unit tests covering forward-slash identity, empty path, dot path, Unix borrow, Unix backslash preservation, Windows translation, mixed separators. - `flist::write` regression tests: assert the wire-encoded filename contains no `\` byte and that a writer-then-reader roundtrip yields `subdir/file.txt` regardless of host platform. - Symlink-target regression test asserting no `\` byte appears in the encoded target bytes. Companion audit: docs/audits/windows-path.md cites upstream flist.c:534-570 and util1.c:955-961. Closes #1905 Refs #1939 * fix: validate IPv6 host with Ipv6Addr and support scoped zone ids (#3450) * fix: validate IPv6 host with Ipv6Addr and support scoped zone ids Closes #1879 and #1880. The previous `host_contains_colon` heuristic in `SshCommand::target_argument` treated any string containing `:` as IPv6 and wrapped it in brackets. That silently accepted malformed input such as `2001:db8:::1` or `garbage:input` and forwarded the unparseable literal to ssh. Replace it with strict `Ipv6Addr::from_str` validation via a new `parse_host_for_ssh` returning a `HostKind` enum (`Hostname` / `Ipv4` / `Ipv6 { addr, zone }`). The parser also recognises RFC 4007 scoped zone identifiers (e.g. `fe80::1%eth0`) and re-emits them inside the bracket form (`[addr%zone]`) per upstream rsync convention. Zone ids are rejected when empty or when they contain whitespace or `]`. Before: `host_contains_colon("2001:db8:::1") == true` -> emits `[2001:db8:::1]`. After: `parse_host_for_ssh("2001:db8:::1")` returns `Err(InvalidIpv6)`, so the input is passed through unchanged and ssh surfaces the resolution failure instead of receiving a malformed bracketed literal. Includes unit tests for: bare IPv4, bare hostname, bracketed IPv6, IPv6 with zone (`fe80::1%eth0` and `[fe80::1%en0]`), malformed multi-`::` IPv6, zone with whitespace (rejected), and direct `parse_host_for_ssh` classification across all `HostKind` and `BuildError` variants. * style: apply cargo fmt to ssh ipv6 validation * feat: add AppleDouble filter and optional xattr merge for macOS (#3457) * feat: add AppleDouble filter for macOS sidecar exclusion (#1907) Add the `--apple-double-skip` option which appends a perishable `._*` exclusion to the filter chain. macOS writes AppleDouble sidecar files on filesystems that cannot represent extended attributes natively (FAT, exFAT, NFS, SMB) to carry FinderInfo, resource forks, and xattrs. Replicating them across machines clutters destinations with stale metadata. The filter half of #1907 ships now. Upstream rsync 3.4.1 has no equivalent flag, so the option uses the descriptive `--apple-double-skip` name. The optional xattr merge half is deferred: the existing `apple-fs` crate does not expose AppleDouble parsing or xattr merging primitives, so a follow-up task will land that piece alongside the required parser. - New `filters::apple_double` module with the canonical `._*` pattern. - `FilterSet::from_rules_with_apple_double` mirrors the `from_rules_with_cvs` convenience constructor. - CLI plumbing through `ParsedArgs`, the command builder, the filter-rules collector, and the workflow runner. - Integration tests under `crates/filters/tests/apple_double.rs` covering top-level/nested matching, perishable semantics, override precedence, and clear-rule interaction. - Frontend tests cover argument parsing and an end-to-end transfer scenario that confirms `._foo` files are dropped while their parents copy through. - Man page and `--help` text updated. Refs #1907 * style: apply cargo fmt to apple-double filter * refactor: use shell-words crate for ssh remote-shell parsing (#3451) * refactor: use shell-words crate for ssh remote-shell parsing Replace the bespoke byte-level state machine in `parse_remote_shell` with `shell_words::split`, the well-supported POSIX shell tokenizer. - Tokenization is delegated to the `shell-words` crate (1.1). - The wrapper retains the three behaviours callers rely on: empty/whitespace input -> `Empty`, interior NUL -> `InteriorNull`, non-UTF-8 input -> `InvalidEncoding`. - The historical `UnterminatedEscape`/`UnterminatedSingleQuote`/ `UnterminatedDoubleQuote` variants collapse into a single `Parse(String)` variant carrying the `shell_words::ParseError` description. No external code matched on the removed variants. - Adds a property test corpus (`crates/rsync_io/tests/parse_remote_shell.rs`) asserting parity with `shell_words::split` over arbitrary RSYNC_RSH-style inputs, plus explicit unit tests for the historical behaviours (quoting, escapes, whitespace, NUL byte, invalid Unicode). Tasks: #1877, #1878. * style: apply cargo fmt to ssh shell-words tokenizer tests * chore: update Cargo.lock for shell-words dependency Cargo.lock was not updated when shell-words = "1.1" was added to crates/rsync_io/Cargo.toml in cc01bbc, causing the MSRV 1.88 workflow to fail with --locked. * fix(ssh): reject trailing backslash escape in remote-shell tokenizer shell_words::split silently accepts a dangling trailing backslash, but upstream rsync (and the previous bespoke tokenizer) reject it. Add a small validator that walks the string tracking single- and double-quote regions so that backslashes inside single quotes are not flagged, and returns true only when the input ends with an unescaped backslash that has no character to escape. This restores the test parser_rejects_trailing_escape that fails on the shell-words crate alone. * test: allow stricter Parse rejection vs shell_words in parity property The bespoke remote-shell tokenizer rejects dangling trailing backslashes, matching upstream rsync's `tokens = parse_arguments(...)`. The new shell-words-based wrapper preserves that strictness via has_trailing_escape, but `shell_words::split` silently absorbs the trailing backslash and returns Ok. The parity property test was therefore failing on inputs ending in a bare `\\` because parse_remote_shell returned Err(Parse) while shell_words returned Ok. The wrapper is intentionally a tightening, not a parity match in both directions. Add an explicit (Err(Parse), Ok(_)) arm so the property test encodes the contract: we may reject inputs shell_words accepts, but never the other way around. * chore: disable coverage workflow on push/pull_request (#3460) Nightly llvm-cov 22.1 SIGSEGVs deterministically inside llvm::coverage::CoverageMapping::getInstantiationGroups, breaking both master and PR runs. Restrict the workflow to manual workflow_dispatch until a known-good nightly is pinned via RUSTUP_TOOLCHAIN. * fix: avoid mmap basis when paired with io_uring writer (#3459) * fix: avoid mmap basis when paired with io_uring writer DeltaApplicator opened its basis file via MapFile::open_adaptive, which selects mmap for files >= 1 MiB on Unix. The applicator has no production caller today, but is the obvious next target for an io_uring-output rewrite. Wiring an IoUringWriter to the applicator without changing the basis path would submit mmap-backed pointers to io_uring SQEs on block-ref copies >= the writer batch threshold, exposing two failure modes: 1. Cold-page faults on the basis file are serviced under the SQE submission thread (worse, the SQPOLL kernel thread when SQPOLL is enabled), turning a "free" zero-copy write into a synchronous fault and stalling other in-flight SQEs on the same poller. 2. Concurrent truncation of the basis file raises SIGBUS while the kernel is dereferencing the page on our behalf - recovery from in-kernel SIGBUS is not signal-safe. Upstream rsync 3.4.1 deliberately avoids mmap(2) for basis files for the same truncation reason (fileio.c:214-217). Our BufferedMap matches that decision. Closes the F1 hazard from docs/audits/mmap-iouring-co-usage.md (#1660) by introducing BasisWriterKind in DeltaApplyConfig: when the writer is io_uring-backed, DeltaApplicator::new opens the basis via the new MapFile::open_adaptive_buffered (forces AdaptiveMapStrategy::Buffered regardless of size). Standard / std-write writers retain the existing adaptive selection. Non-Unix is unaffected since BufferedMap is the only basis strategy there. Tests cover both directions: io_uring writer + 2 MiB basis stays on BufferedMap, standard writer + 2 MiB basis still picks mmap on Unix. docs/design/basis-file-io-policy.md updated with implementation status noting that the io_uring-pairing axis is now wired; remaining hazard columns (--inplace, --append, --copy-devices, sparse, network FS) are future work tracked in the same doc. Refs: #1906 (this task), #1660 (audit). * fix: drop redundant Write import in applicator tests The test module brings in `Write` via `use super::*;` which re-exports the file-level `use std::io::{self, Read, Write};`. The explicit `use std::io::Write as _;` is therefore unused and trips clippy under `-D warnings`. * feat: bridge IconvSetting to FilenameConverter (#3458) Adds IconvSetting::resolve_converter() to map the CLI-side iconv setting onto a transfer-side FilenameConverter, and wires the converter through apply_common_server_flags so every SSH and daemon ServerConfig builder populates ConnectionConfig.iconv when --iconv is in effect. Without this bridge, the CLI parsed --iconv, validated it, forwarded it to the remote peer over SSH, but the local file-list reader and writer hooks always observed None and silently passed raw bytes through. Also rejects --iconv=LOCAL,REMOTE with a hard error when the iconv cargo feature is compiled out, rather than silently no-opping. Closes #1911 Closes #1915 * perf: wire IoUringDiskBatch into disk_commit hot path (#3452) * perf: wire IoUringDiskBatch into disk_commit hot path Task #1901. The disk-commit thread previously created an IoUringDiskBatch via try_create_disk_batch() but bound it to `_disk_batch`, leaving every write to fall through to the buffered path (thread.rs:127). All disk writes therefore bypassed io_uring batching even when the policy was Auto or Enabled on Linux 5.6+. Introduce a Writer enum in disk_commit/writer.rs that dispatches between ReusableBufWriter (existing 256 KB buffered path) and a new IoUring variant that borrows the disk thread's persistent IoUringDiskBatch. The IoUring variant is cfg-gated to all(target_os = "linux", feature = "io_uring") so non-Linux builds never reference it. In disk_commit/thread.rs, drop the underscore from the batch binding and thread `disk_batch.as_mut()` into both process_file and process_whole_file on every iteration of the message loop. The batch is created once per thread and reused across all files via begin_file/commit_file. In disk_commit/process.rs, replace the unconditional ReusableBufWriter construction with a make_writer() helper that registers the file with the batch when (a) the batch is present and (b) sparse mode is disabled. Sparse writes require Seek, which IoUringDiskBatch does not implement, so sparse mode keeps using the buffered writer via buffered_for_sparse(). Semantics preserved: - Temp-file commit ordering: Writer::finish() calls commit_file(do_fsync) which flushes + fsyncs + detaches the file before commit_file() runs the rename. Buffered path drops the writer (closing the file) before rename, identical to prior behaviour. - fsync: do_fsync flag flows through Writer::flush_and_sync (buffered) and Writer::finish (io_uring). The io_uring path makes flush_and_sync a no-op so commit_file performs both atomically in a single SQE. - Error propagation: io::Result is preserved end-to-end with file_path context attached on fsync/commit errors. * fix: gate unused Writer::finish parameters on non-Linux builds The do_fsync and file_path parameters on Writer::finish are only consumed by the io_uring arm. On non-Linux (or with the io_uring feature disabled) the Buffered arm ignores them, which under -D warnings escalates to a build failure. * fix(disk_commit): force buffered writer for append mode The IoUringDiskBatch writes via SQEs with absolute offsets starting at 0, ignoring the file position set by `file.seek(append_offset)` in open_output_file. This caused --append-mode transfers under io_uring to overwrite the existing file prefix with zeros, surfacing as the standalone:append interop test failing with NUL-byte corruption. Gate Writer::IoUring off whenever begin.append_offset > 0 and route those transfers through ReusableBufWriter, which honors the seek via standard Write::write_all. upstream: receiver.c:307-308 - file position is the source of truth in append mode; any writer that ignores it cannot be used. * docs(compress): convert restating comments to rustdoc in lz4 (#2026) (#3462) Remove inline comments inside lz4 codec function bodies and tests that simply echo the next statement. All public-item rustdoc and upstream rsync references (token.c) remain intact. * fix(filters): propagate P (protect) filter modifier in sender-side filter exchange (#3463) oc-rsync was emitting `P pattern` directly on the wire while also attaching the `r` (receiver-side) modifier derived from the rule's `applies_to_receiver` flag. Upstream rsync rejects `Pr` because the `P` prefix already specifies the side and `r` after a side-specifying prefix triggers the "invalid modifier" error in exclude.c:1270-1271. Upstream's `get_rule_prefix()` (exclude.c:1536-1572) only ever emits `+`, `-`, or `:` as the leading character; `P` and `R` exist only as parser sugar that lower to plain include/exclude rules carrying the `FILTRULE_RECEIVER_SIDE` flag. The wire encoding for a P rule is `-r pattern`; the wire encoding for an R rule is `+r pattern`. Normalize Protect/Risk in `build_modern_prefix` so the wire stream matches upstream byte-for-byte. The receiver-side flag is forced when serializing Protect/Risk rule types so that hand-built `FilterRuleWireFormat` values without `receiver_side=true` still serialize correctly. Reproduced and verified the fix against upstream rsync 3.4.1 daemon in the rsync-profile podman container with `--filter='P *.log'` push: protected files now survive `--delete`. Removes `standalone:delete-filter-protect` from KNOWN_FAILURES and DASHBOARD_ENTRIES. upstream: exclude.c:1536-1542 upstream: exclude.c:1569-1572 upstream: exclude.c:1201-1206 * docs(checksums): convert restating comments to rustdoc in rolling (#2018) (#3461) Cleans up the `crates/checksums/src/rolling/` test modules: - Drop decorative banner divider in front of the golden-tests block; replace with a short paragraph that describes intent and references upstream checksum.c. - Remove restating-the-code comments (e.g. "Verify it matches manual computation", "Verify rolling produces correct checksum after shift") whose information is already conveyed by the surrounding code or test name. - Sharpen wire-format comments to call out their connection to upstream rsync's packed (s2 << 16) | s1 layout. Production source files in `rolling/` already use proper rustdoc on every public item, so no `///` conversions are required there. SAFETY comments and SIMD weight-table explanations are preserved verbatim. No runtime behaviour changes. * fix(checksums): apply checksum_seed to MD4 strong checksum (upstream parity) (#3465) Upstream rsync's `get_checksum2()` (checksum.c:358-396) appends the negotiated `checksum_seed` after the data when computing MD4 block strong sums: memcpy(buf1, buf, len); if (checksum_seed) { SIVAL(buf1, len, checksum_seed); len += 4; } Our MD4 path was unconditionally calling `Md4::digest(data)` and ignoring the seed, so block-level strong checksums diverged from upstream whenever the session negotiated a non-zero seed. This silently hurt delta-transfer match quality across rsync<->oc-rsync interop and inflated `hash_hits` / `false_alarms` counters in delta-stats output. Changes: * Add `Md4::digest_with_seed(seed, data)` mirroring upstream's append-seed- after-data semantics. A zero seed is a no-op (matches upstream's `if (checksum_seed)` guard). * Add `SignatureAlgorithm::Md4Seeded { seed }` variant. Introducing a new variant rather than mutating the unit `Md4` keeps the 30+ existing call-sites that match on `Md4` working. * `ChecksumFactory::signature_algorithm()` now selects `Md4Seeded` when MD4 (or the legacy `None`/protocol<27 default) runs with a non-zero seed, and keeps unit `Md4` for the seed=0 case so existing wire goldens stay byte- identical. * Extend exhaustive matches in `signature::algorithm`, the parallel checksum executor, and the delta-transfer bench to cover `Md4Seeded`. Tests: * `md4_seeded_appends_seed_after_data` - verifies the byte layout matches building `(data || seed_le_bytes)` and hashing with plain MD4. * `md4_seeded_zero_seed_matches_unseeded` - guards the upstream zero-seed short-circuit. * `md4_seeded_negative_seed_is_le_two_complement` - confirms i32 wire encoding for negative seeds. * `md4_seeded_signature_matches_upstream_format` - parity at the `SignatureAlgorithm` layer. * Three `ChecksumFactory` tests covering MD4/None x seed=0/non-zero. upstream: checksum.c:358-396 `get_checksum2()` MD4 branch * fix(filters): correct R-before-P ordering in delete-filter-risk interop test (#3464) * fix(filters): correct R-before-P ordering in delete-filter-risk interop test The standalone:delete-filter-risk interop test had the filter rules in the wrong order: --filter='P *.log' --filter='P *.sh' --filter='R *.log'. Upstream rsync evaluates filter rules first-match-wins (exclude.c:1038-1065 check_filter()). For R (risk) to override P (protect) on *.log, it must be listed BEFORE the P rule. With R last, the P rule matched first and the *.log files were never deleted. Verified that upstream rsync 3.4.1 itself produces the same "wrong" outcome with the original P-first ordering, confirming the test expectation - not the filter engine - was incorrect. oc-rsync's filter engine already mirrors upstream semantics; the unit test risk_rule_allows_deletion_before_protection in crates/filters/src/tests.rs explicitly documents the required R-before-P order. Reordered both directions (up->oc and oc->up) to use --filter='R *.log' --filter='P *.log' --filter='P *.sh' and updated the function-level comment to cite exclude.c:1201-1207 (R rflags) and exclude.c:1038-1065 (first-match-wins). Verified end-to-end in the rsync-profile container that both upstream->oc and oc->upstream daemon pushes now produce the expected outcome: - source files transferred - risky.log + subdir/nested.log deleted (R overrides P) - keeper.sh preserved (P with no preceding R) - destonly.txt deleted (no protection) Removed the entry from KNOWN_FAILURES and DASHBOARD_ENTRIES in tools/ci/known_failures.conf. upstream: exclude.c:1201-1207 'R' = FILTRULE_INCLUDE|FILTRULE_RECEIVER_SIDE upstream: exclude.c:1038-1065 check_filter() first-match-wins * fix(filters): suppress redundant r/s modifier on Risk/Protect wire prefix The wire encoder unconditionally emitted `s`/`r` modifiers when the sender_side/receiver_side flags were set on a FilterRuleWireFormat. This produced output like `Rr *.log` for a Risk rule (which oc-rsync internally tags as receiver-side). Upstream rsync's parser rejects that with `invalid modifier 'r' at position 1 in filter rule`, breaking the delete-filter-risk interop scenario in oc->upstream direction with exit 12. Upstream's filter parser (exclude.c:1180-1207) sets prefix_specifies_side when the rule type char is R/P (and S/H), and exclude.c:1269-1278 rejects an explicit `r`/`s` modifier in that case. Match upstream's get_rule_prefix (exclude.c:1525-1587), which never emits the redundant flag for those prefixes. Suppress `s`/`r` emission when the rule type is Risk or Protect. Add unit tests at the prefix builder level and wire-level tests including a Risk roundtrip to lock in the format. * docs(protocol): convert restating comments to rustdoc in varint (#1999) (#3466) Remove decorative banner comments from the varint test module and tighten internal comments in `read_varlong` to delete pure restatement while preserving upstream rsync C source references (`io.c:read_varlong`, `memcpy(u.b, b2+1, min_bytes-1)`). No behaviour, signature, or import changes. * docs(compress): convert restating comments to rustdoc in zlib (#2029) (#3467) Remove decorative and restating-the-code comments from the zlib module's tests, while preserving every upstream rsync C source reference and comments that explain non-obvious test design rationale. Non-test files (decoder.rs, encoder.rs, helpers.rs, level.rs, mod.rs) already use proper rustdoc and required no changes. * docs: clean up comments in core::timeout (#2014) (#3471) * docs(signature): convert restating comments to rustdoc (#2034) (#3474) Drops inline comments that merely echo the immediately following code in block_size.rs, pipelined_gen.rs, and async_gen.rs. Reworks the few that remain to focus on the WHY (upstream-derived phase 2 short-circuit, sqrt bound derivation, BATCH_SIZE cross-file consistency, disk I/O thread cap). Comment-only change with no functional impact. * docs(metadata): convert restating comments to rustdoc in chmod (#2022) (#3473) Remove section-banner comments and pure-restatement comments from chmod test modules. Convert a stranded `///` doc-comment in `upstream_compatibility` to a proper `//!` module doc. Preserve all comments that reference upstream rsync semantics or that explain the non-obvious conditional-X / assign-empty-perms / octal-validation rules. No functional code changes. * feat(ci): add upstream rsync testsuite harness (#3470) Adds tools/ci/run_upstream_testsuite.sh, which runs upstream rsync's testsuite/*.test scripts against oc-rsync. Mirrors upstream's runtests.sh contract: exports $RSYNC, $TOOLDIR, $srcdir, $suitedir, $scratchdir, $TLS_ARGS, $POSIXLY_CORRECT, and $ECHO_* per test. Difference vs upstream runtests.sh: $RSYNC points at oc-rsync, not upstream. Upstream sources are still needed for helper tools (tls, getgroups, lsh.sh) and config artefacts (config.h, shconfig); the harness fetches and configures them on first run. Known failures live in tools/ci/upstream_testsuite_known_failures.conf and are grouped by category (lsh.sh-routed remote-shell tests, daemon-mode tests, ACL/xattr, devices, chown, hardlinks INC_RECURSE, atimes/crtimes, etc). Tests in this list are reported XFAIL on failure and UPASS on unexpected success so the list stays self-curating. Wires up risk area #4 from the audit: protocol edge-case compatibility against upstream's own conformance suite. Usage: tools/ci/run_upstream_testsuite.sh # all tests WHICHTESTS=00-hello.test tools/ci/... # one test PRESERVE_SCRATCH=yes tools/ci/... # keep per-test dirs * docs: clean up comments in protocol::error_recovery (#1985) (#3472) Remove restatement and decorative banner comments that echo the code without adding value. Replace two banner-style remarks with focused notes that explain non-obvious behavior (over-receive handling, why HashMap re-insert is the desired semantic, the conservative default classification, and the resume-vs-retry policy on transient errors). Scope is limited to crates/protocol/src/error_recovery/. No functional changes; tests and public APIs are unchanged. Refs #1985 * docs(checksums): convert restating comments to rustdoc in strong (#2020) (#3469) Audit of `crates/checksums/src/strong/` (excluding `strategy/`, `md4_tests.rs`, `xxh64_tests.rs` per task scope). Production source files are already well-documented from prior cleanup passes: every public type, trait, function, and method carries rustdoc; upstream rsync references (`upstream: checksum.c:get_checksum2()` and similar) are preserved on the seed-handling paths in `md4.rs` and `md5.rs`; no decorative banners, debug checkpoints, or commented-out code remain in scope. The single substantive change in this pass adds a quirk comment to the private `detect()` helper in `openssl_support.rs` that captures non-obvious upstream behaviour: MD4 may be absent on OpenSSL builds without the legacy provider, so the MD4 probe is best-effort and overall detection succeeds as long as MD5 works. This explains the otherwise-mysterious `if let Some(md4) = ...` whose result is deliberately discarded. No behaviour, signature, or import changes. * docs(compress): convert restating comments to rustdoc in strategy (#2028) (#3468) Remove decorative banner/divider comments from the compress::strategy module. The module's rustdoc was already comprehensive; this cleanup removes the remaining `// ---- Section ----` test-grouping banners that violate the no-decorative-divider rule from CLAUDE.md. All upstream rsync C source references and load-bearing internal comments are preserved verbatim. No production code, function bodies, signatures, types, or imports were modified. * docs: add v0.6 known limitations and security hardening notes Adds a "Known Limitations / Architectural Trade-offs" section to README.md covering PBUF_RING 5.19+ requirement, io_uring buffer pool ceiling, SSH double-compression interaction, single-thread delta computation, daemon TLS gap, and Windows IOCP wiring status (#1868). Updates SECURITY.md supported-versions table to 0.6.x and adds a "Hardening Notes" section covering recycle_buffer release-mode bounds-check gap, bgid u16 namespace exhaustion, SSH double-compression amplification surface, daemon TLS-in-front guidance (stunnel / SSH tunnel / reverse proxy), and daemon module hardening defaults. Cross-links the two documents so operators planning a deployment have a single source of truth for the documented trade-offs. * docs: comment cleanup in protocol::flist::hardlink Remove restating-the-code comments from the hardlink test suite and fix two malformed upstream references in types.rs that used `// upstream:` inside rustdoc blocks (rendered literally in generated docs). Preserved: all upstream rsync source references, non-obvious WHY explanations (FxHash collision rationale, swapped-value asymmetry, weak-mixing input categorisation), and module-level documentation. * docs(engine): audit --fuzzy basis-file search algorithm (#2051) Document oc-rsync's --fuzzy / -y basis-file search and scoring against upstream rsync 3.4.1 (generator.c::find_fuzzy() and util1.c::fuzzy_distance()). Identifies six divergences (scoring metric, phase A size+mtime short-circuit, FNAMECMP_FUZZY wire emission, suffix heuristics, already-sent filtering, tie-breaker direction) and three matches (-yy plumbing, --whole-file/--inplace /--append interactions). Recommends Levenshtein-based scoring, phase A short-circuit, and ITEM_BASIS_TYPE_FOLLOWS emission.
3 tasks
oferchen
added a commit
that referenced
this pull request
May 1, 2026
1 task
oferchen
added a commit
that referenced
this pull request
May 1, 2026
…#3515) Document the High-severity backslash-leak finding with reproduction steps and the remediation plan referencing task #1905. Refines the parent audit #1842 (windows-path-normalization.md) with a focused finding-level write-up: code paths, reproduction commands (daemon and batch capture), upstream comparison via Cygwin, the path_bytes_to_wire helper plan, regression-test obligations, and the Windows-sender -> Linux-receiver CI matrix entry.
oferchen
added a commit
that referenced
this pull request
May 5, 2026
…3488) * docs(fast_io): design session-level io_uring ring pool (#1409) (#3444) * docs(fast_io): audit io_uring SQPOLL and DEFER_TASKRUN for socket I/O (#1267) (#3445) * docs(ci): audit Windows ACL/xattr CI matrix gaps (#1869) (#3446) * docs(fast_io): audit MADV_WILLNEED prefault for mmap'd basis files (#1662) (#3447) * docs(batch): investigate zstd as batch-compatible compression alternative (#1685) (#3448) * docs(fast_io): document mmap page-fault impact on io_uring SQPOLL (#1661) (#3449) * docs: audit --delete-during ordering vs upstream rsync 3.4.1 (#3453) Captures the audit findings from #1893 in docs/architecture/delete-during.md and lays the groundwork for the documentation work tracked in #1894. Contrasts upstream's per-directory interleaved deletion (generator.c::recv_generator + delete_in_dir) against oc-rsync's batched pre-transfer sweep (crates/transfer/src/receiver/transfer.rs:532), enumerates the resulting differences in phase ordering, determinism, filter evaluation, and error semantics, and lists the follow-up actions: CHANGELOG / man-page note, interop test for concurrent new+deleted entries, .rsync-filter investigation, and a possible --delete-strict-order opt-in. Refs #1893, #1894. * feat: add --jump-host proxy-jump CLI flag (#3454) Expose OpenSSH ProxyJump (-J) as --jump-host. Comma-separated [user@]HOST[:PORT] hops are forwarded to the remote shell as `ssh -J <value>` before the destination operand when the configured remote shell is OpenSSH. Empty values are rejected at every layer. Note: only the long form `--jump-host` is provided. Upstream rsync 3.4.1 binds `-J` to `--omit-link-times` (options.c:647), so reusing the short flag would break wire compatibility. Refs #1881 * docs: audit --iconv inert critical gap (#3455) Document the parse-but-never-applied architectural dead-end where the CLI accepts --iconv, validates and stores IconvSetting on ClientConfig, and forwards the option to the remote peer over SSH, but the local process never receives a FilenameConverter. ServerConfigBuilder::iconv() exists with zero call sites; receiver and generator already pull a converter from connection.iconv but always observe None. Severity: critical. Filenames with non-ASCII bytes silently bypass transcoding on every local-side path (file-list ingest, file-list emit, filter matching, daemon module serving) when --iconv is supplied. Companion to docs/audits/iconv-pipeline.md, narrowing on the single missing IconvSetting -> FilenameConverter bridge that currently makes every gap in that broader pipeline simultaneously unobservable. Tracks tasks #1909, #1910, #1918. Remediation path enumerated against existing #1911-#1919 task chain. * fix: normalise Windows backslash to forward slash in wire-encoded paths (#3456) A Windows oc-rsync sender that builds a `FileEntry` whose path was constructed via `Path::join` / `PathBuf::push` (the normal case for recursive transfers) emitted the native string with `\` separators verbatim on the wire. A POSIX rsync receiver decoding those bytes treated every `\` as part of a single filename, producing one literal filename per source file instead of the expected directory hierarchy. Upstream rsync 3.4.1 writes filename bytes verbatim in `flist.c:send_file_entry()` (lines 534-570). The wire format remains `/`-separated because every upstream build either runs on a POSIX kernel or under Cygwin's POSIX layer, which presents `/`-separated paths to the application before this code is reached. oc-rsync targets native Win32 directly, so the sender must perform the separator normalisation explicitly. Before this fix on Windows: let mut path = PathBuf::from(\"subdir\"); path.push(\"file.txt\"); let entry = FileEntry::new_file(path, 1024, 0o644); entry.name_bytes() == b\"subdir\\\\file.txt\" // wrong After: entry.name_bytes() == b\"subdir/file.txt\" // matches upstream The fix introduces a single-purpose helper `crates/protocol/src/flist/wire_path.rs::path_bytes_to_wire` that mirrors the existing identity-on-Unix / convert-on-Windows pattern of `wire_mode.rs`. The helper is applied at every wire-encode call site: - `FileEntry::name_bytes()` (filename emission via `write_entry`) - `write_symlink_target()` (symlink target emission) - `strip_leading_slashes()` Windows branch trims both `/` and `\` The `name_bytes()` accessor signature changes from `&[u8]` to `Cow<'_, [u8]>` so the helper can borrow on Unix (zero allocation) and own only when conversion is required on Windows. Sort comparators borrow the inner slice via `Deref` and continue to work unchanged. Tests added: - `wire_path` module: 8 unit tests covering forward-slash identity, empty path, dot path, Unix borrow, Unix backslash preservation, Windows translation, mixed separators. - `flist::write` regression tests: assert the wire-encoded filename contains no `\` byte and that a writer-then-reader roundtrip yields `subdir/file.txt` regardless of host platform. - Symlink-target regression test asserting no `\` byte appears in the encoded target bytes. Companion audit: docs/audits/windows-path.md cites upstream flist.c:534-570 and util1.c:955-961. Closes #1905 Refs #1939 * fix: validate IPv6 host with Ipv6Addr and support scoped zone ids (#3450) * fix: validate IPv6 host with Ipv6Addr and support scoped zone ids Closes #1879 and #1880. The previous `host_contains_colon` heuristic in `SshCommand::target_argument` treated any string containing `:` as IPv6 and wrapped it in brackets. That silently accepted malformed input such as `2001:db8:::1` or `garbage:input` and forwarded the unparseable literal to ssh. Replace it with strict `Ipv6Addr::from_str` validation via a new `parse_host_for_ssh` returning a `HostKind` enum (`Hostname` / `Ipv4` / `Ipv6 { addr, zone }`). The parser also recognises RFC 4007 scoped zone identifiers (e.g. `fe80::1%eth0`) and re-emits them inside the bracket form (`[addr%zone]`) per upstream rsync convention. Zone ids are rejected when empty or when they contain whitespace or `]`. Before: `host_contains_colon("2001:db8:::1") == true` -> emits `[2001:db8:::1]`. After: `parse_host_for_ssh("2001:db8:::1")` returns `Err(InvalidIpv6)`, so the input is passed through unchanged and ssh surfaces the resolution failure instead of receiving a malformed bracketed literal. Includes unit tests for: bare IPv4, bare hostname, bracketed IPv6, IPv6 with zone (`fe80::1%eth0` and `[fe80::1%en0]`), malformed multi-`::` IPv6, zone with whitespace (rejected), and direct `parse_host_for_ssh` classification across all `HostKind` and `BuildError` variants. * style: apply cargo fmt to ssh ipv6 validation * feat: add AppleDouble filter and optional xattr merge for macOS (#3457) * feat: add AppleDouble filter for macOS sidecar exclusion (#1907) Add the `--apple-double-skip` option which appends a perishable `._*` exclusion to the filter chain. macOS writes AppleDouble sidecar files on filesystems that cannot represent extended attributes natively (FAT, exFAT, NFS, SMB) to carry FinderInfo, resource forks, and xattrs. Replicating them across machines clutters destinations with stale metadata. The filter half of #1907 ships now. Upstream rsync 3.4.1 has no equivalent flag, so the option uses the descriptive `--apple-double-skip` name. The optional xattr merge half is deferred: the existing `apple-fs` crate does not expose AppleDouble parsing or xattr merging primitives, so a follow-up task will land that piece alongside the required parser. - New `filters::apple_double` module with the canonical `._*` pattern. - `FilterSet::from_rules_with_apple_double` mirrors the `from_rules_with_cvs` convenience constructor. - CLI plumbing through `ParsedArgs`, the command builder, the filter-rules collector, and the workflow runner. - Integration tests under `crates/filters/tests/apple_double.rs` covering top-level/nested matching, perishable semantics, override precedence, and clear-rule interaction. - Frontend tests cover argument parsing and an end-to-end transfer scenario that confirms `._foo` files are dropped while their parents copy through. - Man page and `--help` text updated. Refs #1907 * style: apply cargo fmt to apple-double filter * refactor: use shell-words crate for ssh remote-shell parsing (#3451) * refactor: use shell-words crate for ssh remote-shell parsing Replace the bespoke byte-level state machine in `parse_remote_shell` with `shell_words::split`, the well-supported POSIX shell tokenizer. - Tokenization is delegated to the `shell-words` crate (1.1). - The wrapper retains the three behaviours callers rely on: empty/whitespace input -> `Empty`, interior NUL -> `InteriorNull`, non-UTF-8 input -> `InvalidEncoding`. - The historical `UnterminatedEscape`/`UnterminatedSingleQuote`/ `UnterminatedDoubleQuote` variants collapse into a single `Parse(String)` variant carrying the `shell_words::ParseError` description. No external code matched on the removed variants. - Adds a property test corpus (`crates/rsync_io/tests/parse_remote_shell.rs`) asserting parity with `shell_words::split` over arbitrary RSYNC_RSH-style inputs, plus explicit unit tests for the historical behaviours (quoting, escapes, whitespace, NUL byte, invalid Unicode). Tasks: #1877, #1878. * style: apply cargo fmt to ssh shell-words tokenizer tests * chore: update Cargo.lock for shell-words dependency Cargo.lock was not updated when shell-words = "1.1" was added to crates/rsync_io/Cargo.toml in cc01bbc, causing the MSRV 1.88 workflow to fail with --locked. * fix(ssh): reject trailing backslash escape in remote-shell tokenizer shell_words::split silently accepts a dangling trailing backslash, but upstream rsync (and the previous bespoke tokenizer) reject it. Add a small validator that walks the string tracking single- and double-quote regions so that backslashes inside single quotes are not flagged, and returns true only when the input ends with an unescaped backslash that has no character to escape. This restores the test parser_rejects_trailing_escape that fails on the shell-words crate alone. * test: allow stricter Parse rejection vs shell_words in parity property The bespoke remote-shell tokenizer rejects dangling trailing backslashes, matching upstream rsync's `tokens = parse_arguments(...)`. The new shell-words-based wrapper preserves that strictness via has_trailing_escape, but `shell_words::split` silently absorbs the trailing backslash and returns Ok. The parity property test was therefore failing on inputs ending in a bare `\\` because parse_remote_shell returned Err(Parse) while shell_words returned Ok. The wrapper is intentionally a tightening, not a parity match in both directions. Add an explicit (Err(Parse), Ok(_)) arm so the property test encodes the contract: we may reject inputs shell_words accepts, but never the other way around. * chore: disable coverage workflow on push/pull_request (#3460) Nightly llvm-cov 22.1 SIGSEGVs deterministically inside llvm::coverage::CoverageMapping::getInstantiationGroups, breaking both master and PR runs. Restrict the workflow to manual workflow_dispatch until a known-good nightly is pinned via RUSTUP_TOOLCHAIN. * fix: avoid mmap basis when paired with io_uring writer (#3459) * fix: avoid mmap basis when paired with io_uring writer DeltaApplicator opened its basis file via MapFile::open_adaptive, which selects mmap for files >= 1 MiB on Unix. The applicator has no production caller today, but is the obvious next target for an io_uring-output rewrite. Wiring an IoUringWriter to the applicator without changing the basis path would submit mmap-backed pointers to io_uring SQEs on block-ref copies >= the writer batch threshold, exposing two failure modes: 1. Cold-page faults on the basis file are serviced under the SQE submission thread (worse, the SQPOLL kernel thread when SQPOLL is enabled), turning a "free" zero-copy write into a synchronous fault and stalling other in-flight SQEs on the same poller. 2. Concurrent truncation of the basis file raises SIGBUS while the kernel is dereferencing the page on our behalf - recovery from in-kernel SIGBUS is not signal-safe. Upstream rsync 3.4.1 deliberately avoids mmap(2) for basis files for the same truncation reason (fileio.c:214-217). Our BufferedMap matches that decision. Closes the F1 hazard from docs/audits/mmap-iouring-co-usage.md (#1660) by introducing BasisWriterKind in DeltaApplyConfig: when the writer is io_uring-backed, DeltaApplicator::new opens the basis via the new MapFile::open_adaptive_buffered (forces AdaptiveMapStrategy::Buffered regardless of size). Standard / std-write writers retain the existing adaptive selection. Non-Unix is unaffected since BufferedMap is the only basis strategy there. Tests cover both directions: io_uring writer + 2 MiB basis stays on BufferedMap, standard writer + 2 MiB basis still picks mmap on Unix. docs/design/basis-file-io-policy.md updated with implementation status noting that the io_uring-pairing axis is now wired; remaining hazard columns (--inplace, --append, --copy-devices, sparse, network FS) are future work tracked in the same doc. Refs: #1906 (this task), #1660 (audit). * fix: drop redundant Write import in applicator tests The test module brings in `Write` via `use super::*;` which re-exports the file-level `use std::io::{self, Read, Write};`. The explicit `use std::io::Write as _;` is therefore unused and trips clippy under `-D warnings`. * feat: bridge IconvSetting to FilenameConverter (#3458) Adds IconvSetting::resolve_converter() to map the CLI-side iconv setting onto a transfer-side FilenameConverter, and wires the converter through apply_common_server_flags so every SSH and daemon ServerConfig builder populates ConnectionConfig.iconv when --iconv is in effect. Without this bridge, the CLI parsed --iconv, validated it, forwarded it to the remote peer over SSH, but the local file-list reader and writer hooks always observed None and silently passed raw bytes through. Also rejects --iconv=LOCAL,REMOTE with a hard error when the iconv cargo feature is compiled out, rather than silently no-opping. Closes #1911 Closes #1915 * perf: wire IoUringDiskBatch into disk_commit hot path (#3452) * perf: wire IoUringDiskBatch into disk_commit hot path Task #1901. The disk-commit thread previously created an IoUringDiskBatch via try_create_disk_batch() but bound it to `_disk_batch`, leaving every write to fall through to the buffered path (thread.rs:127). All disk writes therefore bypassed io_uring batching even when the policy was Auto or Enabled on Linux 5.6+. Introduce a Writer enum in disk_commit/writer.rs that dispatches between ReusableBufWriter (existing 256 KB buffered path) and a new IoUring variant that borrows the disk thread's persistent IoUringDiskBatch. The IoUring variant is cfg-gated to all(target_os = "linux", feature = "io_uring") so non-Linux builds never reference it. In disk_commit/thread.rs, drop the underscore from the batch binding and thread `disk_batch.as_mut()` into both process_file and process_whole_file on every iteration of the message loop. The batch is created once per thread and reused across all files via begin_file/commit_file. In disk_commit/process.rs, replace the unconditional ReusableBufWriter construction with a make_writer() helper that registers the file with the batch when (a) the batch is present and (b) sparse mode is disabled. Sparse writes require Seek, which IoUringDiskBatch does not implement, so sparse mode keeps using the buffered writer via buffered_for_sparse(). Semantics preserved: - Temp-file commit ordering: Writer::finish() calls commit_file(do_fsync) which flushes + fsyncs + detaches the file before commit_file() runs the rename. Buffered path drops the writer (closing the file) before rename, identical to prior behaviour. - fsync: do_fsync flag flows through Writer::flush_and_sync (buffered) and Writer::finish (io_uring). The io_uring path makes flush_and_sync a no-op so commit_file performs both atomically in a single SQE. - Error propagation: io::Result is preserved end-to-end with file_path context attached on fsync/commit errors. * fix: gate unused Writer::finish parameters on non-Linux builds The do_fsync and file_path parameters on Writer::finish are only consumed by the io_uring arm. On non-Linux (or with the io_uring feature disabled) the Buffered arm ignores them, which under -D warnings escalates to a build failure. * fix(disk_commit): force buffered writer for append mode The IoUringDiskBatch writes via SQEs with absolute offsets starting at 0, ignoring the file position set by `file.seek(append_offset)` in open_output_file. This caused --append-mode transfers under io_uring to overwrite the existing file prefix with zeros, surfacing as the standalone:append interop test failing with NUL-byte corruption. Gate Writer::IoUring off whenever begin.append_offset > 0 and route those transfers through ReusableBufWriter, which honors the seek via standard Write::write_all. upstream: receiver.c:307-308 - file position is the source of truth in append mode; any writer that ignores it cannot be used. * docs(compress): convert restating comments to rustdoc in lz4 (#2026) (#3462) Remove inline comments inside lz4 codec function bodies and tests that simply echo the next statement. All public-item rustdoc and upstream rsync references (token.c) remain intact. * fix(filters): propagate P (protect) filter modifier in sender-side filter exchange (#3463) oc-rsync was emitting `P pattern` directly on the wire while also attaching the `r` (receiver-side) modifier derived from the rule's `applies_to_receiver` flag. Upstream rsync rejects `Pr` because the `P` prefix already specifies the side and `r` after a side-specifying prefix triggers the "invalid modifier" error in exclude.c:1270-1271. Upstream's `get_rule_prefix()` (exclude.c:1536-1572) only ever emits `+`, `-`, or `:` as the leading character; `P` and `R` exist only as parser sugar that lower to plain include/exclude rules carrying the `FILTRULE_RECEIVER_SIDE` flag. The wire encoding for a P rule is `-r pattern`; the wire encoding for an R rule is `+r pattern`. Normalize Protect/Risk in `build_modern_prefix` so the wire stream matches upstream byte-for-byte. The receiver-side flag is forced when serializing Protect/Risk rule types so that hand-built `FilterRuleWireFormat` values without `receiver_side=true` still serialize correctly. Reproduced and verified the fix against upstream rsync 3.4.1 daemon in the rsync-profile podman container with `--filter='P *.log'` push: protected files now survive `--delete`. Removes `standalone:delete-filter-protect` from KNOWN_FAILURES and DASHBOARD_ENTRIES. upstream: exclude.c:1536-1542 upstream: exclude.c:1569-1572 upstream: exclude.c:1201-1206 * docs(checksums): convert restating comments to rustdoc in rolling (#2018) (#3461) Cleans up the `crates/checksums/src/rolling/` test modules: - Drop decorative banner divider in front of the golden-tests block; replace with a short paragraph that describes intent and references upstream checksum.c. - Remove restating-the-code comments (e.g. "Verify it matches manual computation", "Verify rolling produces correct checksum after shift") whose information is already conveyed by the surrounding code or test name. - Sharpen wire-format comments to call out their connection to upstream rsync's packed (s2 << 16) | s1 layout. Production source files in `rolling/` already use proper rustdoc on every public item, so no `///` conversions are required there. SAFETY comments and SIMD weight-table explanations are preserved verbatim. No runtime behaviour changes. * fix(checksums): apply checksum_seed to MD4 strong checksum (upstream parity) (#3465) Upstream rsync's `get_checksum2()` (checksum.c:358-396) appends the negotiated `checksum_seed` after the data when computing MD4 block strong sums: memcpy(buf1, buf, len); if (checksum_seed) { SIVAL(buf1, len, checksum_seed); len += 4; } Our MD4 path was unconditionally calling `Md4::digest(data)` and ignoring the seed, so block-level strong checksums diverged from upstream whenever the session negotiated a non-zero seed. This silently hurt delta-transfer match quality across rsync<->oc-rsync interop and inflated `hash_hits` / `false_alarms` counters in delta-stats output. Changes: * Add `Md4::digest_with_seed(seed, data)` mirroring upstream's append-seed- after-data semantics. A zero seed is a no-op (matches upstream's `if (checksum_seed)` guard). * Add `SignatureAlgorithm::Md4Seeded { seed }` variant. Introducing a new variant rather than mutating the unit `Md4` keeps the 30+ existing call-sites that match on `Md4` working. * `ChecksumFactory::signature_algorithm()` now selects `Md4Seeded` when MD4 (or the legacy `None`/protocol<27 default) runs with a non-zero seed, and keeps unit `Md4` for the seed=0 case so existing wire goldens stay byte- identical. * Extend exhaustive matches in `signature::algorithm`, the parallel checksum executor, and the delta-transfer bench to cover `Md4Seeded`. Tests: * `md4_seeded_appends_seed_after_data` - verifies the byte layout matches building `(data || seed_le_bytes)` and hashing with plain MD4. * `md4_seeded_zero_seed_matches_unseeded` - guards the upstream zero-seed short-circuit. * `md4_seeded_negative_seed_is_le_two_complement` - confirms i32 wire encoding for negative seeds. * `md4_seeded_signature_matches_upstream_format` - parity at the `SignatureAlgorithm` layer. * Three `ChecksumFactory` tests covering MD4/None x seed=0/non-zero. upstream: checksum.c:358-396 `get_checksum2()` MD4 branch * fix(filters): correct R-before-P ordering in delete-filter-risk interop test (#3464) * fix(filters): correct R-before-P ordering in delete-filter-risk interop test The standalone:delete-filter-risk interop test had the filter rules in the wrong order: --filter='P *.log' --filter='P *.sh' --filter='R *.log'. Upstream rsync evaluates filter rules first-match-wins (exclude.c:1038-1065 check_filter()). For R (risk) to override P (protect) on *.log, it must be listed BEFORE the P rule. With R last, the P rule matched first and the *.log files were never deleted. Verified that upstream rsync 3.4.1 itself produces the same "wrong" outcome with the original P-first ordering, confirming the test expectation - not the filter engine - was incorrect. oc-rsync's filter engine already mirrors upstream semantics; the unit test risk_rule_allows_deletion_before_protection in crates/filters/src/tests.rs explicitly documents the required R-before-P order. Reordered both directions (up->oc and oc->up) to use --filter='R *.log' --filter='P *.log' --filter='P *.sh' and updated the function-level comment to cite exclude.c:1201-1207 (R rflags) and exclude.c:1038-1065 (first-match-wins). Verified end-to-end in the rsync-profile container that both upstream->oc and oc->upstream daemon pushes now produce the expected outcome: - source files transferred - risky.log + subdir/nested.log deleted (R overrides P) - keeper.sh preserved (P with no preceding R) - destonly.txt deleted (no protection) Removed the entry from KNOWN_FAILURES and DASHBOARD_ENTRIES in tools/ci/known_failures.conf. upstream: exclude.c:1201-1207 'R' = FILTRULE_INCLUDE|FILTRULE_RECEIVER_SIDE upstream: exclude.c:1038-1065 check_filter() first-match-wins * fix(filters): suppress redundant r/s modifier on Risk/Protect wire prefix The wire encoder unconditionally emitted `s`/`r` modifiers when the sender_side/receiver_side flags were set on a FilterRuleWireFormat. This produced output like `Rr *.log` for a Risk rule (which oc-rsync internally tags as receiver-side). Upstream rsync's parser rejects that with `invalid modifier 'r' at position 1 in filter rule`, breaking the delete-filter-risk interop scenario in oc->upstream direction with exit 12. Upstream's filter parser (exclude.c:1180-1207) sets prefix_specifies_side when the rule type char is R/P (and S/H), and exclude.c:1269-1278 rejects an explicit `r`/`s` modifier in that case. Match upstream's get_rule_prefix (exclude.c:1525-1587), which never emits the redundant flag for those prefixes. Suppress `s`/`r` emission when the rule type is Risk or Protect. Add unit tests at the prefix builder level and wire-level tests including a Risk roundtrip to lock in the format. * docs(protocol): convert restating comments to rustdoc in varint (#1999) (#3466) Remove decorative banner comments from the varint test module and tighten internal comments in `read_varlong` to delete pure restatement while preserving upstream rsync C source references (`io.c:read_varlong`, `memcpy(u.b, b2+1, min_bytes-1)`). No behaviour, signature, or import changes. * docs(compress): convert restating comments to rustdoc in zlib (#2029) (#3467) Remove decorative and restating-the-code comments from the zlib module's tests, while preserving every upstream rsync C source reference and comments that explain non-obvious test design rationale. Non-test files (decoder.rs, encoder.rs, helpers.rs, level.rs, mod.rs) already use proper rustdoc and required no changes. * docs: clean up comments in core::timeout (#2014) (#3471) * docs(signature): convert restating comments to rustdoc (#2034) (#3474) Drops inline comments that merely echo the immediately following code in block_size.rs, pipelined_gen.rs, and async_gen.rs. Reworks the few that remain to focus on the WHY (upstream-derived phase 2 short-circuit, sqrt bound derivation, BATCH_SIZE cross-file consistency, disk I/O thread cap). Comment-only change with no functional impact. * docs(metadata): convert restating comments to rustdoc in chmod (#2022) (#3473) Remove section-banner comments and pure-restatement comments from chmod test modules. Convert a stranded `///` doc-comment in `upstream_compatibility` to a proper `//!` module doc. Preserve all comments that reference upstream rsync semantics or that explain the non-obvious conditional-X / assign-empty-perms / octal-validation rules. No functional code changes. * feat(ci): add upstream rsync testsuite harness (#3470) Adds tools/ci/run_upstream_testsuite.sh, which runs upstream rsync's testsuite/*.test scripts against oc-rsync. Mirrors upstream's runtests.sh contract: exports $RSYNC, $TOOLDIR, $srcdir, $suitedir, $scratchdir, $TLS_ARGS, $POSIXLY_CORRECT, and $ECHO_* per test. Difference vs upstream runtests.sh: $RSYNC points at oc-rsync, not upstream. Upstream sources are still needed for helper tools (tls, getgroups, lsh.sh) and config artefacts (config.h, shconfig); the harness fetches and configures them on first run. Known failures live in tools/ci/upstream_testsuite_known_failures.conf and are grouped by category (lsh.sh-routed remote-shell tests, daemon-mode tests, ACL/xattr, devices, chown, hardlinks INC_RECURSE, atimes/crtimes, etc). Tests in this list are reported XFAIL on failure and UPASS on unexpected success so the list stays self-curating. Wires up risk area #4 from the audit: protocol edge-case compatibility against upstream's own conformance suite. Usage: tools/ci/run_upstream_testsuite.sh # all tests WHICHTESTS=00-hello.test tools/ci/... # one test PRESERVE_SCRATCH=yes tools/ci/... # keep per-test dirs * docs: clean up comments in protocol::error_recovery (#1985) (#3472) Remove restatement and decorative banner comments that echo the code without adding value. Replace two banner-style remarks with focused notes that explain non-obvious behavior (over-receive handling, why HashMap re-insert is the desired semantic, the conservative default classification, and the resume-vs-retry policy on transient errors). Scope is limited to crates/protocol/src/error_recovery/. No functional changes; tests and public APIs are unchanged. Refs #1985 * docs(checksums): convert restating comments to rustdoc in strong (#2020) (#3469) Audit of `crates/checksums/src/strong/` (excluding `strategy/`, `md4_tests.rs`, `xxh64_tests.rs` per task scope). Production source files are already well-documented from prior cleanup passes: every public type, trait, function, and method carries rustdoc; upstream rsync references (`upstream: checksum.c:get_checksum2()` and similar) are preserved on the seed-handling paths in `md4.rs` and `md5.rs`; no decorative banners, debug checkpoints, or commented-out code remain in scope. The single substantive change in this pass adds a quirk comment to the private `detect()` helper in `openssl_support.rs` that captures non-obvious upstream behaviour: MD4 may be absent on OpenSSL builds without the legacy provider, so the MD4 probe is best-effort and overall detection succeeds as long as MD5 works. This explains the otherwise-mysterious `if let Some(md4) = ...` whose result is deliberately discarded. No behaviour, signature, or import changes. * docs(compress): convert restating comments to rustdoc in strategy (#2028) (#3468) Remove decorative banner/divider comments from the compress::strategy module. The module's rustdoc was already comprehensive; this cleanup removes the remaining `// ---- Section ----` test-grouping banners that violate the no-decorative-divider rule from <internal-tool-doc>. All upstream rsync C source references and load-bearing internal comments are preserved verbatim. No production code, function bodies, signatures, types, or imports were modified. * docs: add v0.6 known limitations and security hardening notes Adds a "Known Limitations / Architectural Trade-offs" section to README.md covering PBUF_RING 5.19+ requirement, io_uring buffer pool ceiling, SSH double-compression interaction, single-thread delta computation, daemon TLS gap, and Windows IOCP wiring status (#1868). Updates SECURITY.md supported-versions table to 0.6.x and adds a "Hardening Notes" section covering recycle_buffer release-mode bounds-check gap, bgid u16 namespace exhaustion, SSH double-compression amplification surface, daemon TLS-in-front guidance (stunnel / SSH tunnel / reverse proxy), and daemon module hardening defaults. Cross-links the two documents so operators planning a deployment have a single source of truth for the documented trade-offs. * docs: comment cleanup in protocol::flist::hardlink Remove restating-the-code comments from the hardlink test suite and fix two malformed upstream references in types.rs that used `// upstream:` inside rustdoc blocks (rendered literally in generated docs). Preserved: all upstream rsync source references, non-obvious WHY explanations (FxHash collision rationale, swapped-value asymmetry, weak-mixing input categorisation), and module-level documentation. * docs(filters): audit .rsync-filter per-directory inheritance (#2050) Compare oc-rsync's per-directory merge file inheritance against upstream rsync 3.4.1. Documents the push/pop lifecycle, modifier semantics (n, e, p, s, r, !), and parent-dirscan behaviour. Identifies three behavioural bugs in FilterChain (n modifier ignored, !/clear not propagated to scopes, include-only scopes cannot override outer excludes), one intentional divergence (parent_dirscan), and one untested area (nested dir-merge declarations).
oferchen
added a commit
that referenced
this pull request
May 5, 2026
* docs(fast_io): design session-level io_uring ring pool (#1409) (#3444) * docs(fast_io): audit io_uring SQPOLL and DEFER_TASKRUN for socket I/O (#1267) (#3445) * docs(ci): audit Windows ACL/xattr CI matrix gaps (#1869) (#3446) * docs(fast_io): audit MADV_WILLNEED prefault for mmap'd basis files (#1662) (#3447) * docs(batch): investigate zstd as batch-compatible compression alternative (#1685) (#3448) * docs(fast_io): document mmap page-fault impact on io_uring SQPOLL (#1661) (#3449) * docs: audit --delete-during ordering vs upstream rsync 3.4.1 (#3453) Captures the audit findings from #1893 in docs/architecture/delete-during.md and lays the groundwork for the documentation work tracked in #1894. Contrasts upstream's per-directory interleaved deletion (generator.c::recv_generator + delete_in_dir) against oc-rsync's batched pre-transfer sweep (crates/transfer/src/receiver/transfer.rs:532), enumerates the resulting differences in phase ordering, determinism, filter evaluation, and error semantics, and lists the follow-up actions: CHANGELOG / man-page note, interop test for concurrent new+deleted entries, .rsync-filter investigation, and a possible --delete-strict-order opt-in. Refs #1893, #1894. * feat: add --jump-host proxy-jump CLI flag (#3454) Expose OpenSSH ProxyJump (-J) as --jump-host. Comma-separated [user@]HOST[:PORT] hops are forwarded to the remote shell as `ssh -J <value>` before the destination operand when the configured remote shell is OpenSSH. Empty values are rejected at every layer. Note: only the long form `--jump-host` is provided. Upstream rsync 3.4.1 binds `-J` to `--omit-link-times` (options.c:647), so reusing the short flag would break wire compatibility. Refs #1881 * docs: audit --iconv inert critical gap (#3455) Document the parse-but-never-applied architectural dead-end where the CLI accepts --iconv, validates and stores IconvSetting on ClientConfig, and forwards the option to the remote peer over SSH, but the local process never receives a FilenameConverter. ServerConfigBuilder::iconv() exists with zero call sites; receiver and generator already pull a converter from connection.iconv but always observe None. Severity: critical. Filenames with non-ASCII bytes silently bypass transcoding on every local-side path (file-list ingest, file-list emit, filter matching, daemon module serving) when --iconv is supplied. Companion to docs/audits/iconv-pipeline.md, narrowing on the single missing IconvSetting -> FilenameConverter bridge that currently makes every gap in that broader pipeline simultaneously unobservable. Tracks tasks #1909, #1910, #1918. Remediation path enumerated against existing #1911-#1919 task chain. * fix: normalise Windows backslash to forward slash in wire-encoded paths (#3456) A Windows oc-rsync sender that builds a `FileEntry` whose path was constructed via `Path::join` / `PathBuf::push` (the normal case for recursive transfers) emitted the native string with `\` separators verbatim on the wire. A POSIX rsync receiver decoding those bytes treated every `\` as part of a single filename, producing one literal filename per source file instead of the expected directory hierarchy. Upstream rsync 3.4.1 writes filename bytes verbatim in `flist.c:send_file_entry()` (lines 534-570). The wire format remains `/`-separated because every upstream build either runs on a POSIX kernel or under Cygwin's POSIX layer, which presents `/`-separated paths to the application before this code is reached. oc-rsync targets native Win32 directly, so the sender must perform the separator normalisation explicitly. Before this fix on Windows: let mut path = PathBuf::from(\"subdir\"); path.push(\"file.txt\"); let entry = FileEntry::new_file(path, 1024, 0o644); entry.name_bytes() == b\"subdir\\\\file.txt\" // wrong After: entry.name_bytes() == b\"subdir/file.txt\" // matches upstream The fix introduces a single-purpose helper `crates/protocol/src/flist/wire_path.rs::path_bytes_to_wire` that mirrors the existing identity-on-Unix / convert-on-Windows pattern of `wire_mode.rs`. The helper is applied at every wire-encode call site: - `FileEntry::name_bytes()` (filename emission via `write_entry`) - `write_symlink_target()` (symlink target emission) - `strip_leading_slashes()` Windows branch trims both `/` and `\` The `name_bytes()` accessor signature changes from `&[u8]` to `Cow<'_, [u8]>` so the helper can borrow on Unix (zero allocation) and own only when conversion is required on Windows. Sort comparators borrow the inner slice via `Deref` and continue to work unchanged. Tests added: - `wire_path` module: 8 unit tests covering forward-slash identity, empty path, dot path, Unix borrow, Unix backslash preservation, Windows translation, mixed separators. - `flist::write` regression tests: assert the wire-encoded filename contains no `\` byte and that a writer-then-reader roundtrip yields `subdir/file.txt` regardless of host platform. - Symlink-target regression test asserting no `\` byte appears in the encoded target bytes. Companion audit: docs/audits/windows-path.md cites upstream flist.c:534-570 and util1.c:955-961. Closes #1905 Refs #1939 * fix: validate IPv6 host with Ipv6Addr and support scoped zone ids (#3450) * fix: validate IPv6 host with Ipv6Addr and support scoped zone ids Closes #1879 and #1880. The previous `host_contains_colon` heuristic in `SshCommand::target_argument` treated any string containing `:` as IPv6 and wrapped it in brackets. That silently accepted malformed input such as `2001:db8:::1` or `garbage:input` and forwarded the unparseable literal to ssh. Replace it with strict `Ipv6Addr::from_str` validation via a new `parse_host_for_ssh` returning a `HostKind` enum (`Hostname` / `Ipv4` / `Ipv6 { addr, zone }`). The parser also recognises RFC 4007 scoped zone identifiers (e.g. `fe80::1%eth0`) and re-emits them inside the bracket form (`[addr%zone]`) per upstream rsync convention. Zone ids are rejected when empty or when they contain whitespace or `]`. Before: `host_contains_colon("2001:db8:::1") == true` -> emits `[2001:db8:::1]`. After: `parse_host_for_ssh("2001:db8:::1")` returns `Err(InvalidIpv6)`, so the input is passed through unchanged and ssh surfaces the resolution failure instead of receiving a malformed bracketed literal. Includes unit tests for: bare IPv4, bare hostname, bracketed IPv6, IPv6 with zone (`fe80::1%eth0` and `[fe80::1%en0]`), malformed multi-`::` IPv6, zone with whitespace (rejected), and direct `parse_host_for_ssh` classification across all `HostKind` and `BuildError` variants. * style: apply cargo fmt to ssh ipv6 validation * feat: add AppleDouble filter and optional xattr merge for macOS (#3457) * feat: add AppleDouble filter for macOS sidecar exclusion (#1907) Add the `--apple-double-skip` option which appends a perishable `._*` exclusion to the filter chain. macOS writes AppleDouble sidecar files on filesystems that cannot represent extended attributes natively (FAT, exFAT, NFS, SMB) to carry FinderInfo, resource forks, and xattrs. Replicating them across machines clutters destinations with stale metadata. The filter half of #1907 ships now. Upstream rsync 3.4.1 has no equivalent flag, so the option uses the descriptive `--apple-double-skip` name. The optional xattr merge half is deferred: the existing `apple-fs` crate does not expose AppleDouble parsing or xattr merging primitives, so a follow-up task will land that piece alongside the required parser. - New `filters::apple_double` module with the canonical `._*` pattern. - `FilterSet::from_rules_with_apple_double` mirrors the `from_rules_with_cvs` convenience constructor. - CLI plumbing through `ParsedArgs`, the command builder, the filter-rules collector, and the workflow runner. - Integration tests under `crates/filters/tests/apple_double.rs` covering top-level/nested matching, perishable semantics, override precedence, and clear-rule interaction. - Frontend tests cover argument parsing and an end-to-end transfer scenario that confirms `._foo` files are dropped while their parents copy through. - Man page and `--help` text updated. Refs #1907 * style: apply cargo fmt to apple-double filter * refactor: use shell-words crate for ssh remote-shell parsing (#3451) * refactor: use shell-words crate for ssh remote-shell parsing Replace the bespoke byte-level state machine in `parse_remote_shell` with `shell_words::split`, the well-supported POSIX shell tokenizer. - Tokenization is delegated to the `shell-words` crate (1.1). - The wrapper retains the three behaviours callers rely on: empty/whitespace input -> `Empty`, interior NUL -> `InteriorNull`, non-UTF-8 input -> `InvalidEncoding`. - The historical `UnterminatedEscape`/`UnterminatedSingleQuote`/ `UnterminatedDoubleQuote` variants collapse into a single `Parse(String)` variant carrying the `shell_words::ParseError` description. No external code matched on the removed variants. - Adds a property test corpus (`crates/rsync_io/tests/parse_remote_shell.rs`) asserting parity with `shell_words::split` over arbitrary RSYNC_RSH-style inputs, plus explicit unit tests for the historical behaviours (quoting, escapes, whitespace, NUL byte, invalid Unicode). Tasks: #1877, #1878. * style: apply cargo fmt to ssh shell-words tokenizer tests * chore: update Cargo.lock for shell-words dependency Cargo.lock was not updated when shell-words = "1.1" was added to crates/rsync_io/Cargo.toml in cc01bbc, causing the MSRV 1.88 workflow to fail with --locked. * fix(ssh): reject trailing backslash escape in remote-shell tokenizer shell_words::split silently accepts a dangling trailing backslash, but upstream rsync (and the previous bespoke tokenizer) reject it. Add a small validator that walks the string tracking single- and double-quote regions so that backslashes inside single quotes are not flagged, and returns true only when the input ends with an unescaped backslash that has no character to escape. This restores the test parser_rejects_trailing_escape that fails on the shell-words crate alone. * test: allow stricter Parse rejection vs shell_words in parity property The bespoke remote-shell tokenizer rejects dangling trailing backslashes, matching upstream rsync's `tokens = parse_arguments(...)`. The new shell-words-based wrapper preserves that strictness via has_trailing_escape, but `shell_words::split` silently absorbs the trailing backslash and returns Ok. The parity property test was therefore failing on inputs ending in a bare `\\` because parse_remote_shell returned Err(Parse) while shell_words returned Ok. The wrapper is intentionally a tightening, not a parity match in both directions. Add an explicit (Err(Parse), Ok(_)) arm so the property test encodes the contract: we may reject inputs shell_words accepts, but never the other way around. * chore: disable coverage workflow on push/pull_request (#3460) Nightly llvm-cov 22.1 SIGSEGVs deterministically inside llvm::coverage::CoverageMapping::getInstantiationGroups, breaking both master and PR runs. Restrict the workflow to manual workflow_dispatch until a known-good nightly is pinned via RUSTUP_TOOLCHAIN. * fix: avoid mmap basis when paired with io_uring writer (#3459) * fix: avoid mmap basis when paired with io_uring writer DeltaApplicator opened its basis file via MapFile::open_adaptive, which selects mmap for files >= 1 MiB on Unix. The applicator has no production caller today, but is the obvious next target for an io_uring-output rewrite. Wiring an IoUringWriter to the applicator without changing the basis path would submit mmap-backed pointers to io_uring SQEs on block-ref copies >= the writer batch threshold, exposing two failure modes: 1. Cold-page faults on the basis file are serviced under the SQE submission thread (worse, the SQPOLL kernel thread when SQPOLL is enabled), turning a "free" zero-copy write into a synchronous fault and stalling other in-flight SQEs on the same poller. 2. Concurrent truncation of the basis file raises SIGBUS while the kernel is dereferencing the page on our behalf - recovery from in-kernel SIGBUS is not signal-safe. Upstream rsync 3.4.1 deliberately avoids mmap(2) for basis files for the same truncation reason (fileio.c:214-217). Our BufferedMap matches that decision. Closes the F1 hazard from docs/audits/mmap-iouring-co-usage.md (#1660) by introducing BasisWriterKind in DeltaApplyConfig: when the writer is io_uring-backed, DeltaApplicator::new opens the basis via the new MapFile::open_adaptive_buffered (forces AdaptiveMapStrategy::Buffered regardless of size). Standard / std-write writers retain the existing adaptive selection. Non-Unix is unaffected since BufferedMap is the only basis strategy there. Tests cover both directions: io_uring writer + 2 MiB basis stays on BufferedMap, standard writer + 2 MiB basis still picks mmap on Unix. docs/design/basis-file-io-policy.md updated with implementation status noting that the io_uring-pairing axis is now wired; remaining hazard columns (--inplace, --append, --copy-devices, sparse, network FS) are future work tracked in the same doc. Refs: #1906 (this task), #1660 (audit). * fix: drop redundant Write import in applicator tests The test module brings in `Write` via `use super::*;` which re-exports the file-level `use std::io::{self, Read, Write};`. The explicit `use std::io::Write as _;` is therefore unused and trips clippy under `-D warnings`. * feat: bridge IconvSetting to FilenameConverter (#3458) Adds IconvSetting::resolve_converter() to map the CLI-side iconv setting onto a transfer-side FilenameConverter, and wires the converter through apply_common_server_flags so every SSH and daemon ServerConfig builder populates ConnectionConfig.iconv when --iconv is in effect. Without this bridge, the CLI parsed --iconv, validated it, forwarded it to the remote peer over SSH, but the local file-list reader and writer hooks always observed None and silently passed raw bytes through. Also rejects --iconv=LOCAL,REMOTE with a hard error when the iconv cargo feature is compiled out, rather than silently no-opping. Closes #1911 Closes #1915 * perf: wire IoUringDiskBatch into disk_commit hot path (#3452) * perf: wire IoUringDiskBatch into disk_commit hot path Task #1901. The disk-commit thread previously created an IoUringDiskBatch via try_create_disk_batch() but bound it to `_disk_batch`, leaving every write to fall through to the buffered path (thread.rs:127). All disk writes therefore bypassed io_uring batching even when the policy was Auto or Enabled on Linux 5.6+. Introduce a Writer enum in disk_commit/writer.rs that dispatches between ReusableBufWriter (existing 256 KB buffered path) and a new IoUring variant that borrows the disk thread's persistent IoUringDiskBatch. The IoUring variant is cfg-gated to all(target_os = "linux", feature = "io_uring") so non-Linux builds never reference it. In disk_commit/thread.rs, drop the underscore from the batch binding and thread `disk_batch.as_mut()` into both process_file and process_whole_file on every iteration of the message loop. The batch is created once per thread and reused across all files via begin_file/commit_file. In disk_commit/process.rs, replace the unconditional ReusableBufWriter construction with a make_writer() helper that registers the file with the batch when (a) the batch is present and (b) sparse mode is disabled. Sparse writes require Seek, which IoUringDiskBatch does not implement, so sparse mode keeps using the buffered writer via buffered_for_sparse(). Semantics preserved: - Temp-file commit ordering: Writer::finish() calls commit_file(do_fsync) which flushes + fsyncs + detaches the file before commit_file() runs the rename. Buffered path drops the writer (closing the file) before rename, identical to prior behaviour. - fsync: do_fsync flag flows through Writer::flush_and_sync (buffered) and Writer::finish (io_uring). The io_uring path makes flush_and_sync a no-op so commit_file performs both atomically in a single SQE. - Error propagation: io::Result is preserved end-to-end with file_path context attached on fsync/commit errors. * fix: gate unused Writer::finish parameters on non-Linux builds The do_fsync and file_path parameters on Writer::finish are only consumed by the io_uring arm. On non-Linux (or with the io_uring feature disabled) the Buffered arm ignores them, which under -D warnings escalates to a build failure. * fix(disk_commit): force buffered writer for append mode The IoUringDiskBatch writes via SQEs with absolute offsets starting at 0, ignoring the file position set by `file.seek(append_offset)` in open_output_file. This caused --append-mode transfers under io_uring to overwrite the existing file prefix with zeros, surfacing as the standalone:append interop test failing with NUL-byte corruption. Gate Writer::IoUring off whenever begin.append_offset > 0 and route those transfers through ReusableBufWriter, which honors the seek via standard Write::write_all. upstream: receiver.c:307-308 - file position is the source of truth in append mode; any writer that ignores it cannot be used. * docs(compress): convert restating comments to rustdoc in lz4 (#2026) (#3462) Remove inline comments inside lz4 codec function bodies and tests that simply echo the next statement. All public-item rustdoc and upstream rsync references (token.c) remain intact. * fix(filters): propagate P (protect) filter modifier in sender-side filter exchange (#3463) oc-rsync was emitting `P pattern` directly on the wire while also attaching the `r` (receiver-side) modifier derived from the rule's `applies_to_receiver` flag. Upstream rsync rejects `Pr` because the `P` prefix already specifies the side and `r` after a side-specifying prefix triggers the "invalid modifier" error in exclude.c:1270-1271. Upstream's `get_rule_prefix()` (exclude.c:1536-1572) only ever emits `+`, `-`, or `:` as the leading character; `P` and `R` exist only as parser sugar that lower to plain include/exclude rules carrying the `FILTRULE_RECEIVER_SIDE` flag. The wire encoding for a P rule is `-r pattern`; the wire encoding for an R rule is `+r pattern`. Normalize Protect/Risk in `build_modern_prefix` so the wire stream matches upstream byte-for-byte. The receiver-side flag is forced when serializing Protect/Risk rule types so that hand-built `FilterRuleWireFormat` values without `receiver_side=true` still serialize correctly. Reproduced and verified the fix against upstream rsync 3.4.1 daemon in the rsync-profile podman container with `--filter='P *.log'` push: protected files now survive `--delete`. Removes `standalone:delete-filter-protect` from KNOWN_FAILURES and DASHBOARD_ENTRIES. upstream: exclude.c:1536-1542 upstream: exclude.c:1569-1572 upstream: exclude.c:1201-1206 * docs(checksums): convert restating comments to rustdoc in rolling (#2018) (#3461) Cleans up the `crates/checksums/src/rolling/` test modules: - Drop decorative banner divider in front of the golden-tests block; replace with a short paragraph that describes intent and references upstream checksum.c. - Remove restating-the-code comments (e.g. "Verify it matches manual computation", "Verify rolling produces correct checksum after shift") whose information is already conveyed by the surrounding code or test name. - Sharpen wire-format comments to call out their connection to upstream rsync's packed (s2 << 16) | s1 layout. Production source files in `rolling/` already use proper rustdoc on every public item, so no `///` conversions are required there. SAFETY comments and SIMD weight-table explanations are preserved verbatim. No runtime behaviour changes. * fix(checksums): apply checksum_seed to MD4 strong checksum (upstream parity) (#3465) Upstream rsync's `get_checksum2()` (checksum.c:358-396) appends the negotiated `checksum_seed` after the data when computing MD4 block strong sums: memcpy(buf1, buf, len); if (checksum_seed) { SIVAL(buf1, len, checksum_seed); len += 4; } Our MD4 path was unconditionally calling `Md4::digest(data)` and ignoring the seed, so block-level strong checksums diverged from upstream whenever the session negotiated a non-zero seed. This silently hurt delta-transfer match quality across rsync<->oc-rsync interop and inflated `hash_hits` / `false_alarms` counters in delta-stats output. Changes: * Add `Md4::digest_with_seed(seed, data)` mirroring upstream's append-seed- after-data semantics. A zero seed is a no-op (matches upstream's `if (checksum_seed)` guard). * Add `SignatureAlgorithm::Md4Seeded { seed }` variant. Introducing a new variant rather than mutating the unit `Md4` keeps the 30+ existing call-sites that match on `Md4` working. * `ChecksumFactory::signature_algorithm()` now selects `Md4Seeded` when MD4 (or the legacy `None`/protocol<27 default) runs with a non-zero seed, and keeps unit `Md4` for the seed=0 case so existing wire goldens stay byte- identical. * Extend exhaustive matches in `signature::algorithm`, the parallel checksum executor, and the delta-transfer bench to cover `Md4Seeded`. Tests: * `md4_seeded_appends_seed_after_data` - verifies the byte layout matches building `(data || seed_le_bytes)` and hashing with plain MD4. * `md4_seeded_zero_seed_matches_unseeded` - guards the upstream zero-seed short-circuit. * `md4_seeded_negative_seed_is_le_two_complement` - confirms i32 wire encoding for negative seeds. * `md4_seeded_signature_matches_upstream_format` - parity at the `SignatureAlgorithm` layer. * Three `ChecksumFactory` tests covering MD4/None x seed=0/non-zero. upstream: checksum.c:358-396 `get_checksum2()` MD4 branch * fix(filters): correct R-before-P ordering in delete-filter-risk interop test (#3464) * fix(filters): correct R-before-P ordering in delete-filter-risk interop test The standalone:delete-filter-risk interop test had the filter rules in the wrong order: --filter='P *.log' --filter='P *.sh' --filter='R *.log'. Upstream rsync evaluates filter rules first-match-wins (exclude.c:1038-1065 check_filter()). For R (risk) to override P (protect) on *.log, it must be listed BEFORE the P rule. With R last, the P rule matched first and the *.log files were never deleted. Verified that upstream rsync 3.4.1 itself produces the same "wrong" outcome with the original P-first ordering, confirming the test expectation - not the filter engine - was incorrect. oc-rsync's filter engine already mirrors upstream semantics; the unit test risk_rule_allows_deletion_before_protection in crates/filters/src/tests.rs explicitly documents the required R-before-P order. Reordered both directions (up->oc and oc->up) to use --filter='R *.log' --filter='P *.log' --filter='P *.sh' and updated the function-level comment to cite exclude.c:1201-1207 (R rflags) and exclude.c:1038-1065 (first-match-wins). Verified end-to-end in the rsync-profile container that both upstream->oc and oc->upstream daemon pushes now produce the expected outcome: - source files transferred - risky.log + subdir/nested.log deleted (R overrides P) - keeper.sh preserved (P with no preceding R) - destonly.txt deleted (no protection) Removed the entry from KNOWN_FAILURES and DASHBOARD_ENTRIES in tools/ci/known_failures.conf. upstream: exclude.c:1201-1207 'R' = FILTRULE_INCLUDE|FILTRULE_RECEIVER_SIDE upstream: exclude.c:1038-1065 check_filter() first-match-wins * fix(filters): suppress redundant r/s modifier on Risk/Protect wire prefix The wire encoder unconditionally emitted `s`/`r` modifiers when the sender_side/receiver_side flags were set on a FilterRuleWireFormat. This produced output like `Rr *.log` for a Risk rule (which oc-rsync internally tags as receiver-side). Upstream rsync's parser rejects that with `invalid modifier 'r' at position 1 in filter rule`, breaking the delete-filter-risk interop scenario in oc->upstream direction with exit 12. Upstream's filter parser (exclude.c:1180-1207) sets prefix_specifies_side when the rule type char is R/P (and S/H), and exclude.c:1269-1278 rejects an explicit `r`/`s` modifier in that case. Match upstream's get_rule_prefix (exclude.c:1525-1587), which never emits the redundant flag for those prefixes. Suppress `s`/`r` emission when the rule type is Risk or Protect. Add unit tests at the prefix builder level and wire-level tests including a Risk roundtrip to lock in the format. * docs(protocol): convert restating comments to rustdoc in varint (#1999) (#3466) Remove decorative banner comments from the varint test module and tighten internal comments in `read_varlong` to delete pure restatement while preserving upstream rsync C source references (`io.c:read_varlong`, `memcpy(u.b, b2+1, min_bytes-1)`). No behaviour, signature, or import changes. * docs(compress): convert restating comments to rustdoc in zlib (#2029) (#3467) Remove decorative and restating-the-code comments from the zlib module's tests, while preserving every upstream rsync C source reference and comments that explain non-obvious test design rationale. Non-test files (decoder.rs, encoder.rs, helpers.rs, level.rs, mod.rs) already use proper rustdoc and required no changes. * docs: clean up comments in core::timeout (#2014) (#3471) * docs(signature): convert restating comments to rustdoc (#2034) (#3474) Drops inline comments that merely echo the immediately following code in block_size.rs, pipelined_gen.rs, and async_gen.rs. Reworks the few that remain to focus on the WHY (upstream-derived phase 2 short-circuit, sqrt bound derivation, BATCH_SIZE cross-file consistency, disk I/O thread cap). Comment-only change with no functional impact. * docs(metadata): convert restating comments to rustdoc in chmod (#2022) (#3473) Remove section-banner comments and pure-restatement comments from chmod test modules. Convert a stranded `///` doc-comment in `upstream_compatibility` to a proper `//!` module doc. Preserve all comments that reference upstream rsync semantics or that explain the non-obvious conditional-X / assign-empty-perms / octal-validation rules. No functional code changes. * feat(ci): add upstream rsync testsuite harness (#3470) Adds tools/ci/run_upstream_testsuite.sh, which runs upstream rsync's testsuite/*.test scripts against oc-rsync. Mirrors upstream's runtests.sh contract: exports $RSYNC, $TOOLDIR, $srcdir, $suitedir, $scratchdir, $TLS_ARGS, $POSIXLY_CORRECT, and $ECHO_* per test. Difference vs upstream runtests.sh: $RSYNC points at oc-rsync, not upstream. Upstream sources are still needed for helper tools (tls, getgroups, lsh.sh) and config artefacts (config.h, shconfig); the harness fetches and configures them on first run. Known failures live in tools/ci/upstream_testsuite_known_failures.conf and are grouped by category (lsh.sh-routed remote-shell tests, daemon-mode tests, ACL/xattr, devices, chown, hardlinks INC_RECURSE, atimes/crtimes, etc). Tests in this list are reported XFAIL on failure and UPASS on unexpected success so the list stays self-curating. Wires up risk area #4 from the audit: protocol edge-case compatibility against upstream's own conformance suite. Usage: tools/ci/run_upstream_testsuite.sh # all tests WHICHTESTS=00-hello.test tools/ci/... # one test PRESERVE_SCRATCH=yes tools/ci/... # keep per-test dirs * docs: clean up comments in protocol::error_recovery (#1985) (#3472) Remove restatement and decorative banner comments that echo the code without adding value. Replace two banner-style remarks with focused notes that explain non-obvious behavior (over-receive handling, why HashMap re-insert is the desired semantic, the conservative default classification, and the resume-vs-retry policy on transient errors). Scope is limited to crates/protocol/src/error_recovery/. No functional changes; tests and public APIs are unchanged. Refs #1985 * docs(checksums): convert restating comments to rustdoc in strong (#2020) (#3469) Audit of `crates/checksums/src/strong/` (excluding `strategy/`, `md4_tests.rs`, `xxh64_tests.rs` per task scope). Production source files are already well-documented from prior cleanup passes: every public type, trait, function, and method carries rustdoc; upstream rsync references (`upstream: checksum.c:get_checksum2()` and similar) are preserved on the seed-handling paths in `md4.rs` and `md5.rs`; no decorative banners, debug checkpoints, or commented-out code remain in scope. The single substantive change in this pass adds a quirk comment to the private `detect()` helper in `openssl_support.rs` that captures non-obvious upstream behaviour: MD4 may be absent on OpenSSL builds without the legacy provider, so the MD4 probe is best-effort and overall detection succeeds as long as MD5 works. This explains the otherwise-mysterious `if let Some(md4) = ...` whose result is deliberately discarded. No behaviour, signature, or import changes. * docs(compress): convert restating comments to rustdoc in strategy (#2028) (#3468) Remove decorative banner/divider comments from the compress::strategy module. The module's rustdoc was already comprehensive; this cleanup removes the remaining `// ---- Section ----` test-grouping banners that violate the no-decorative-divider rule from <internal-tool-doc>. All upstream rsync C source references and load-bearing internal comments are preserved verbatim. No production code, function bodies, signatures, types, or imports were modified. * docs: add v0.6 known limitations and security hardening notes Adds a "Known Limitations / Architectural Trade-offs" section to README.md covering PBUF_RING 5.19+ requirement, io_uring buffer pool ceiling, SSH double-compression interaction, single-thread delta computation, daemon TLS gap, and Windows IOCP wiring status (#1868). Updates SECURITY.md supported-versions table to 0.6.x and adds a "Hardening Notes" section covering recycle_buffer release-mode bounds-check gap, bgid u16 namespace exhaustion, SSH double-compression amplification surface, daemon TLS-in-front guidance (stunnel / SSH tunnel / reverse proxy), and daemon module hardening defaults. Cross-links the two documents so operators planning a deployment have a single source of truth for the documented trade-offs. * docs: comment cleanup in protocol::flist::hardlink Remove restating-the-code comments from the hardlink test suite and fix two malformed upstream references in types.rs that used `// upstream:` inside rustdoc blocks (rendered literally in generated docs). Preserved: all upstream rsync source references, non-obvious WHY explanations (FxHash collision rationale, swapped-value asymmetry, weak-mixing input categorisation), and module-level documentation. * docs(engine): audit --fuzzy basis-file search algorithm (#2051) Document oc-rsync's --fuzzy / -y basis-file search and scoring against upstream rsync 3.4.1 (generator.c::find_fuzzy() and util1.c::fuzzy_distance()). Identifies six divergences (scoring metric, phase A size+mtime short-circuit, FNAMECMP_FUZZY wire emission, suffix heuristics, already-sent filtering, tie-breaker direction) and three matches (-yy plumbing, --whole-file/--inplace /--append interactions). Recommends Levenshtein-based scoring, phase A short-circuit, and ITEM_BASIS_TYPE_FOLLOWS emission.
oferchen
added a commit
that referenced
this pull request
May 5, 2026
oferchen
added a commit
that referenced
this pull request
May 5, 2026
…#3515) Document the High-severity backslash-leak finding with reproduction steps and the remediation plan referencing task #1905. Refines the parent audit #1842 (windows-path-normalization.md) with a focused finding-level write-up: code paths, reproduction commands (daemon and batch capture), upstream comparison via Cygwin, the path_bytes_to_wire helper plan, regression-test obligations, and the Windows-sender -> Linux-receiver CI matrix entry.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Testing
https://chatgpt.com/codex/tasks/task_e_690904f615188323aab84469000abc8d