Skip to content

Ensure fallback binary lookup mirrors Windows PATHEXT#1913

Merged
oferchen merged 1 commit into
masterfrom
implement-missing-components-for-oc-rsync
Nov 3, 2025
Merged

Ensure fallback binary lookup mirrors Windows PATHEXT#1913
oferchen merged 1 commit into
masterfrom
implement-missing-components-for-oc-rsync

Conversation

@oferchen
Copy link
Copy Markdown
Owner

@oferchen oferchen commented Nov 3, 2025

Summary

  • expand fallback binary candidate generation to reuse PATHEXT entries for explicit Windows paths and deduplicate results
  • factor path-extension application into a shared helper so PATH scanning and explicit overrides reuse the same normalization
  • add a Windows-specific regression test covering PATHEXT expansion when the override points at a directory-qualified binary name

Testing

  • cargo test -p rsync-core fallback::binary

https://chatgpt.com/codex/tasks/task_e_69091f3ec794832392f6819450f8f956

@oferchen oferchen merged commit 5550da9 into master Nov 3, 2025
1 check passed
@oferchen oferchen deleted the implement-missing-components-for-oc-rsync branch November 3, 2025 21:50
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +58 to +66
fn candidates_for_explicit_path(path: &Path) -> Vec<PathBuf> {
let mut results = Vec::new();
let mut seen = HashSet::new();

push_candidate(path.to_path_buf(), &mut seen, &mut results);

for ext in collect_windows_extensions(path.extension()) {
if let Some(candidate) = apply_extension(path, &ext) {
push_candidate(candidate, &mut seen, &mut results);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skip PATHEXT variants for explicit paths with extensions

On Windows, candidates_for_explicit_path now always applies collect_windows_extensions even when the caller already supplied an explicit extension. That means an override like C:\tools\rsync.exe will report as available whenever any sibling rsync.<ext> exists (for example rsync.cmd), but the subsequent spawn still uses the original .exe path and fails. This regresses the previous behaviour which only checked the exact path and aligns with how CreateProcess works when the command contains a suffix. Consider skipping PATHEXT expansion when path.extension().is_some() so explicit paths are validated exactly and false positives are avoided.

Useful? React with 👍 / 👎.

oferchen added a commit that referenced this pull request May 1, 2026
…1911)

Closes #1911 by completing the local-copy bridge between the parsed
`IconvSetting` on `ClientConfig` and the engine's `LocalCopyOptions`,
mirroring the SSH/daemon bridge that PR #3458 wired through
`apply_common_server_flags`.

The audit at `docs/audits/iconv-inert.md` (PR #3526) labels #1911 as
"Partial" because the local-copy path bypasses
`apply_common_server_flags` and therefore had no route by which a
resolved `FilenameConverter` could reach the engine.

Changes:

- `engine::local_copy::LocalCopyOptions` and
  `LocalCopyOptionsBuilder` gain an `iconv: Option<FilenameConverter>`
  field with a `with_iconv` setter and `iconv()` accessor on the
  options struct, plus an `iconv()` setter on the builder.
- `core::client::run::LocalCopyOptionsBuilder::build` (the internal
  wrapper around the engine builder) calls
  `config.iconv().resolve_converter()` and propagates the result via
  the new setter.
- Unit tests in `crates/engine/src/local_copy/options/filters.rs`
  cover default-none, attach, and clear.
- Unit tests in `crates/core/src/client/run/mod.rs` cover the
  `IconvSetting -> LocalCopyOptions.iconv` mapping for unspecified,
  disabled, locale-default, explicit, and unsupported-charset variants.

This PR plumbs the converter only; it does not yet apply it on
sender file-list emit (#1912), receiver file-list ingest (#1913),
or filter-rule path matching (#1914). Those are tracked separately.
oferchen added a commit that referenced this pull request May 2, 2026
Annotate the receiver flist ingest path so it explicitly references upstream
`flist.c recv_file_entry() iconv_buf(ic_recv, ...)` and adds regression tests
that exercise the wire-to-local conversion through `FileListReader`.

Wire bytes hit `apply_encoding_conversion` after `read_name` updates the
prefix-compression buffer, mirroring upstream's `lastname` semantics where
the static buffer holds the unconverted wire bytes so subsequent entries
share the prefix as the sender intended.

The new tests cover:
- ISO-8859-1 wire bytes decoded to UTF-8 with `--iconv=UTF-8,ISO-8859-1`
- pass-through when no converter is configured
- identity converter is a no-op

Refs #1913.
oferchen added a commit that referenced this pull request May 2, 2026
Per docs/audits/iconv-pipeline.md Finding 4, the IconvSetting ->
protocol::FilenameConverter bridge does not exist in production code,
so --iconv is a no-op end to end in SSH/local mode and the test hangs
on direction (a) (oc-rsync sender -> upstream receiver).

The wiring lands in #1911 (config build), #1912 (sender flist emit),
and #1913 (receiver flist ingest). Once those merge, remove this
entry so the test starts gating regressions.
oferchen added a commit that referenced this pull request May 2, 2026
…) (#3546)

Annotate the receiver flist ingest path so it explicitly references upstream
`flist.c recv_file_entry() iconv_buf(ic_recv, ...)` and adds regression tests
that exercise the wire-to-local conversion through `FileListReader`.

Wire bytes hit `apply_encoding_conversion` after `read_name` updates the
prefix-compression buffer, mirroring upstream's `lastname` semantics where
the static buffer holds the unconverted wire bytes so subsequent entries
share the prefix as the sender intended.

The new tests cover:
- ISO-8859-1 wire bytes decoded to UTF-8 with `--iconv=UTF-8,ISO-8859-1`
- pass-through when no converter is configured
- identity converter is a no-op

Refs #1913.
oferchen added a commit that referenced this pull request May 2, 2026
…1916) (#3534)

* test(interop): add --iconv interop test against upstream rsync 3.4.1 (#1916)

Add test_iconv_local_ssh_interop to tools/ci/run_interop.sh covering
the SSH/local-mode side of --iconv interop with upstream rsync 3.4.1,
the path PR #3458 wired up via IconvSetting -> FilenameConverter. Two
directions are exercised through a fake remote-shell wrapper that
discards the host argument and exec's the rest locally:

  a) oc-rsync sender -> upstream receiver  (--rsh=fake --rsync-path=upstream)
  b) upstream sender -> oc-rsync receiver  (--rsh=fake --rsync-path=oc-rsync)

The fixture is UTF-8 source filenames whose code points all fit in
ISO-8859-1 (cafe, uber, naive, Zurich); --iconv=UTF-8,ISO-8859-1
forces Latin-1 wire encoding while the local charset stays UTF-8.

The companion daemon-mode scenario, "standalone:iconv-upstream", stays
in known_failures.conf until daemon-side `charset =` plumbing lands
(#1911-#1917 per docs/audits/iconv-pipeline.md). Comments updated to
make the SSH/local vs daemon split explicit.

Pre-checks:
  - upstream binary version availability (graceful skip).
  - upstream iconv compile-time support (graceful skip on --disable-iconv).
  - host filesystem accepts UTF-8 names (graceful skip).

References:
  upstream: options.c:recv_iconv_settings, flist.c:738-754, 1579-1603
  oc-rsync: docs/audits/iconv-pipeline.md (Findings 1-7)

* test(interop): mark iconv-local-ssh known failure pending #1911-#1913

Per docs/audits/iconv-pipeline.md Finding 4, the IconvSetting ->
protocol::FilenameConverter bridge does not exist in production code,
so --iconv is a no-op end to end in SSH/local mode and the test hangs
on direction (a) (oc-rsync sender -> upstream receiver).

The wiring lands in #1911 (config build), #1912 (sender flist emit),
and #1913 (receiver flist ingest). Once those merge, remove this
entry so the test starts gating regressions.
oferchen added a commit that referenced this pull request May 2, 2026
… (#3552)

Locks the byte-level wire encoding of filenames that pass through the
--iconv pipeline (#1911 config, #1912 sender, #1913 receiver), so future
regressions in either the sender or the receiver are caught at the byte
level rather than at the round-trip level.

Coverage:
  - sender (UTF-8 -> ISO-8859-1, KOI8-R, windows-1252) emits exact
    remote-charset bytes; suffix_len header reflects the post-iconv
    length, not the UTF-8 source length
  - receiver decodes ISO-8859-1, KOI8-R, and windows-1252 wire bytes
    into UTF-8 disk names byte-for-byte
  - identity converter (UTF-8 <-> UTF-8) preserves raw UTF-8 bytes
  - ASCII-only names pass through both directions unchanged
  - sender + receiver wire bytes match exactly (iconv writer output
    equals raw remote-charset entry output; iconv reader output equals
    UTF-8-native plain reader output)
  - round-trip preserves "café", "файл", "Ångström.txt" through their
    respective remote charsets
  - directory-prefixed compression operates on post-iconv bytes so
    sender and receiver agree on the iconv-then-compress order
  - unmappable source characters (Greek alpha into ISO-8859-1) and
    invalid wire bytes (malformed UTF-8) surface InvalidData errors,
    mirroring upstream's IOERR_GENERAL path in flist.c

Tests gated on `feature = "iconv"` and Unix where non-UTF-8 path bytes
are preserved by OsStr. Wire-format bytes verified against upstream
rsync 3.4.1's flist.c send_file_entry/recv_file_entry encoding.
oferchen added a commit that referenced this pull request May 2, 2026
Previous multi-file fixture exposed an unrelated bug: the receiver's
flist ingest pipeline misaligns content with names when multiple files
re-sort after iconv conversion (UTF-8 -> Latin-1). That is a separate
bug (#1913 receiver flist ingest) and out of scope for the daemon
charset directive PR.

Replace the parallel-array fixture with a single-file fixture
(cafe.txt, U+00E9) so the test only validates what #1917 fixes:
the daemon's `charset =` directive activating ic_recv on the
receiver to convert wire-UTF-8 filenames to disk-Latin-1 bytes.

The multi-file scenario will be re-enabled once #1913 lands.
oferchen added a commit that referenced this pull request May 2, 2026
…#3554)

* feat(daemon): wire module charset= directive to iconv runtime (#1917)

The daemon parses `charset =` (module_directives.rs:330) and stores it on
ModuleDefinition.charset, but the value was never threaded into the iconv
runtime - so `--iconv` from a client to an oc-rsync daemon was a silent
no-op even when the module config declared a charset.

Wire it through build_server_config: read module.charset(), resolve it to
a protocol::iconv::FilenameConverter, and assign it to
ServerConfig.connection.iconv (the same field the SSH/local client path
already populates at remote/flags.rs:228).

Mirror upstream's setup_iconv() (rsync.c:87-140, clientserver.c:712-716):

- comma-separated `LOCAL,REMOTE` honours the post-comma segment on the
  server (upstream rsync.c:118-120, am_server branch);
- `.` and the empty string resolve to the locale default
  (upstream rsync.c:125-126);
- unknown charsets log via tracing and degrade to None (no transcoding)
  rather than aborting, matching the lenient behaviour
  IconvSetting::resolve_converter already uses on the client side.

Add daemon `iconv` Cargo feature that turns on `core/iconv` and
`protocol/iconv` so encoding_rs is actually compiled in. Without the
feature, FilenameConverter::new only accepts UTF-8 and the runtime
behaviour is identical to before this change.

Add 8 unit tests in client_args.rs covering: missing/empty/whitespace
directives, `.` and `LOCAL,.` identity, comma-segment selection, unknown
charset soft failure, whitespace trimming, and a Latin-1<->UTF-8 byte
round trip.

Drop `standalone:iconv-upstream` from KNOWN_FAILURES and DASHBOARD_ENTRIES
in tools/ci/known_failures.conf so the existing iconv-upstream interop
test (run_interop.sh:3168, daemon push with `charset = ISO-8859-1`) starts
gating regressions.

* fix(daemon): use field access for charset since accessor is cfg(test) only (#1917)

The charset() accessor on ModuleDefinition is in a #[cfg(test)] impl block,
so it isn't visible in non-test builds. Use the pub(crate) field directly,
matching the surrounding accesses (dont_compress, temp_dir, path).

* test(interop): dump dest contents and daemon log on iconv content mismatch

The iconv-upstream test was reporting "plain.txt content mismatch after
iconv transfer" without revealing what bytes were actually written, which
files exist at the destination, or what the daemon logged. Add a
hexdump+find+log-tail dump on both the missing-file and cmp-mismatch
branches so the next CI run pinpoints whether plain.txt is empty,
truncated, or carrying a different file's bytes (e.g. NDX misalignment).

Refs #1917

* test(interop): redesign iconv test for ISO-8859-1 dest filenames

The previous test asserted that destination filenames remained UTF-8
after `--iconv=UTF-8,ISO-8859-1` with daemon `charset = ISO-8859-1`.
That premise contradicts upstream rsync's wire model:

  rsync.c:130-140 - the wire is always UTF-8, and the `charset`
  parameter (CLI LOCAL,REMOTE or daemon `charset =`) names the *local
  disk* encoding, not the wire encoding.

Result: the daemon's ic_recv (UTF-8 -> ISO-8859-1) writes single-byte
Latin-1 filenames to disk (caf\xe9.txt), but the test was looking for
UTF-8 multi-byte filenames (caf\xc3\xa9.txt) that never exist there.

Redesign uses parallel-array Value Objects to capture the source-side
UTF-8 bytes and the dest-side ISO-8859-1 bytes for each logical name,
plus single-responsibility helpers:

  _ic_init_fixtures   - populate the fixture mapping
  _ic_write_fixtures  - create source files; signal SKIP on host reject
  _ic_verify_dest     - verify dest has the ISO-8859-1 byte name + body
  _ic_dump_failure    - uniform diagnostic dump (octal-escaped ls + log)

Verification compares the source file content against the expected
ISO-8859-1 dest path, which is the property that proves the daemon's
iconv pipeline ran end-to-end.

* test(interop): single-file iconv fixture isolates #1917 from #1913

Previous multi-file fixture exposed an unrelated bug: the receiver's
flist ingest pipeline misaligns content with names when multiple files
re-sort after iconv conversion (UTF-8 -> Latin-1). That is a separate
bug (#1913 receiver flist ingest) and out of scope for the daemon
charset directive PR.

Replace the parallel-array fixture with a single-file fixture
(cafe.txt, U+00E9) so the test only validates what #1917 fixes:
the daemon's `charset =` directive activating ic_recv on the
receiver to convert wire-UTF-8 filenames to disk-Latin-1 bytes.

The multi-file scenario will be re-enabled once #1913 lands.
oferchen added a commit that referenced this pull request May 2, 2026
#1911) (#3555)

* feat(transfer): wire IconvSetting -> FilenameConverter at config build (#1911)

Per docs/audits/iconv-pipeline.md Finding 4, --iconv was previously a
no-op end to end: the CLI parsed --iconv into IconvSetting and forwarded
the value to the remote peer, but the local process never converted that
setting into a protocol::FilenameConverter, so the file-list reader and
writer ran with iconv: None and copied raw bytes through transfer.

The bridge is now in place across all three transfer modes:

- SSH push/pull (and embedded SSH): apply_common_server_flags writes
  `server_config.connection.iconv = config.iconv().resolve_converter()`
  in crates/core/src/client/remote/flags.rs:228, invoked from
  build_server_config_for_receiver and build_server_config_for_generator
  (ssh_transfer.rs and embedded_ssh_transfer.rs).
- Daemon push/pull: apply_common_server_flags is also called from
  daemon_transfer/orchestration/server_config.rs (both paths). On the
  daemon side, module_access/client_args.rs:283 resolves the module's
  `charset =` directive into the same FilenameConverter slot.
- Local copy: LocalCopyOptionsBuilder::apply_iconv in
  crates/core/src/client/run/mod.rs:399 writes the converter onto
  LocalCopyOptions::iconv, since the local-copy executor in the engine
  crate bypasses the SSH/daemon ServerConfig builder.

The transfer crate consumes the converter at:

- crates/transfer/src/receiver/mod.rs:369 - reader.with_iconv(...)
- crates/transfer/src/generator/mod.rs:570 - writer.with_iconv(...)

These hand the converter to the protocol crate's FileListReader /
FileListWriter, where #1912 and #1913 already apply it via
apply_encoding_conversion on every emitted and ingested filename. With
the producer-side wiring landed, the test_iconv_local_ssh interop
scenario starts gating regressions, so the standalone:iconv-local-ssh
entry is removed from KNOWN_FAILURES and DASHBOARD_ENTRIES.

Upstream references:

- flist.c:1579-1603 send_file_entry filename iconvbufs(ic_send, ...)
- flist.c:738-754 recv_file_entry filename iconvbufs(ic_recv, ...)
- options.c::recv_iconv_settings parses --iconv=LOCAL,REMOTE

* revert(ci): restore standalone:iconv-local-ssh known failure

PR #3458 (cd76ec2) wired IconvSetting -> FilenameConverter for the
local generator/receiver, but two architectural gaps prevent the
SSH iconv scenario from passing end-to-end:

- --iconv is not forwarded to the spawned remote rsync. Upstream
  options.c:2715-2723 forwards the post-comma half of iconv_opt so
  the remote setup_iconv() agrees on charsets.
- FilenameConverter (crates/protocol/src/iconv/converter.rs) treats
  remote_encoding as the literal wire charset. Upstream rsync.c:87-147
  always uses UTF-8 on the wire and treats `charset` as each peer's
  local charset.

Audit findings 1-3 and 5 (docs/audits/iconv-pipeline.md) also remain
open: symlink target transcoding, --files-from line forwarding,
secluded-args argv re-encoding, CF_SYMLINK_ICONV gating.

Restoring the KF entry until the iconv pipeline rework lands.
oferchen added a commit that referenced this pull request May 5, 2026
…1911)

Closes #1911 by completing the local-copy bridge between the parsed
`IconvSetting` on `ClientConfig` and the engine's `LocalCopyOptions`,
mirroring the SSH/daemon bridge that PR #3458 wired through
`apply_common_server_flags`.

The audit at `docs/audits/iconv-inert.md` (PR #3526) labels #1911 as
"Partial" because the local-copy path bypasses
`apply_common_server_flags` and therefore had no route by which a
resolved `FilenameConverter` could reach the engine.

Changes:

- `engine::local_copy::LocalCopyOptions` and
  `LocalCopyOptionsBuilder` gain an `iconv: Option<FilenameConverter>`
  field with a `with_iconv` setter and `iconv()` accessor on the
  options struct, plus an `iconv()` setter on the builder.
- `core::client::run::LocalCopyOptionsBuilder::build` (the internal
  wrapper around the engine builder) calls
  `config.iconv().resolve_converter()` and propagates the result via
  the new setter.
- Unit tests in `crates/engine/src/local_copy/options/filters.rs`
  cover default-none, attach, and clear.
- Unit tests in `crates/core/src/client/run/mod.rs` cover the
  `IconvSetting -> LocalCopyOptions.iconv` mapping for unspecified,
  disabled, locale-default, explicit, and unsupported-charset variants.

This PR plumbs the converter only; it does not yet apply it on
sender file-list emit (#1912), receiver file-list ingest (#1913),
or filter-rule path matching (#1914). Those are tracked separately.
oferchen added a commit that referenced this pull request May 5, 2026
…) (#3546)

Annotate the receiver flist ingest path so it explicitly references upstream
`flist.c recv_file_entry() iconv_buf(ic_recv, ...)` and adds regression tests
that exercise the wire-to-local conversion through `FileListReader`.

Wire bytes hit `apply_encoding_conversion` after `read_name` updates the
prefix-compression buffer, mirroring upstream's `lastname` semantics where
the static buffer holds the unconverted wire bytes so subsequent entries
share the prefix as the sender intended.

The new tests cover:
- ISO-8859-1 wire bytes decoded to UTF-8 with `--iconv=UTF-8,ISO-8859-1`
- pass-through when no converter is configured
- identity converter is a no-op

Refs #1913.
oferchen added a commit that referenced this pull request May 5, 2026
…1916) (#3534)

* test(interop): add --iconv interop test against upstream rsync 3.4.1 (#1916)

Add test_iconv_local_ssh_interop to tools/ci/run_interop.sh covering
the SSH/local-mode side of --iconv interop with upstream rsync 3.4.1,
the path PR #3458 wired up via IconvSetting -> FilenameConverter. Two
directions are exercised through a fake remote-shell wrapper that
discards the host argument and exec's the rest locally:

  a) oc-rsync sender -> upstream receiver  (--rsh=fake --rsync-path=upstream)
  b) upstream sender -> oc-rsync receiver  (--rsh=fake --rsync-path=oc-rsync)

The fixture is UTF-8 source filenames whose code points all fit in
ISO-8859-1 (cafe, uber, naive, Zurich); --iconv=UTF-8,ISO-8859-1
forces Latin-1 wire encoding while the local charset stays UTF-8.

The companion daemon-mode scenario, "standalone:iconv-upstream", stays
in known_failures.conf until daemon-side `charset =` plumbing lands
(#1911-#1917 per docs/audits/iconv-pipeline.md). Comments updated to
make the SSH/local vs daemon split explicit.

Pre-checks:
  - upstream binary version availability (graceful skip).
  - upstream iconv compile-time support (graceful skip on --disable-iconv).
  - host filesystem accepts UTF-8 names (graceful skip).

References:
  upstream: options.c:recv_iconv_settings, flist.c:738-754, 1579-1603
  oc-rsync: docs/audits/iconv-pipeline.md (Findings 1-7)

* test(interop): mark iconv-local-ssh known failure pending #1911-#1913

Per docs/audits/iconv-pipeline.md Finding 4, the IconvSetting ->
protocol::FilenameConverter bridge does not exist in production code,
so --iconv is a no-op end to end in SSH/local mode and the test hangs
on direction (a) (oc-rsync sender -> upstream receiver).

The wiring lands in #1911 (config build), #1912 (sender flist emit),
and #1913 (receiver flist ingest). Once those merge, remove this
entry so the test starts gating regressions.
oferchen added a commit that referenced this pull request May 5, 2026
… (#3552)

Locks the byte-level wire encoding of filenames that pass through the
--iconv pipeline (#1911 config, #1912 sender, #1913 receiver), so future
regressions in either the sender or the receiver are caught at the byte
level rather than at the round-trip level.

Coverage:
  - sender (UTF-8 -> ISO-8859-1, KOI8-R, windows-1252) emits exact
    remote-charset bytes; suffix_len header reflects the post-iconv
    length, not the UTF-8 source length
  - receiver decodes ISO-8859-1, KOI8-R, and windows-1252 wire bytes
    into UTF-8 disk names byte-for-byte
  - identity converter (UTF-8 <-> UTF-8) preserves raw UTF-8 bytes
  - ASCII-only names pass through both directions unchanged
  - sender + receiver wire bytes match exactly (iconv writer output
    equals raw remote-charset entry output; iconv reader output equals
    UTF-8-native plain reader output)
  - round-trip preserves "café", "файл", "Ångström.txt" through their
    respective remote charsets
  - directory-prefixed compression operates on post-iconv bytes so
    sender and receiver agree on the iconv-then-compress order
  - unmappable source characters (Greek alpha into ISO-8859-1) and
    invalid wire bytes (malformed UTF-8) surface InvalidData errors,
    mirroring upstream's IOERR_GENERAL path in flist.c

Tests gated on `feature = "iconv"` and Unix where non-UTF-8 path bytes
are preserved by OsStr. Wire-format bytes verified against upstream
rsync 3.4.1's flist.c send_file_entry/recv_file_entry encoding.
oferchen added a commit that referenced this pull request May 5, 2026
…#3554)

* feat(daemon): wire module charset= directive to iconv runtime (#1917)

The daemon parses `charset =` (module_directives.rs:330) and stores it on
ModuleDefinition.charset, but the value was never threaded into the iconv
runtime - so `--iconv` from a client to an oc-rsync daemon was a silent
no-op even when the module config declared a charset.

Wire it through build_server_config: read module.charset(), resolve it to
a protocol::iconv::FilenameConverter, and assign it to
ServerConfig.connection.iconv (the same field the SSH/local client path
already populates at remote/flags.rs:228).

Mirror upstream's setup_iconv() (rsync.c:87-140, clientserver.c:712-716):

- comma-separated `LOCAL,REMOTE` honours the post-comma segment on the
  server (upstream rsync.c:118-120, am_server branch);
- `.` and the empty string resolve to the locale default
  (upstream rsync.c:125-126);
- unknown charsets log via tracing and degrade to None (no transcoding)
  rather than aborting, matching the lenient behaviour
  IconvSetting::resolve_converter already uses on the client side.

Add daemon `iconv` Cargo feature that turns on `core/iconv` and
`protocol/iconv` so encoding_rs is actually compiled in. Without the
feature, FilenameConverter::new only accepts UTF-8 and the runtime
behaviour is identical to before this change.

Add 8 unit tests in client_args.rs covering: missing/empty/whitespace
directives, `.` and `LOCAL,.` identity, comma-segment selection, unknown
charset soft failure, whitespace trimming, and a Latin-1<->UTF-8 byte
round trip.

Drop `standalone:iconv-upstream` from KNOWN_FAILURES and DASHBOARD_ENTRIES
in tools/ci/known_failures.conf so the existing iconv-upstream interop
test (run_interop.sh:3168, daemon push with `charset = ISO-8859-1`) starts
gating regressions.

* fix(daemon): use field access for charset since accessor is cfg(test) only (#1917)

The charset() accessor on ModuleDefinition is in a #[cfg(test)] impl block,
so it isn't visible in non-test builds. Use the pub(crate) field directly,
matching the surrounding accesses (dont_compress, temp_dir, path).

* test(interop): dump dest contents and daemon log on iconv content mismatch

The iconv-upstream test was reporting "plain.txt content mismatch after
iconv transfer" without revealing what bytes were actually written, which
files exist at the destination, or what the daemon logged. Add a
hexdump+find+log-tail dump on both the missing-file and cmp-mismatch
branches so the next CI run pinpoints whether plain.txt is empty,
truncated, or carrying a different file's bytes (e.g. NDX misalignment).

Refs #1917

* test(interop): redesign iconv test for ISO-8859-1 dest filenames

The previous test asserted that destination filenames remained UTF-8
after `--iconv=UTF-8,ISO-8859-1` with daemon `charset = ISO-8859-1`.
That premise contradicts upstream rsync's wire model:

  rsync.c:130-140 - the wire is always UTF-8, and the `charset`
  parameter (CLI LOCAL,REMOTE or daemon `charset =`) names the *local
  disk* encoding, not the wire encoding.

Result: the daemon's ic_recv (UTF-8 -> ISO-8859-1) writes single-byte
Latin-1 filenames to disk (caf\xe9.txt), but the test was looking for
UTF-8 multi-byte filenames (caf\xc3\xa9.txt) that never exist there.

Redesign uses parallel-array Value Objects to capture the source-side
UTF-8 bytes and the dest-side ISO-8859-1 bytes for each logical name,
plus single-responsibility helpers:

  _ic_init_fixtures   - populate the fixture mapping
  _ic_write_fixtures  - create source files; signal SKIP on host reject
  _ic_verify_dest     - verify dest has the ISO-8859-1 byte name + body
  _ic_dump_failure    - uniform diagnostic dump (octal-escaped ls + log)

Verification compares the source file content against the expected
ISO-8859-1 dest path, which is the property that proves the daemon's
iconv pipeline ran end-to-end.

* test(interop): single-file iconv fixture isolates #1917 from #1913

Previous multi-file fixture exposed an unrelated bug: the receiver's
flist ingest pipeline misaligns content with names when multiple files
re-sort after iconv conversion (UTF-8 -> Latin-1). That is a separate
bug (#1913 receiver flist ingest) and out of scope for the daemon
charset directive PR.

Replace the parallel-array fixture with a single-file fixture
(cafe.txt, U+00E9) so the test only validates what #1917 fixes:
the daemon's `charset =` directive activating ic_recv on the
receiver to convert wire-UTF-8 filenames to disk-Latin-1 bytes.

The multi-file scenario will be re-enabled once #1913 lands.
oferchen added a commit that referenced this pull request May 5, 2026
#1911) (#3555)

* feat(transfer): wire IconvSetting -> FilenameConverter at config build (#1911)

Per docs/audits/iconv-pipeline.md Finding 4, --iconv was previously a
no-op end to end: the CLI parsed --iconv into IconvSetting and forwarded
the value to the remote peer, but the local process never converted that
setting into a protocol::FilenameConverter, so the file-list reader and
writer ran with iconv: None and copied raw bytes through transfer.

The bridge is now in place across all three transfer modes:

- SSH push/pull (and embedded SSH): apply_common_server_flags writes
  `server_config.connection.iconv = config.iconv().resolve_converter()`
  in crates/core/src/client/remote/flags.rs:228, invoked from
  build_server_config_for_receiver and build_server_config_for_generator
  (ssh_transfer.rs and embedded_ssh_transfer.rs).
- Daemon push/pull: apply_common_server_flags is also called from
  daemon_transfer/orchestration/server_config.rs (both paths). On the
  daemon side, module_access/client_args.rs:283 resolves the module's
  `charset =` directive into the same FilenameConverter slot.
- Local copy: LocalCopyOptionsBuilder::apply_iconv in
  crates/core/src/client/run/mod.rs:399 writes the converter onto
  LocalCopyOptions::iconv, since the local-copy executor in the engine
  crate bypasses the SSH/daemon ServerConfig builder.

The transfer crate consumes the converter at:

- crates/transfer/src/receiver/mod.rs:369 - reader.with_iconv(...)
- crates/transfer/src/generator/mod.rs:570 - writer.with_iconv(...)

These hand the converter to the protocol crate's FileListReader /
FileListWriter, where #1912 and #1913 already apply it via
apply_encoding_conversion on every emitted and ingested filename. With
the producer-side wiring landed, the test_iconv_local_ssh interop
scenario starts gating regressions, so the standalone:iconv-local-ssh
entry is removed from KNOWN_FAILURES and DASHBOARD_ENTRIES.

Upstream references:

- flist.c:1579-1603 send_file_entry filename iconvbufs(ic_send, ...)
- flist.c:738-754 recv_file_entry filename iconvbufs(ic_recv, ...)
- options.c::recv_iconv_settings parses --iconv=LOCAL,REMOTE

* revert(ci): restore standalone:iconv-local-ssh known failure

PR #3458 (cd76ec2) wired IconvSetting -> FilenameConverter for the
local generator/receiver, but two architectural gaps prevent the
SSH iconv scenario from passing end-to-end:

- --iconv is not forwarded to the spawned remote rsync. Upstream
  options.c:2715-2723 forwards the post-comma half of iconv_opt so
  the remote setup_iconv() agrees on charsets.
- FilenameConverter (crates/protocol/src/iconv/converter.rs) treats
  remote_encoding as the literal wire charset. Upstream rsync.c:87-147
  always uses UTF-8 on the wire and treats `charset` as each peer's
  local charset.

Audit findings 1-3 and 5 (docs/audits/iconv-pipeline.md) also remain
open: symlink target transcoding, --files-from line forwarding,
secluded-args argv re-encoding, CF_SYMLINK_ICONV gating.

Restoring the KF entry until the iconv pipeline rework lands.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant