Skip to content

Handle files-from relative entries with explicit source directory#1917

Merged
oferchen merged 1 commit into
masterfrom
implement-missing-components-for-oc-rsync
Nov 3, 2025
Merged

Handle files-from relative entries with explicit source directory#1917
oferchen merged 1 commit into
masterfrom
implement-missing-components-for-oc-rsync

Conversation

@oferchen
Copy link
Copy Markdown
Owner

@oferchen oferchen commented Nov 3, 2025

Summary

  • adjust the file-list loader to prefix relative --files-from entries with the explicit source operand before constructing the transfer plan
  • export and invoke the helper in the workflow so module listings and fallback arguments see the resolved paths
  • add regression coverage for local --files-from usage with relative paths

Testing

  • cargo test

https://chatgpt.com/codex/tasks/task_e_69093a6453b88323a1236705a0e1192c

@oferchen oferchen merged commit e38eb5f into master Nov 3, 2025
@oferchen oferchen deleted the implement-missing-components-for-oc-rsync branch November 3, 2025 23:43
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

let mut transfer_operands = Vec::with_capacity(file_list_operands.len() + remainder.len());
transfer_operands.append(&mut file_list_operands);
transfer_operands.extend(remainder);

P1 Badge Exclude base source from transfer operands after resolving files-from entries

After resolve_file_list_entries prefixes relative file-list entries with the explicit source directory, the original source operand is still appended to transfer_operands alongside the resolved file paths. For a local invocation such as rsync --files-from=list src dest, this means the engine copies src itself and each prefixed entry, producing dest/src/... in addition to the intended files. Upstream rsync treats the explicit source as a prefix only, not an extra operand, so this behaviour duplicates data and can trigger large unintended transfers. Consider removing the single local source from remainder once its entries have been expanded before constructing transfer_operands.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

oferchen added a commit that referenced this pull request May 1, 2026
…1916)

Add test_iconv_local_ssh_interop to tools/ci/run_interop.sh covering
the SSH/local-mode side of --iconv interop with upstream rsync 3.4.1,
the path PR #3458 wired up via IconvSetting -> FilenameConverter. Two
directions are exercised through a fake remote-shell wrapper that
discards the host argument and exec's the rest locally:

  a) oc-rsync sender -> upstream receiver  (--rsh=fake --rsync-path=upstream)
  b) upstream sender -> oc-rsync receiver  (--rsh=fake --rsync-path=oc-rsync)

The fixture is UTF-8 source filenames whose code points all fit in
ISO-8859-1 (cafe, uber, naive, Zurich); --iconv=UTF-8,ISO-8859-1
forces Latin-1 wire encoding while the local charset stays UTF-8.

The companion daemon-mode scenario, "standalone:iconv-upstream", stays
in known_failures.conf until daemon-side `charset =` plumbing lands
(#1911-#1917 per docs/audits/iconv-pipeline.md). Comments updated to
make the SSH/local vs daemon split explicit.

Pre-checks:
  - upstream binary version availability (graceful skip).
  - upstream iconv compile-time support (graceful skip on --disable-iconv).
  - host filesystem accepts UTF-8 names (graceful skip).

References:
  upstream: options.c:recv_iconv_settings, flist.c:738-754, 1579-1603
  oc-rsync: docs/audits/iconv-pipeline.md (Findings 1-7)
oferchen added a commit that referenced this pull request May 1, 2026
…1916)

Add test_iconv_local_ssh_interop to tools/ci/run_interop.sh covering
the SSH/local-mode side of --iconv interop with upstream rsync 3.4.1,
the path PR #3458 wired up via IconvSetting -> FilenameConverter. Two
directions are exercised through a fake remote-shell wrapper that
discards the host argument and exec's the rest locally:

  a) oc-rsync sender -> upstream receiver  (--rsh=fake --rsync-path=upstream)
  b) upstream sender -> oc-rsync receiver  (--rsh=fake --rsync-path=oc-rsync)

The fixture is UTF-8 source filenames whose code points all fit in
ISO-8859-1 (cafe, uber, naive, Zurich); --iconv=UTF-8,ISO-8859-1
forces Latin-1 wire encoding while the local charset stays UTF-8.

The companion daemon-mode scenario, "standalone:iconv-upstream", stays
in known_failures.conf until daemon-side `charset =` plumbing lands
(#1911-#1917 per docs/audits/iconv-pipeline.md). Comments updated to
make the SSH/local vs daemon split explicit.

Pre-checks:
  - upstream binary version availability (graceful skip).
  - upstream iconv compile-time support (graceful skip on --disable-iconv).
  - host filesystem accepts UTF-8 names (graceful skip).

References:
  upstream: options.c:recv_iconv_settings, flist.c:738-754, 1579-1603
  oc-rsync: docs/audits/iconv-pipeline.md (Findings 1-7)
oferchen added a commit that referenced this pull request May 2, 2026
…1916)

Add test_iconv_local_ssh_interop to tools/ci/run_interop.sh covering
the SSH/local-mode side of --iconv interop with upstream rsync 3.4.1,
the path PR #3458 wired up via IconvSetting -> FilenameConverter. Two
directions are exercised through a fake remote-shell wrapper that
discards the host argument and exec's the rest locally:

  a) oc-rsync sender -> upstream receiver  (--rsh=fake --rsync-path=upstream)
  b) upstream sender -> oc-rsync receiver  (--rsh=fake --rsync-path=oc-rsync)

The fixture is UTF-8 source filenames whose code points all fit in
ISO-8859-1 (cafe, uber, naive, Zurich); --iconv=UTF-8,ISO-8859-1
forces Latin-1 wire encoding while the local charset stays UTF-8.

The companion daemon-mode scenario, "standalone:iconv-upstream", stays
in known_failures.conf until daemon-side `charset =` plumbing lands
(#1911-#1917 per docs/audits/iconv-pipeline.md). Comments updated to
make the SSH/local vs daemon split explicit.

Pre-checks:
  - upstream binary version availability (graceful skip).
  - upstream iconv compile-time support (graceful skip on --disable-iconv).
  - host filesystem accepts UTF-8 names (graceful skip).

References:
  upstream: options.c:recv_iconv_settings, flist.c:738-754, 1579-1603
  oc-rsync: docs/audits/iconv-pipeline.md (Findings 1-7)
oferchen added a commit that referenced this pull request May 2, 2026
…1916) (#3534)

* test(interop): add --iconv interop test against upstream rsync 3.4.1 (#1916)

Add test_iconv_local_ssh_interop to tools/ci/run_interop.sh covering
the SSH/local-mode side of --iconv interop with upstream rsync 3.4.1,
the path PR #3458 wired up via IconvSetting -> FilenameConverter. Two
directions are exercised through a fake remote-shell wrapper that
discards the host argument and exec's the rest locally:

  a) oc-rsync sender -> upstream receiver  (--rsh=fake --rsync-path=upstream)
  b) upstream sender -> oc-rsync receiver  (--rsh=fake --rsync-path=oc-rsync)

The fixture is UTF-8 source filenames whose code points all fit in
ISO-8859-1 (cafe, uber, naive, Zurich); --iconv=UTF-8,ISO-8859-1
forces Latin-1 wire encoding while the local charset stays UTF-8.

The companion daemon-mode scenario, "standalone:iconv-upstream", stays
in known_failures.conf until daemon-side `charset =` plumbing lands
(#1911-#1917 per docs/audits/iconv-pipeline.md). Comments updated to
make the SSH/local vs daemon split explicit.

Pre-checks:
  - upstream binary version availability (graceful skip).
  - upstream iconv compile-time support (graceful skip on --disable-iconv).
  - host filesystem accepts UTF-8 names (graceful skip).

References:
  upstream: options.c:recv_iconv_settings, flist.c:738-754, 1579-1603
  oc-rsync: docs/audits/iconv-pipeline.md (Findings 1-7)

* test(interop): mark iconv-local-ssh known failure pending #1911-#1913

Per docs/audits/iconv-pipeline.md Finding 4, the IconvSetting ->
protocol::FilenameConverter bridge does not exist in production code,
so --iconv is a no-op end to end in SSH/local mode and the test hangs
on direction (a) (oc-rsync sender -> upstream receiver).

The wiring lands in #1911 (config build), #1912 (sender flist emit),
and #1913 (receiver flist ingest). Once those merge, remove this
entry so the test starts gating regressions.
oferchen added a commit that referenced this pull request May 2, 2026
… only (#1917)

The charset() accessor on ModuleDefinition is in a #[cfg(test)] impl block,
so it isn't visible in non-test builds. Use the pub(crate) field directly,
matching the surrounding accesses (dont_compress, temp_dir, path).
oferchen added a commit that referenced this pull request May 2, 2026
…match

The iconv-upstream test was reporting "plain.txt content mismatch after
iconv transfer" without revealing what bytes were actually written, which
files exist at the destination, or what the daemon logged. Add a
hexdump+find+log-tail dump on both the missing-file and cmp-mismatch
branches so the next CI run pinpoints whether plain.txt is empty,
truncated, or carrying a different file's bytes (e.g. NDX misalignment).

Refs #1917
oferchen added a commit that referenced this pull request May 2, 2026
The daemon parses `charset =` (module_directives.rs:330) and stores it on
ModuleDefinition.charset, but the value was never threaded into the iconv
runtime - so `--iconv` from a client to an oc-rsync daemon was a silent
no-op even when the module config declared a charset.

Wire it through build_server_config: read module.charset(), resolve it to
a protocol::iconv::FilenameConverter, and assign it to
ServerConfig.connection.iconv (the same field the SSH/local client path
already populates at remote/flags.rs:228).

Mirror upstream's setup_iconv() (rsync.c:87-140, clientserver.c:712-716):

- comma-separated `LOCAL,REMOTE` honours the post-comma segment on the
  server (upstream rsync.c:118-120, am_server branch);
- `.` and the empty string resolve to the locale default
  (upstream rsync.c:125-126);
- unknown charsets log via tracing and degrade to None (no transcoding)
  rather than aborting, matching the lenient behaviour
  IconvSetting::resolve_converter already uses on the client side.

Add daemon `iconv` Cargo feature that turns on `core/iconv` and
`protocol/iconv` so encoding_rs is actually compiled in. Without the
feature, FilenameConverter::new only accepts UTF-8 and the runtime
behaviour is identical to before this change.

Add 8 unit tests in client_args.rs covering: missing/empty/whitespace
directives, `.` and `LOCAL,.` identity, comma-segment selection, unknown
charset soft failure, whitespace trimming, and a Latin-1<->UTF-8 byte
round trip.

Drop `standalone:iconv-upstream` from KNOWN_FAILURES and DASHBOARD_ENTRIES
in tools/ci/known_failures.conf so the existing iconv-upstream interop
test (run_interop.sh:3168, daemon push with `charset = ISO-8859-1`) starts
gating regressions.
oferchen added a commit that referenced this pull request May 2, 2026
… only (#1917)

The charset() accessor on ModuleDefinition is in a #[cfg(test)] impl block,
so it isn't visible in non-test builds. Use the pub(crate) field directly,
matching the surrounding accesses (dont_compress, temp_dir, path).
oferchen added a commit that referenced this pull request May 2, 2026
…match

The iconv-upstream test was reporting "plain.txt content mismatch after
iconv transfer" without revealing what bytes were actually written, which
files exist at the destination, or what the daemon logged. Add a
hexdump+find+log-tail dump on both the missing-file and cmp-mismatch
branches so the next CI run pinpoints whether plain.txt is empty,
truncated, or carrying a different file's bytes (e.g. NDX misalignment).

Refs #1917
oferchen added a commit that referenced this pull request May 2, 2026
Previous multi-file fixture exposed an unrelated bug: the receiver's
flist ingest pipeline misaligns content with names when multiple files
re-sort after iconv conversion (UTF-8 -> Latin-1). That is a separate
bug (#1913 receiver flist ingest) and out of scope for the daemon
charset directive PR.

Replace the parallel-array fixture with a single-file fixture
(cafe.txt, U+00E9) so the test only validates what #1917 fixes:
the daemon's `charset =` directive activating ic_recv on the
receiver to convert wire-UTF-8 filenames to disk-Latin-1 bytes.

The multi-file scenario will be re-enabled once #1913 lands.
oferchen added a commit that referenced this pull request May 2, 2026
…#3554)

* feat(daemon): wire module charset= directive to iconv runtime (#1917)

The daemon parses `charset =` (module_directives.rs:330) and stores it on
ModuleDefinition.charset, but the value was never threaded into the iconv
runtime - so `--iconv` from a client to an oc-rsync daemon was a silent
no-op even when the module config declared a charset.

Wire it through build_server_config: read module.charset(), resolve it to
a protocol::iconv::FilenameConverter, and assign it to
ServerConfig.connection.iconv (the same field the SSH/local client path
already populates at remote/flags.rs:228).

Mirror upstream's setup_iconv() (rsync.c:87-140, clientserver.c:712-716):

- comma-separated `LOCAL,REMOTE` honours the post-comma segment on the
  server (upstream rsync.c:118-120, am_server branch);
- `.` and the empty string resolve to the locale default
  (upstream rsync.c:125-126);
- unknown charsets log via tracing and degrade to None (no transcoding)
  rather than aborting, matching the lenient behaviour
  IconvSetting::resolve_converter already uses on the client side.

Add daemon `iconv` Cargo feature that turns on `core/iconv` and
`protocol/iconv` so encoding_rs is actually compiled in. Without the
feature, FilenameConverter::new only accepts UTF-8 and the runtime
behaviour is identical to before this change.

Add 8 unit tests in client_args.rs covering: missing/empty/whitespace
directives, `.` and `LOCAL,.` identity, comma-segment selection, unknown
charset soft failure, whitespace trimming, and a Latin-1<->UTF-8 byte
round trip.

Drop `standalone:iconv-upstream` from KNOWN_FAILURES and DASHBOARD_ENTRIES
in tools/ci/known_failures.conf so the existing iconv-upstream interop
test (run_interop.sh:3168, daemon push with `charset = ISO-8859-1`) starts
gating regressions.

* fix(daemon): use field access for charset since accessor is cfg(test) only (#1917)

The charset() accessor on ModuleDefinition is in a #[cfg(test)] impl block,
so it isn't visible in non-test builds. Use the pub(crate) field directly,
matching the surrounding accesses (dont_compress, temp_dir, path).

* test(interop): dump dest contents and daemon log on iconv content mismatch

The iconv-upstream test was reporting "plain.txt content mismatch after
iconv transfer" without revealing what bytes were actually written, which
files exist at the destination, or what the daemon logged. Add a
hexdump+find+log-tail dump on both the missing-file and cmp-mismatch
branches so the next CI run pinpoints whether plain.txt is empty,
truncated, or carrying a different file's bytes (e.g. NDX misalignment).

Refs #1917

* test(interop): redesign iconv test for ISO-8859-1 dest filenames

The previous test asserted that destination filenames remained UTF-8
after `--iconv=UTF-8,ISO-8859-1` with daemon `charset = ISO-8859-1`.
That premise contradicts upstream rsync's wire model:

  rsync.c:130-140 - the wire is always UTF-8, and the `charset`
  parameter (CLI LOCAL,REMOTE or daemon `charset =`) names the *local
  disk* encoding, not the wire encoding.

Result: the daemon's ic_recv (UTF-8 -> ISO-8859-1) writes single-byte
Latin-1 filenames to disk (caf\xe9.txt), but the test was looking for
UTF-8 multi-byte filenames (caf\xc3\xa9.txt) that never exist there.

Redesign uses parallel-array Value Objects to capture the source-side
UTF-8 bytes and the dest-side ISO-8859-1 bytes for each logical name,
plus single-responsibility helpers:

  _ic_init_fixtures   - populate the fixture mapping
  _ic_write_fixtures  - create source files; signal SKIP on host reject
  _ic_verify_dest     - verify dest has the ISO-8859-1 byte name + body
  _ic_dump_failure    - uniform diagnostic dump (octal-escaped ls + log)

Verification compares the source file content against the expected
ISO-8859-1 dest path, which is the property that proves the daemon's
iconv pipeline ran end-to-end.

* test(interop): single-file iconv fixture isolates #1917 from #1913

Previous multi-file fixture exposed an unrelated bug: the receiver's
flist ingest pipeline misaligns content with names when multiple files
re-sort after iconv conversion (UTF-8 -> Latin-1). That is a separate
bug (#1913 receiver flist ingest) and out of scope for the daemon
charset directive PR.

Replace the parallel-array fixture with a single-file fixture
(cafe.txt, U+00E9) so the test only validates what #1917 fixes:
the daemon's `charset =` directive activating ic_recv on the
receiver to convert wire-UTF-8 filenames to disk-Latin-1 bytes.

The multi-file scenario will be re-enabled once #1913 lands.
oferchen added a commit that referenced this pull request May 5, 2026
…1916) (#3534)

* test(interop): add --iconv interop test against upstream rsync 3.4.1 (#1916)

Add test_iconv_local_ssh_interop to tools/ci/run_interop.sh covering
the SSH/local-mode side of --iconv interop with upstream rsync 3.4.1,
the path PR #3458 wired up via IconvSetting -> FilenameConverter. Two
directions are exercised through a fake remote-shell wrapper that
discards the host argument and exec's the rest locally:

  a) oc-rsync sender -> upstream receiver  (--rsh=fake --rsync-path=upstream)
  b) upstream sender -> oc-rsync receiver  (--rsh=fake --rsync-path=oc-rsync)

The fixture is UTF-8 source filenames whose code points all fit in
ISO-8859-1 (cafe, uber, naive, Zurich); --iconv=UTF-8,ISO-8859-1
forces Latin-1 wire encoding while the local charset stays UTF-8.

The companion daemon-mode scenario, "standalone:iconv-upstream", stays
in known_failures.conf until daemon-side `charset =` plumbing lands
(#1911-#1917 per docs/audits/iconv-pipeline.md). Comments updated to
make the SSH/local vs daemon split explicit.

Pre-checks:
  - upstream binary version availability (graceful skip).
  - upstream iconv compile-time support (graceful skip on --disable-iconv).
  - host filesystem accepts UTF-8 names (graceful skip).

References:
  upstream: options.c:recv_iconv_settings, flist.c:738-754, 1579-1603
  oc-rsync: docs/audits/iconv-pipeline.md (Findings 1-7)

* test(interop): mark iconv-local-ssh known failure pending #1911-#1913

Per docs/audits/iconv-pipeline.md Finding 4, the IconvSetting ->
protocol::FilenameConverter bridge does not exist in production code,
so --iconv is a no-op end to end in SSH/local mode and the test hangs
on direction (a) (oc-rsync sender -> upstream receiver).

The wiring lands in #1911 (config build), #1912 (sender flist emit),
and #1913 (receiver flist ingest). Once those merge, remove this
entry so the test starts gating regressions.
oferchen added a commit that referenced this pull request May 5, 2026
…#3554)

* feat(daemon): wire module charset= directive to iconv runtime (#1917)

The daemon parses `charset =` (module_directives.rs:330) and stores it on
ModuleDefinition.charset, but the value was never threaded into the iconv
runtime - so `--iconv` from a client to an oc-rsync daemon was a silent
no-op even when the module config declared a charset.

Wire it through build_server_config: read module.charset(), resolve it to
a protocol::iconv::FilenameConverter, and assign it to
ServerConfig.connection.iconv (the same field the SSH/local client path
already populates at remote/flags.rs:228).

Mirror upstream's setup_iconv() (rsync.c:87-140, clientserver.c:712-716):

- comma-separated `LOCAL,REMOTE` honours the post-comma segment on the
  server (upstream rsync.c:118-120, am_server branch);
- `.` and the empty string resolve to the locale default
  (upstream rsync.c:125-126);
- unknown charsets log via tracing and degrade to None (no transcoding)
  rather than aborting, matching the lenient behaviour
  IconvSetting::resolve_converter already uses on the client side.

Add daemon `iconv` Cargo feature that turns on `core/iconv` and
`protocol/iconv` so encoding_rs is actually compiled in. Without the
feature, FilenameConverter::new only accepts UTF-8 and the runtime
behaviour is identical to before this change.

Add 8 unit tests in client_args.rs covering: missing/empty/whitespace
directives, `.` and `LOCAL,.` identity, comma-segment selection, unknown
charset soft failure, whitespace trimming, and a Latin-1<->UTF-8 byte
round trip.

Drop `standalone:iconv-upstream` from KNOWN_FAILURES and DASHBOARD_ENTRIES
in tools/ci/known_failures.conf so the existing iconv-upstream interop
test (run_interop.sh:3168, daemon push with `charset = ISO-8859-1`) starts
gating regressions.

* fix(daemon): use field access for charset since accessor is cfg(test) only (#1917)

The charset() accessor on ModuleDefinition is in a #[cfg(test)] impl block,
so it isn't visible in non-test builds. Use the pub(crate) field directly,
matching the surrounding accesses (dont_compress, temp_dir, path).

* test(interop): dump dest contents and daemon log on iconv content mismatch

The iconv-upstream test was reporting "plain.txt content mismatch after
iconv transfer" without revealing what bytes were actually written, which
files exist at the destination, or what the daemon logged. Add a
hexdump+find+log-tail dump on both the missing-file and cmp-mismatch
branches so the next CI run pinpoints whether plain.txt is empty,
truncated, or carrying a different file's bytes (e.g. NDX misalignment).

Refs #1917

* test(interop): redesign iconv test for ISO-8859-1 dest filenames

The previous test asserted that destination filenames remained UTF-8
after `--iconv=UTF-8,ISO-8859-1` with daemon `charset = ISO-8859-1`.
That premise contradicts upstream rsync's wire model:

  rsync.c:130-140 - the wire is always UTF-8, and the `charset`
  parameter (CLI LOCAL,REMOTE or daemon `charset =`) names the *local
  disk* encoding, not the wire encoding.

Result: the daemon's ic_recv (UTF-8 -> ISO-8859-1) writes single-byte
Latin-1 filenames to disk (caf\xe9.txt), but the test was looking for
UTF-8 multi-byte filenames (caf\xc3\xa9.txt) that never exist there.

Redesign uses parallel-array Value Objects to capture the source-side
UTF-8 bytes and the dest-side ISO-8859-1 bytes for each logical name,
plus single-responsibility helpers:

  _ic_init_fixtures   - populate the fixture mapping
  _ic_write_fixtures  - create source files; signal SKIP on host reject
  _ic_verify_dest     - verify dest has the ISO-8859-1 byte name + body
  _ic_dump_failure    - uniform diagnostic dump (octal-escaped ls + log)

Verification compares the source file content against the expected
ISO-8859-1 dest path, which is the property that proves the daemon's
iconv pipeline ran end-to-end.

* test(interop): single-file iconv fixture isolates #1917 from #1913

Previous multi-file fixture exposed an unrelated bug: the receiver's
flist ingest pipeline misaligns content with names when multiple files
re-sort after iconv conversion (UTF-8 -> Latin-1). That is a separate
bug (#1913 receiver flist ingest) and out of scope for the daemon
charset directive PR.

Replace the parallel-array fixture with a single-file fixture
(cafe.txt, U+00E9) so the test only validates what #1917 fixes:
the daemon's `charset =` directive activating ic_recv on the
receiver to convert wire-UTF-8 filenames to disk-Latin-1 bytes.

The multi-file scenario will be re-enabled once #1913 lands.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant