Skip to content

test(protocol): golden byte tests for iconv-converted filenames (#1919)#3552

Merged
oferchen merged 2 commits into
masterfrom
test/iconv-golden-bytes-1919
May 2, 2026
Merged

test(protocol): golden byte tests for iconv-converted filenames (#1919)#3552
oferchen merged 2 commits into
masterfrom
test/iconv-golden-bytes-1919

Conversation

@oferchen
Copy link
Copy Markdown
Owner

@oferchen oferchen commented May 2, 2026

Summary

  • Locks the byte-level wire encoding of filenames that pass through the --iconv pipeline (config build Propagate fallback signal exit codes #1911, sender flist emit Cache proxy authorization headers #1912, receiver flist ingest Ensure fallback binary lookup mirrors Windows PATHEXT #1913) so future regressions in either direction are caught at the byte level.
  • Verifies UTF-8 -> ISO-8859-1, KOI8-R, and windows-1252 conversion both ways: sender writes exact remote-charset bytes; receiver decodes them back into UTF-8 disk names. Identity (UTF-8 <-> UTF-8), ASCII passthrough, round-trip, suffix-len-after-iconv, and prefix-compression-after-iconv invariants are all asserted at the byte level.
  • Negative cases mirror upstream's IOERR_GENERAL path in flist.c: a Greek alpha source name targeting ISO-8859-1 (sender) and malformed UTF-8 wire bytes (receiver) both surface io::ErrorKind::InvalidData.

Coverage

Direction Test Asserts
Sender golden_sender_utf8_to_latin1_cafe café -> 63 61 66 e9
Sender golden_sender_utf8_to_koi8r_cyrillic файл -> c6 c1 ca cc
Sender golden_sender_utf8_to_windows1252_angstrom Ångström.txt -> 12-byte CP1252 form, suffix_len=12
Sender golden_sender_ascii_passthrough_under_iconv iconv path leaves ASCII bytes unchanged
Sender golden_sender_identity_converter_preserves_utf8 identity converter is detected and skips re-encoding
Sender golden_sender_unmappable_char_fails_conversion U+03B1 -> ISO-8859-1 fails with InvalidData
Sender golden_sender_suffix_len_is_post_iconv_for_shrinking_conversion suffix_len = 4 for café over ISO-8859-1
Sender golden_sender_suffix_len_is_post_iconv_for_cyrillic suffix_len = 4 for файл over KOI8-R
Receiver golden_receiver_latin1_to_utf8_cafe 63 61 66 e9 -> UTF-8 café
Receiver golden_receiver_koi8r_to_utf8_cyrillic c6 c1 ca cc -> UTF-8 файл
Receiver golden_receiver_windows1252_to_utf8_angstrom 12-byte CP1252 -> 14-byte UTF-8
Receiver golden_receiver_ascii_passthrough_under_iconv ASCII bytes survive iconv unchanged
Receiver golden_receiver_invalid_remote_bytes_fail [0xc3, 0x28] (bad UTF-8) -> InvalidData
Round-trip golden_round_trip_cafe_via_latin1 café survives ISO-8859-1 hop
Round-trip golden_round_trip_cyrillic_via_koi8r файл survives KOI8-R hop
Round-trip golden_round_trip_angstrom_via_windows1252 Ångström.txt survives windows-1252 hop
Round-trip golden_round_trip_dir_then_file_compressed_after_iconv XMIT_SAME_NAME prefix compression operates on post-iconv bytes
Cross-check golden_sender_wire_equals_raw_remote_bytes_for_cafe iconv writer output == raw-remote-bytes writer output
Cross-check golden_receiver_post_iconv_equals_utf8_native_decode iconv reader output == plain reader output for the equivalent UTF-8 wire

Tests gated on feature = "iconv". Receiver-side and round-trip tests additionally gated on Unix, where OsStr preserves arbitrary byte sequences (required to thread non-UTF-8 wire bytes through the writer for the receiver tests).

Wire-format expectations cross-referenced against upstream rsync 3.4.1 flist.c:

  • send_file_entry() lines 1580-1602 - iconvbufs(ic_send, ...) on file->dirname and file->basename before they reach the wire.
  • recv_file_entry() lines 738-753 - iconvbufs(ic_recv, ...) after read_sbuf() into thisname, before clean_fname().

Test plan

  • cargo nextest run -p protocol --all-features -E 'test(iconv) or test(golden)' -> 443 tests run, 443 passed
  • cargo fmt --all -- --check clean
  • cargo clippy -p protocol --all-features --tests --no-deps -- -D warnings clean
  • CI: fmt+clippy
  • CI: nextest (stable) on Linux, Windows, macOS, Linux musl

Locks the byte-level wire encoding of filenames that pass through the
--iconv pipeline (#1911 config, #1912 sender, #1913 receiver), so future
regressions in either the sender or the receiver are caught at the byte
level rather than at the round-trip level.

Coverage:
  - sender (UTF-8 -> ISO-8859-1, KOI8-R, windows-1252) emits exact
    remote-charset bytes; suffix_len header reflects the post-iconv
    length, not the UTF-8 source length
  - receiver decodes ISO-8859-1, KOI8-R, and windows-1252 wire bytes
    into UTF-8 disk names byte-for-byte
  - identity converter (UTF-8 <-> UTF-8) preserves raw UTF-8 bytes
  - ASCII-only names pass through both directions unchanged
  - sender + receiver wire bytes match exactly (iconv writer output
    equals raw remote-charset entry output; iconv reader output equals
    UTF-8-native plain reader output)
  - round-trip preserves "café", "файл", "Ångström.txt" through their
    respective remote charsets
  - directory-prefixed compression operates on post-iconv bytes so
    sender and receiver agree on the iconv-then-compress order
  - unmappable source characters (Greek alpha into ISO-8859-1) and
    invalid wire bytes (malformed UTF-8) surface InvalidData errors,
    mirroring upstream's IOERR_GENERAL path in flist.c

Tests gated on `feature = "iconv"` and Unix where non-UTF-8 path bytes
are preserved by OsStr. Wire-format bytes verified against upstream
rsync 3.4.1's flist.c send_file_entry/recv_file_entry encoding.
@github-actions github-actions Bot added the test label May 2, 2026
@oferchen oferchen merged commit 6a7f2f2 into master May 2, 2026
37 checks passed
@oferchen oferchen deleted the test/iconv-golden-bytes-1919 branch May 2, 2026 13:01
oferchen added a commit that referenced this pull request May 5, 2026
… (#3552)

Locks the byte-level wire encoding of filenames that pass through the
--iconv pipeline (#1911 config, #1912 sender, #1913 receiver), so future
regressions in either the sender or the receiver are caught at the byte
level rather than at the round-trip level.

Coverage:
  - sender (UTF-8 -> ISO-8859-1, KOI8-R, windows-1252) emits exact
    remote-charset bytes; suffix_len header reflects the post-iconv
    length, not the UTF-8 source length
  - receiver decodes ISO-8859-1, KOI8-R, and windows-1252 wire bytes
    into UTF-8 disk names byte-for-byte
  - identity converter (UTF-8 <-> UTF-8) preserves raw UTF-8 bytes
  - ASCII-only names pass through both directions unchanged
  - sender + receiver wire bytes match exactly (iconv writer output
    equals raw remote-charset entry output; iconv reader output equals
    UTF-8-native plain reader output)
  - round-trip preserves "café", "файл", "Ångström.txt" through their
    respective remote charsets
  - directory-prefixed compression operates on post-iconv bytes so
    sender and receiver agree on the iconv-then-compress order
  - unmappable source characters (Greek alpha into ISO-8859-1) and
    invalid wire bytes (malformed UTF-8) surface InvalidData errors,
    mirroring upstream's IOERR_GENERAL path in flist.c

Tests gated on `feature = "iconv"` and Unix where non-UTF-8 path bytes
are preserved by OsStr. Wire-format bytes verified against upstream
rsync 3.4.1's flist.c send_file_entry/recv_file_entry encoding.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant