Skip to content

Accelerate rolling checksum with SIMD fast paths#1916

Merged
oferchen merged 1 commit into
masterfrom
optimize-architecture-and-update-agents.md
Nov 3, 2025
Merged

Accelerate rolling checksum with SIMD fast paths#1916
oferchen merged 1 commit into
masterfrom
optimize-architecture-and-update-agents.md

Conversation

@oferchen
Copy link
Copy Markdown
Owner

@oferchen oferchen commented Nov 3, 2025

Summary

  • document the requirement to keep SIMD-accelerated checksum paths in sync in internal docs
  • add SSE2 (x86_64) and NEON (aarch64) fast paths for RollingChecksum::accumulate_chunk with scalar fallback
  • cover the SIMD dispatch with deterministic parity tests that compare against the scalar implementation

Testing

  • cargo test -p rsync-checksums

https://chatgpt.com/codex/tasks/task_e_690928f98a548323b97dfc6d03f6ee57

@oferchen oferchen merged commit 7494768 into master Nov 3, 2025
1 check passed
@oferchen oferchen deleted the optimize-architecture-and-update-agents.md branch November 3, 2025 22:45
oferchen added a commit that referenced this pull request May 1, 2026
…1916) (#3498)

Adds a bidirectional daemon-mode interop scenario that exercises
--iconv=UTF-8,ISO-8859-1 between oc-rsync and upstream rsync 3.4.1.

The test creates a deterministic source tree with UTF-8 filenames whose
code points all fit in Latin-1 (café.txt, über.txt, ångström.txt) plus
an ASCII baseline, then drives two transfers:

  1. upstream client -> oc-rsync daemon (charset = ISO-8859-1)
  2. oc-rsync client -> upstream daemon (charset = ISO-8859-1)

Each transfer is verified by re-reading the destination filenames and
comparing content byte-for-byte against the source.

Daemon-mode iconv negotiation in oc-rsync is still incomplete: the
`charset =` directive is parsed (crates/daemon/.../module_directives.rs)
but never threaded into the iconv runtime setup. Findings 1-3 of the
audit (symlink target transcoding, --files-from forwarding,
--secluded-args/--protect-args transcoding) also remain open. The
scenario is therefore added to KNOWN_FAILURES and DASHBOARD_ENTRIES so
CI tracks the gap without blocking, and check_known_failures.sh gets a
matching reproducer so the dashboard can rerun it.

References:
  upstream: options.c:recv_iconv_settings, flist.c:1579-1603,
            flist.c:738-754
  oc-rsync: docs/audits/iconv-pipeline.md

Touched files:
- tools/ci/run_interop.sh: new test_iconv_upstream_interop function,
  added "iconv-upstream" to standalone test_names/test_funcs arrays,
  port-injection case branch.
- tools/ci/known_failures.conf: KNOWN_FAILURES + DASHBOARD_ENTRIES entry.
- tools/ci/check_known_failures.sh: dashboard reproducer dispatch case.
oferchen added a commit that referenced this pull request May 1, 2026
…1916)

Add test_iconv_local_ssh_interop to tools/ci/run_interop.sh covering
the SSH/local-mode side of --iconv interop with upstream rsync 3.4.1,
the path PR #3458 wired up via IconvSetting -> FilenameConverter. Two
directions are exercised through a fake remote-shell wrapper that
discards the host argument and exec's the rest locally:

  a) oc-rsync sender -> upstream receiver  (--rsh=fake --rsync-path=upstream)
  b) upstream sender -> oc-rsync receiver  (--rsh=fake --rsync-path=oc-rsync)

The fixture is UTF-8 source filenames whose code points all fit in
ISO-8859-1 (cafe, uber, naive, Zurich); --iconv=UTF-8,ISO-8859-1
forces Latin-1 wire encoding while the local charset stays UTF-8.

The companion daemon-mode scenario, "standalone:iconv-upstream", stays
in known_failures.conf until daemon-side `charset =` plumbing lands
(#1911-#1917 per docs/audits/iconv-pipeline.md). Comments updated to
make the SSH/local vs daemon split explicit.

Pre-checks:
  - upstream binary version availability (graceful skip).
  - upstream iconv compile-time support (graceful skip on --disable-iconv).
  - host filesystem accepts UTF-8 names (graceful skip).

References:
  upstream: options.c:recv_iconv_settings, flist.c:738-754, 1579-1603
  oc-rsync: docs/audits/iconv-pipeline.md (Findings 1-7)
oferchen added a commit that referenced this pull request May 1, 2026
…3535)

Maps each entry in the unconditional KNOWN_FAILURES array and each
conditional rule in is_known_failure_from_conf() to a concrete
eliminate path: fix in oc-rsync, permanent upstream bug, or permanent
protocol-version-locked. Cites upstream rsync 3.4.1 source (compat.c,
exclude.c, token.c) and existing tracking issues (#1916, #1685,
companion iconv/zstd/protocol audits) so beta-readiness criterion #3
has an explicit work-plan.

Two entries are fixable (standalone:iconv-upstream daemon plumbing and
standalone:delta-stats daemon-mode delta engine); six are permanent
(one upstream bug, five protocol-version-locked at proto < 29 or 30).
oferchen added a commit that referenced this pull request May 1, 2026
…1916)

Add test_iconv_local_ssh_interop to tools/ci/run_interop.sh covering
the SSH/local-mode side of --iconv interop with upstream rsync 3.4.1,
the path PR #3458 wired up via IconvSetting -> FilenameConverter. Two
directions are exercised through a fake remote-shell wrapper that
discards the host argument and exec's the rest locally:

  a) oc-rsync sender -> upstream receiver  (--rsh=fake --rsync-path=upstream)
  b) upstream sender -> oc-rsync receiver  (--rsh=fake --rsync-path=oc-rsync)

The fixture is UTF-8 source filenames whose code points all fit in
ISO-8859-1 (cafe, uber, naive, Zurich); --iconv=UTF-8,ISO-8859-1
forces Latin-1 wire encoding while the local charset stays UTF-8.

The companion daemon-mode scenario, "standalone:iconv-upstream", stays
in known_failures.conf until daemon-side `charset =` plumbing lands
(#1911-#1917 per docs/audits/iconv-pipeline.md). Comments updated to
make the SSH/local vs daemon split explicit.

Pre-checks:
  - upstream binary version availability (graceful skip).
  - upstream iconv compile-time support (graceful skip on --disable-iconv).
  - host filesystem accepts UTF-8 names (graceful skip).

References:
  upstream: options.c:recv_iconv_settings, flist.c:738-754, 1579-1603
  oc-rsync: docs/audits/iconv-pipeline.md (Findings 1-7)
oferchen added a commit that referenced this pull request May 2, 2026
…1916)

Add test_iconv_local_ssh_interop to tools/ci/run_interop.sh covering
the SSH/local-mode side of --iconv interop with upstream rsync 3.4.1,
the path PR #3458 wired up via IconvSetting -> FilenameConverter. Two
directions are exercised through a fake remote-shell wrapper that
discards the host argument and exec's the rest locally:

  a) oc-rsync sender -> upstream receiver  (--rsh=fake --rsync-path=upstream)
  b) upstream sender -> oc-rsync receiver  (--rsh=fake --rsync-path=oc-rsync)

The fixture is UTF-8 source filenames whose code points all fit in
ISO-8859-1 (cafe, uber, naive, Zurich); --iconv=UTF-8,ISO-8859-1
forces Latin-1 wire encoding while the local charset stays UTF-8.

The companion daemon-mode scenario, "standalone:iconv-upstream", stays
in known_failures.conf until daemon-side `charset =` plumbing lands
(#1911-#1917 per docs/audits/iconv-pipeline.md). Comments updated to
make the SSH/local vs daemon split explicit.

Pre-checks:
  - upstream binary version availability (graceful skip).
  - upstream iconv compile-time support (graceful skip on --disable-iconv).
  - host filesystem accepts UTF-8 names (graceful skip).

References:
  upstream: options.c:recv_iconv_settings, flist.c:738-754, 1579-1603
  oc-rsync: docs/audits/iconv-pipeline.md (Findings 1-7)
oferchen added a commit that referenced this pull request May 2, 2026
…1916) (#3534)

* test(interop): add --iconv interop test against upstream rsync 3.4.1 (#1916)

Add test_iconv_local_ssh_interop to tools/ci/run_interop.sh covering
the SSH/local-mode side of --iconv interop with upstream rsync 3.4.1,
the path PR #3458 wired up via IconvSetting -> FilenameConverter. Two
directions are exercised through a fake remote-shell wrapper that
discards the host argument and exec's the rest locally:

  a) oc-rsync sender -> upstream receiver  (--rsh=fake --rsync-path=upstream)
  b) upstream sender -> oc-rsync receiver  (--rsh=fake --rsync-path=oc-rsync)

The fixture is UTF-8 source filenames whose code points all fit in
ISO-8859-1 (cafe, uber, naive, Zurich); --iconv=UTF-8,ISO-8859-1
forces Latin-1 wire encoding while the local charset stays UTF-8.

The companion daemon-mode scenario, "standalone:iconv-upstream", stays
in known_failures.conf until daemon-side `charset =` plumbing lands
(#1911-#1917 per docs/audits/iconv-pipeline.md). Comments updated to
make the SSH/local vs daemon split explicit.

Pre-checks:
  - upstream binary version availability (graceful skip).
  - upstream iconv compile-time support (graceful skip on --disable-iconv).
  - host filesystem accepts UTF-8 names (graceful skip).

References:
  upstream: options.c:recv_iconv_settings, flist.c:738-754, 1579-1603
  oc-rsync: docs/audits/iconv-pipeline.md (Findings 1-7)

* test(interop): mark iconv-local-ssh known failure pending #1911-#1913

Per docs/audits/iconv-pipeline.md Finding 4, the IconvSetting ->
protocol::FilenameConverter bridge does not exist in production code,
so --iconv is a no-op end to end in SSH/local mode and the test hangs
on direction (a) (oc-rsync sender -> upstream receiver).

The wiring lands in #1911 (config build), #1912 (sender flist emit),
and #1913 (receiver flist ingest). Once those merge, remove this
entry so the test starts gating regressions.
oferchen added a commit that referenced this pull request May 5, 2026
…1916) (#3498)

Adds a bidirectional daemon-mode interop scenario that exercises
--iconv=UTF-8,ISO-8859-1 between oc-rsync and upstream rsync 3.4.1.

The test creates a deterministic source tree with UTF-8 filenames whose
code points all fit in Latin-1 (café.txt, über.txt, ångström.txt) plus
an ASCII baseline, then drives two transfers:

  1. upstream client -> oc-rsync daemon (charset = ISO-8859-1)
  2. oc-rsync client -> upstream daemon (charset = ISO-8859-1)

Each transfer is verified by re-reading the destination filenames and
comparing content byte-for-byte against the source.

Daemon-mode iconv negotiation in oc-rsync is still incomplete: the
`charset =` directive is parsed (crates/daemon/.../module_directives.rs)
but never threaded into the iconv runtime setup. Findings 1-3 of the
audit (symlink target transcoding, --files-from forwarding,
--secluded-args/--protect-args transcoding) also remain open. The
scenario is therefore added to KNOWN_FAILURES and DASHBOARD_ENTRIES so
CI tracks the gap without blocking, and check_known_failures.sh gets a
matching reproducer so the dashboard can rerun it.

References:
  upstream: options.c:recv_iconv_settings, flist.c:1579-1603,
            flist.c:738-754
  oc-rsync: docs/audits/iconv-pipeline.md

Touched files:
- tools/ci/run_interop.sh: new test_iconv_upstream_interop function,
  added "iconv-upstream" to standalone test_names/test_funcs arrays,
  port-injection case branch.
- tools/ci/known_failures.conf: KNOWN_FAILURES + DASHBOARD_ENTRIES entry.
- tools/ci/check_known_failures.sh: dashboard reproducer dispatch case.
oferchen added a commit that referenced this pull request May 5, 2026
…3535)

Maps each entry in the unconditional KNOWN_FAILURES array and each
conditional rule in is_known_failure_from_conf() to a concrete
eliminate path: fix in oc-rsync, permanent upstream bug, or permanent
protocol-version-locked. Cites upstream rsync 3.4.1 source (compat.c,
exclude.c, token.c) and existing tracking issues (#1916, #1685,
companion iconv/zstd/protocol audits) so beta-readiness criterion #3
has an explicit work-plan.

Two entries are fixable (standalone:iconv-upstream daemon plumbing and
standalone:delta-stats daemon-mode delta engine); six are permanent
(one upstream bug, five protocol-version-locked at proto < 29 or 30).
oferchen added a commit that referenced this pull request May 5, 2026
…1916) (#3534)

* test(interop): add --iconv interop test against upstream rsync 3.4.1 (#1916)

Add test_iconv_local_ssh_interop to tools/ci/run_interop.sh covering
the SSH/local-mode side of --iconv interop with upstream rsync 3.4.1,
the path PR #3458 wired up via IconvSetting -> FilenameConverter. Two
directions are exercised through a fake remote-shell wrapper that
discards the host argument and exec's the rest locally:

  a) oc-rsync sender -> upstream receiver  (--rsh=fake --rsync-path=upstream)
  b) upstream sender -> oc-rsync receiver  (--rsh=fake --rsync-path=oc-rsync)

The fixture is UTF-8 source filenames whose code points all fit in
ISO-8859-1 (cafe, uber, naive, Zurich); --iconv=UTF-8,ISO-8859-1
forces Latin-1 wire encoding while the local charset stays UTF-8.

The companion daemon-mode scenario, "standalone:iconv-upstream", stays
in known_failures.conf until daemon-side `charset =` plumbing lands
(#1911-#1917 per docs/audits/iconv-pipeline.md). Comments updated to
make the SSH/local vs daemon split explicit.

Pre-checks:
  - upstream binary version availability (graceful skip).
  - upstream iconv compile-time support (graceful skip on --disable-iconv).
  - host filesystem accepts UTF-8 names (graceful skip).

References:
  upstream: options.c:recv_iconv_settings, flist.c:738-754, 1579-1603
  oc-rsync: docs/audits/iconv-pipeline.md (Findings 1-7)

* test(interop): mark iconv-local-ssh known failure pending #1911-#1913

Per docs/audits/iconv-pipeline.md Finding 4, the IconvSetting ->
protocol::FilenameConverter bridge does not exist in production code,
so --iconv is a no-op end to end in SSH/local mode and the test hangs
on direction (a) (oc-rsync sender -> upstream receiver).

The wiring lands in #1911 (config build), #1912 (sender flist emit),
and #1913 (receiver flist ingest). Once those merge, remove this
entry so the test starts gating regressions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant