Skip to content

Optimize rolling checksum bulk rolls#1933

Merged
oferchen merged 1 commit into
masterfrom
implement-missing-features-for-oc-rsync
Nov 4, 2025
Merged

Optimize rolling checksum bulk rolls#1933
oferchen merged 1 commit into
masterfrom
implement-missing-features-for-oc-rsync

Conversation

@oferchen
Copy link
Copy Markdown
Owner

@oferchen oferchen commented Nov 4, 2025

Summary

  • tighten roll_many's bulk math by caching the iteration weight and reusing a scalar fallback helper
  • guard the wide-roll fast path against indices that exceed u32 so extremely large windows fall back to the scalar loop

Testing

  • cargo test -p rsync-checksums

https://chatgpt.com/codex/tasks/task_e_6909b05a91288323aec39c2107cb63bc

@oferchen oferchen merged commit ca61922 into master Nov 4, 2025
@oferchen oferchen deleted the implement-missing-features-for-oc-rsync branch November 4, 2025 07:57
oferchen added a commit that referenced this pull request May 5, 2026
Defines the empirical benchmark harness, workloads (100 / 1k / 10k
concurrent clients), metrics, soft-limit triggers, comparison oracle
against upstream rsync 3.4.1, and the decision criteria that gate the
async listener migration tracked under #1935. Frames the active-counter
fix from the parent audit (#1673, PR #3705) as a strict precondition.

Tracking: oc-rsync task #1933.
oferchen added a commit that referenced this pull request May 5, 2026
)

Defines the empirical benchmark harness, workloads (100 / 1k / 10k
concurrent clients), metrics, soft-limit triggers, comparison oracle
against upstream rsync 3.4.1, and the decision criteria that gate the
async listener migration tracked under #1935. Frames the active-counter
fix from the parent audit (#1673, PR #3705) as a strict precondition.

Tracking: oc-rsync task #1933.
oferchen added a commit that referenced this pull request May 7, 2026
…#3891)

Slim runnable plan complementing the broader benchmark plan. Specifies
the minimum harness needed to land first measured numbers on Linux
loopback at 100/1K/10K concurrent rsync:// pulls so #1934 RFC and #1935
async-listener work can compare sync vs async paths against quantified
sync-baseline ttfb, completion p99, peak RSS, and thread count.
oferchen added a commit that referenced this pull request May 16, 2026
#1933) (#4182)

* chore(bench): stress harness for thread-per-connection scaling

Adds an integration-test-shaped stress benchmark that drives 100, 1000,
and 10000 concurrent TCP clients against the daemon listener, capturing
wall time, ECONNREFUSED / EMFILE counts, and peak RSS via getrusage. All
three scenarios are marked #[ignore]; the 10k case is unix-only and
self-skips when RLIMIT_NOFILE cannot accommodate the request.

The harness exists to provide evidence for whether an async listener
(tracked separately) would be a meaningful change to the current
std::thread::spawn-per-connection model.

* style(daemon): apply rustfmt to connection_scaling_stress

* fix(daemon): convert ru_maxrss via i64::from for cross-arch portability

* fix(daemon): use ru_maxrss directly without conversion

The conversion was redundant on all targets we build on: c_long is i64
on 64-bit Linux/macOS (no conversion needed) and i32 on 32-bit Linux
(the subsequent 'as u64' sign-extends to i64 first, then saturating_mul
operates on u64). Removing the conversion silences both
clippy::unnecessary_cast and clippy::useless_conversion without losing
overflow safety.
oferchen added a commit that referenced this pull request May 17, 2026
Add a focused evaluation of async runtime options for the daemon
accept loop. Compares tokio against async-std and the existing
thread-per-connection model, and records the decision to adopt
tokio with the rt-multi-thread flavour under the existing async
feature gate.

The doc complements the implementation plan in #1935 and the
benchmark plan in #1933 rather than restating either. It covers
maintenance posture, feature parity, ecosystem alignment, the
case for staying threaded, the case for tokio, migration cost,
trigger conditions, and a five-step adoption sequence.
oferchen added a commit that referenced this pull request May 18, 2026
)

Defines the empirical benchmark harness, workloads (100 / 1k / 10k
concurrent clients), metrics, soft-limit triggers, comparison oracle
against upstream rsync 3.4.1, and the decision criteria that gate the
async listener migration tracked under #1935. Frames the active-counter
fix from the parent audit (#1673, PR #3705) as a strict precondition.

Tracking: oc-rsync task #1933.
oferchen added a commit that referenced this pull request May 18, 2026
…#3891)

Slim runnable plan complementing the broader benchmark plan. Specifies
the minimum harness needed to land first measured numbers on Linux
loopback at 100/1K/10K concurrent rsync:// pulls so #1934 RFC and #1935
async-listener work can compare sync vs async paths against quantified
sync-baseline ttfb, completion p99, peak RSS, and thread count.
oferchen added a commit that referenced this pull request May 18, 2026
#1933) (#4182)

* chore(bench): stress harness for thread-per-connection scaling

Adds an integration-test-shaped stress benchmark that drives 100, 1000,
and 10000 concurrent TCP clients against the daemon listener, capturing
wall time, ECONNREFUSED / EMFILE counts, and peak RSS via getrusage. All
three scenarios are marked #[ignore]; the 10k case is unix-only and
self-skips when RLIMIT_NOFILE cannot accommodate the request.

The harness exists to provide evidence for whether an async listener
(tracked separately) would be a meaningful change to the current
std::thread::spawn-per-connection model.

* style(daemon): apply rustfmt to connection_scaling_stress

* fix(daemon): convert ru_maxrss via i64::from for cross-arch portability

* fix(daemon): use ru_maxrss directly without conversion

The conversion was redundant on all targets we build on: c_long is i64
on 64-bit Linux/macOS (no conversion needed) and i32 on 32-bit Linux
(the subsequent 'as u64' sign-extends to i64 first, then saturating_mul
operates on u64). Removing the conversion silences both
clippy::unnecessary_cast and clippy::useless_conversion without losing
overflow safety.
oferchen added a commit that referenced this pull request May 18, 2026
Add a focused evaluation of async runtime options for the daemon
accept loop. Compares tokio against async-std and the existing
thread-per-connection model, and records the decision to adopt
tokio with the rt-multi-thread flavour under the existing async
feature gate.

The doc complements the implementation plan in #1935 and the
benchmark plan in #1933 rather than restating either. It covers
maintenance posture, feature parity, ecosystem alignment, the
case for staying threaded, the case for tokio, migration cost,
trigger conditions, and a five-step adoption sequence.
oferchen added a commit that referenced this pull request May 18, 2026
#1933) (#4182)

* chore(bench): stress harness for thread-per-connection scaling

Adds an integration-test-shaped stress benchmark that drives 100, 1000,
and 10000 concurrent TCP clients against the daemon listener, capturing
wall time, ECONNREFUSED / EMFILE counts, and peak RSS via getrusage. All
three scenarios are marked #[ignore]; the 10k case is unix-only and
self-skips when RLIMIT_NOFILE cannot accommodate the request.

The harness exists to provide evidence for whether an async listener
(tracked separately) would be a meaningful change to the current
std::thread::spawn-per-connection model.

* style(daemon): apply rustfmt to connection_scaling_stress

* fix(daemon): convert ru_maxrss via i64::from for cross-arch portability

* fix(daemon): use ru_maxrss directly without conversion

The conversion was redundant on all targets we build on: c_long is i64
on 64-bit Linux/macOS (no conversion needed) and i32 on 32-bit Linux
(the subsequent 'as u64' sign-extends to i64 first, then saturating_mul
operates on u64). Removing the conversion silences both
clippy::unnecessary_cast and clippy::useless_conversion without losing
overflow safety.
oferchen added a commit that referenced this pull request May 18, 2026
Add a focused evaluation of async runtime options for the daemon
accept loop. Compares tokio against async-std and the existing
thread-per-connection model, and records the decision to adopt
tokio with the rt-multi-thread flavour under the existing async
feature gate.

The doc complements the implementation plan in #1935 and the
benchmark plan in #1933 rather than restating either. It covers
maintenance posture, feature parity, ecosystem alignment, the
case for staying threaded, the case for tokio, migration cost,
trigger conditions, and a five-step adoption sequence.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant