bench(fast_io): IOCP vs io_uring under matched workloads (#1868) by oferchen · Pull Request #4255 · oferchen/rsync

oferchen · 2026-05-17T20:13:24Z

Summary

Add crates/fast_io/benches/iocp_vs_iouring_matched.rs: a matched-workload bench that drives IocpDiskBatch on Windows and IoUringDiskBatch on Linux through identical payload sizes (4 KiB / 64 KiB / 1 MiB), file counts (1000), per-iteration temp-dir lifecycles, and a deterministic 4 KiB tiled payload pattern.
Every host (Linux, Windows, macOS) runs a portable std_baseline cell that uses File::create + write_all so per-host throughput can be normalised before any cross-host comparison. The two kernels cannot be compared head-to-head; the normalised ratio is the cross-platform signal.
Register iocp_vs_iouring_matched in crates/fast_io/Cargo.toml with harness = false. The bench file compiles on every platform; non-target-OS bodies fall back to the std_baseline-only group so Criterion's harness = false contract holds everywhere.

Cells per platform

Linux (all(target_os = "linux", feature = "io_uring")): iouring_default, iouring_concurrent_ops_8 (sq_entries = 8), iouring_sqpoll (sqpoll = true), std_baseline. The io_uring cells are env-gated via OC_RSYNC_BENCH_IOURING_RING=1; the SQPOLL cell additionally requires OC_RSYNC_BENCH_IOURING_SQPOLL=1, matching the existing iouring_sqpoll_vs_regular.rs recipe.
Windows (all(target_os = "windows", feature = "iocp")): iocp_default, iocp_concurrent_ops_8 (concurrent_ops = 8), std_baseline.
Every other host: std_baseline only.

Normalisation approach

Divide each platform-specific cell's throughput by that host's std_baseline throughput for the same payload size, then compare the resulting ratios across the Linux and Windows runs. The ratio strips out host-specific storage stack effects so the residual delta reflects the kernel-async dispatch style itself. This is documented in the bench file's module doc with a worked example.

Test plan

CI fmt+clippy passes.
CI nextest (stable) passes on Linux / macOS / Windows.
Linux runner with io_uring opt-in: OC_RSYNC_BENCH_IOURING_RING=1 cargo bench -p fast_io --bench iocp_vs_iouring_matched produces all four Linux cells.
Windows runner: cargo bench -p fast_io --bench iocp_vs_iouring_matched produces all three Windows cells.
macOS runner: bench compiles and emits only the std_baseline cells.

Add a matched-workload benchmark that drives both kernel-async write paths (IocpDiskBatch on Windows, IoUringDiskBatch on Linux) against the same payload sizes (4 KiB / 64 KiB / 1 MiB), file count (1000), and per-iteration temp-dir lifecycle. Each host also runs a std_baseline cell so per-host throughput can be normalised before cross-host comparison; the two kernels cannot be compared directly.

oferchen merged commit 41083c3 into master May 17, 2026
14 checks passed

oferchen deleted the bench/iocp-vs-iouring-matched-1868 branch May 19, 2026 19:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bench(fast_io): IOCP vs io_uring under matched workloads (#1868)#4255

bench(fast_io): IOCP vs io_uring under matched workloads (#1868)#4255
oferchen merged 1 commit into
masterfrom
bench/iocp-vs-iouring-matched-1868

oferchen commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

oferchen commented May 17, 2026

Summary

Cells per platform

Normalisation approach

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant