Skip to content

fix(fast_io): prevent io_uring SEND deadlock under TCP backpressure (#1872)#3551

Merged
oferchen merged 4 commits into
masterfrom
fix/io-uring-send-backpressure-1872
May 2, 2026
Merged

fix(fast_io): prevent io_uring SEND deadlock under TCP backpressure (#1872)#3551
oferchen merged 4 commits into
masterfrom
fix/io-uring-send-backpressure-1872

Conversation

@oferchen
Copy link
Copy Markdown
Owner

@oferchen oferchen commented May 2, 2026

Summary

  • Fixes docs: document canonical source repository #1872. An IORING_OP_SEND on a back-pressured TCP socket would sit in the kernel until the send buffer drained, holding submit_and_wait and starving any concurrent RECV completion side, producing an apparent deadlock during daemon-mode bidirectional transfers.
  • Adds a PollAdd(POLLOUT) readiness gate (with a linked timeout) in front of every submit_send_batch flush. SEND SQEs only enter the kernel once the socket reports writable, so submit_and_wait is bounded by readiness rather than peer drain. This mirrors upstream rsync's select()-based bidirectional I/O loop in io.c:perform_io.
  • Transient EAGAIN/EWOULDBLOCK SEND CQEs and ETIME/ECANCELED poll CQEs re-arm the readiness wait without surfacing fatal errors.

Why poll-add gating instead of an interleaved peek loop

Each IoUringSocketWriter/IoUringSocketReader already owns a private ring (so the SEND ring never carries RECV SQEs). The cleanest containment for the deadlock is therefore inside the writer's own submit_send_batch: defer the SEND submission until the kernel signals room. This adds zero coupling between the writer and reader rings and keeps the existing API surface intact, whereas an interleaved peek/timeout loop would require sharing CQE handling between two rings that today have no shared state.

Test plan

  • CI: cargo nextest run -p fast_io --all-features -E 'test(io_uring) or test(socket) or test(send)' (Linux must exercise test_socket_send_no_deadlock_under_backpressure_1872; macOS/Windows skip via #[cfg(target_os = "linux")]).
  • CI: full nextest matrix (Linux musl, Windows, macOS) on stable.
  • CI: fmt + clippy + interop workflows.

The new regression test:

  1. Opens a loopback TCP pair and shrinks SO_SNDBUF/SO_RCVBUF to 4 KiB.
  2. Pre-fills the writer's kernel send buffer with a sentinel byte until a non-blocking write returns EAGAIN, then restores blocking mode.
  3. Spawns separate writer and drain threads, each backed by its own io_uring ring (IoUringPolicy::Enabled).
  4. Asserts the writer reports the full 64 KiB payload within a 20 s wall-clock deadline (without the fix this would loop indefinitely).
  5. Skips gracefully when io_uring is unavailable.

@github-actions github-actions Bot added the bug Something isn't working label May 2, 2026
oferchen added 4 commits May 2, 2026 08:20
…1872)

Without a readiness gate, an IORING_OP_SEND on a back-pressured TCP
socket can sit in the kernel until the send buffer drains. While that
SQE is pending the writer ring's submit_and_wait() does not return,
starving any concurrent RECV completion side and producing the
apparent deadlock reported in #1872.

Mirror upstream rsync's bidirectional select() strategy
(io.c:perform_io) inside submit_send_batch by gating each batch with a
PollAdd(POLLOUT) SQE plus a linked Timeout. SEND SQEs are only
submitted after the socket reports writable, so submit_and_wait is
bounded by readiness rather than peer drain. Transient EAGAIN/ETIME
results re-arm the readiness wait instead of failing.

Adds a Linux-only regression test that prefills a small TCP socket
buffer until EAGAIN, then drives a concurrent SEND and RECV across
their own io_uring rings under a 20s wall-clock guard. The test skips
gracefully when io_uring is not available.
)

EWOULDBLOCK is defined equal to EAGAIN on Linux, so the second match arm is unreachable under -D warnings. Collapse to a single equality check and rerun rustfmt on the writer thread spawn block.
…1872)

Same unreachable-pattern fix as the prior commit for batching.rs - applied
to the prefill loop in tests.rs. EWOULDBLOCK == EAGAIN on Linux, so the
second arm is unreachable under -D warnings.
The previous payload formula `(i % 250) + 1` produced byte 0xAB at
i=170, which collides with PREFILL_MARKER (0xAB). The drain side relies
on this byte never appearing in the payload to recognize the boundary
between the saturating prefill and the io_uring writer output.

Replace the mapping with a range that walks 1..=255 while skipping the
marker byte, so the debug_assert on line 1040 holds for any index.
@oferchen oferchen force-pushed the fix/io-uring-send-backpressure-1872 branch from 4e3eed2 to 58e4329 Compare May 2, 2026 05:20
@oferchen oferchen merged commit 3088f26 into master May 2, 2026
37 checks passed
@oferchen oferchen deleted the fix/io-uring-send-backpressure-1872 branch May 2, 2026 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant