io_uring: switch server-socket read path to multishot recv#44669
Draft
aburan28 wants to merge 4 commits into
Draft
io_uring: switch server-socket read path to multishot recv#44669aburan28 wants to merge 4 commits into
aburan28 wants to merge 4 commits into
Conversation
This is a no-behavior-change preparation step for multishot recv. The ``CompletionCb`` callback type now takes a ``uint32_t flags`` argument that carries the raw ``cqe->flags`` value from the kernel. For multishot completions a follow-up change will inspect: * ``IORING_CQE_F_BUFFER`` — a buffer was selected from a buf-ring; the buffer ID is encoded in the upper bits. * ``IORING_CQE_F_MORE`` — the SQE will produce further completions. The worker callback ignores ``flags`` for now. Injected completions are defined to always carry ``flags == 0``. All ``forEveryCompletion`` callers (worker, impl tests) updated. ``IoUringSocket::on*`` virtual methods are intentionally unchanged in this commit; only ``onRead`` will need flags, in the multishot recv change. Signed-off-by: Adam Buran <a.buran28@gmail.com> Signed-off-by: Adam Buran <aburan28@gmail.com>
Adds the kernel-managed buffer ring lifecycle and the ``recv`` multishot opcode to ``IoUringImpl``. This is the plumbing layer for switching the io_uring socket read path off the per-read ``readv`` allocation; the worker change comes in a follow-up PR. New ``IoUring`` virtuals: * ``setupBufRing(group_id, count, buf_size)`` — register a buffer ring with the kernel. The buffers live in a single contiguous allocation owned by ``IoUringImpl``. Validates that ``count`` is a non-zero power of two and rejects double-setup. Falls back to ``IoUringResult::Failed`` on kernels that lack ``IORING_REGISTER_PBUF_RING`` (< 5.19). * ``prepareRecvMultishot(fd, group_id, user_data)`` — submits a recv with ``IOSQE_BUFFER_SELECT`` so the kernel pulls a buffer from the ring. The same SQE may produce multiple completions, signalled by ``IORING_CQE_F_MORE`` in ``cqe->flags``. * ``getBufferForBid(group_id, bid)`` — look up the storage backing a particular kernel-selected buffer; the consumer reads up to ``cqe->res`` bytes and then recycles. * ``recycleBuffer(group_id, bid)`` — return a consumed buffer to the ring so the kernel can reuse it. For now only one buf-ring is supported per ``IoUring`` instance. Test: * ``SetupBufRingValidatesInputs`` — exercises the rejection paths (bad count, bad buf_size, double-setup). * ``MultishotRecvDeliversBuffersAndStaysArmed`` — end-to-end with a real socketpair and a real ring: arm a multishot recv, write twice, verify both completions deliver buffers, the bid is in range, the data matches, and the SQE stays armed (F_MORE set on the first completion). Skips when the kernel lacks buf-ring support. Signed-off-by: Adam Buran <a.buran28@gmail.com> Signed-off-by: Adam Buran <aburan28@gmail.com>
When the worker is configured with multishot recv enabled and the kernel/liburing successfully sets up a buf-ring (5.19+), the ``IoUringServerSocket`` read path replaces the per-read ``readv`` SQE + ``uint8_t[]`` allocation with a single ``IORING_OP_RECV`` multishot SQE that pulls buffers from the kernel- managed ring. Each completion delivers one kernel-selected buffer; the ``BufferFragment`` wrapping it recycles the buffer back to the ring on release. Mechanics: * New ``Request::RequestType::RecvMultishot`` distinguishes the multishot SQE from a plain ``Read``. The worker's completion dispatch routes both to ``onRead`` but holds onto the ``Request*`` while ``IORING_CQE_F_MORE`` is set (the kernel reuses the same user_data for further completions on the same SQE). * ``IoUringSocket::onRead`` gains a ``uint32_t flags`` argument carrying the raw ``cqe->flags``. The buffer ID is in the upper bits when ``IORING_CQE_F_BUFFER`` is set; ``F_MORE`` indicates the SQE is still armed. * ``IoUringServerSocket::onRead`` only clears ``read_req_`` when the SQE has terminated. While armed, the bottom-of-function ``submitReadRequest`` short-circuits because ``read_req_`` is still non-null. When ``F_MORE`` clears, ``read_req_`` is freed and a new multishot SQE is submitted. * ``IoUringWorkerImpl::makeMultishotBufferFragment`` wraps the kernel buffer with a release callback that calls ``recycleBuffer`` — back-pressure / buffer return is driven by the upper-layer drain. * On older kernels ``setupBufRing`` returns ``Failed`` and the worker silently falls back to the existing ``readv`` path, so the feature is safe to ship gated behind a config flag. The worker constructor gains two new defaulted args (``enable_multishot_recv``, ``multishot_recv_buffer_count``) so all existing call sites continue to compile unchanged. Tests: * ``MultishotRecvSetupAndSubmit`` — buf-ring setup + first submit picks the multishot path and produces a ``RecvMultishot`` request. * ``MultishotRecvFallbackOnUnsupportedKernel`` — when ``setupBufRing`` fails, the worker falls back to ``prepareReadv``. * ``MultishotRecvDeliversBufferAndStaysArmed`` — completion with ``F_BUFFER | F_MORE`` delivers the buffer to the upper layer and does not re-arm the SQE; the buffer is recycled when the upper layer drains. * ``MultishotRecvReArmOnFMoreClear`` — completion with ``F_BUFFER`` but no ``F_MORE`` triggers a fresh ``prepareRecvMultishot`` to re-arm. The proto / factory wiring to actually expose this option is in a follow-up change. Signed-off-by: Adam Buran <a.buran28@gmail.com> Signed-off-by: Adam Buran <aburan28@gmail.com>
|
Hi @aburan28, welcome and thank you for your contribution. We will try to review your Pull Request as quickly as possible. In the meantime, please take a look at the contribution guidelines if you have not done so already. |
Member
|
Let's mark this as a draft until the PR it depends on is merged. |
Signed-off-by: Adam Buran <aburan28@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Commit Message:
io_uring: switch server-socket read path to multishot recv
Switches
IoUringServerSocket's read path from a per-readreadv+heap-allocated
uint8_t[]to a single multishotIORING_OP_RECVSQE thatpulls buffers from a kernel-managed buf-ring. Each completion hands the
upper layer one kernel-selected buffer wrapped in a
BufferFragmentImplwhose release callback recycles it back to the ring.
The current readv path has two costs on every read: a
make_unique<uint8_t[]>(size)(and a paireddelete[]), and a fresh SQEsubmission. With multishot + provided buffers the kernel keeps the SQE armed
across many recv completions and pulls from a pre-registered buffer pool —
eliminating both costs on the hot path.
Mechanics:
Request::RequestType::RecvMultishotdistinguishes the multishot SQEso the worker's completion dispatch knows to keep the
Request*alivewhile
IORING_CQE_F_MOREis set (the kernel reuses the same user_data forfurther completions).
IoUringSocket::onReadvirtual gains auint32_t flagsargument carryingthe raw
cqe->flags. The buffer ID is in the upper bits whenIORING_CQE_F_BUFFERis set;F_MOREindicates the SQE is still armed.IoUringServerSocket::onReadonly clearsread_req_when the SQE hasterminated. While armed the bottom-of-function
submitReadRequestshort-circuits because
read_req_is still non-null. WhenF_MOREclears,
read_req_is freed and a new multishot SQE is submitted.IoUringWorkerImpl::makeMultishotBufferFragmentwraps the kernel bufferwith a release callback that calls
recycleBuffer— buffer return isdriven by the upper-layer drain.
Depends on:
CompletionCbflags arg)prepareRecvMultishotAPI)Additional Description:
The worker constructor gains two new defaulted args
(
enable_multishot_recv = false,multishot_recv_buffer_count = 256) so allexisting call sites compile unchanged. When multishot is requested but the
kernel/liburing doesn't support it,
setupBufRingreturnsFailedand theworker silently falls back to readv.
The proto config + factory wiring to actually expose this option lives in
#44670. Until that lands, the readv path is the only one used in production.
AI usage disclosure: Portions of the code and/or PR description were drafted
with the assistance of Claude (Anthropic). I reviewed and understand all
submitted code.
Risk Level: Medium
(Materially different read path for io_uring server sockets, but defaulted
off via worker constructor args and not yet reachable from proto config.
With the default, no behavior changes.)
Testing:
MultishotRecvSetupAndSubmit— buf-ring setup + first submit picks themultishot path and produces a
RecvMultishotrequest.MultishotRecvFallbackOnUnsupportedKernel— whensetupBufRingfails,the worker falls back to
prepareReadv.MultishotRecvDeliversBufferAndStaysArmed— completion withF_BUFFER | F_MOREdelivers the buffer and does not re-arm.MultishotRecvReArmOnFMoreClear— completion withF_BUFFERbut noF_MOREtriggers a freshprepareRecvMultishot.Docs Changes: N/A. New
onReadflagsarg is documented inline inenvoy/common/io/io_uring.h.Release Notes: N/A (read path change is unreachable from configuration until
#44670 lands; the public-facing release note belongs there.)
Platform Specific Features:
io_uring is Linux-only. Multishot recv requires kernel 6.0+; on older
kernels the worker falls back to readv via the
setupBufRingfailure path.No platform support change beyond existing io_uring build gating.
Runtime guard: N/A in this PR — the new path is gated by a constructor arg
that defaults to off and is unreachable from configuration. The config-level
gate (and any release-note runtime guard) lives in #44670.