Skip to content

io_uring: add buf-ring + multishot recv to IoUringImpl#44668

Open
aburan28 wants to merge 3 commits into
envoyproxy:mainfrom
aburan28:multishot-recv/02-buf-ring
Open

io_uring: add buf-ring + multishot recv to IoUringImpl#44668
aburan28 wants to merge 3 commits into
envoyproxy:mainfrom
aburan28:multishot-recv/02-buf-ring

Conversation

@aburan28
Copy link
Copy Markdown

@aburan28 aburan28 commented Apr 27, 2026

Commit Message:
io_uring: add buf-ring + multishot recv API to IoUringImpl

Plumbing layer for switching the io_uring socket read path off the per-read
readv allocation. Adds the kernel-managed buffer ring
(IORING_REGISTER_PBUF_RING) lifecycle and IORING_OP_RECV multishot to
IoUringImpl. The worker change that consumes this comes in a follow-up PR.

New IoUring virtuals:

  • setupBufRing(group_id, count, buf_size) — registers a buffer ring with
    the kernel. Buffers live in a single contiguous allocation owned by
    IoUringImpl. Validates that count is a non-zero power of two; rejects
    double-setup. Returns Failed on kernels < 5.19 (no
    IORING_REGISTER_PBUF_RING).
  • prepareRecvMultishot(fd, group_id, user_data) — submits a recv with
    IOSQE_BUFFER_SELECT. The same SQE may produce multiple completions,
    signalled by IORING_CQE_F_MORE in cqe->flags.
  • getBufferForBid(group_id, bid) — looks up the storage backing a
    kernel-selected buffer.
  • recycleBuffer(group_id, bid) — returns a consumed buffer to the ring.

Only one buf-ring per IoUring instance for now.

Depends on #44667 (CompletionCb flags-arg refactor — required so
forEveryCompletion can surface cqe->flags to the multishot consumer).

Additional Description:
The full multishot read path touches the worker, the server-socket read state
machine, and the proto config. Landing the IoUringImpl API on its own gives
reviewers a smaller, well-scoped surface to vet against the liburing/kernel
contract.

AI usage disclosure: Portions of the code and/or PR description were drafted
with the assistance of Claude (Anthropic). I reviewed and understand all
submitted code.

Risk Level: Low
(API-only addition. Nothing in this PR calls the new virtuals — the worker-
side caller arrives in #44669. No existing path changes.)

Testing:

  • New unit test SetupBufRingValidatesInputs covers rejection paths (bad
    count, bad buf_size, double-setup).
  • New end-to-end test MultishotRecvDeliversBuffersAndStaysArmed uses a real
    IoUringImpl + socketpair: arms a multishot recv, writes twice, verifies
    both completions deliver buffers, bids are in range, data matches, and
    F_MORE stays set across recycles. Skips when the kernel lacks buf-ring
    support.
  • Existing io_uring unit and integration tests on Linux CI.

Docs Changes: N/A. New methods are documented inline in
envoy/common/io/io_uring.h.

Release Notes: N/A (internal API additions; not yet exposed to extension
authors or operators — exposure happens in #44670).

Platform Specific Features:
io_uring is Linux-only. Buf-ring requires kernel 5.19+; setupBufRing
returns Failed on older kernels and callers are expected to fall back. No
platform support change beyond the existing io_uring build gating.

Runtime guard: N/A. No call sites in this PR — the new API is dead code
until #44669 lands.

This is a no-behavior-change preparation step for multishot recv. The
``CompletionCb`` callback type now takes a ``uint32_t flags`` argument
that carries the raw ``cqe->flags`` value from the kernel.

For multishot completions a follow-up change will inspect:
* ``IORING_CQE_F_BUFFER`` — a buffer was selected from a buf-ring; the
  buffer ID is encoded in the upper bits.
* ``IORING_CQE_F_MORE`` — the SQE will produce further completions.

The worker callback ignores ``flags`` for now. Injected completions are
defined to always carry ``flags == 0``.

All ``forEveryCompletion`` callers (worker, impl tests) updated.
``IoUringSocket::on*`` virtual methods are intentionally unchanged in
this commit; only ``onRead`` will need flags, in the multishot recv
change.

Signed-off-by: Adam Buran <a.buran28@gmail.com>
Signed-off-by: Adam Buran <aburan28@gmail.com>
Adds the kernel-managed buffer ring lifecycle and the ``recv`` multishot
opcode to ``IoUringImpl``. This is the plumbing layer for switching the
io_uring socket read path off the per-read ``readv`` allocation; the
worker change comes in a follow-up PR.

New ``IoUring`` virtuals:

* ``setupBufRing(group_id, count, buf_size)`` — register a buffer ring
  with the kernel. The buffers live in a single contiguous allocation
  owned by ``IoUringImpl``. Validates that ``count`` is a non-zero power
  of two and rejects double-setup. Falls back to ``IoUringResult::Failed``
  on kernels that lack ``IORING_REGISTER_PBUF_RING`` (< 5.19).
* ``prepareRecvMultishot(fd, group_id, user_data)`` — submits a recv
  with ``IOSQE_BUFFER_SELECT`` so the kernel pulls a buffer from the
  ring. The same SQE may produce multiple completions, signalled by
  ``IORING_CQE_F_MORE`` in ``cqe->flags``.
* ``getBufferForBid(group_id, bid)`` — look up the storage backing a
  particular kernel-selected buffer; the consumer reads up to ``cqe->res``
  bytes and then recycles.
* ``recycleBuffer(group_id, bid)`` — return a consumed buffer to the
  ring so the kernel can reuse it.

For now only one buf-ring is supported per ``IoUring`` instance.

Test:
* ``SetupBufRingValidatesInputs`` — exercises the rejection paths
  (bad count, bad buf_size, double-setup).
* ``MultishotRecvDeliversBuffersAndStaysArmed`` — end-to-end with a real
  socketpair and a real ring: arm a multishot recv, write twice,
  verify both completions deliver buffers, the bid is in range, the
  data matches, and the SQE stays armed (F_MORE set on the first
  completion). Skips when the kernel lacks buf-ring support.

Signed-off-by: Adam Buran <a.buran28@gmail.com>
Signed-off-by: Adam Buran <aburan28@gmail.com>
@aburan28 aburan28 had a problem deploying to external-contributors April 27, 2026 02:02 — with GitHub Actions Error
@repokitteh-read-only
Copy link
Copy Markdown

Hi @aburan28, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #44668 was opened by aburan28.

see: more, trace.

@repokitteh-read-only
Copy link
Copy Markdown

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #44668 was opened by aburan28.

see: more, trace.

@zuercher
Copy link
Copy Markdown
Member

zuercher commented May 5, 2026

Let's mark this as a draft until the PR it depends on is merged.

Signed-off-by: Adam Buran <aburan28@gmail.com>
@aburan28 aburan28 requested a deployment to external-contributors May 11, 2026 00:01 — with GitHub Actions Waiting
@kyessenov
Copy link
Copy Markdown
Contributor

Please mark as draft if not ready to merge.
/wait

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants