Harden virtio-blk and virtio-net by jserv · Pull Request #51 · sysprog21/kvm-host

jserv · 2026-04-29T20:10:02Z

Address ten correctness and safety issues across the virtqueue chain walking, descriptor validation, and back-pressure paths.

Chain walking. Both devices walked exactly one (virtio-net) or three (virtio-blk) descriptors per request, so a guest that split a packet or a >4 KiB block I/O across multiple descriptors got partial transfers and the next chain descriptor was reinterpreted as a new request head. Both device emulators now snapshot the entire chain via VRING_DESC_F_NEXT, sized to the VIRTQ_SIZE-bounded stack array, and route data through readv/writev (net) or per-segment diskimg I/O (blk).

Descriptor direction. Neither device validated VRING_DESC_F_WRITE before touching guest memory, so a buggy or malicious guest could hand a "readable-only" status descriptor and have the device write into memory it never offered for writes. Every consumer site now checks direction matches the request type and rejects mismatches with VIRTIO_BLK_S_IOERR or an empty USED publication.

NULL guards. virtq_get_avail can legitimately return NULL on a malformed chain; the prior code dereferenced the result on virtio-blk's data and status reads, which is a host segfault, not a queue stall. Every consumer site now NULL-checks. On a malformed chain the device stalls the queue rather than publishing USED with chain[0].id (the buffer ID lives on the chain's last descriptor in packed virtqueues, so publishing with a head id risks pointing the driver at an unrelated in-flight chain).

Always-publish-USED. Every error path in virtio-blk/virtio-net used to return without flipping USED on the head, leaking descriptors and eventually drifting the device's view of the ring out of sync with the driver's. Successful walks now always publish USED — IOERR with len=1 on validation failure, the real status byte plus device-writable byte count on success.

Stack-array bounds. Per-iteration desc_snap[VIRTQ_SIZE] / iov[VIRTQ_SIZE] arrays are sized to the host-advertised queue depth. virtio-pci now clamps guest writes to queue_size at VIRTQ_SIZE and the walkers cap at VIRTQ_SIZE, so a guest cannot make the walk overrun the stack arrays.

Atomic event flag stores. virtq_handle_avail reads guest_event->flags with __ATOMIC_ACQUIRE; the worker threads paired this with plain stores, so the compiler was free to tear or reorder against the surrounding completion writes. A new virtq_set_guest_event_flags helper replaces every plain store with __atomic_store_n / __ATOMIC_RELEASE.

virtio-blk SG arithmetic. data_size widened from uint16_t to uint32_t so >64 KiB segments don't silently truncate before reaching diskimg I/O. Per-segment writable_total now accumulates in uint64_t with __builtin_add_overflow, capping at UINT32_MAX-1 so the reported used.len plus the trailing status byte fits the packed-ring uint32_t.

VIRTIO_BLK_F_FLUSH. Without the negotiated FLUSH feature the Linux virtio-blk driver runs in writeback-without-barriers mode, so guest fsync(2) returns success while data sits in the host page cache. The device now advertises VIRTIO_BLK_F_FLUSH, dispatches T_FLUSH to a new diskimg_flush() (fdatasync), and fsyncs the backing file at exit.

TX back-pressure. The TX worker previously required tapfd POLLOUT in its poll predicate; with a level-triggered ioeventfd POLLIN that meant 100% CPU until the TAP drained. The poll set now drops the TAP predicate by default, drains the ioeventfd on a guest kick, and arms POLLOUT only after a transient writev() EAGAIN. On EAGAIN the chain's next_avail_idx and used_wrap_count are rolled back so the TAP-blocked chain is retried on POLLOUT rather than silently dropped.

Summary by cubic

Hardened virtio-blk and virtio-net by walking full packed-virtqueue descriptor chains, enforcing descriptor direction, using atomic event-flag stores, and improving TX back-pressure. This prevents partial I/O, ring desyncs, and busy spins, and adds flush support so guest fsync is durable.

Bug Fixes
- Walk entire chains; build iovecs and use readv/writev (net) or per-segment disk I/O (blk).
- Validate VRING_DESC_F_WRITE and header sizes; NULL-check chain elements; stall on malformed chains.
- Always publish USED with the last-descriptor buffer ID and correct used.len via virtq_publish_used.
- Clamp queue_size and chain length to VIRTQ_SIZE; bound stack arrays; widen blk data_size to uint32_t and cap 64-bit totals.
- Use release stores for guest event flags via virtq_set_guest_event_flags and for USED flips to fix ordering.
- TX back-pressure (net): poll drains ioeventfd, arm POLLOUT only after EAGAIN, roll back indices to retry, and stop CPU spin; RX/TX handle the virtio header correctly.
New Features
- Advertise VIRTIO_BLK_F_FLUSH; handle T_FLUSH via fdatasync and flush the backing file on device shutdown.

^{Written for commit b461e1a. Summary will update on new commits. Review in cubic}

cubic-dev-ai

1 issue found across 9 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/virtio-net.c">

<violation number="1" location="src/virtio-net.c:372">
P2: `goto tx_publish` skips `dev->tx_wait_for_tap = false`, leaving stale back-pressure state if a previous iteration hit EAGAIN. Move the assignment above the label so it applies to all paths that publish USED.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

Address ten correctness and safety issues across the virtqueue chain walking, descriptor validation, and back-pressure paths. Chain walking. Both devices walked exactly one (virtio-net) or three (virtio-blk) descriptors per request, so a guest that split a packet or a >4 KiB block I/O across multiple descriptors got partial transfers and the next chain descriptor was reinterpreted as a new request head. Both device emulators now snapshot the entire chain via VRING_DESC_F_NEXT, sized to the VIRTQ_SIZE-bounded stack array, and route data through readv/writev (net) or per-segment diskimg I/O (blk). Descriptor direction. Neither device validated VRING_DESC_F_WRITE before touching guest memory, so a buggy or malicious guest could hand a "readable-only" status descriptor and have the device write into memory it never offered for writes. Every consumer site now checks direction matches the request type and rejects mismatches with VIRTIO_BLK_S_IOERR or an empty USED publication. NULL guards. virtq_get_avail can legitimately return NULL on a malformed chain; the prior code dereferenced the result on virtio-blk's data and status reads, which is a host segfault, not a queue stall. Every consumer site now NULL-checks. On a malformed chain the device stalls the queue rather than publishing USED with chain[0].id (the buffer ID lives on the chain's last descriptor in packed virtqueues, so publishing with a head id risks pointing the driver at an unrelated in-flight chain). Always-publish-USED. Every error path in virtio-blk/virtio-net used to return without flipping USED on the head, leaking descriptors and eventually drifting the device's view of the ring out of sync with the driver's. Successful walks now always publish USED — IOERR with len=1 on validation failure, the real status byte plus device-writable byte count on success. Stack-array bounds. Per-iteration desc_snap[VIRTQ_SIZE] / iov[VIRTQ_SIZE] arrays are sized to the host-advertised queue depth. virtio-pci now clamps guest writes to queue_size at VIRTQ_SIZE and the walkers cap at VIRTQ_SIZE, so a guest cannot make the walk overrun the stack arrays. Atomic event flag stores. virtq_handle_avail reads guest_event->flags with __ATOMIC_ACQUIRE; the worker threads paired this with plain stores, so the compiler was free to tear or reorder against the surrounding completion writes. A new virtq_set_guest_event_flags helper replaces every plain store with __atomic_store_n / __ATOMIC_RELEASE. virtio-blk SG arithmetic. data_size widened from uint16_t to uint32_t so >64 KiB segments don't silently truncate before reaching diskimg I/O. Per-segment writable_total now accumulates in uint64_t with __builtin_add_overflow, capping at UINT32_MAX-1 so the reported used.len plus the trailing status byte fits the packed-ring uint32_t. VIRTIO_BLK_F_FLUSH. Without the negotiated FLUSH feature the Linux virtio-blk driver runs in writeback-without-barriers mode, so guest fsync(2) returns success while data sits in the host page cache. The device now advertises VIRTIO_BLK_F_FLUSH, dispatches T_FLUSH to a new diskimg_flush() (fdatasync), and fsyncs the backing file at exit. TX back-pressure. The TX worker previously required tapfd POLLOUT in its poll predicate; with a level-triggered ioeventfd POLLIN that meant 100% CPU until the TAP drained. The poll set now drops the TAP predicate by default, drains the ioeventfd on a guest kick, and arms POLLOUT only after a transient writev() EAGAIN. On EAGAIN the chain's next_avail_idx and used_wrap_count are rolled back so the TAP-blocked chain is retried on POLLOUT rather than silently dropped.

cubic-dev-ai Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread src/virtio-net.c

jserv force-pushed the virtio branch from 2993f3c to b461e1a Compare April 29, 2026 20:25

jserv merged commit 606e9ca into master Apr 29, 2026
10 checks passed

jserv deleted the virtio branch April 29, 2026 20:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden virtio-blk and virtio-net#51

Harden virtio-blk and virtio-net#51
jserv merged 1 commit intomasterfrom
virtio

jserv commented Apr 29, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jserv commented Apr 29, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by cubic

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jserv commented Apr 29, 2026 •

edited by cubic-dev-ai Bot

Loading