get_vring_base should not reset the queue #161

germag · 2023-05-22T15:59:20Z

Summary of the PR

The spec specifies that on receiving GET_VRING_BASE the backend should stop the vring, but not that it must be reset. This is intended for VHOST_USER_RESET_DEVICE, also in this case the spec makes a difference between stopping and disabling the ring.

The spec also doesn't forbid to send VHOST_USER_SET_VRING_ENABLE to enable the vring after receiving GET_VRING_BASE or sending more GET_VRING_BASE messages, which would always respond 0. Moreover, qemu[0] doesn't reset the vring either.

[0] https://github.com/qemu/qemu/blob/792f77f376adef944f9a03e601f6ad90c2f891b2/subprojects/libvhost-user/libvhost-user.c#L1175

Note:
I just added blank lines around self.vrings[index as usize].set_queue_ready(false); because the first time I worked on this (#154) I didn't realize that the ring was already being disabled.

Requirements

Before submitting your PR, please make sure you addressed the following
requirements:

All commits in this PR are signed (with git commit -s), and the commit
message has max 60 characters for the summary and max 75 characters for each
description line.
All added/changed functionality has a corresponding unit/integration
test.
All added/changed public-facing functionality has entries in the "Upcoming
Release" section of CHANGELOG.md (if no such section exists, please create one).
Any newly added unsafe code is properly documented.

germag · 2023-05-22T16:04:55Z

In my previous PR (#154) I was wrong about DPDK[1], at the end it calls vring_invalidate(dev, vq); that does something similar to Queue::reset(), however, I still think it is not the right place to call reset, I think it is incorrectly assumed that GET_VRING_BASE will always be the last message.

[1] https://github.com/DPDK/dpdk/blob/d03446724972d2a1bb645ce7f3e64f5ef0203d61/lib/vhost/vhost_user.c#L2133

stefano-garzarella

In my previous PR (#154) I was wrong about DPDK[1], at the end it calls vring_invalidate(dev, vq); that does something similar to Queue::reset(),

Thanks for checking!

however, I still think it is not the right place to call reset, I think it is incorrectly assumed that GET_VRING_BASE will always be the last message.

Yep, I agree. I think multiple calls of GET_VRING_BASE should return the same index.
I think we should support VHOST_USER_RESET_DEVICE, in the meantime maybe a well place could be VHOST_USER_RESET_OWNER, which itself should be a NOP, but I see in QEMU that it is used to reset the device when VHOST_USER_PROTOCOL_F_RESET_DEVICE has not been negotiated.

stefano-garzarella · 2023-06-26T10:38:06Z

@jiangliu @sboeuf can you take a look?
And eventually test with your code to be sure we aren't breaking anything?

The spec specifies that on receiving `GET_VRING_BASE` the backend should stop the vring, but not that it must be reset. This is intended for `VHOST_USER_RESET_DEVICE`, also in this case the spec makes a difference between stopping and disabling the ring. The spec also doesn't forbid to send `VHOST_USER_SET_VRING_ENABLE` to enable the vring after receiving `GET_VRING_BASE` or sending more `GET_VRING_BASE` messages, which would always respond 0. Moreover, qemu doesn't reset the vring either. Signed-off-by: German Maglione <gmaglione@redhat.com>

The new version of vhost-user-backend fetches the used idx from the guest[0], without this fix the guest hangs after a reboot, because the used idx is not reset. This error was introduced when we fixed GET_VRING_BASE to incorrectly reset the vring[1], which left the vring in an inconsistent state after a stop/cont cycle. Resetting the VQ when receiving GET_VRING_BASE had the unintended side effect of setting the 'used' index to 0 after a driver change (e.g., after a reboot). [0] rust-vmm/vhost#180 (commit: 8a4ba9d0c5666075aa7f7e2e32ceeaf8d827f9da) [1] rust-vmm/vhost#161 (commit: 958cdec2b8741af77b7f90214ec9c97040bf5a8a)

germag requested review from eryugey, jiangliu, sboeuf, slp and stefano-garzarella as code owners May 22, 2023 15:59

stefano-garzarella approved these changes May 30, 2023

View reviewed changes

jiangliu force-pushed the vhost-vring_get_base-remove-reset branch from c1aa51e to 4d811df Compare June 26, 2023 18:51

jiangliu approved these changes Jun 26, 2023

View reviewed changes

jiangliu force-pushed the vhost-vring_get_base-remove-reset branch from 4d811df to bcb23f7 Compare June 28, 2023 14:49

stefano-garzarella merged commit 958cdec into rust-vmm:main Jun 28, 2023

germag mentioned this pull request Jul 3, 2023

vhost-user-backend: Release new version (fix GET_VRING_BASE behavior) #169

Closed

This was referenced Jul 3, 2023

vhost: check better if GET_VRING_BASE message should reset the vring #157

Closed

Xen/mmapv2 #160

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_vring_base should not reset the queue #161

get_vring_base should not reset the queue #161

germag commented May 22, 2023

germag commented May 22, 2023

stefano-garzarella left a comment

stefano-garzarella commented Jun 26, 2023

get_vring_base should not reset the queue #161

get_vring_base should not reset the queue #161

Conversation

germag commented May 22, 2023

Summary of the PR

Requirements

germag commented May 22, 2023

stefano-garzarella left a comment

Choose a reason for hiding this comment

stefano-garzarella commented Jun 26, 2023