Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_vring_base should not reset the queue #161

Merged

Conversation

germag
Copy link
Collaborator

@germag germag commented May 22, 2023

Summary of the PR

The spec specifies that on receiving GET_VRING_BASE the backend should stop the vring, but not that it must be reset. This is intended for VHOST_USER_RESET_DEVICE, also in this case the spec makes a difference between stopping and disabling the ring.

The spec also doesn't forbid to send VHOST_USER_SET_VRING_ENABLE to enable the vring after receiving GET_VRING_BASE or sending more GET_VRING_BASE messages, which would always respond 0. Moreover, qemu[0] doesn't reset the vring either.

[0] https://github.com/qemu/qemu/blob/792f77f376adef944f9a03e601f6ad90c2f891b2/subprojects/libvhost-user/libvhost-user.c#L1175

Note:
I just added blank lines around self.vrings[index as usize].set_queue_ready(false); because the first time I worked on this (#154) I didn't realize that the ring was already being disabled.

Requirements

Before submitting your PR, please make sure you addressed the following
requirements:

  • All commits in this PR are signed (with git commit -s), and the commit
    message has max 60 characters for the summary and max 75 characters for each
    description line.
  • All added/changed functionality has a corresponding unit/integration
    test.
  • All added/changed public-facing functionality has entries in the "Upcoming
    Release" section of CHANGELOG.md (if no such section exists, please create one).
  • Any newly added unsafe code is properly documented.

@germag
Copy link
Collaborator Author

germag commented May 22, 2023

In my previous PR (#154) I was wrong about DPDK[1], at the end it calls vring_invalidate(dev, vq); that does something similar to Queue::reset(), however, I still think it is not the right place to call reset, I think it is incorrectly assumed that GET_VRING_BASE will always be the last message.

[1] https://github.com/DPDK/dpdk/blob/d03446724972d2a1bb645ce7f3e64f5ef0203d61/lib/vhost/vhost_user.c#L2133

Copy link
Member

@stefano-garzarella stefano-garzarella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my previous PR (#154) I was wrong about DPDK[1], at the end it calls vring_invalidate(dev, vq); that does something similar to Queue::reset(),

Thanks for checking!

however, I still think it is not the right place to call reset, I think it is incorrectly assumed that GET_VRING_BASE will always be the last message.

Yep, I agree. I think multiple calls of GET_VRING_BASE should return the same index.
I think we should support VHOST_USER_RESET_DEVICE, in the meantime maybe a well place could be VHOST_USER_RESET_OWNER, which itself should be a NOP, but I see in QEMU that it is used to reset the device when VHOST_USER_PROTOCOL_F_RESET_DEVICE has not been negotiated.

@stefano-garzarella
Copy link
Member

@jiangliu @sboeuf can you take a look?
And eventually test with your code to be sure we aren't breaking anything?

@jiangliu jiangliu force-pushed the vhost-vring_get_base-remove-reset branch from c1aa51e to 4d811df Compare June 26, 2023 18:51
The spec specifies that on receiving `GET_VRING_BASE` the backend should
stop the vring, but not that it must be reset. This is intended for
`VHOST_USER_RESET_DEVICE`, also in this case the spec makes a
difference between stopping and disabling the ring.

The spec also doesn't forbid to send `VHOST_USER_SET_VRING_ENABLE` to
enable the vring after receiving `GET_VRING_BASE` or sending more
`GET_VRING_BASE` messages, which would always respond 0. Moreover, qemu
doesn't reset the vring either.

Signed-off-by: German Maglione <gmaglione@redhat.com>
@jiangliu jiangliu force-pushed the vhost-vring_get_base-remove-reset branch from 4d811df to bcb23f7 Compare June 28, 2023 14:49
@stefano-garzarella stefano-garzarella merged commit 958cdec into rust-vmm:main Jun 28, 2023
23 checks passed
birktj pushed a commit to birktj/virtiofsd that referenced this pull request Jan 5, 2024
The new version of vhost-user-backend fetches the used idx from the
guest[0], without this fix the guest hangs after a reboot, because
the used idx is not reset.

This error was introduced when we fixed GET_VRING_BASE to incorrectly
reset the vring[1], which left the vring in an inconsistent state after
a stop/cont cycle. Resetting the VQ when receiving GET_VRING_BASE had
the unintended side effect of setting the 'used' index to 0 after a
driver change (e.g., after a reboot).

[0] rust-vmm/vhost#180
(commit: 8a4ba9d0c5666075aa7f7e2e32ceeaf8d827f9da)

[1] rust-vmm/vhost#161
(commit: 958cdec2b8741af77b7f90214ec9c97040bf5a8a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants