Permalink
Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign up| Vhost-user Protocol | |
| =================== | |
| Copyright (c) 2014 Virtual Open Systems Sarl. | |
| This work is licensed under the terms of the GNU GPL, version 2 or later. | |
| See the COPYING file in the top-level directory. | |
| =================== | |
| This protocol is aiming to complement the ioctl interface used to control the | |
| vhost implementation in the Linux kernel. It implements the control plane needed | |
| to establish virtqueue sharing with a user space process on the same host. It | |
| uses communication over a Unix domain socket to share file descriptors in the | |
| ancillary data of the message. | |
| The protocol defines 2 sides of the communication, master and slave. Master is | |
| the application that shares its virtqueues, in our case QEMU. Slave is the | |
| consumer of the virtqueues. | |
| In the current implementation QEMU is the Master, and the Slave is intended to | |
| be a software Ethernet switch running in user space, such as Snabbswitch. | |
| Master and slave can be either a client (i.e. connecting) or server (listening) | |
| in the socket communication. | |
| Message Specification | |
| --------------------- | |
| Note that all numbers are in the machine native byte order. A vhost-user message | |
| consists of 3 header fields and a payload: | |
| ------------------------------------ | |
| | request | flags | size | payload | | |
| ------------------------------------ | |
| * Request: 32-bit type of the request | |
| * Flags: 32-bit bit field: | |
| - Lower 2 bits are the version (currently 0x01) | |
| - Bit 2 is the reply flag - needs to be sent on each reply from the slave | |
| - Bit 3 is the need_reply flag - see VHOST_USER_PROTOCOL_F_REPLY_ACK for | |
| details. | |
| * Size - 32-bit size of the payload | |
| Depending on the request type, payload can be: | |
| * A single 64-bit integer | |
| ------- | |
| | u64 | | |
| ------- | |
| u64: a 64-bit unsigned integer | |
| * A vring state description | |
| --------------- | |
| | index | num | | |
| --------------- | |
| Index: a 32-bit index | |
| Num: a 32-bit number | |
| * A vring address description | |
| -------------------------------------------------------------- | |
| | index | flags | size | descriptor | used | available | log | | |
| -------------------------------------------------------------- | |
| Index: a 32-bit vring index | |
| Flags: a 32-bit vring flags | |
| Descriptor: a 64-bit ring address of the vring descriptor table | |
| Used: a 64-bit ring address of the vring used ring | |
| Available: a 64-bit ring address of the vring available ring | |
| Log: a 64-bit guest address for logging | |
| Note that a ring address is an IOVA if VIRTIO_F_IOMMU_PLATFORM has been | |
| negotiated. Otherwise it is a user address. | |
| * Memory regions description | |
| --------------------------------------------------- | |
| | num regions | padding | region0 | ... | region7 | | |
| --------------------------------------------------- | |
| Num regions: a 32-bit number of regions | |
| Padding: 32-bit | |
| A region is: | |
| ----------------------------------------------------- | |
| | guest address | size | user address | mmap offset | | |
| ----------------------------------------------------- | |
| Guest address: a 64-bit guest address of the region | |
| Size: a 64-bit size | |
| User address: a 64-bit user address | |
| mmap offset: 64-bit offset where region starts in the mapped memory | |
| * Log description | |
| --------------------------- | |
| | log size | log offset | | |
| --------------------------- | |
| log size: size of area used for logging | |
| log offset: offset from start of supplied file descriptor | |
| where logging starts (i.e. where guest address 0 would be logged) | |
| * An IOTLB message | |
| --------------------------------------------------------- | |
| | iova | size | user address | permissions flags | type | | |
| --------------------------------------------------------- | |
| IOVA: a 64-bit I/O virtual address programmed by the guest | |
| Size: a 64-bit size | |
| User address: a 64-bit user address | |
| Permissions: an 8-bit value: | |
| - 0: No access | |
| - 1: Read access | |
| - 2: Write access | |
| - 3: Read/Write access | |
| Type: an 8-bit IOTLB message type: | |
| - 1: IOTLB miss | |
| - 2: IOTLB update | |
| - 3: IOTLB invalidate | |
| - 4: IOTLB access fail | |
| * Virtio device config space | |
| ----------------------------------- | |
| | offset | size | flags | payload | | |
| ----------------------------------- | |
| Offset: a 32-bit offset of virtio device's configuration space | |
| Size: a 32-bit configuration space access size in bytes | |
| Flags: a 32-bit value: | |
| - 0: Vhost master messages used for writeable fields | |
| - 1: Vhost master messages used for live migration | |
| Payload: Size bytes array holding the contents of the virtio | |
| device's configuration space | |
| * Vring area description | |
| ----------------------- | |
| | u64 | size | offset | | |
| ----------------------- | |
| u64: a 64-bit integer contains vring index and flags | |
| Size: a 64-bit size of this area | |
| Offset: a 64-bit offset of this area from the start of the | |
| supplied file descriptor | |
| In QEMU the vhost-user message is implemented with the following struct: | |
| typedef struct VhostUserMsg { | |
| VhostUserRequest request; | |
| uint32_t flags; | |
| uint32_t size; | |
| union { | |
| uint64_t u64; | |
| struct vhost_vring_state state; | |
| struct vhost_vring_addr addr; | |
| VhostUserMemory memory; | |
| VhostUserLog log; | |
| struct vhost_iotlb_msg iotlb; | |
| VhostUserConfig config; | |
| VhostUserVringArea area; | |
| }; | |
| } QEMU_PACKED VhostUserMsg; | |
| Communication | |
| ------------- | |
| The protocol for vhost-user is based on the existing implementation of vhost | |
| for the Linux Kernel. Most messages that can be sent via the Unix domain socket | |
| implementing vhost-user have an equivalent ioctl to the kernel implementation. | |
| The communication consists of master sending message requests and slave sending | |
| message replies. Most of the requests don't require replies. Here is a list of | |
| the ones that do: | |
| * VHOST_USER_GET_FEATURES | |
| * VHOST_USER_GET_PROTOCOL_FEATURES | |
| * VHOST_USER_GET_VRING_BASE | |
| * VHOST_USER_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD) | |
| [ Also see the section on REPLY_ACK protocol extension. ] | |
| There are several messages that the master sends with file descriptors passed | |
| in the ancillary data: | |
| * VHOST_USER_SET_MEM_TABLE | |
| * VHOST_USER_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD) | |
| * VHOST_USER_SET_LOG_FD | |
| * VHOST_USER_SET_VRING_KICK | |
| * VHOST_USER_SET_VRING_CALL | |
| * VHOST_USER_SET_VRING_ERR | |
| * VHOST_USER_SET_SLAVE_REQ_FD | |
| If Master is unable to send the full message or receives a wrong reply it will | |
| close the connection. An optional reconnection mechanism can be implemented. | |
| Any protocol extensions are gated by protocol feature bits, | |
| which allows full backwards compatibility on both master | |
| and slave. | |
| As older slaves don't support negotiating protocol features, | |
| a feature bit was dedicated for this purpose: | |
| #define VHOST_USER_F_PROTOCOL_FEATURES 30 | |
| Starting and stopping rings | |
| ---------------------- | |
| Client must only process each ring when it is started. | |
| Client must only pass data between the ring and the | |
| backend, when the ring is enabled. | |
| If ring is started but disabled, client must process the | |
| ring without talking to the backend. | |
| For example, for a networking device, in the disabled state | |
| client must not supply any new RX packets, but must process | |
| and discard any TX packets. | |
| If VHOST_USER_F_PROTOCOL_FEATURES has not been negotiated, the ring is initialized | |
| in an enabled state. | |
| If VHOST_USER_F_PROTOCOL_FEATURES has been negotiated, the ring is initialized | |
| in a disabled state. Client must not pass data to/from the backend until ring is enabled by | |
| VHOST_USER_SET_VRING_ENABLE with parameter 1, or after it has been disabled by | |
| VHOST_USER_SET_VRING_ENABLE with parameter 0. | |
| Each ring is initialized in a stopped state, client must not process it until | |
| ring is started, or after it has been stopped. | |
| Client must start ring upon receiving a kick (that is, detecting that file | |
| descriptor is readable) on the descriptor specified by | |
| VHOST_USER_SET_VRING_KICK, and stop ring upon receiving | |
| VHOST_USER_GET_VRING_BASE. | |
| While processing the rings (whether they are enabled or not), client must | |
| support changing some configuration aspects on the fly. | |
| Multiple queue support | |
| ---------------------- | |
| Multiple queue is treated as a protocol extension, hence the slave has to | |
| implement protocol features first. The multiple queues feature is supported | |
| only when the protocol feature VHOST_USER_PROTOCOL_F_MQ (bit 0) is set. | |
| The max number of queue pairs the slave supports can be queried with message | |
| VHOST_USER_GET_QUEUE_NUM. Master should stop when the number of | |
| requested queues is bigger than that. | |
| As all queues share one connection, the master uses a unique index for each | |
| queue in the sent message to identify a specified queue. One queue pair | |
| is enabled initially. More queues are enabled dynamically, by sending | |
| message VHOST_USER_SET_VRING_ENABLE. | |
| Migration | |
| --------- | |
| During live migration, the master may need to track the modifications | |
| the slave makes to the memory mapped regions. The client should mark | |
| the dirty pages in a log. Once it complies to this logging, it may | |
| declare the VHOST_F_LOG_ALL vhost feature. | |
| To start/stop logging of data/used ring writes, server may send messages | |
| VHOST_USER_SET_FEATURES with VHOST_F_LOG_ALL and VHOST_USER_SET_VRING_ADDR with | |
| VHOST_VRING_F_LOG in ring's flags set to 1/0, respectively. | |
| All the modifications to memory pointed by vring "descriptor" should | |
| be marked. Modifications to "used" vring should be marked if | |
| VHOST_VRING_F_LOG is part of ring's flags. | |
| Dirty pages are of size: | |
| #define VHOST_LOG_PAGE 0x1000 | |
| The log memory fd is provided in the ancillary data of | |
| VHOST_USER_SET_LOG_BASE message when the slave has | |
| VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol feature. | |
| The size of the log is supplied as part of VhostUserMsg | |
| which should be large enough to cover all known guest | |
| addresses. Log starts at the supplied offset in the | |
| supplied file descriptor. | |
| The log covers from address 0 to the maximum of guest | |
| regions. In pseudo-code, to mark page at "addr" as dirty: | |
| page = addr / VHOST_LOG_PAGE | |
| log[page / 8] |= 1 << page % 8 | |
| Where addr is the guest physical address. | |
| Use atomic operations, as the log may be concurrently manipulated. | |
| Note that when logging modifications to the used ring (when VHOST_VRING_F_LOG | |
| is set for this ring), log_guest_addr should be used to calculate the log | |
| offset: the write to first byte of the used ring is logged at this offset from | |
| log start. Also note that this value might be outside the legal guest physical | |
| address range (i.e. does not have to be covered by the VhostUserMemory table), | |
| but the bit offset of the last byte of the ring must fall within | |
| the size supplied by VhostUserLog. | |
| VHOST_USER_SET_LOG_FD is an optional message with an eventfd in | |
| ancillary data, it may be used to inform the master that the log has | |
| been modified. | |
| Once the source has finished migration, rings will be stopped by | |
| the source. No further update must be done before rings are | |
| restarted. | |
| In postcopy migration the slave is started before all the memory has been | |
| received from the source host, and care must be taken to avoid accessing pages | |
| that have yet to be received. The slave opens a 'userfault'-fd and registers | |
| the memory with it; this fd is then passed back over to the master. | |
| The master services requests on the userfaultfd for pages that are accessed | |
| and when the page is available it performs WAKE ioctl's on the userfaultfd | |
| to wake the stalled slave. The client indicates support for this via the | |
| VHOST_USER_PROTOCOL_F_PAGEFAULT feature. | |
| Memory access | |
| ------------- | |
| The master sends a list of vhost memory regions to the slave using the | |
| VHOST_USER_SET_MEM_TABLE message. Each region has two base addresses: a guest | |
| address and a user address. | |
| Messages contain guest addresses and/or user addresses to reference locations | |
| within the shared memory. The mapping of these addresses works as follows. | |
| User addresses map to the vhost memory region containing that user address. | |
| When the VIRTIO_F_IOMMU_PLATFORM feature has not been negotiated: | |
| * Guest addresses map to the vhost memory region containing that guest | |
| address. | |
| When the VIRTIO_F_IOMMU_PLATFORM feature has been negotiated: | |
| * Guest addresses are also called I/O virtual addresses (IOVAs). They are | |
| translated to user addresses via the IOTLB. | |
| * The vhost memory region guest address is not used. | |
| IOMMU support | |
| ------------- | |
| When the VIRTIO_F_IOMMU_PLATFORM feature has been negotiated, the master | |
| sends IOTLB entries update & invalidation by sending VHOST_USER_IOTLB_MSG | |
| requests to the slave with a struct vhost_iotlb_msg as payload. For update | |
| events, the iotlb payload has to be filled with the update message type (2), | |
| the I/O virtual address, the size, the user virtual address, and the | |
| permissions flags. Addresses and size must be within vhost memory regions set | |
| via the VHOST_USER_SET_MEM_TABLE request. For invalidation events, the iotlb | |
| payload has to be filled with the invalidation message type (3), the I/O virtual | |
| address and the size. On success, the slave is expected to reply with a zero | |
| payload, non-zero otherwise. | |
| The slave relies on the slave communcation channel (see "Slave communication" | |
| section below) to send IOTLB miss and access failure events, by sending | |
| VHOST_USER_SLAVE_IOTLB_MSG requests to the master with a struct vhost_iotlb_msg | |
| as payload. For miss events, the iotlb payload has to be filled with the miss | |
| message type (1), the I/O virtual address and the permissions flags. For access | |
| failure event, the iotlb payload has to be filled with the access failure | |
| message type (4), the I/O virtual address and the permissions flags. | |
| For synchronization purpose, the slave may rely on the reply-ack feature, | |
| so the master may send a reply when operation is completed if the reply-ack | |
| feature is negotiated and slaves requests a reply. For miss events, completed | |
| operation means either master sent an update message containing the IOTLB entry | |
| containing requested address and permission, or master sent nothing if the IOTLB | |
| miss message is invalid (invalid IOVA or permission). | |
| The master isn't expected to take the initiative to send IOTLB update messages, | |
| as the slave sends IOTLB miss messages for the guest virtual memory areas it | |
| needs to access. | |
| Slave communication | |
| ------------------- | |
| An optional communication channel is provided if the slave declares | |
| VHOST_USER_PROTOCOL_F_SLAVE_REQ protocol feature, to allow the slave to make | |
| requests to the master. | |
| The fd is provided via VHOST_USER_SET_SLAVE_REQ_FD ancillary data. | |
| A slave may then send VHOST_USER_SLAVE_* messages to the master | |
| using this fd communication channel. | |
| If VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD protocol feature is negotiated, | |
| slave can send file descriptors (at most 8 descriptors in each message) | |
| to master via ancillary data using this fd communication channel. | |
| Protocol features | |
| ----------------- | |
| #define VHOST_USER_PROTOCOL_F_MQ 0 | |
| #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 | |
| #define VHOST_USER_PROTOCOL_F_RARP 2 | |
| #define VHOST_USER_PROTOCOL_F_REPLY_ACK 3 | |
| #define VHOST_USER_PROTOCOL_F_MTU 4 | |
| #define VHOST_USER_PROTOCOL_F_SLAVE_REQ 5 | |
| #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN 6 | |
| #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7 | |
| #define VHOST_USER_PROTOCOL_F_PAGEFAULT 8 | |
| #define VHOST_USER_PROTOCOL_F_CONFIG 9 | |
| #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD 10 | |
| #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11 | |
| Master message types | |
| -------------------- | |
| * VHOST_USER_GET_FEATURES | |
| Id: 1 | |
| Equivalent ioctl: VHOST_GET_FEATURES | |
| Master payload: N/A | |
| Slave payload: u64 | |
| Get from the underlying vhost implementation the features bitmask. | |
| Feature bit VHOST_USER_F_PROTOCOL_FEATURES signals slave support for | |
| VHOST_USER_GET_PROTOCOL_FEATURES and VHOST_USER_SET_PROTOCOL_FEATURES. | |
| * VHOST_USER_SET_FEATURES | |
| Id: 2 | |
| Ioctl: VHOST_SET_FEATURES | |
| Master payload: u64 | |
| Enable features in the underlying vhost implementation using a bitmask. | |
| Feature bit VHOST_USER_F_PROTOCOL_FEATURES signals slave support for | |
| VHOST_USER_GET_PROTOCOL_FEATURES and VHOST_USER_SET_PROTOCOL_FEATURES. | |
| * VHOST_USER_GET_PROTOCOL_FEATURES | |
| Id: 15 | |
| Equivalent ioctl: VHOST_GET_FEATURES | |
| Master payload: N/A | |
| Slave payload: u64 | |
| Get the protocol feature bitmask from the underlying vhost implementation. | |
| Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in | |
| VHOST_USER_GET_FEATURES. | |
| Note: slave that reported VHOST_USER_F_PROTOCOL_FEATURES must support | |
| this message even before VHOST_USER_SET_FEATURES was called. | |
| * VHOST_USER_SET_PROTOCOL_FEATURES | |
| Id: 16 | |
| Ioctl: VHOST_SET_FEATURES | |
| Master payload: u64 | |
| Enable protocol features in the underlying vhost implementation. | |
| Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in | |
| VHOST_USER_GET_FEATURES. | |
| Note: slave that reported VHOST_USER_F_PROTOCOL_FEATURES must support | |
| this message even before VHOST_USER_SET_FEATURES was called. | |
| * VHOST_USER_SET_OWNER | |
| Id: 3 | |
| Equivalent ioctl: VHOST_SET_OWNER | |
| Master payload: N/A | |
| Issued when a new connection is established. It sets the current Master | |
| as an owner of the session. This can be used on the Slave as a | |
| "session start" flag. | |
| * VHOST_USER_RESET_OWNER | |
| Id: 4 | |
| Master payload: N/A | |
| This is no longer used. Used to be sent to request disabling | |
| all rings, but some clients interpreted it to also discard | |
| connection state (this interpretation would lead to bugs). | |
| It is recommended that clients either ignore this message, | |
| or use it to disable all rings. | |
| * VHOST_USER_SET_MEM_TABLE | |
| Id: 5 | |
| Equivalent ioctl: VHOST_SET_MEM_TABLE | |
| Master payload: memory regions description | |
| Slave payload: (postcopy only) memory regions description | |
| Sets the memory map regions on the slave so it can translate the vring | |
| addresses. In the ancillary data there is an array of file descriptors | |
| for each memory mapped region. The size and ordering of the fds matches | |
| the number and ordering of memory regions. | |
| When VHOST_USER_POSTCOPY_LISTEN has been received, SET_MEM_TABLE replies with | |
| the bases of the memory mapped regions to the master. The slave must | |
| have mmap'd the regions but not yet accessed them and should not yet generate | |
| a userfault event. Note NEED_REPLY_MASK is not set in this case. | |
| QEMU will then reply back to the list of mappings with an empty | |
| VHOST_USER_SET_MEM_TABLE as an acknowledgment; only upon reception of this | |
| message may the guest start accessing the memory and generating faults. | |
| * VHOST_USER_SET_LOG_BASE | |
| Id: 6 | |
| Equivalent ioctl: VHOST_SET_LOG_BASE | |
| Master payload: u64 | |
| Slave payload: N/A | |
| Sets logging shared memory space. | |
| When slave has VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol | |
| feature, the log memory fd is provided in the ancillary data of | |
| VHOST_USER_SET_LOG_BASE message, the size and offset of shared | |
| memory area provided in the message. | |
| * VHOST_USER_SET_LOG_FD | |
| Id: 7 | |
| Equivalent ioctl: VHOST_SET_LOG_FD | |
| Master payload: N/A | |
| Sets the logging file descriptor, which is passed as ancillary data. | |
| * VHOST_USER_SET_VRING_NUM | |
| Id: 8 | |
| Equivalent ioctl: VHOST_SET_VRING_NUM | |
| Master payload: vring state description | |
| Set the size of the queue. | |
| * VHOST_USER_SET_VRING_ADDR | |
| Id: 9 | |
| Equivalent ioctl: VHOST_SET_VRING_ADDR | |
| Master payload: vring address description | |
| Slave payload: N/A | |
| Sets the addresses of the different aspects of the vring. | |
| * VHOST_USER_SET_VRING_BASE | |
| Id: 10 | |
| Equivalent ioctl: VHOST_SET_VRING_BASE | |
| Master payload: vring state description | |
| Sets the base offset in the available vring. | |
| * VHOST_USER_GET_VRING_BASE | |
| Id: 11 | |
| Equivalent ioctl: VHOST_USER_GET_VRING_BASE | |
| Master payload: vring state description | |
| Slave payload: vring state description | |
| Get the available vring base offset. | |
| * VHOST_USER_SET_VRING_KICK | |
| Id: 12 | |
| Equivalent ioctl: VHOST_SET_VRING_KICK | |
| Master payload: u64 | |
| Set the event file descriptor for adding buffers to the vring. It | |
| is passed in the ancillary data. | |
| Bits (0-7) of the payload contain the vring index. Bit 8 is the | |
| invalid FD flag. This flag is set when there is no file descriptor | |
| in the ancillary data. This signals that polling should be used | |
| instead of waiting for a kick. | |
| * VHOST_USER_SET_VRING_CALL | |
| Id: 13 | |
| Equivalent ioctl: VHOST_SET_VRING_CALL | |
| Master payload: u64 | |
| Set the event file descriptor to signal when buffers are used. It | |
| is passed in the ancillary data. | |
| Bits (0-7) of the payload contain the vring index. Bit 8 is the | |
| invalid FD flag. This flag is set when there is no file descriptor | |
| in the ancillary data. This signals that polling will be used | |
| instead of waiting for the call. | |
| * VHOST_USER_SET_VRING_ERR | |
| Id: 14 | |
| Equivalent ioctl: VHOST_SET_VRING_ERR | |
| Master payload: u64 | |
| Set the event file descriptor to signal when error occurs. It | |
| is passed in the ancillary data. | |
| Bits (0-7) of the payload contain the vring index. Bit 8 is the | |
| invalid FD flag. This flag is set when there is no file descriptor | |
| in the ancillary data. | |
| * VHOST_USER_GET_QUEUE_NUM | |
| Id: 17 | |
| Equivalent ioctl: N/A | |
| Master payload: N/A | |
| Slave payload: u64 | |
| Query how many queues the backend supports. This request should be | |
| sent only when VHOST_USER_PROTOCOL_F_MQ is set in queried protocol | |
| features by VHOST_USER_GET_PROTOCOL_FEATURES. | |
| * VHOST_USER_SET_VRING_ENABLE | |
| Id: 18 | |
| Equivalent ioctl: N/A | |
| Master payload: vring state description | |
| Signal slave to enable or disable corresponding vring. | |
| This request should be sent only when VHOST_USER_F_PROTOCOL_FEATURES | |
| has been negotiated. | |
| * VHOST_USER_SEND_RARP | |
| Id: 19 | |
| Equivalent ioctl: N/A | |
| Master payload: u64 | |
| Ask vhost user backend to broadcast a fake RARP to notify the migration | |
| is terminated for guest that does not support GUEST_ANNOUNCE. | |
| Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in | |
| VHOST_USER_GET_FEATURES and protocol feature bit VHOST_USER_PROTOCOL_F_RARP | |
| is present in VHOST_USER_GET_PROTOCOL_FEATURES. | |
| The first 6 bytes of the payload contain the mac address of the guest to | |
| allow the vhost user backend to construct and broadcast the fake RARP. | |
| * VHOST_USER_NET_SET_MTU | |
| Id: 20 | |
| Equivalent ioctl: N/A | |
| Master payload: u64 | |
| Set host MTU value exposed to the guest. | |
| This request should be sent only when VIRTIO_NET_F_MTU feature has been | |
| successfully negotiated, VHOST_USER_F_PROTOCOL_FEATURES is present in | |
| VHOST_USER_GET_FEATURES and protocol feature bit | |
| VHOST_USER_PROTOCOL_F_NET_MTU is present in | |
| VHOST_USER_GET_PROTOCOL_FEATURES. | |
| If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, slave must respond | |
| with zero in case the specified MTU is valid, or non-zero otherwise. | |
| * VHOST_USER_SET_SLAVE_REQ_FD | |
| Id: 21 | |
| Equivalent ioctl: N/A | |
| Master payload: N/A | |
| Set the socket file descriptor for slave initiated requests. It is passed | |
| in the ancillary data. | |
| This request should be sent only when VHOST_USER_F_PROTOCOL_FEATURES | |
| has been negotiated, and protocol feature bit VHOST_USER_PROTOCOL_F_SLAVE_REQ | |
| bit is present in VHOST_USER_GET_PROTOCOL_FEATURES. | |
| If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, slave must respond | |
| with zero for success, non-zero otherwise. | |
| * VHOST_USER_IOTLB_MSG | |
| Id: 22 | |
| Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type) | |
| Master payload: struct vhost_iotlb_msg | |
| Slave payload: u64 | |
| Send IOTLB messages with struct vhost_iotlb_msg as payload. | |
| Master sends such requests to update and invalidate entries in the device | |
| IOTLB. The slave has to acknowledge the request with sending zero as u64 | |
| payload for success, non-zero otherwise. | |
| This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature | |
| has been successfully negotiated. | |
| * VHOST_USER_SET_VRING_ENDIAN | |
| Id: 23 | |
| Equivalent ioctl: VHOST_SET_VRING_ENDIAN | |
| Master payload: vring state description | |
| Set the endianness of a VQ for legacy devices. Little-endian is indicated | |
| with state.num set to 0 and big-endian is indicated with state.num set | |
| to 1. Other values are invalid. | |
| This request should be sent only when VHOST_USER_PROTOCOL_F_CROSS_ENDIAN | |
| has been negotiated. | |
| Backends that negotiated this feature should handle both endiannesses | |
| and expect this message once (per VQ) during device configuration | |
| (ie. before the master starts the VQ). | |
| * VHOST_USER_GET_CONFIG | |
| Id: 24 | |
| Equivalent ioctl: N/A | |
| Master payload: virtio device config space | |
| Slave payload: virtio device config space | |
| When VHOST_USER_PROTOCOL_F_CONFIG is negotiated, this message is | |
| submitted by the vhost-user master to fetch the contents of the virtio | |
| device configuration space, vhost-user slave's payload size MUST match | |
| master's request, vhost-user slave uses zero length of payload to | |
| indicate an error to vhost-user master. The vhost-user master may | |
| cache the contents to avoid repeated VHOST_USER_GET_CONFIG calls. | |
| * VHOST_USER_SET_CONFIG | |
| Id: 25 | |
| Equivalent ioctl: N/A | |
| Master payload: virtio device config space | |
| Slave payload: N/A | |
| When VHOST_USER_PROTOCOL_F_CONFIG is negotiated, this message is | |
| submitted by the vhost-user master when the Guest changes the virtio | |
| device configuration space and also can be used for live migration | |
| on the destination host. The vhost-user slave must check the flags | |
| field, and slaves MUST NOT accept SET_CONFIG for read-only | |
| configuration space fields unless the live migration bit is set. | |
| * VHOST_USER_CREATE_CRYPTO_SESSION | |
| Id: 26 | |
| Equivalent ioctl: N/A | |
| Master payload: crypto session description | |
| Slave payload: crypto session description | |
| Create a session for crypto operation. The server side must return the | |
| session id, 0 or positive for success, negative for failure. | |
| This request should be sent only when VHOST_USER_PROTOCOL_F_CRYPTO_SESSION | |
| feature has been successfully negotiated. | |
| It's a required feature for crypto devices. | |
| * VHOST_USER_CLOSE_CRYPTO_SESSION | |
| Id: 27 | |
| Equivalent ioctl: N/A | |
| Master payload: u64 | |
| Close a session for crypto operation which was previously | |
| created by VHOST_USER_CREATE_CRYPTO_SESSION. | |
| This request should be sent only when VHOST_USER_PROTOCOL_F_CRYPTO_SESSION | |
| feature has been successfully negotiated. | |
| It's a required feature for crypto devices. | |
| * VHOST_USER_POSTCOPY_ADVISE | |
| Id: 28 | |
| Master payload: N/A | |
| Slave payload: userfault fd | |
| When VHOST_USER_PROTOCOL_F_PAGEFAULT is supported, the | |
| master advises slave that a migration with postcopy enabled is underway, | |
| the slave must open a userfaultfd for later use. | |
| Note that at this stage the migration is still in precopy mode. | |
| * VHOST_USER_POSTCOPY_LISTEN | |
| Id: 29 | |
| Master payload: N/A | |
| Master advises slave that a transition to postcopy mode has happened. | |
| The slave must ensure that shared memory is registered with userfaultfd | |
| to cause faulting of non-present pages. | |
| This is always sent sometime after a VHOST_USER_POSTCOPY_ADVISE, and | |
| thus only when VHOST_USER_PROTOCOL_F_PAGEFAULT is supported. | |
| * VHOST_USER_POSTCOPY_END | |
| Id: 30 | |
| Slave payload: u64 | |
| Master advises that postcopy migration has now completed. The | |
| slave must disable the userfaultfd. The response is an acknowledgement | |
| only. | |
| When VHOST_USER_PROTOCOL_F_PAGEFAULT is supported, this message | |
| is sent at the end of the migration, after VHOST_USER_POSTCOPY_LISTEN | |
| was previously sent. | |
| The value returned is an error indication; 0 is success. | |
| Slave message types | |
| ------------------- | |
| * VHOST_USER_SLAVE_IOTLB_MSG | |
| Id: 1 | |
| Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type) | |
| Slave payload: struct vhost_iotlb_msg | |
| Master payload: N/A | |
| Send IOTLB messages with struct vhost_iotlb_msg as payload. | |
| Slave sends such requests to notify of an IOTLB miss, or an IOTLB | |
| access failure. If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, | |
| and slave set the VHOST_USER_NEED_REPLY flag, master must respond with | |
| zero when operation is successfully completed, or non-zero otherwise. | |
| This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature | |
| has been successfully negotiated. | |
| * VHOST_USER_SLAVE_CONFIG_CHANGE_MSG | |
| Id: 2 | |
| Equivalent ioctl: N/A | |
| Slave payload: N/A | |
| Master payload: N/A | |
| When VHOST_USER_PROTOCOL_F_CONFIG is negotiated, vhost-user slave sends | |
| such messages to notify that the virtio device's configuration space has | |
| changed, for those host devices which can support such feature, host | |
| driver can send VHOST_USER_GET_CONFIG message to slave to get the latest | |
| content. If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, and slave set | |
| the VHOST_USER_NEED_REPLY flag, master must respond with zero when | |
| operation is successfully completed, or non-zero otherwise. | |
| * VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG | |
| Id: 3 | |
| Equivalent ioctl: N/A | |
| Slave payload: vring area description | |
| Master payload: N/A | |
| Sets host notifier for a specified queue. The queue index is contained | |
| in the u64 field of the vring area description. The host notifier is | |
| described by the file descriptor (typically it's a VFIO device fd) which | |
| is passed as ancillary data and the size (which is mmap size and should | |
| be the same as host page size) and offset (which is mmap offset) carried | |
| in the vring area description. QEMU can mmap the file descriptor based | |
| on the size and offset to get a memory range. Registering a host notifier | |
| means mapping this memory range to the VM as the specified queue's notify | |
| MMIO region. Slave sends this request to tell QEMU to de-register the | |
| existing notifier if any and register the new notifier if the request is | |
| sent with a file descriptor. | |
| This request should be sent only when VHOST_USER_PROTOCOL_F_HOST_NOTIFIER | |
| protocol feature has been successfully negotiated. | |
| VHOST_USER_PROTOCOL_F_REPLY_ACK: | |
| ------------------------------- | |
| The original vhost-user specification only demands replies for certain | |
| commands. This differs from the vhost protocol implementation where commands | |
| are sent over an ioctl() call and block until the client has completed. | |
| With this protocol extension negotiated, the sender (QEMU) can set the | |
| "need_reply" [Bit 3] flag to any command. This indicates that | |
| the client MUST respond with a Payload VhostUserMsg indicating success or | |
| failure. The payload should be set to zero on success or non-zero on failure, | |
| unless the message already has an explicit reply body. | |
| The response payload gives QEMU a deterministic indication of the result | |
| of the command. Today, QEMU is expected to terminate the main vhost-user | |
| loop upon receiving such errors. In future, qemu could be taught to be more | |
| resilient for selective requests. | |
| For the message types that already solicit a reply from the client, the | |
| presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings | |
| no behavioural change. (See the 'Communication' section for details.) |