Skip to content

Commit

Permalink
ivshmem: Split ivshmem-plain, ivshmem-doorbell off ivshmem
Browse files Browse the repository at this point in the history
ivshmem can be configured with and without interrupt capability
(a.k.a. "doorbell").  The two configurations have largely disjoint
options, which makes for a confusing (and badly checked) user
interface.  Moreover, the device can't tell the guest whether its
doorbell is enabled.

Create two new device models ivshmem-plain and ivshmem-doorbell, and
deprecate the old one.

Changes from ivshmem:

* PCI revision is 1 instead of 0.  The new revision is fully backwards
  compatible for guests.  Guests may elect to require at least
  revision 1 to make sure they're not exposed to the funny "no shared
  memory, yet" state.

* Property "role" replaced by "master".  role=master becomes
  master=on, role=peer becomes master=off.  Default is off instead of
  auto.

* Property "use64" is gone.  The new devices always have 64 bit BARs.

Changes from ivshmem to ivshmem-plain:

* The Interrupt Pin register in PCI config space is zero (does not use
  an interrupt pin) instead of one (uses INTA).

* Property "x-memdev" is renamed to "memdev".

* Properties "shm" and "size" are gone.  Use property "memdev"
  instead.

* Property "msi" is gone.  The new device can't have MSI-X capability.
  It can't interrupt anyway.

* Properties "ioeventfd" and "vectors" are gone.  They're meaningless
  without interrupts anyway.

Changes from ivshmem to ivshmem-doorbell:

* Property "msi" is gone.  The new device always has MSI-X capability.

* Property "ioeventfd" defaults to on instead of off.

* Property "size" is gone.  The new device can only map all the shared
  memory received from the server.

Guests can easily find out whether the device is configured for
interrupts by checking for MSI-X capability.

Note: some code added in sub-optimal places to make the diff easier to
review.  The next commit will move it to more sensible places.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-37-git-send-email-armbru@redhat.com>
  • Loading branch information
Markus Armbruster committed Mar 21, 2016
1 parent 2a845da commit 5400c02
Show file tree
Hide file tree
Showing 4 changed files with 304 additions and 136 deletions.
66 changes: 35 additions & 31 deletions docs/specs/ivshmem-spec.txt
Expand Up @@ -17,9 +17,10 @@ get interrupted by its peers.

There are two basic configurations:

- Just shared memory: -device ivshmem,shm=NAME,...
- Just shared memory: -device ivshmem-plain,memdev=HMB,...

This uses shared memory object NAME.
This uses host memory backend HMB. It should have option "share"
set.

- Shared memory plus interrupts: -device ivshmem,chardev=CHR,vectors=N,...

Expand All @@ -30,24 +31,24 @@ There are two basic configurations:
Each peer gets assigned a unique ID by the server. IDs must be
between 0 and 65535.

Interrupts are message-signaled by default (MSI-X). With msi=off
the device has no MSI-X capability, and uses legacy INTx instead.
vectors=N configures the number of vectors to use.
Interrupts are message-signaled (MSI-X). vectors=N configures the
number of vectors to use.

For more details on ivshmem device properties, see The QEMU Emulator
User Documentation (qemu-doc.*).


== The ivshmem PCI device's guest interface ==

The device has vendor ID 1af4, device ID 1110, revision 0.
The device has vendor ID 1af4, device ID 1110, revision 1. Before
QEMU 2.6.0, it had revision 0.

=== PCI BARs ===

The ivshmem PCI device has two or three BARs:

- BAR0 holds device registers (256 Byte MMIO)
- BAR1 holds MSI-X table and PBA (only when using MSI-X)
- BAR1 holds MSI-X table and PBA (only ivshmem-doorbell)
- BAR2 maps the shared memory object

There are two ways to use this device:
Expand All @@ -58,29 +59,32 @@ There are two ways to use this device:
user space (see http://dpdk.org/browse/memnic).

- If you additionally need the capability for peers to interrupt each
other, you need BAR0 and, if using MSI-X, BAR1. You will most
likely want to write a kernel driver to handle interrupts. Requires
the device to be configured for interrupts, obviously.
other, you need BAR0 and BAR1. You will most likely want to write a
kernel driver to handle interrupts. Requires the device to be
configured for interrupts, obviously.

Before QEMU 2.6.0, BAR2 can initially be invalid if the device is
configured for interrupts. It becomes safely accessible only after
the ivshmem server provided the shared memory. Guest software should
wait for the IVPosition register (described below) to become
non-negative before accessing BAR2.
the ivshmem server provided the shared memory. These devices have PCI
revision 0 rather than 1. Guest software should wait for the
IVPosition register (described below) to become non-negative before
accessing BAR2.

The device is not capable to tell guest software whether it is
configured for interrupts.
Revision 0 of the device is not capable to tell guest software whether
it is configured for interrupts.

=== PCI device registers ===

BAR 0 contains the following registers:

Offset Size Access On reset Function
0 4 read/write 0 Interrupt Mask
bit 0: peer interrupt
bit 0: peer interrupt (rev 0)
reserved (rev 1)
bit 1..31: reserved
4 4 read/write 0 Interrupt Status
bit 0: peer interrupt
bit 0: peer interrupt (rev 0)
reserved (rev 1)
bit 1..31: reserved
8 4 read-only 0 or ID IVPosition
12 4 write-only N/A Doorbell
Expand All @@ -92,18 +96,18 @@ Software should only access the registers as specified in column
"Access". Reserved bits should be ignored on read, and preserved on
write.

Interrupt Status and Mask Register together control the legacy INTx
interrupt when the device has no MSI-X capability: INTx is asserted
when the bit-wise AND of Status and Mask is non-zero and the device
has no MSI-X capability. Interrupt Status Register bit 0 becomes 1
when an interrupt request from a peer is received. Reading the
register clears it.
In revision 0 of the device, Interrupt Status and Mask Register
together control the legacy INTx interrupt when the device has no
MSI-X capability: INTx is asserted when the bit-wise AND of Status and
Mask is non-zero and the device has no MSI-X capability. Interrupt
Status Register bit 0 becomes 1 when an interrupt request from a peer
is received. Reading the register clears it.

IVPosition Register: if the device is not configured for interrupts,
this is zero. Else, it is the device's ID (between 0 and 65535).

Before QEMU 2.6.0, the register may read -1 for a short while after
reset.
reset. These devices have PCI revision 0 rather than 1.

There is no good way for software to find out whether the device is
configured for interrupts. A positive IVPosition means interrupts,
Expand All @@ -124,14 +128,14 @@ interrupt vectors connected, the write is ignored. The device is not
capable to tell guest software what peers are connected, or how many
interrupt vectors are connected.

If the peer doesn't use MSI-X, its Interrupt Status register is set to
1. This asserts INTx unless masked by the Interrupt Mask register.
The device is not capable to communicate the interrupt vector to guest
software then.
The peer's interrupt for this vector then becomes pending. There is
no way for software to clear the pending bit, and a polling mode of
operation is therefore impossible.

If the peer uses MSI-X, the interrupt for this vector becomes pending.
There is no way for software to clear the pending bit, and a polling
mode of operation is therefore impossible with MSI-X.
If the peer is a revision 0 device without MSI-X capability, its
Interrupt Status register is set to 1. This asserts INTx unless
masked by the Interrupt Mask register. The device is not capable to
communicate the interrupt vector to guest software then.

With multiple MSI-X vectors, different vectors can be used to indicate
different events have occurred. The semantics of interrupt vectors
Expand Down

0 comments on commit 5400c02

Please sign in to comment.