Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/m…
…st/qemu into staging

virtio,pc,pci: fixes, features, cleanups

CXL volatile memory support
More memslots for vhost-user on x86 and ARM.
vIOMMU support for vhost-vdpa
pcie-to-pci bridge can now be compiled out
MADT revision bumped to 3
Fixes, cleanups all over the place.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmRniWoPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpN4MH/RqdvHmujrjvjzXbbN/gq87Njp+kQLKEooIE
# ZkqdNaVUE6vjCH8iU+chjsxt4VSquSjOL9CWWrYefEIeqCFLWsuXSAY0VDAbY67x
# +aes51tTYILVsx7fbb+T5mJKRgVuWW4C5KaGeQ1djSexy42nvplZUJdIJUhZr0t9
# dzzOsD+mezHS7Xu2QOzSfl5QQRuOVVJnjJXkqJG/yRvHrZM5aTolatr/X7jNGedm
# 4oyMsVMaAcQ+dnEQigRJodf/MpFfs9DfNZAH55VwwQWsNT0t0ueD0xigR203jjaE
# mJJJipAqetFax2JjC7QMXWf+LR36BnL/0/xH+x/BWb0FI42wr0I=
# =ajmR
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 19 May 2023 07:36:26 AM PDT
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [undefined]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (40 commits)
  hw/i386/pc: No need for rtc_state to be an out-parameter
  hw/i386/pc: Create RTC controllers in south bridges
  hw/cxl: Introduce cxl_device_get_timestamp() utility function
  hw/cxl: rename mailbox return code type from ret_code to CXLRetCode
  hw/pci-bridge: make building pcie-to-pci bridge configurable
  virtio-pci: add handling of PCI ATS and Device-TLB enable/disable
  hw/pci-host/pam: Make init_pam() usage more readable
  hw/i386/pc: Initialize ram_memory variable directly
  hw/i386/pc_{q35,piix}: Minimize usage of get_system_memory()
  hw/i386/pc_{q35,piix}: Reuse MachineClass::desc as SMB product name
  hw/i386/pc_q35: Reuse machine parameter
  hw/pci-host/q35: Inline sysbus_add_io()
  hw/pci-host/i440fx: Inline sysbus_add_io()
  vhost-vdpa: Add support for vIOMMU.
  vhost-vdpa: Add check for full 64-bit in region delete
  vhost_vdpa: fix the input in trace_vhost_vdpa_listener_region_del()
  vhost: expose function vhost_dev_has_iommu()
  virtio-crypto: fix NULL pointer dereference in virtio_crypto_free_request
  virtio-net: not enable vq reset feature unconditionally
  vhost-user: Remove acpi-specific memslot limit
  ...

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
  • Loading branch information
rth7680 committed May 19, 2023
2 parents d009607 + 87af48a commit 64e63dc
Show file tree
Hide file tree
Showing 57 changed files with 899 additions and 340 deletions.
8 changes: 8 additions & 0 deletions docs/about/deprecated.rst
Expand Up @@ -328,6 +328,14 @@ from Intel that was not properly allocated. Since version 5.2, the controller
has used a properly allocated identifier. Deprecate the ``use-intel-id``
machine compatibility parameter.

``-device cxl-type3,memdev=xxxx`` (since 8.0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``cxl-type3`` device initially only used a single memory backend. With
the addition of volatile memory support, it is now necessary to distinguish
between persistent and volatile memory backends. As such, memdev is deprecated
in favor of persistent-memdev.


Block device options
''''''''''''''''''''
Expand Down
63 changes: 46 additions & 17 deletions docs/system/devices/cxl.rst
Expand Up @@ -162,7 +162,7 @@ Example system Topology. x marks the match in each decoder level::
|<------------------SYSTEM PHYSICAL ADDRESS MAP (1)----------------->|
| __________ __________________________________ __________ |
| | | | | | | |
| | CFMW 0 | | CXL Fixed Memory Window 1 | | CFMW 1 | |
| | CFMW 0 | | CXL Fixed Memory Window 1 | | CFMW 2 | |
| | HB0 only | | Configured to interleave memory | | HB1 only | |
| | | | memory accesses across HB0/HB1 | | | |
| |__________| |_____x____________________________| |__________| |
Expand Down Expand Up @@ -208,8 +208,8 @@ Notes:
(1) **3 CXL Fixed Memory Windows (CFMW)** corresponding to different
ranges of the system physical address map. Each CFMW has
particular interleave setup across the CXL Host Bridges (HB)
CFMW0 provides uninterleaved access to HB0, CFW2 provides
uninterleaved access to HB1. CFW1 provides interleaved memory access
CFMW0 provides uninterleaved access to HB0, CFMW2 provides
uninterleaved access to HB1. CFMW1 provides interleaved memory access
across HB0 and HB1.

(2) **Two CXL Host Bridges**. Each of these has 2 CXL Root Ports and
Expand Down Expand Up @@ -247,7 +247,7 @@ Example topology involving a switch::
|<------------------SYSTEM PHYSICAL ADDRESS MAP (1)----------------->|
| __________ __________________________________ __________ |
| | | | | | | |
| | CFMW 0 | | CXL Fixed Memory Window 1 | | CFMW 1 | |
| | CFMW 0 | | CXL Fixed Memory Window 1 | | CFMW 2 | |
| | HB0 only | | Configured to interleave memory | | HB1 only | |
| | | | memory accesses across HB0/HB1 | | | |
| |____x_____| |__________________________________| |__________| |
Expand Down Expand Up @@ -300,22 +300,43 @@ Example topology involving a switch::

Example command lines
---------------------
A very simple setup with just one directly attached CXL Type 3 device::
A very simple setup with just one directly attached CXL Type 3 Persistent Memory device::

qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu max \
qemu-system-x86_64 -M q35,cxl=on -m 4G,maxmem=8G,slots=8 -smp 4 \
...
-object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=256M \
-object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=256M \
-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
-device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
-device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0 \
-device cxl-type3,bus=root_port13,persistent-memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0 \
-M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G

A very simple setup with just one directly attached CXL Type 3 Volatile Memory device::

qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu max \
...
-object memory-backend-ram,id=vmem0,share=on,size=256M \
-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
-device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
-device cxl-type3,bus=root_port13,volatile-memdev=vmem0,id=cxl-vmem0 \
-M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G

The same volatile setup may optionally include an LSA region::

qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu max \
...
-object memory-backend-ram,id=vmem0,share=on,size=256M \
-object memory-backend-file,id=cxl-lsa0,share=on,mem-path=/tmp/lsa.raw,size=256M \
-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
-device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
-device cxl-type3,bus=root_port13,volatile-memdev=vmem0,lsa=cxl-lsa0,id=cxl-vmem0 \
-M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G

A setup suitable for 4 way interleave. Only one fixed window provided, to enable 2 way
interleave across 2 CXL host bridges. Each host bridge has 2 CXL Root Ports, with
the CXL Type3 device directly attached (no switches).::

qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu max \
qemu-system-x86_64 -M q35,cxl=on -m 4G,maxmem=8G,slots=8 -smp 4 \
...
-object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=256M \
-object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M \
Expand All @@ -328,18 +349,18 @@ the CXL Type3 device directly attached (no switches).::
-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
-device pxb-cxl,bus_nr=222,bus=pcie.0,id=cxl.2 \
-device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
-device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0 \
-device cxl-type3,bus=root_port13,persistent-memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0 \
-device cxl-rp,port=1,bus=cxl.1,id=root_port14,chassis=0,slot=3 \
-device cxl-type3,bus=root_port14,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem1 \
-device cxl-type3,bus=root_port14,persistent-memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem1 \
-device cxl-rp,port=0,bus=cxl.2,id=root_port15,chassis=0,slot=5 \
-device cxl-type3,bus=root_port15,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem2 \
-device cxl-type3,bus=root_port15,persistent-memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem2 \
-device cxl-rp,port=1,bus=cxl.2,id=root_port16,chassis=0,slot=6 \
-device cxl-type3,bus=root_port16,memdev=cxl-mem4,lsa=cxl-lsa4,id=cxl-pmem3 \
-device cxl-type3,bus=root_port16,persistent-memdev=cxl-mem4,lsa=cxl-lsa4,id=cxl-pmem3 \
-M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.targets.1=cxl.2,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=8k

An example of 4 devices below a switch suitable for 1, 2 or 4 way interleave::

qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu max \
qemu-system-x86_64 -M q35,cxl=on -m 4G,maxmem=8G,slots=8 -smp 4 \
...
-object memory-backend-file,id=cxl-mem0,share=on,mem-path=/tmp/cxltest.raw,size=256M \
-object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest1.raw,size=256M \
Expand All @@ -354,15 +375,23 @@ An example of 4 devices below a switch suitable for 1, 2 or 4 way interleave::
-device cxl-rp,port=1,bus=cxl.1,id=root_port1,chassis=0,slot=1 \
-device cxl-upstream,bus=root_port0,id=us0 \
-device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \
-device cxl-type3,bus=swport0,memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0,size=256M \
-device cxl-type3,bus=swport0,persistent-memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0 \
-device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \
-device cxl-type3,bus=swport1,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem1,size=256M \
-device cxl-type3,bus=swport1,persistent-memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem1 \
-device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \
-device cxl-type3,bus=swport2,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem2,size=256M \
-device cxl-type3,bus=swport2,persistent-memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem2 \
-device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \
-device cxl-type3,bus=swport3,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem3,size=256M \
-device cxl-type3,bus=swport3,persistent-memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem3 \
-M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=4k

Deprecations
------------

The Type 3 device [memdev] attribute has been deprecated in favor of the
[persistent-memdev] attributes. [memdev] will default to a persistent memory
device for backward compatibility and is incapable of being used in combination
with [persistent-memdev].

Kernel Configuration Options
----------------------------

Expand Down
1 change: 1 addition & 0 deletions hw/core/machine.c
Expand Up @@ -48,6 +48,7 @@ GlobalProperty hw_compat_7_2[] = {
{ "e1000e", "migrate-timadj", "off" },
{ "virtio-mem", "x-early-migration", "false" },
{ "migration", "x-preempt-pre-7-2", "true" },
{ TYPE_PCI_DEVICE, "x-pcie-err-unc-mask", "off" },
};
const size_t hw_compat_7_2_len = G_N_ELEMENTS(hw_compat_7_2);

Expand Down
60 changes: 25 additions & 35 deletions hw/cxl/cxl-cdat.c
Expand Up @@ -108,75 +108,66 @@ static void ct3_build_cdat(CDATObject *cdat, Error **errp)
static void ct3_load_cdat(CDATObject *cdat, Error **errp)
{
g_autofree CDATEntry *cdat_st = NULL;
g_autofree char *buf = NULL;
uint8_t sum = 0;
int num_ent;
int i = 0, ent = 1, file_size = 0;
int i = 0, ent = 1;
gsize file_size = 0;
CDATSubHeader *hdr;
FILE *fp = NULL;
GError *error = NULL;

/* Read CDAT file and create its cache */
fp = fopen(cdat->filename, "r");
if (!fp) {
error_setg(errp, "CDAT: Unable to open file");
if (!g_file_get_contents(cdat->filename, (gchar **)&buf,
&file_size, &error)) {
error_setg(errp, "CDAT: File read failed: %s", error->message);
g_error_free(error);
return;
}

fseek(fp, 0, SEEK_END);
file_size = ftell(fp);
fseek(fp, 0, SEEK_SET);
cdat->buf = g_malloc0(file_size);

if (fread(cdat->buf, file_size, 1, fp) == 0) {
error_setg(errp, "CDAT: File read failed");
return;
}

fclose(fp);

if (file_size < sizeof(CDATTableHeader)) {
error_setg(errp, "CDAT: File too short");
return;
}
i = sizeof(CDATTableHeader);
num_ent = 1;
while (i < file_size) {
hdr = (CDATSubHeader *)(cdat->buf + i);
hdr = (CDATSubHeader *)(buf + i);
if (i + sizeof(CDATSubHeader) > file_size) {
error_setg(errp, "CDAT: Truncated table");
return;
}
cdat_len_check(hdr, errp);
i += hdr->length;
if (i > file_size) {
error_setg(errp, "CDAT: Truncated table");
return;
}
num_ent++;
}
if (i != file_size) {
error_setg(errp, "CDAT: File length mismatch");
return;
}

cdat_st = g_malloc0(sizeof(*cdat_st) * num_ent);
if (!cdat_st) {
error_setg(errp, "CDAT: Failed to allocate entry array");
return;
}
cdat_st = g_new0(CDATEntry, num_ent);

/* Set CDAT header, Entry = 0 */
cdat_st[0].base = cdat->buf;
cdat_st[0].base = buf;
cdat_st[0].length = sizeof(CDATTableHeader);
i = 0;

while (i < cdat_st[0].length) {
sum += cdat->buf[i++];
sum += buf[i++];
}

/* Read CDAT structures */
while (i < file_size) {
hdr = (CDATSubHeader *)(cdat->buf + i);
cdat_len_check(hdr, errp);

hdr = (CDATSubHeader *)(buf + i);
cdat_st[ent].base = hdr;
cdat_st[ent].length = hdr->length;

while (cdat->buf + i <
(uint8_t *)cdat_st[ent].base + cdat_st[ent].length) {
while (buf + i < (char *)cdat_st[ent].base + cdat_st[ent].length) {
assert(i < file_size);
sum += cdat->buf[i++];
sum += buf[i++];
}

ent++;
Expand All @@ -187,6 +178,7 @@ static void ct3_load_cdat(CDATObject *cdat, Error **errp)
}
cdat->entry_len = num_ent;
cdat->entry = g_steal_pointer(&cdat_st);
cdat->buf = g_steal_pointer(&buf);
}

void cxl_doe_cdat_init(CXLComponentState *cxl_cstate, Error **errp)
Expand Down Expand Up @@ -218,7 +210,5 @@ void cxl_doe_cdat_release(CXLComponentState *cxl_cstate)
cdat->free_cdat_table(cdat->built_buf, cdat->built_buf_len,
cdat->private);
}
if (cdat->buf) {
free(cdat->buf);
}
g_free(cdat->buf);
}
14 changes: 8 additions & 6 deletions hw/cxl/cxl-component-utils.c
Expand Up @@ -38,23 +38,25 @@ static void dumb_hdm_handler(CXLComponentState *cxl_cstate, hwaddr offset,
ComponentRegisters *cregs = &cxl_cstate->crb;
uint32_t *cache_mem = cregs->cache_mem_registers;
bool should_commit = false;
bool should_uncommit = false;

switch (offset) {
case A_CXL_HDM_DECODER0_CTRL:
should_commit = FIELD_EX32(value, CXL_HDM_DECODER0_CTRL, COMMIT);
should_uncommit = !should_commit;
break;
default:
break;
}

memory_region_transaction_begin();
stl_le_p((uint8_t *)cache_mem + offset, value);
if (should_commit) {
ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, COMMIT, 0);
ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, ERR, 0);
ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, COMMITTED, 1);
value = FIELD_DP32(value, CXL_HDM_DECODER0_CTRL, ERR, 0);
value = FIELD_DP32(value, CXL_HDM_DECODER0_CTRL, COMMITTED, 1);
} else if (should_uncommit) {
value = FIELD_DP32(value, CXL_HDM_DECODER0_CTRL, ERR, 0);
value = FIELD_DP32(value, CXL_HDM_DECODER0_CTRL, COMMITTED, 0);
}
memory_region_transaction_commit();
stl_le_p((uint8_t *)cache_mem + offset, value);
}

static void cxl_cache_mem_write_reg(void *opaque, hwaddr offset, uint64_t value,
Expand Down
15 changes: 15 additions & 0 deletions hw/cxl/cxl-device-utils.c
Expand Up @@ -269,3 +269,18 @@ void cxl_device_register_init_common(CXLDeviceState *cxl_dstate)

cxl_initialize_mailbox(cxl_dstate);
}

uint64_t cxl_device_get_timestamp(CXLDeviceState *cxl_dstate)
{
uint64_t time, delta;
uint64_t final_time = 0;

if (cxl_dstate->timestamp.set) {
/* Find the delta from the last time the host set the time. */
time = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
delta = time - cxl_dstate->timestamp.last_set;
final_time = cxl_dstate->timestamp.host_set + delta;
}

return final_time;
}

0 comments on commit 64e63dc

Please sign in to comment.