244 changes: 244 additions & 0 deletions docs/system/s390x/cpu-topology.rst
@@ -0,0 +1,244 @@
.. _cpu-topology-s390x:

CPU topology on s390x
=====================

Since QEMU 8.2, CPU topology on s390x provides up to 3 levels of
topology containers: drawers, books and sockets. They define a
tree-shaped hierarchy.

The socket container has one or more CPU entries.
Each of these CPU entries consists of a bitmap and three CPU attributes:

- CPU type
- entitlement
- dedication

Each bit set in the bitmap correspond to a core-id of a vCPU with matching
attributes.

This documentation provides general information on S390 CPU topology,
how to enable it and explains the new CPU attributes.
For information on how to modify the S390 CPU topology and how to
monitor polarization changes, see ``docs/devel/s390-cpu-topology.rst``.

Prerequisites
-------------

To use the CPU topology, you need to run with KVM on a s390x host that
uses the Linux kernel v6.0 or newer (which provide the so-called
``KVM_CAP_S390_CPU_TOPOLOGY`` capability that allows QEMU to signal the
CPU topology facility via the so-called STFLE bit 11 to the VM).

Enabling CPU topology
---------------------

Currently, CPU topology is only enabled in the host model by default.

Enabling CPU topology in a CPU model is done by setting the CPU flag
``ctop`` to ``on`` as in:

.. code-block:: bash
-cpu gen16b,ctop=on
Having the topology disabled by default allows migration between
old and new QEMU without adding new flags.

Default topology usage
----------------------

The CPU topology can be specified on the QEMU command line
with the ``-smp`` or the ``-device`` QEMU command arguments.

Note also that since 7.2 threads are no longer supported in the topology
and the ``-smp`` command line argument accepts only ``threads=1``.

If none of the containers attributes (drawers, books, sockets) are
specified for the ``-smp`` flag, the number of these containers
is 1.

Thus the following two options will result in the same topology:

.. code-block:: bash
-smp cpus=5,drawer=1,books=1,sockets=8,cores=4,maxcpus=32
and

.. code-block:: bash
-smp cpus=5,sockets=8,cores=4,maxcpus=32
When a CPU is defined by the ``-smp`` command argument, its position
inside the topology is calculated by adding the CPUs to the topology
based on the core-id starting with core-0 at position 0 of socket-0,
book-0, drawer-0 and filling all CPUs of socket-0 before filling socket-1
of book-0 and so on up to the last socket of the last book of the last
drawer.

When a CPU is defined by the ``-device`` command argument, the
tree topology attributes must all be defined or all not defined.

.. code-block:: bash
-device gen16b-s390x-cpu,drawer-id=1,book-id=1,socket-id=2,core-id=1
or

.. code-block:: bash
-device gen16b-s390x-cpu,core-id=1,dedicated=true
If none of the tree attributes (drawer, book, sockets), are specified
for the ``-device`` argument, like for all CPUs defined with the ``-smp``
command argument the topology tree attributes will be set by simply
adding the CPUs to the topology based on the core-id.

QEMU will not try to resolve collisions and will report an error if the
CPU topology defined explicitly or implicitly on a ``-device``
argument collides with the definition of a CPU implicitly defined
on the ``-smp`` argument.

When the topology modifier attributes are not defined for the
``-device`` command argument they takes following default values:

- dedicated: ``false``
- entitlement: ``medium``


Hot plug
++++++++

New CPUs can be plugged using the device_add hmp command as in:

.. code-block:: bash
(qemu) device_add gen16b-s390x-cpu,core-id=9
The placement of the CPU is derived from the core-id as described above.

The topology can of course also be fully defined:

.. code-block:: bash
(qemu) device_add gen16b-s390x-cpu,drawer-id=1,book-id=1,socket-id=2,core-id=1
Examples
++++++++

In the following machine we define 8 sockets with 4 cores each.

.. code-block:: bash
$ qemu-system-s390x -m 2G \
-cpu gen16b,ctop=on \
-smp cpus=5,sockets=8,cores=4,maxcpus=32 \
-device host-s390x-cpu,core-id=14 \
A new CPUs can be plugged using the device_add hmp command as before:

.. code-block:: bash
(qemu) device_add gen16b-s390x-cpu,core-id=9
The core-id defines the placement of the core in the topology by
starting with core 0 in socket 0 up to maxcpus.

In the example above:

* There are 5 CPUs provided to the guest with the ``-smp`` command line
They will take the core-ids 0,1,2,3,4
As we have 4 cores in a socket, we have 4 CPUs provided
to the guest in socket 0, with core-ids 0,1,2,3.
The last CPU, with core-id 4, will be on socket 1.

* the core with ID 14 provided by the ``-device`` command line will
be placed in socket 3, with core-id 14

* the core with ID 9 provided by the ``device_add`` qmp command will
be placed in socket 2, with core-id 9


Polarization, entitlement and dedication
----------------------------------------

Polarization
++++++++++++

The polarization affects how the CPUs of a shared host are utilized/distributed
among guests.
The guest determines the polarization by using the PTF instruction.

Polarization defines two models of CPU provisioning: horizontal
and vertical.

The horizontal polarization is the default model on boot and after
subsystem reset. When horizontal polarization is in effect all vCPUs should
have about equal resource provisioning.

In the vertical polarization model vCPUs are unequal, but overall more resources
might be available.
The guest can make use of the vCPU entitlement information provided by the host
to optimize kernel thread scheduling.

A subsystem reset puts all vCPU of the configuration into the
horizontal polarization.

Entitlement
+++++++++++

The vertical polarization specifies that the guest's vCPU can get
different real CPU provisioning:

- a vCPU with vertical high entitlement specifies that this
vCPU gets 100% of the real CPU provisioning.

- a vCPU with vertical medium entitlement specifies that this
vCPU shares the real CPU with other vCPUs.

- a vCPU with vertical low entitlement specifies that this
vCPU only gets real CPU provisioning when no other vCPUs needs it.

In the case a vCPU with vertical high entitlement does not use
the real CPU, the unused "slack" can be dispatched to other vCPU
with medium or low entitlement.

A vCPU can be "dedicated" in which case the vCPU is fully dedicated to a single
real CPU.

The dedicated bit is an indication of affinity of a vCPU for a real CPU
while the entitlement indicates the sharing or exclusivity of use.

Defining the topology on the command line
-----------------------------------------

The topology can entirely be defined using -device cpu statements,
with the exception of CPU 0 which must be defined with the -smp
argument.

For example, here we set the position of the cores 1,2,3 to
drawer 1, book 1, socket 2 and cores 0,9 and 14 to drawer 0,
book 0, socket 0 without defining entitlement or dedication.
Core 4 will be set on its default position on socket 1
(since we have 4 core per socket) and we define it as dedicated and
with vertical high entitlement.

.. code-block:: bash
$ qemu-system-s390x -m 2G \
-cpu gen16b,ctop=on \
-smp cpus=1,sockets=8,cores=4,maxcpus=32 \
\
-device gen16b-s390x-cpu,drawer-id=1,book-id=1,socket-id=2,core-id=1 \
-device gen16b-s390x-cpu,drawer-id=1,book-id=1,socket-id=2,core-id=2 \
-device gen16b-s390x-cpu,drawer-id=1,book-id=1,socket-id=2,core-id=3 \
\
-device gen16b-s390x-cpu,drawer-id=0,book-id=0,socket-id=0,core-id=9 \
-device gen16b-s390x-cpu,drawer-id=0,book-id=0,socket-id=0,core-id=14 \
\
-device gen16b-s390x-cpu,core-id=4,dedicated=on,entitlement=high
The entitlement defined for the CPU 4 will only be used after the guest
successfully enables vertical polarization by using the PTF instruction.
1 change: 1 addition & 0 deletions docs/system/target-s390x.rst
Expand Up @@ -34,3 +34,4 @@ Architectural features
.. toctree::
s390x/bootdevices
s390x/protvirt
s390x/cpu-topology
6 changes: 6 additions & 0 deletions hw/core/machine-hmp-cmds.c
Expand Up @@ -71,6 +71,12 @@ void hmp_hotpluggable_cpus(Monitor *mon, const QDict *qdict)
if (c->has_node_id) {
monitor_printf(mon, " node-id: \"%" PRIu64 "\"\n", c->node_id);
}
if (c->has_drawer_id) {
monitor_printf(mon, " drawer-id: \"%" PRIu64 "\"\n", c->drawer_id);
}
if (c->has_book_id) {
monitor_printf(mon, " book-id: \"%" PRIu64 "\"\n", c->book_id);
}
if (c->has_socket_id) {
monitor_printf(mon, " socket-id: \"%" PRIu64 "\"\n", c->socket_id);
}
Expand Down
48 changes: 41 additions & 7 deletions hw/core/machine-smp.c
Expand Up @@ -33,6 +33,14 @@ static char *cpu_hierarchy_to_string(MachineState *ms)
MachineClass *mc = MACHINE_GET_CLASS(ms);
GString *s = g_string_new(NULL);

if (mc->smp_props.drawers_supported) {
g_string_append_printf(s, "drawers (%u) * ", ms->smp.drawers);
}

if (mc->smp_props.books_supported) {
g_string_append_printf(s, "books (%u) * ", ms->smp.books);
}

g_string_append_printf(s, "sockets (%u)", ms->smp.sockets);

if (mc->smp_props.dies_supported) {
Expand Down Expand Up @@ -75,6 +83,8 @@ void machine_parse_smp_config(MachineState *ms,
{
MachineClass *mc = MACHINE_GET_CLASS(ms);
unsigned cpus = config->has_cpus ? config->cpus : 0;
unsigned drawers = config->has_drawers ? config->drawers : 0;
unsigned books = config->has_books ? config->books : 0;
unsigned sockets = config->has_sockets ? config->sockets : 0;
unsigned dies = config->has_dies ? config->dies : 0;
unsigned clusters = config->has_clusters ? config->clusters : 0;
Expand All @@ -87,6 +97,8 @@ void machine_parse_smp_config(MachineState *ms,
* explicit configuration like "cpus=0" is not allowed.
*/
if ((config->has_cpus && config->cpus == 0) ||
(config->has_drawers && config->drawers == 0) ||
(config->has_books && config->books == 0) ||
(config->has_sockets && config->sockets == 0) ||
(config->has_dies && config->dies == 0) ||
(config->has_clusters && config->clusters == 0) ||
Expand All @@ -113,6 +125,19 @@ void machine_parse_smp_config(MachineState *ms,
dies = dies > 0 ? dies : 1;
clusters = clusters > 0 ? clusters : 1;

if (!mc->smp_props.books_supported && books > 1) {
error_setg(errp, "books not supported by this machine's CPU topology");
return;
}
books = books > 0 ? books : 1;

if (!mc->smp_props.drawers_supported && drawers > 1) {
error_setg(errp,
"drawers not supported by this machine's CPU topology");
return;
}
drawers = drawers > 0 ? drawers : 1;

/* compute missing values based on the provided ones */
if (cpus == 0 && maxcpus == 0) {
sockets = sockets > 0 ? sockets : 1;
Expand All @@ -126,33 +151,41 @@ void machine_parse_smp_config(MachineState *ms,
if (sockets == 0) {
cores = cores > 0 ? cores : 1;
threads = threads > 0 ? threads : 1;
sockets = maxcpus / (dies * clusters * cores * threads);
sockets = maxcpus /
(drawers * books * dies * clusters * cores * threads);
} else if (cores == 0) {
threads = threads > 0 ? threads : 1;
cores = maxcpus / (sockets * dies * clusters * threads);
cores = maxcpus /
(drawers * books * sockets * dies * clusters * threads);
}
} else {
/* prefer cores over sockets since 6.2 */
if (cores == 0) {
sockets = sockets > 0 ? sockets : 1;
threads = threads > 0 ? threads : 1;
cores = maxcpus / (sockets * dies * clusters * threads);
cores = maxcpus /
(drawers * books * sockets * dies * clusters * threads);
} else if (sockets == 0) {
threads = threads > 0 ? threads : 1;
sockets = maxcpus / (dies * clusters * cores * threads);
sockets = maxcpus /
(drawers * books * dies * clusters * cores * threads);
}
}

/* try to calculate omitted threads at last */
if (threads == 0) {
threads = maxcpus / (sockets * dies * clusters * cores);
threads = maxcpus /
(drawers * books * sockets * dies * clusters * cores);
}
}

maxcpus = maxcpus > 0 ? maxcpus : sockets * dies * clusters * cores * threads;
maxcpus = maxcpus > 0 ? maxcpus : drawers * books * sockets * dies *
clusters * cores * threads;
cpus = cpus > 0 ? cpus : maxcpus;

ms->smp.cpus = cpus;
ms->smp.drawers = drawers;
ms->smp.books = books;
ms->smp.sockets = sockets;
ms->smp.dies = dies;
ms->smp.clusters = clusters;
Expand All @@ -163,7 +196,8 @@ void machine_parse_smp_config(MachineState *ms,
mc->smp_props.has_clusters = config->has_clusters;

/* sanity-check of the computed topology */
if (sockets * dies * clusters * cores * threads != maxcpus) {
if (drawers * books * sockets * dies * clusters * cores * threads !=
maxcpus) {
g_autofree char *topo_msg = cpu_hierarchy_to_string(ms);
error_setg(errp, "Invalid CPU topology: "
"product of the hierarchy must match maxcpus: "
Expand Down
4 changes: 4 additions & 0 deletions hw/core/machine.c
Expand Up @@ -863,6 +863,8 @@ static void machine_get_smp(Object *obj, Visitor *v, const char *name,
MachineState *ms = MACHINE(obj);
SMPConfiguration *config = &(SMPConfiguration){
.has_cpus = true, .cpus = ms->smp.cpus,
.has_drawers = true, .drawers = ms->smp.drawers,
.has_books = true, .books = ms->smp.books,
.has_sockets = true, .sockets = ms->smp.sockets,
.has_dies = true, .dies = ms->smp.dies,
.has_clusters = true, .clusters = ms->smp.clusters,
Expand Down Expand Up @@ -1137,6 +1139,8 @@ static void machine_initfn(Object *obj)
/* default to mc->default_cpus */
ms->smp.cpus = mc->default_cpus;
ms->smp.max_cpus = mc->default_cpus;
ms->smp.drawers = 1;
ms->smp.books = 1;
ms->smp.sockets = 1;
ms->smp.dies = 1;
ms->smp.clusters = 1;
Expand Down
13 changes: 13 additions & 0 deletions hw/core/qdev-properties-system.c
Expand Up @@ -1139,3 +1139,16 @@ const PropertyInfo qdev_prop_uuid = {
.set = set_uuid,
.set_default_value = set_default_uuid_auto,
};

/* --- s390 cpu entitlement policy --- */

QEMU_BUILD_BUG_ON(sizeof(CpuS390Entitlement) != sizeof(int));

const PropertyInfo qdev_prop_cpus390entitlement = {
.name = "CpuS390Entitlement",
.description = "low/medium (default)/high",
.enum_table = &CpuS390Entitlement_lookup,
.get = qdev_propinfo_get_enum,
.set = qdev_propinfo_set_enum,
.set_default_value = qdev_propinfo_set_default_value_enum,
};
469 changes: 469 additions & 0 deletions hw/s390x/cpu-topology.c

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions hw/s390x/meson.build
Expand Up @@ -23,6 +23,7 @@ s390x_ss.add(when: 'CONFIG_KVM', if_true: files(
's390-skeys-kvm.c',
's390-stattrib-kvm.c',
's390-pci-kvm.c',
'cpu-topology.c',
))
s390x_ss.add(when: 'CONFIG_TCG', if_true: files(
'tod-tcg.c',
Expand Down
29 changes: 27 additions & 2 deletions hw/s390x/s390-virtio-ccw.c
Expand Up @@ -45,6 +45,7 @@
#include "target/s390x/kvm/pv.h"
#include "migration/blocker.h"
#include "qapi/visitor.h"
#include "hw/s390x/cpu-topology.h"

static Error *pv_mig_blocker;

Expand Down Expand Up @@ -123,6 +124,9 @@ static void subsystem_reset(void)
device_cold_reset(dev);
}
}
if (s390_has_topology()) {
s390_topology_reset();
}
}

static int virtio_ccw_hcall_notify(const uint64_t *args)
Expand Down Expand Up @@ -309,10 +313,18 @@ static void s390_cpu_plug(HotplugHandler *hotplug_dev,
{
MachineState *ms = MACHINE(hotplug_dev);
S390CPU *cpu = S390_CPU(dev);
ERRP_GUARD();

g_assert(!ms->possible_cpus->cpus[cpu->env.core_id].cpu);
ms->possible_cpus->cpus[cpu->env.core_id].cpu = OBJECT(dev);

if (s390_has_topology()) {
s390_topology_setup_cpu(ms, cpu, errp);
if (*errp) {
return;
}
}

if (dev->hotplugged) {
raise_irq_cpu_hotplug();
}
Expand Down Expand Up @@ -562,11 +574,20 @@ static const CPUArchIdList *s390_possible_cpu_arch_ids(MachineState *ms)
sizeof(CPUArchId) * max_cpus);
ms->possible_cpus->len = max_cpus;
for (i = 0; i < ms->possible_cpus->len; i++) {
CpuInstanceProperties *props = &ms->possible_cpus->cpus[i].props;

ms->possible_cpus->cpus[i].type = ms->cpu_type;
ms->possible_cpus->cpus[i].vcpus_count = 1;
ms->possible_cpus->cpus[i].arch_id = i;
ms->possible_cpus->cpus[i].props.has_core_id = true;
ms->possible_cpus->cpus[i].props.core_id = i;

props->has_core_id = true;
props->core_id = i;
props->has_socket_id = true;
props->socket_id = s390_std_socket(i, &ms->smp);
props->has_book_id = true;
props->book_id = s390_std_book(i, &ms->smp);
props->has_drawer_id = true;
props->drawer_id = s390_std_drawer(i, &ms->smp);
}

return ms->possible_cpus;
Expand Down Expand Up @@ -744,6 +765,8 @@ static void ccw_machine_class_init(ObjectClass *oc, void *data)
mc->no_sdcard = 1;
mc->max_cpus = S390_MAX_CPUS;
mc->has_hotpluggable_cpus = true;
mc->smp_props.books_supported = true;
mc->smp_props.drawers_supported = true;
assert(!mc->get_hotplug_handler);
mc->get_hotplug_handler = s390_get_hotplug_handler;
mc->cpu_index_to_instance_props = s390_cpu_index_to_props;
Expand Down Expand Up @@ -853,6 +876,8 @@ static void ccw_machine_8_1_class_options(MachineClass *mc)
{
ccw_machine_8_2_class_options(mc);
compat_props_add(mc->compat_props, hw_compat_8_1, hw_compat_8_1_len);
mc->smp_props.drawers_supported = false;
mc->smp_props.books_supported = false;
}
DEFINE_CCW_MACHINE(8_1, "8.1", false);

Expand Down
5 changes: 5 additions & 0 deletions hw/s390x/sclp.c
Expand Up @@ -20,6 +20,7 @@
#include "hw/s390x/event-facility.h"
#include "hw/s390x/s390-pci-bus.h"
#include "hw/s390x/ipl.h"
#include "hw/s390x/cpu-topology.h"

static inline SCLPDevice *get_sclp_device(void)
{
Expand Down Expand Up @@ -123,6 +124,10 @@ static void read_SCP_info(SCLPDevice *sclp, SCCB *sccb)
return;
}

if (s390_has_topology()) {
read_info->stsi_parm = SCLP_READ_SCP_INFO_MNEST;
}

/* CPU information */
prepare_cpu_entries(machine, entries_start, &cpu_count);
read_info->entries_cpu = cpu_to_be16(cpu_count);
Expand Down
10 changes: 9 additions & 1 deletion include/hw/boards.h
Expand Up @@ -135,12 +135,16 @@ typedef struct {
* @clusters_supported - whether clusters are supported by the machine
* @has_clusters - whether clusters are explicitly specified in the user
* provided SMP configuration
* @books_supported - whether books are supported by the machine
* @drawers_supported - whether drawers are supported by the machine
*/
typedef struct {
bool prefer_sockets;
bool dies_supported;
bool clusters_supported;
bool has_clusters;
bool books_supported;
bool drawers_supported;
} SMPCompatProps;

/**
Expand Down Expand Up @@ -323,7 +327,9 @@ typedef struct DeviceMemoryState {
/**
* CpuTopology:
* @cpus: the number of present logical processors on the machine
* @sockets: the number of sockets on the machine
* @drawers: the number of drawers on the machine
* @books: the number of books in one drawer
* @sockets: the number of sockets in one book
* @dies: the number of dies in one socket
* @clusters: the number of clusters in one die
* @cores: the number of cores in one cluster
Expand All @@ -332,6 +338,8 @@ typedef struct DeviceMemoryState {
*/
typedef struct CpuTopology {
unsigned int cpus;
unsigned int drawers;
unsigned int books;
unsigned int sockets;
unsigned int dies;
unsigned int clusters;
Expand Down
4 changes: 4 additions & 0 deletions include/hw/qdev-properties-system.h
Expand Up @@ -22,6 +22,7 @@ extern const PropertyInfo qdev_prop_audiodev;
extern const PropertyInfo qdev_prop_off_auto_pcibar;
extern const PropertyInfo qdev_prop_pcie_link_speed;
extern const PropertyInfo qdev_prop_pcie_link_width;
extern const PropertyInfo qdev_prop_cpus390entitlement;

#define DEFINE_PROP_PCI_DEVFN(_n, _s, _f, _d) \
DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_pci_devfn, int32_t)
Expand Down Expand Up @@ -73,5 +74,8 @@ extern const PropertyInfo qdev_prop_pcie_link_width;
#define DEFINE_PROP_UUID_NODEFAULT(_name, _state, _field) \
DEFINE_PROP(_name, _state, _field, qdev_prop_uuid, QemuUUID)

#define DEFINE_PROP_CPUS390ENTITLEMENT(_n, _s, _f, _d) \
DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_cpus390entitlement, \
CpuS390Entitlement)

#endif
83 changes: 83 additions & 0 deletions include/hw/s390x/cpu-topology.h
@@ -0,0 +1,83 @@
/* SPDX-License-Identifier: GPL-2.0-or-later */
/*
* CPU Topology
*
* Copyright IBM Corp. 2022, 2023
* Author(s): Pierre Morel <pmorel@linux.ibm.com>
*
*/
#ifndef HW_S390X_CPU_TOPOLOGY_H
#define HW_S390X_CPU_TOPOLOGY_H

#ifndef CONFIG_USER_ONLY

#include "qemu/queue.h"
#include "hw/boards.h"
#include "qapi/qapi-types-machine-target.h"

#define S390_TOPOLOGY_CPU_IFL 0x03

typedef struct S390TopologyId {
uint8_t sentinel;
uint8_t drawer;
uint8_t book;
uint8_t socket;
uint8_t type;
uint8_t vertical:1;
uint8_t entitlement:2;
uint8_t dedicated;
uint8_t origin;
} S390TopologyId;

typedef struct S390TopologyEntry {
QTAILQ_ENTRY(S390TopologyEntry) next;
S390TopologyId id;
uint64_t mask;
} S390TopologyEntry;

typedef struct S390Topology {
uint8_t *cores_per_socket;
CpuS390Polarization polarization;
} S390Topology;

typedef QTAILQ_HEAD(, S390TopologyEntry) S390TopologyList;

#ifdef CONFIG_KVM
bool s390_has_topology(void);
void s390_topology_setup_cpu(MachineState *ms, S390CPU *cpu, Error **errp);
void s390_topology_reset(void);
#else
static inline bool s390_has_topology(void)
{
return false;
}
static inline void s390_topology_setup_cpu(MachineState *ms,
S390CPU *cpu,
Error **errp) {}
static inline void s390_topology_reset(void)
{
/* Unreachable, CPU topology not implemented for TCG */
assert(false);
}
#endif

extern S390Topology s390_topology;

static inline int s390_std_socket(int n, CpuTopology *smp)
{
return (n / smp->cores) % smp->sockets;
}

static inline int s390_std_book(int n, CpuTopology *smp)
{
return (n / (smp->cores * smp->sockets)) % smp->books;
}

static inline int s390_std_drawer(int n, CpuTopology *smp)
{
return (n / (smp->cores * smp->sockets * smp->books)) % smp->drawers;
}

#endif /* CONFIG_USER_ONLY */

#endif
6 changes: 6 additions & 0 deletions include/hw/s390x/s390-virtio-ccw.h
Expand Up @@ -30,6 +30,12 @@ struct S390CcwMachineState {
uint8_t loadparm[8];
};

#define S390_PTF_REASON_NONE (0x00 << 8)
#define S390_PTF_REASON_DONE (0x01 << 8)
#define S390_PTF_REASON_BUSY (0x02 << 8)
#define S390_TOPO_FC_MASK 0xffUL
void s390_handle_ptf(S390CPU *cpu, uint8_t r1, uintptr_t ra);

struct S390CcwMachineClass {
/*< private >*/
MachineClass parent_class;
Expand Down
4 changes: 3 additions & 1 deletion include/hw/s390x/sclp.h
Expand Up @@ -112,11 +112,13 @@ typedef struct CPUEntry {
} QEMU_PACKED CPUEntry;

#define SCLP_READ_SCP_INFO_FIXED_CPU_OFFSET 128
#define SCLP_READ_SCP_INFO_MNEST 4
typedef struct ReadInfo {
SCCBHeader h;
uint16_t rnmax;
uint8_t rnsize;
uint8_t _reserved1[16 - 11]; /* 11-15 */
uint8_t _reserved1[15 - 11]; /* 11-14 */
uint8_t stsi_parm; /* 15-15 */
uint16_t entries_cpu; /* 16-17 */
uint16_t offset_cpu; /* 18-19 */
uint8_t _reserved2[24 - 20]; /* 20-23 */
Expand Down
21 changes: 21 additions & 0 deletions qapi/machine-common.json
@@ -0,0 +1,21 @@
# -*- Mode: Python -*-
# vim: filetype=python
#
# This work is licensed under the terms of the GNU GPL, version 2 or later.
# See the COPYING file in the top-level directory.

##
# = Machines S390 data types
##

##
# @CpuS390Entitlement:
#
# An enumeration of CPU entitlements that can be assumed by a virtual
# S390 CPU
#
# Since: 8.2
##
{ 'enum': 'CpuS390Entitlement',
'prefix': 'S390_CPU_ENTITLEMENT',
'data': [ 'auto', 'low', 'medium', 'high' ] }
121 changes: 121 additions & 0 deletions qapi/machine-target.json
Expand Up @@ -4,6 +4,8 @@
# This work is licensed under the terms of the GNU GPL, version 2 or later.
# See the COPYING file in the top-level directory.

{ 'include': 'machine-common.json' }

##
# @CpuModelInfo:
#
Expand Down Expand Up @@ -361,3 +363,122 @@
'TARGET_MIPS',
'TARGET_LOONGARCH64',
'TARGET_RISCV' ] } }

##
# @CpuS390Polarization:
#
# An enumeration of CPU polarization that can be assumed by a virtual
# S390 CPU
#
# Since: 8.2
##
{ 'enum': 'CpuS390Polarization',
'prefix': 'S390_CPU_POLARIZATION',
'data': [ 'horizontal', 'vertical' ],
'if': 'TARGET_S390X'
}

##
# @set-cpu-topology:
#
# Modify the topology by moving the CPU inside the topology tree,
# or by changing a modifier attribute of a CPU.
# Absent values will not be modified.
#
# @core-id: the vCPU ID to be moved
#
# @socket-id: destination socket to move the vCPU to
#
# @book-id: destination book to move the vCPU to
#
# @drawer-id: destination drawer to move the vCPU to
#
# @entitlement: entitlement to set
#
# @dedicated: whether the provisioning of real to virtual CPU is dedicated
#
# Features:
#
# @unstable: This command is experimental.
#
# Returns: Nothing on success.
#
# Since: 8.2
##
{ 'command': 'set-cpu-topology',
'data': {
'core-id': 'uint16',
'*socket-id': 'uint16',
'*book-id': 'uint16',
'*drawer-id': 'uint16',
'*entitlement': 'CpuS390Entitlement',
'*dedicated': 'bool'
},
'features': [ 'unstable' ],
'if': { 'all': [ 'TARGET_S390X' , 'CONFIG_KVM' ] }
}

##
# @CPU_POLARIZATION_CHANGE:
#
# Emitted when the guest asks to change the polarization.
#
# The guest can tell the host (via the PTF instruction) whether the
# CPUs should be provisioned using horizontal or vertical polarization.
#
# On horizontal polarization the host is expected to provision all vCPUs
# equally.
#
# On vertical polarization the host can provision each vCPU differently.
# The guest will get information on the details of the provisioning
# the next time it uses the STSI(15) instruction.
#
# @polarization: polarization specified by the guest
#
# Features:
#
# @unstable: This event is experimental.
#
# Since: 8.2
#
# Example:
#
# <- { "event": "CPU_POLARIZATION_CHANGE",
# "data": { "polarization": "horizontal" },
# "timestamp": { "seconds": 1401385907, "microseconds": 422329 } }
##
{ 'event': 'CPU_POLARIZATION_CHANGE',
'data': { 'polarization': 'CpuS390Polarization' },
'features': [ 'unstable' ],
'if': { 'all': [ 'TARGET_S390X', 'CONFIG_KVM' ] }
}

##
# @CpuPolarizationInfo:
#
# The result of a CPU polarization query.
#
# @polarization: the CPU polarization
#
# Since: 8.2
##
{ 'struct': 'CpuPolarizationInfo',
'data': { 'polarization': 'CpuS390Polarization' },
'if': { 'all': [ 'TARGET_S390X', 'CONFIG_KVM' ] }
}

##
# @query-s390x-cpu-polarization:
#
# Features:
#
# @unstable: This command is experimental.
#
# Returns: the machine's CPU polarization
#
# Since: 8.2
##
{ 'command': 'query-s390x-cpu-polarization', 'returns': 'CpuPolarizationInfo',
'features': [ 'unstable' ],
'if': { 'all': [ 'TARGET_S390X', 'CONFIG_KVM' ] }
}
85 changes: 63 additions & 22 deletions qapi/machine.json
Expand Up @@ -9,6 +9,7 @@
##

{ 'include': 'common.json' }
{ 'include': 'machine-common.json' }

##
# @SysEmuTarget:
Expand Down Expand Up @@ -56,9 +57,16 @@
#
# @cpu-state: the virtual CPU's state
#
# @dedicated: the virtual CPU's dedication (since 8.2)
#
# @entitlement: the virtual CPU's entitlement (since 8.2)
#
# Since: 2.12
##
{ 'struct': 'CpuInfoS390', 'data': { 'cpu-state': 'CpuS390State' } }
{ 'struct': 'CpuInfoS390',
'data': { 'cpu-state': 'CpuS390State',
'*dedicated': 'bool',
'*entitlement': 'CpuS390Entitlement' } }

##
# @CpuInfoFast:
Expand All @@ -71,8 +79,7 @@
#
# @thread-id: ID of the underlying host thread
#
# @props: properties describing to which node/socket/core/thread
# virtual CPU belongs to, provided if supported by board
# @props: properties associated with a virtual CPU, e.g. the socket id
#
# @target: the QEMU system emulation target, which determines which
# additional fields will be listed (since 3.0)
Expand Down Expand Up @@ -899,29 +906,46 @@
# should be passed by management with device_add command when a CPU is
# being hotplugged.
#
# Which members are optional and which mandatory depends on the
# architecture and board.
#
# For s390x see :ref:`cpu-topology-s390x`.
#
# The ids other than the node-id specify the position of the CPU
# within the CPU topology (as defined by the machine property "smp",
# thus see also type @SMPConfiguration)
#
# @node-id: NUMA node ID the CPU belongs to
#
# @socket-id: socket number within node/board the CPU belongs to
# @drawer-id: drawer number within CPU topology the CPU belongs to
# (since 8.2)
#
# @die-id: die number within socket the CPU belongs to (since 4.1)
# @book-id: book number within parent container the CPU belongs to
# (since 8.2)
#
# @cluster-id: cluster number within die the CPU belongs to (since
# 7.1)
# @socket-id: socket number within parent container the CPU belongs to
#
# @core-id: core number within cluster the CPU belongs to
# @die-id: die number within the parent container the CPU belongs to
# (since 4.1)
#
# @thread-id: thread number within core the CPU belongs to
# @cluster-id: cluster number within the parent container the CPU
# belongs to (since 7.1)
#
# Note: currently there are 6 properties that could be present but
# management should be prepared to pass through other properties
# with device_add command to allow for future interface extension.
# This also requires the filed names to be kept in sync with the
# properties passed to -device/device_add.
# @core-id: core number within the parent container the CPU
# belongs to
#
# @thread-id: thread number within the core the CPU belongs to
#
# Note: management should be prepared to pass through additional
# properties with device_add.
#
# Since: 2.7
##
{ 'struct': 'CpuInstanceProperties',
# Keep these in sync with the properties device_add accepts
'data': { '*node-id': 'int',
'*drawer-id': 'int',
'*book-id': 'int',
'*socket-id': 'int',
'*die-id': 'int',
'*cluster-id': 'int',
Expand Down Expand Up @@ -1478,26 +1502,43 @@
# Schema for CPU topology configuration. A missing value lets QEMU
# figure out a suitable value based on the ones that are provided.
#
# @cpus: number of virtual CPUs in the virtual machine
#
# @sockets: number of sockets in the CPU topology
# The members other than @cpus and @maxcpus define a topology of
# containers.
#
# @dies: number of dies per socket in the CPU topology
# The ordering from highest/coarsest to lowest/finest is:
# @drawers, @books, @sockets, @dies, @clusters, @cores, @threads.
#
# @clusters: number of clusters per die in the CPU topology (since
# 7.0)
# Different architectures support different subsets of topology
# containers.
#
# @cores: number of cores per cluster in the CPU topology
# For example, s390x does not have clusters and dies, and the socket
# is the parent container of cores.
#
# @threads: number of threads per core in the CPU topology
# @cpus: number of virtual CPUs in the virtual machine
#
# @maxcpus: maximum number of hotpluggable virtual CPUs in the virtual
# machine
#
# @drawers: number of drawers in the CPU topology (since 8.2)
#
# @books: number of books in the CPU topology (since 8.2)
#
# @sockets: number of sockets per parent container
#
# @dies: number of dies per parent container
#
# @clusters: number of clusters per parent container (since 7.0)
#
# @cores: number of cores per parent container
#
# @threads: number of threads per core
#
# Since: 6.1
##
{ 'struct': 'SMPConfiguration', 'data': {
'*cpus': 'int',
'*drawers': 'int',
'*books': 'int',
'*sockets': 'int',
'*dies': 'int',
'*clusters': 'int',
Expand Down
1 change: 1 addition & 0 deletions qapi/meson.build
Expand Up @@ -36,6 +36,7 @@ qapi_all_modules = [
'error',
'introspect',
'job',
'machine-common',
'machine',
'machine-target',
'migration',
Expand Down
1 change: 1 addition & 0 deletions qapi/qapi-schema.json
Expand Up @@ -66,6 +66,7 @@
{ 'include': 'introspect.json' }
{ 'include': 'qom.json' }
{ 'include': 'qdev.json' }
{ 'include': 'machine-common.json' }
{ 'include': 'machine.json' }
{ 'include': 'machine-target.json' }
{ 'include': 'replay.json' }
Expand Down
7 changes: 5 additions & 2 deletions qemu-options.hx
Expand Up @@ -272,11 +272,14 @@ SRST
ERST

DEF("smp", HAS_ARG, QEMU_OPTION_smp,
"-smp [[cpus=]n][,maxcpus=maxcpus][,sockets=sockets][,dies=dies][,clusters=clusters][,cores=cores][,threads=threads]\n"
"-smp [[cpus=]n][,maxcpus=maxcpus][,drawers=drawers][,books=books][,sockets=sockets]\n"
" [,dies=dies][,clusters=clusters][,cores=cores][,threads=threads]\n"
" set the number of initial CPUs to 'n' [default=1]\n"
" maxcpus= maximum number of total CPUs, including\n"
" offline CPUs for hotplug, etc\n"
" sockets= number of sockets on the machine board\n"
" drawers= number of drawers on the machine board\n"
" books= number of books in one drawer\n"
" sockets= number of sockets in one book\n"
" dies= number of dies in one socket\n"
" clusters= number of clusters in one die\n"
" cores= number of cores in one cluster\n"
Expand Down
6 changes: 6 additions & 0 deletions system/vl.c
Expand Up @@ -726,6 +726,12 @@ static QemuOptsList qemu_smp_opts = {
{
.name = "cpus",
.type = QEMU_OPT_NUMBER,
}, {
.name = "drawers",
.type = QEMU_OPT_NUMBER,
}, {
.name = "books",
.type = QEMU_OPT_NUMBER,
}, {
.name = "sockets",
.type = QEMU_OPT_NUMBER,
Expand Down
13 changes: 13 additions & 0 deletions target/s390x/cpu-sysemu.c
Expand Up @@ -307,3 +307,16 @@ void s390_do_cpu_set_diag318(CPUState *cs, run_on_cpu_data arg)
kvm_s390_set_diag318(cs, arg.host_ulong);
}
}

void s390_cpu_topology_set_changed(bool changed)
{
int ret;

if (kvm_enabled()) {
ret = kvm_s390_topology_set_mtcr(changed);
if (ret) {
error_report("Failed to set Modified Topology Change Report: %s",
strerror(-ret));
}
}
}
16 changes: 16 additions & 0 deletions target/s390x/cpu.c
Expand Up @@ -31,12 +31,14 @@
#include "qapi/qapi-types-machine.h"
#include "sysemu/hw_accel.h"
#include "hw/qdev-properties.h"
#include "hw/qdev-properties-system.h"
#include "fpu/softfloat-helpers.h"
#include "disas/capstone.h"
#include "sysemu/tcg.h"
#ifndef CONFIG_USER_ONLY
#include "sysemu/reset.h"
#endif
#include "hw/s390x/cpu-topology.h"

#define CR0_RESET 0xE0UL
#define CR14_RESET 0xC2000000UL;
Expand Down Expand Up @@ -145,6 +147,14 @@ static void s390_query_cpu_fast(CPUState *cpu, CpuInfoFast *value)
S390CPU *s390_cpu = S390_CPU(cpu);

value->u.s390x.cpu_state = s390_cpu->env.cpu_state;
#if !defined(CONFIG_USER_ONLY)
if (s390_has_topology()) {
value->u.s390x.has_dedicated = true;
value->u.s390x.dedicated = s390_cpu->env.dedicated;
value->u.s390x.has_entitlement = true;
value->u.s390x.entitlement = s390_cpu->env.entitlement;
}
#endif
}

/* S390CPUClass::reset() */
Expand Down Expand Up @@ -290,6 +300,12 @@ static const gchar *s390_gdb_arch_name(CPUState *cs)
static Property s390x_cpu_properties[] = {
#if !defined(CONFIG_USER_ONLY)
DEFINE_PROP_UINT32("core-id", S390CPU, env.core_id, 0),
DEFINE_PROP_INT32("socket-id", S390CPU, env.socket_id, -1),
DEFINE_PROP_INT32("book-id", S390CPU, env.book_id, -1),
DEFINE_PROP_INT32("drawer-id", S390CPU, env.drawer_id, -1),
DEFINE_PROP_BOOL("dedicated", S390CPU, env.dedicated, false),
DEFINE_PROP_CPUS390ENTITLEMENT("entitlement", S390CPU, env.entitlement,
S390_CPU_ENTITLEMENT_AUTO),
#endif
DEFINE_PROP_END_OF_LIST()
};
Expand Down
82 changes: 82 additions & 0 deletions target/s390x/cpu.h
Expand Up @@ -30,6 +30,7 @@
#include "exec/cpu-defs.h"
#include "qemu/cpu-float.h"
#include "tcg/tcg_s390x.h"
#include "qapi/qapi-types-machine-common.h"

#define ELF_MACHINE_UNAME "S390X"

Expand Down Expand Up @@ -132,6 +133,11 @@ struct CPUArchState {

#if !defined(CONFIG_USER_ONLY)
uint32_t core_id; /* PoP "CPU address", same as cpu_index */
int32_t socket_id;
int32_t book_id;
int32_t drawer_id;
bool dedicated;
CpuS390Entitlement entitlement; /* Used only for vertical polarization */
uint64_t cpuid;
#endif

Expand Down Expand Up @@ -564,16 +570,92 @@ typedef struct SysIB_322 {
} SysIB_322;
QEMU_BUILD_BUG_ON(sizeof(SysIB_322) != 4096);

/*
* Topology Magnitude fields (MAG) indicates the maximum number of
* topology list entries (TLE) at the corresponding nesting level.
*/
#define S390_TOPOLOGY_MAG 6
#define S390_TOPOLOGY_MAG6 0
#define S390_TOPOLOGY_MAG5 1
#define S390_TOPOLOGY_MAG4 2
#define S390_TOPOLOGY_MAG3 3
#define S390_TOPOLOGY_MAG2 4
#define S390_TOPOLOGY_MAG1 5
/* Configuration topology */
typedef struct SysIB_151x {
uint8_t reserved0[2];
uint16_t length;
uint8_t mag[S390_TOPOLOGY_MAG];
uint8_t reserved1;
uint8_t mnest;
uint32_t reserved2;
char tle[];
} SysIB_151x;
QEMU_BUILD_BUG_ON(sizeof(SysIB_151x) != 16);

typedef union SysIB {
SysIB_111 sysib_111;
SysIB_121 sysib_121;
SysIB_122 sysib_122;
SysIB_221 sysib_221;
SysIB_222 sysib_222;
SysIB_322 sysib_322;
SysIB_151x sysib_151x;
} SysIB;
QEMU_BUILD_BUG_ON(sizeof(SysIB) != 4096);

/*
* CPU Topology List provided by STSI with fc=15 provides a list
* of two different Topology List Entries (TLE) types to specify
* the topology hierarchy.
*
* - Container Topology List Entry
* Defines a container to contain other Topology List Entries
* of any type, nested containers or CPU.
* - CPU Topology List Entry
* Specifies the CPUs position, type, entitlement and polarization
* of the CPUs contained in the last container TLE.
*
* There can be theoretically up to five levels of containers, QEMU
* uses only three levels, the drawer's, book's and socket's level.
*
* A container with a nesting level (NL) greater than 1 can only
* contain another container of nesting level NL-1.
*
* A container of nesting level 1 (socket), contains as many CPU TLE
* as needed to describe the position and qualities of all CPUs inside
* the container.
* The qualities of a CPU are polarization, entitlement and type.
*
* The CPU TLE defines the position of the CPUs of identical qualities
* using a 64bits mask which first bit has its offset defined by
* the CPU address origin field of the CPU TLE like in:
* CPU address = origin * 64 + bit position within the mask
*/
/* Container type Topology List Entry */
typedef struct SYSIBContainerListEntry {
uint8_t nl;
uint8_t reserved[6];
uint8_t id;
} SYSIBContainerListEntry;
QEMU_BUILD_BUG_ON(sizeof(SYSIBContainerListEntry) != 8);

/* CPU type Topology List Entry */
typedef struct SysIBCPUListEntry {
uint8_t nl;
uint8_t reserved0[3];
#define SYSIB_TLE_POLARITY_MASK 0x03
#define SYSIB_TLE_DEDICATED 0x04
uint8_t flags;
uint8_t type;
uint16_t origin;
uint64_t mask;
} SysIBCPUListEntry;
QEMU_BUILD_BUG_ON(sizeof(SysIBCPUListEntry) != 16);

void insert_stsi_15_1_x(S390CPU *cpu, int sel2, uint64_t addr, uint8_t ar, uintptr_t ra);
void s390_cpu_topology_set_changed(bool changed);

/* MMU defines */
#define ASCE_ORIGIN (~0xfffULL) /* segment table origin */
#define ASCE_SUBSPACE 0x200 /* subspace group control */
Expand Down
1 change: 1 addition & 0 deletions target/s390x/cpu_models.c
Expand Up @@ -255,6 +255,7 @@ bool s390_has_feat(S390Feat feat)
case S390_FEAT_SIE_CMMA:
case S390_FEAT_SIE_PFMFI:
case S390_FEAT_SIE_IBS:
case S390_FEAT_CONFIGURATION_TOPOLOGY:
return false;
break;
default:
Expand Down
166 changes: 78 additions & 88 deletions target/s390x/kvm/kvm.c
Expand Up @@ -86,6 +86,7 @@

#define PRIV_B9_EQBS 0x9c
#define PRIV_B9_CLP 0xa0
#define PRIV_B9_PTF 0xa2
#define PRIV_B9_PCISTG 0xd0
#define PRIV_B9_PCILG 0xd2
#define PRIV_B9_RPCIT 0xd3
Expand Down Expand Up @@ -138,7 +139,6 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
KVM_CAP_LAST_INFO
};

static int cap_sync_regs;
static int cap_async_pf;
static int cap_mem_op;
static int cap_mem_op_extension;
Expand Down Expand Up @@ -337,21 +337,28 @@ int kvm_arch_get_default_type(MachineState *ms)

int kvm_arch_init(MachineState *ms, KVMState *s)
{
int required_caps[] = {
KVM_CAP_DEVICE_CTRL,
KVM_CAP_SYNC_REGS,
};

for (int i = 0; i < ARRAY_SIZE(required_caps); i++) {
if (!kvm_check_extension(s, required_caps[i])) {
error_report("KVM is missing capability #%d - "
"please use kernel 3.15 or newer", required_caps[i]);
return -1;
}
}

object_class_foreach(ccw_machine_class_foreach, TYPE_S390_CCW_MACHINE,
false, NULL);

if (!kvm_check_extension(kvm_state, KVM_CAP_DEVICE_CTRL)) {
error_report("KVM is missing capability KVM_CAP_DEVICE_CTRL - "
"please use kernel 3.15 or newer");
return -1;
}
if (!kvm_check_extension(s, KVM_CAP_S390_COW)) {
error_report("KVM is missing capability KVM_CAP_S390_COW - "
"unsupported environment");
return -1;
}

cap_sync_regs = kvm_check_extension(s, KVM_CAP_SYNC_REGS);
cap_async_pf = kvm_check_extension(s, KVM_CAP_ASYNC_PF);
cap_mem_op = kvm_check_extension(s, KVM_CAP_S390_MEM_OP);
cap_mem_op_extension = kvm_check_extension(s, KVM_CAP_S390_MEM_OP_EXTENSION);
Expand All @@ -365,6 +372,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
kvm_vm_enable_cap(s, KVM_CAP_S390_USER_SIGP, 0);
kvm_vm_enable_cap(s, KVM_CAP_S390_VECTOR_REGISTERS, 0);
kvm_vm_enable_cap(s, KVM_CAP_S390_USER_STSI, 0);
kvm_vm_enable_cap(s, KVM_CAP_S390_CPU_TOPOLOGY, 0);
if (ri_allowed()) {
if (kvm_vm_enable_cap(s, KVM_CAP_S390_RI, 0) == 0) {
cap_ri = 1;
Expand Down Expand Up @@ -458,37 +466,28 @@ void kvm_s390_reset_vcpu_normal(S390CPU *cpu)

static int can_sync_regs(CPUState *cs, int regs)
{
return cap_sync_regs && (cs->kvm_run->kvm_valid_regs & regs) == regs;
return (cs->kvm_run->kvm_valid_regs & regs) == regs;
}

#define KVM_SYNC_REQUIRED_REGS (KVM_SYNC_GPRS | KVM_SYNC_ACRS | \
KVM_SYNC_CRS | KVM_SYNC_PREFIX)

int kvm_arch_put_registers(CPUState *cs, int level)
{
S390CPU *cpu = S390_CPU(cs);
CPUS390XState *env = &cpu->env;
struct kvm_sregs sregs;
struct kvm_regs regs;
struct kvm_fpu fpu = {};
int r;
int i;

g_assert(can_sync_regs(cs, KVM_SYNC_REQUIRED_REGS));

/* always save the PSW and the GPRS*/
cs->kvm_run->psw_addr = env->psw.addr;
cs->kvm_run->psw_mask = env->psw.mask;

if (can_sync_regs(cs, KVM_SYNC_GPRS)) {
for (i = 0; i < 16; i++) {
cs->kvm_run->s.regs.gprs[i] = env->regs[i];
cs->kvm_run->kvm_dirty_regs |= KVM_SYNC_GPRS;
}
} else {
for (i = 0; i < 16; i++) {
regs.gprs[i] = env->regs[i];
}
r = kvm_vcpu_ioctl(cs, KVM_SET_REGS, &regs);
if (r < 0) {
return r;
}
}
memcpy(cs->kvm_run->s.regs.gprs, env->regs, sizeof(cs->kvm_run->s.regs.gprs));
cs->kvm_run->kvm_dirty_regs |= KVM_SYNC_GPRS;

if (can_sync_regs(cs, KVM_SYNC_VRS)) {
for (i = 0; i < 32; i++) {
Expand Down Expand Up @@ -521,6 +520,15 @@ int kvm_arch_put_registers(CPUState *cs, int level)
return 0;
}

/*
* Access registers, control registers and the prefix - these are
* always available via kvm_sync_regs in the kernels that we support
*/
memcpy(cs->kvm_run->s.regs.acrs, env->aregs, sizeof(cs->kvm_run->s.regs.acrs));
memcpy(cs->kvm_run->s.regs.crs, env->cregs, sizeof(cs->kvm_run->s.regs.crs));
cs->kvm_run->s.regs.prefix = env->psa;
cs->kvm_run->kvm_dirty_regs |= KVM_SYNC_ACRS | KVM_SYNC_CRS | KVM_SYNC_PREFIX;

if (can_sync_regs(cs, KVM_SYNC_ARCH0)) {
cs->kvm_run->s.regs.cputm = env->cputm;
cs->kvm_run->s.regs.ckc = env->ckc;
Expand Down Expand Up @@ -567,25 +575,6 @@ int kvm_arch_put_registers(CPUState *cs, int level)
}
}

/* access registers and control registers*/
if (can_sync_regs(cs, KVM_SYNC_ACRS | KVM_SYNC_CRS)) {
for (i = 0; i < 16; i++) {
cs->kvm_run->s.regs.acrs[i] = env->aregs[i];
cs->kvm_run->s.regs.crs[i] = env->cregs[i];
}
cs->kvm_run->kvm_dirty_regs |= KVM_SYNC_ACRS;
cs->kvm_run->kvm_dirty_regs |= KVM_SYNC_CRS;
} else {
for (i = 0; i < 16; i++) {
sregs.acrs[i] = env->aregs[i];
sregs.crs[i] = env->cregs[i];
}
r = kvm_vcpu_ioctl(cs, KVM_SET_SREGS, &sregs);
if (r < 0) {
return r;
}
}

if (can_sync_regs(cs, KVM_SYNC_GSCB)) {
memcpy(cs->kvm_run->s.regs.gscb, env->gscb, 32);
cs->kvm_run->kvm_dirty_regs |= KVM_SYNC_GSCB;
Expand All @@ -607,60 +596,28 @@ int kvm_arch_put_registers(CPUState *cs, int level)
cs->kvm_run->kvm_dirty_regs |= KVM_SYNC_DIAG318;
}

/* Finally the prefix */
if (can_sync_regs(cs, KVM_SYNC_PREFIX)) {
cs->kvm_run->s.regs.prefix = env->psa;
cs->kvm_run->kvm_dirty_regs |= KVM_SYNC_PREFIX;
} else {
/* prefix is only supported via sync regs */
}
return 0;
}

int kvm_arch_get_registers(CPUState *cs)
{
S390CPU *cpu = S390_CPU(cs);
CPUS390XState *env = &cpu->env;
struct kvm_sregs sregs;
struct kvm_regs regs;
struct kvm_fpu fpu;
int i, r;

/* get the PSW */
env->psw.addr = cs->kvm_run->psw_addr;
env->psw.mask = cs->kvm_run->psw_mask;

/* the GPRS */
if (can_sync_regs(cs, KVM_SYNC_GPRS)) {
for (i = 0; i < 16; i++) {
env->regs[i] = cs->kvm_run->s.regs.gprs[i];
}
} else {
r = kvm_vcpu_ioctl(cs, KVM_GET_REGS, &regs);
if (r < 0) {
return r;
}
for (i = 0; i < 16; i++) {
env->regs[i] = regs.gprs[i];
}
}
/* the GPRS, ACRS and CRS */
g_assert(can_sync_regs(cs, KVM_SYNC_REQUIRED_REGS));
memcpy(env->regs, cs->kvm_run->s.regs.gprs, sizeof(env->regs));
memcpy(env->aregs, cs->kvm_run->s.regs.acrs, sizeof(env->aregs));
memcpy(env->cregs, cs->kvm_run->s.regs.crs, sizeof(env->cregs));

/* The ACRS and CRS */
if (can_sync_regs(cs, KVM_SYNC_ACRS | KVM_SYNC_CRS)) {
for (i = 0; i < 16; i++) {
env->aregs[i] = cs->kvm_run->s.regs.acrs[i];
env->cregs[i] = cs->kvm_run->s.regs.crs[i];
}
} else {
r = kvm_vcpu_ioctl(cs, KVM_GET_SREGS, &sregs);
if (r < 0) {
return r;
}
for (i = 0; i < 16; i++) {
env->aregs[i] = sregs.acrs[i];
env->cregs[i] = sregs.crs[i];
}
}
/* The prefix */
env->psa = cs->kvm_run->s.regs.prefix;

/* Floating point and vector registers */
if (can_sync_regs(cs, KVM_SYNC_VRS)) {
Expand All @@ -685,11 +642,6 @@ int kvm_arch_get_registers(CPUState *cs)
env->fpc = fpu.fpc;
}

/* The prefix */
if (can_sync_regs(cs, KVM_SYNC_PREFIX)) {
env->psa = cs->kvm_run->s.regs.prefix;
}

if (can_sync_regs(cs, KVM_SYNC_ARCH0)) {
env->cputm = cs->kvm_run->s.regs.cputm;
env->ckc = cs->kvm_run->s.regs.ckc;
Expand Down Expand Up @@ -1457,6 +1409,13 @@ static int kvm_mpcifc_service_call(S390CPU *cpu, struct kvm_run *run)
}
}

static void kvm_handle_ptf(S390CPU *cpu, struct kvm_run *run)
{
uint8_t r1 = (run->s390_sieic.ipb >> 20) & 0x0f;

s390_handle_ptf(cpu, r1, RA_IGNORED);
}

static int handle_b9(S390CPU *cpu, struct kvm_run *run, uint8_t ipa1)
{
int r = 0;
Expand All @@ -1474,6 +1433,9 @@ static int handle_b9(S390CPU *cpu, struct kvm_run *run, uint8_t ipa1)
case PRIV_B9_RPCIT:
r = kvm_rpcit_service_call(cpu, run);
break;
case PRIV_B9_PTF:
kvm_handle_ptf(cpu, run);
break;
case PRIV_B9_EQBS:
/* just inject exception */
r = -1;
Expand Down Expand Up @@ -1911,9 +1873,12 @@ static int handle_stsi(S390CPU *cpu)
if (run->s390_stsi.sel1 != 2 || run->s390_stsi.sel2 != 2) {
return 0;
}
/* Only sysib 3.2.2 needs post-handling for now. */
insert_stsi_3_2_2(cpu, run->s390_stsi.addr, run->s390_stsi.ar);
return 0;
case 15:
insert_stsi_15_1_x(cpu, run->s390_stsi.sel2, run->s390_stsi.addr,
run->s390_stsi.ar, RA_IGNORED);
return 0;
default:
return 0;
}
Expand Down Expand Up @@ -2495,6 +2460,14 @@ void kvm_s390_get_host_cpu_model(S390CPUModel *model, Error **errp)
set_bit(S390_FEAT_UNPACK, model->features);
}

/*
* If we have kernel support for CPU Topology indicate the
* configuration-topology facility.
*/
if (kvm_check_extension(kvm_state, KVM_CAP_S390_CPU_TOPOLOGY)) {
set_bit(S390_FEAT_CONFIGURATION_TOPOLOGY, model->features);
}

/* We emulate a zPCI bus and AEN, therefore we don't need HW support */
set_bit(S390_FEAT_ZPCI, model->features);
set_bit(S390_FEAT_ADAPTER_EVENT_NOTIFICATION, model->features);
Expand Down Expand Up @@ -2661,6 +2634,23 @@ int kvm_s390_get_zpci_op(void)
return cap_zpci_op;
}

int kvm_s390_topology_set_mtcr(uint64_t attr)
{
struct kvm_device_attr attribute = {
.group = KVM_S390_VM_CPU_TOPOLOGY,
.attr = attr,
};

if (!s390_has_feat(S390_FEAT_CONFIGURATION_TOPOLOGY)) {
return 0;
}
if (!kvm_vm_check_attr(kvm_state, KVM_S390_VM_CPU_TOPOLOGY, attr)) {
return -ENOTSUP;
}

return kvm_vm_ioctl(kvm_state, KVM_SET_DEVICE_ATTR, &attribute);
}

void kvm_arch_accel_class_init(ObjectClass *oc)
{
}
1 change: 1 addition & 0 deletions target/s390x/kvm/kvm_s390x.h
Expand Up @@ -47,5 +47,6 @@ void kvm_s390_crypto_reset(void);
void kvm_s390_restart_interrupt(S390CPU *cpu);
void kvm_s390_stop_interrupt(S390CPU *cpu);
void kvm_s390_set_diag318(CPUState *cs, uint64_t diag318_info);
int kvm_s390_topology_set_mtcr(uint64_t attr);

#endif /* KVM_S390X_H */
3 changes: 2 additions & 1 deletion target/s390x/kvm/meson.build
@@ -1,7 +1,8 @@

s390x_ss.add(when: 'CONFIG_KVM', if_true: files(
'pv.c',
'kvm.c'
'kvm.c',
'stsi-topology.c'
), if_false: files(
'stubs.c'
))
Expand Down
334 changes: 334 additions & 0 deletions target/s390x/kvm/stsi-topology.c
@@ -0,0 +1,334 @@
/* SPDX-License-Identifier: GPL-2.0-or-later */
/*
* QEMU S390x CPU Topology
*
* Copyright IBM Corp. 2022, 2023
* Author(s): Pierre Morel <pmorel@linux.ibm.com>
*
*/
#include "qemu/osdep.h"
#include "cpu.h"
#include "hw/s390x/sclp.h"
#include "hw/s390x/cpu-topology.h"

QEMU_BUILD_BUG_ON(S390_CPU_ENTITLEMENT_LOW != 1);
QEMU_BUILD_BUG_ON(S390_CPU_ENTITLEMENT_MEDIUM != 2);
QEMU_BUILD_BUG_ON(S390_CPU_ENTITLEMENT_HIGH != 3);

/**
* fill_container:
* @p: The address of the container TLE to fill
* @level: The level of nesting for this container
* @id: The container receives a unique ID inside its own container
*
* Returns the next free TLE entry.
*/
static char *fill_container(char *p, int level, int id)
{
SYSIBContainerListEntry *tle = (SYSIBContainerListEntry *)p;

tle->nl = level;
tle->id = id;
return p + sizeof(*tle);
}

/**
* fill_tle_cpu:
* @p: The address of the CPU TLE to fill
* @entry: a pointer to the S390TopologyEntry defining this
* CPU container.
*
* Returns the next free TLE entry.
*/
static char *fill_tle_cpu(char *p, S390TopologyEntry *entry)
{
SysIBCPUListEntry *tle = (SysIBCPUListEntry *)p;
S390TopologyId topology_id = entry->id;

tle->nl = 0;
tle->flags = 0;
if (topology_id.vertical) {
tle->flags |= topology_id.entitlement;
}
if (topology_id.dedicated) {
tle->flags |= SYSIB_TLE_DEDICATED;
}
tle->type = topology_id.type;
tle->origin = cpu_to_be16(topology_id.origin * 64);
tle->mask = cpu_to_be64(entry->mask);
return p + sizeof(*tle);
}

/*
* Macro to check that the size of data after increment
* will not get bigger than the size of the SysIB.
*/
#define SYSIB_GUARD(data, x) do { \
data += x; \
if (data > sizeof(SysIB)) { \
return 0; \
} \
} while (0)

/**
* stsi_topology_fill_sysib:
* @p: A pointer to the position of the first TLE
* @level: The nested level wanted by the guest
*
* Fill the SYSIB with the topology information as described in
* the PoP, nesting containers as appropriate, with the maximum
* nesting limited by @level.
*
* Return value:
* On success: the size of the SysIB_15x after being filled with TLE.
* On error: 0 in the case we would overrun the end of the SysIB.
*/
static int stsi_topology_fill_sysib(S390TopologyList *topology_list,
char *p, int level)
{
S390TopologyEntry *entry;
int last_drawer = -1;
int last_book = -1;
int last_socket = -1;
int drawer_id = 0;
int book_id = 0;
int socket_id = 0;
int n = sizeof(SysIB_151x);

QTAILQ_FOREACH(entry, topology_list, next) {
bool drawer_change = last_drawer != entry->id.drawer;
bool book_change = drawer_change || last_book != entry->id.book;
bool socket_change = book_change || last_socket != entry->id.socket;

if (level > 3 && drawer_change) {
SYSIB_GUARD(n, sizeof(SYSIBContainerListEntry));
p = fill_container(p, 3, drawer_id++);
book_id = 0;
}
if (level > 2 && book_change) {
SYSIB_GUARD(n, sizeof(SYSIBContainerListEntry));
p = fill_container(p, 2, book_id++);
socket_id = 0;
}
if (socket_change) {
SYSIB_GUARD(n, sizeof(SYSIBContainerListEntry));
p = fill_container(p, 1, socket_id++);
}

SYSIB_GUARD(n, sizeof(SysIBCPUListEntry));
p = fill_tle_cpu(p, entry);
last_drawer = entry->id.drawer;
last_book = entry->id.book;
last_socket = entry->id.socket;
}

return n;
}

/**
* setup_stsi:
* @topology_list: ordered list of groups of CPUs with same properties
* @sysib: pointer to a SysIB to be filled with SysIB_151x data
* @level: Nested level specified by the guest
*
* Setup the SYSIB for STSI 15.1, the header as well as the description
* of the topology.
*/
static int setup_stsi(S390TopologyList *topology_list, SysIB_151x *sysib,
int level)
{
sysib->mnest = level;
switch (level) {
case 4:
sysib->mag[S390_TOPOLOGY_MAG4] = current_machine->smp.drawers;
sysib->mag[S390_TOPOLOGY_MAG3] = current_machine->smp.books;
sysib->mag[S390_TOPOLOGY_MAG2] = current_machine->smp.sockets;
sysib->mag[S390_TOPOLOGY_MAG1] = current_machine->smp.cores;
break;
case 3:
sysib->mag[S390_TOPOLOGY_MAG3] = current_machine->smp.drawers *
current_machine->smp.books;
sysib->mag[S390_TOPOLOGY_MAG2] = current_machine->smp.sockets;
sysib->mag[S390_TOPOLOGY_MAG1] = current_machine->smp.cores;
break;
case 2:
sysib->mag[S390_TOPOLOGY_MAG2] = current_machine->smp.drawers *
current_machine->smp.books *
current_machine->smp.sockets;
sysib->mag[S390_TOPOLOGY_MAG1] = current_machine->smp.cores;
break;
}

return stsi_topology_fill_sysib(topology_list, sysib->tle, level);
}

/**
* s390_topology_add_cpu_to_entry:
* @entry: Topology entry to setup
* @cpu: the S390CPU to add
*
* Set the core bit inside the topology mask.
*/
static void s390_topology_add_cpu_to_entry(S390TopologyEntry *entry,
S390CPU *cpu)
{
set_bit(63 - (cpu->env.core_id % 64), &entry->mask);
}

/**
* s390_topology_from_cpu:
* @cpu: S390CPU to calculate the topology id
*
* Initialize the topology id from the CPU environment.
*/
static S390TopologyId s390_topology_from_cpu(S390CPU *cpu)
{
S390TopologyId topology_id = {
.drawer = cpu->env.drawer_id,
.book = cpu->env.book_id,
.socket = cpu->env.socket_id,
.type = S390_TOPOLOGY_CPU_IFL,
.vertical = s390_topology.polarization == S390_CPU_POLARIZATION_VERTICAL,
.entitlement = cpu->env.entitlement,
.dedicated = cpu->env.dedicated,
.origin = cpu->env.core_id / 64,
};

return topology_id;
}

/**
* s390_topology_id_cmp:
* @l: first S390TopologyId
* @r: second S390TopologyId
*
* Compare two topology ids according to the sorting order specified by the PoP.
*
* Returns a negative number if the first id is less than, 0 if it is equal to
* and positive if it is larger than the second id.
*/
static int s390_topology_id_cmp(const S390TopologyId *l,
const S390TopologyId *r)
{
/*
* lexical order, compare less significant values only if more significant
* ones are equal
*/
return l->sentinel - r->sentinel ?:
l->drawer - r->drawer ?:
l->book - r->book ?:
l->socket - r->socket ?:
l->type - r->type ?:
/* logic is inverted for the next three */
r->vertical - l->vertical ?:
r->entitlement - l->entitlement ?:
r->dedicated - l->dedicated ?:
l->origin - r->origin;
}

static bool s390_topology_id_eq(const S390TopologyId *l,
const S390TopologyId *r)
{
return !s390_topology_id_cmp(l, r);
}

static bool s390_topology_id_lt(const S390TopologyId *l,
const S390TopologyId *r)
{
return s390_topology_id_cmp(l, r) < 0;
}

/**
* s390_topology_fill_list_sorted:
* @topology_list: list to fill
*
* Create S390TopologyEntrys as appropriate from all CPUs and fill the
* topology_list with the entries according to the order specified by the PoP.
*/
static void s390_topology_fill_list_sorted(S390TopologyList *topology_list)
{
CPUState *cs;
S390TopologyEntry sentinel = { .id.sentinel = 1 };

QTAILQ_INIT(topology_list);

QTAILQ_INSERT_HEAD(topology_list, &sentinel, next);

CPU_FOREACH(cs) {
S390TopologyId id = s390_topology_from_cpu(S390_CPU(cs));
S390TopologyEntry *entry = NULL, *tmp;

QTAILQ_FOREACH(tmp, topology_list, next) {
if (s390_topology_id_eq(&id, &tmp->id)) {
entry = tmp;
break;
} else if (s390_topology_id_lt(&id, &tmp->id)) {
entry = g_malloc0(sizeof(*entry));
entry->id = id;
QTAILQ_INSERT_BEFORE(tmp, entry, next);
break;
}
}
assert(entry);
s390_topology_add_cpu_to_entry(entry, S390_CPU(cs));
}

QTAILQ_REMOVE(topology_list, &sentinel, next);
}

/**
* s390_topology_empty_list:
*
* Clear all entries in the S390Topology list.
*/
static void s390_topology_empty_list(S390TopologyList *topology_list)
{
S390TopologyEntry *entry = NULL;
S390TopologyEntry *tmp = NULL;

QTAILQ_FOREACH_SAFE(entry, topology_list, next, tmp) {
QTAILQ_REMOVE(topology_list, entry, next);
g_free(entry);
}
}

/**
* insert_stsi_15_1_x:
* @cpu: the CPU doing the call for which we set CC
* @sel2: the selector 2, containing the nested level
* @addr: Guest logical address of the guest SysIB
* @ar: the access register number
* @ra: the return address
*
* Emulate STSI 15.1.x, that is, perform all necessary checks and
* fill the SYSIB.
* In case the topology description is too long to fit into the SYSIB,
* set CC=3 and abort without writing the SYSIB.
*/
void insert_stsi_15_1_x(S390CPU *cpu, int sel2, uint64_t addr, uint8_t ar, uintptr_t ra)
{
S390TopologyList topology_list;
SysIB sysib = {0};
int length;

if (!s390_has_topology() || sel2 < 2 || sel2 > SCLP_READ_SCP_INFO_MNEST) {
setcc(cpu, 3);
return;
}

s390_topology_fill_list_sorted(&topology_list);
length = setup_stsi(&topology_list, &sysib.sysib_151x, sel2);
s390_topology_empty_list(&topology_list);

if (!length) {
setcc(cpu, 3);
return;
}

sysib.sysib_151x.length = cpu_to_be16(length);
if (!s390_cpu_virt_mem_write(cpu, addr, ar, &sysib, length)) {
setcc(cpu, 0);
} else {
s390_cpu_virt_mem_handle_exc(cpu, ra);
}
}
439 changes: 439 additions & 0 deletions tests/avocado/s390_topology.py

Large diffs are not rendered by default.

4 changes: 3 additions & 1 deletion tests/qtest/migration-test.c
Expand Up @@ -3034,7 +3034,9 @@ int main(int argc, char **argv)

qtest_add_func("/migration/bad_dest", test_baddest);
#ifndef _WIN32
qtest_add_func("/migration/analyze-script", test_analyze_script);
if (!g_str_equal(arch, "s390x")) {
qtest_add_func("/migration/analyze-script", test_analyze_script);
}
#endif
qtest_add_func("/migration/precopy/unix/plain", test_precopy_unix_plain);
qtest_add_func("/migration/precopy/unix/xbzrle", test_precopy_unix_xbzrle);
Expand Down