Skip to content

Commit

Permalink
spapr: Implement Open Firmware client interface
Browse files Browse the repository at this point in the history
The PAPR platform describes an OS environment that's presented by
a combination of a hypervisor and firmware. The features it specifies
require collaboration between the firmware and the hypervisor.

Since the beginning, the runtime component of the firmware (RTAS) has
been implemented as a 20 byte shim which simply forwards it to
a hypercall implemented in qemu. The boot time firmware component is
SLOF - but a build that's specific to qemu, and has always needed to be
updated in sync with it. Even though we've managed to limit the amount
of runtime communication we need between qemu and SLOF, there's some,
and it has become increasingly awkward to handle as we've implemented
new features.

This implements a boot time OF client interface (CI) which is
enabled by a new "x-vof" pseries machine option (stands for "Virtual Open
Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall
which implements Open Firmware Client Interface (OF CI). This allows
using a smaller stateless firmware which does not have to manage
the device tree.

The new "vof.bin" firmware image is included with source code under
pc-bios/. It also includes RTAS blob.

This implements a handful of CI methods just to get -kernel/-initrd
working. In particular, this implements the device tree fetching and
simple memory allocator - "claim" (an OF CI memory allocator) and updates
"/memory@0/available" to report the client about available memory.

This implements changing some device tree properties which we know how
to deal with, the rest is ignored. To allow changes, this skips
fdt_pack() when x-vof=on as not packing the blob leaves some room for
appending.

In absence of SLOF, this assigns phandles to device tree nodes to make
device tree traversing work.

When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree.

This adds basic instances support which are managed by a hash map
ihandle -> [phandle].

Before the guest started, the used memory is:
0..e60 - the initial firmware
8000..10000 - stack
400000.. - kernel
3ea0000.. - initramdisk

This OF CI does not implement "interpret".

Unlike SLOF, this does not format uninitialized nvram. Instead, this
includes a disk image with pre-formatted nvram.

With this basic support, this can only boot into kernel directly.
However this is just enough for the petitboot kernel and initradmdisk to
boot from any possible source. Note this requires reasonably recent guest
kernel with:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735

The immediate benefit is much faster booting time which especially
crucial with fully emulated early CPU bring up environments. Also this
may come handy when/if GRUB-in-the-userspace sees light of the day.

This separates VOF and sPAPR in a hope that VOF bits may be reused by
other POWERPC boards which do not support pSeries.

This assumes potential support for booting from QEMU backends
such as blockdev or netdev without devices/drivers used.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20210625055155.2252896-1-aik@ozlabs.ru>
Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
[dwg: Adjusted some includes which broke compile in some more obscure
 compilation setups]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
  • Loading branch information
aik authored and dgibson committed Jul 9, 2021
1 parent ea41397 commit fc8c745
Show file tree
Hide file tree
Showing 22 changed files with 1,801 additions and 13 deletions.
12 changes: 12 additions & 0 deletions MAINTAINERS
Expand Up @@ -1360,6 +1360,18 @@ F: hw/pci-host/mv64361.c
F: hw/pci-host/mv643xx.h
F: include/hw/pci-host/mv64361.h

Virtual Open Firmware (VOF)
M: Alexey Kardashevskiy <aik@ozlabs.ru>
M: David Gibson <david@gibson.dropbear.id.au>
M: Greg Kurz <groug@kaod.org>
L: qemu-ppc@nongnu.org
S: Maintained
F: hw/ppc/spapr_vof*
F: hw/ppc/vof*
F: include/hw/ppc/vof*
F: pc-bios/vof/*
F: pc-bios/vof*

RISC-V Machines
---------------
OpenTitan
Expand Down
4 changes: 4 additions & 0 deletions hw/ppc/Kconfig
Expand Up @@ -13,6 +13,7 @@ config PSERIES
select MSI_NONBROKEN
select FDT_PPC
select CHRP_NVRAM
select VOF

config SPAPR_RNG
bool
Expand Down Expand Up @@ -144,3 +145,6 @@ config FW_CFG_PPC

config FDT_PPC
bool

config VOF
bool
3 changes: 3 additions & 0 deletions hw/ppc/meson.build
Expand Up @@ -84,4 +84,7 @@ ppc_ss.add(when: 'CONFIG_VIRTEX', if_true: files('virtex_ml507.c'))
# Pegasos2
ppc_ss.add(when: 'CONFIG_PEGASOS2', if_true: files('pegasos2.c'))

ppc_ss.add(when: 'CONFIG_VOF', if_true: files('vof.c'))
ppc_ss.add(when: ['CONFIG_VOF', 'CONFIG_PSERIES'], if_true: files('spapr_vof.c'))

hw_arch += {'ppc': ppc_ss}
67 changes: 60 additions & 7 deletions hw/ppc/spapr.c
Expand Up @@ -101,6 +101,7 @@
#define FDT_MAX_ADDR 0x80000000 /* FDT must stay below that */
#define FW_MAX_SIZE 0x400000
#define FW_FILE_NAME "slof.bin"
#define FW_FILE_NAME_VOF "vof.bin"
#define FW_OVERHEAD 0x2800000
#define KERNEL_LOAD_ADDR FW_MAX_SIZE

Expand Down Expand Up @@ -1643,22 +1644,37 @@ static void spapr_machine_reset(MachineState *machine)
fdt_addr = MIN(spapr->rma_size, FDT_MAX_ADDR) - FDT_MAX_SIZE;

fdt = spapr_build_fdt(spapr, true, FDT_MAX_SIZE);
if (spapr->vof) {
target_ulong stack_ptr = 0;

rc = fdt_pack(fdt);
spapr_vof_reset(spapr, fdt, &stack_ptr, &error_fatal);

/* Should only fail if we've built a corrupted tree */
assert(rc == 0);
spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT,
stack_ptr, spapr->initrd_base,
spapr->initrd_size);
/* VOF is 32bit BE so enforce MSR here */
first_ppc_cpu->env.msr &= ~((1ULL << MSR_SF) | (1ULL << MSR_LE));
/*
* Do not pack the FDT as the client may change properties.
* VOF client does not expect the FDT so we do not load it to the VM.
*/
} else {
rc = fdt_pack(fdt);
/* Should only fail if we've built a corrupted tree */
assert(rc == 0);

/* Load the fdt */
spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT,
0, fdt_addr, 0);
cpu_physical_memory_write(fdt_addr, fdt, fdt_totalsize(fdt));
}
qemu_fdt_dumpdtb(fdt, fdt_totalsize(fdt));
cpu_physical_memory_write(fdt_addr, fdt, fdt_totalsize(fdt));

g_free(spapr->fdt_blob);
spapr->fdt_size = fdt_totalsize(fdt);
spapr->fdt_initial_size = spapr->fdt_size;
spapr->fdt_blob = fdt;

/* Set up the entry state */
spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT, 0, fdt_addr, 0);
first_ppc_cpu->env.gpr[5] = 0;

spapr->fwnmi_system_reset_addr = -1;
Expand Down Expand Up @@ -2661,7 +2677,8 @@ static void spapr_machine_init(MachineState *machine)
SpaprMachineState *spapr = SPAPR_MACHINE(machine);
SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(machine);
MachineClass *mc = MACHINE_GET_CLASS(machine);
const char *bios_name = machine->firmware ?: FW_FILE_NAME;
const char *bios_default = spapr->vof ? FW_FILE_NAME_VOF : FW_FILE_NAME;
const char *bios_name = machine->firmware ?: bios_default;
const char *kernel_filename = machine->kernel_filename;
const char *initrd_filename = machine->initrd_filename;
PCIHostState *phb;
Expand Down Expand Up @@ -3018,6 +3035,10 @@ static void spapr_machine_init(MachineState *machine)
}

qemu_cond_init(&spapr->fwnmi_machine_check_interlock_cond);
if (spapr->vof) {
spapr->vof->fw_size = fw_size; /* for claim() on itself */
spapr_register_hypercall(KVMPPC_H_VOF_CLIENT, spapr_h_vof_client);
}
}

#define DEFAULT_KVM_TYPE "auto"
Expand Down Expand Up @@ -3208,6 +3229,28 @@ static void spapr_set_resize_hpt(Object *obj, const char *value, Error **errp)
}
}

static bool spapr_get_vof(Object *obj, Error **errp)
{
SpaprMachineState *spapr = SPAPR_MACHINE(obj);

return spapr->vof != NULL;
}

static void spapr_set_vof(Object *obj, bool value, Error **errp)
{
SpaprMachineState *spapr = SPAPR_MACHINE(obj);

if (spapr->vof) {
vof_cleanup(spapr->vof);
g_free(spapr->vof);
spapr->vof = NULL;
}
if (!value) {
return;
}
spapr->vof = g_malloc0(sizeof(*spapr->vof));
}

static char *spapr_get_ic_mode(Object *obj, Error **errp)
{
SpaprMachineState *spapr = SPAPR_MACHINE(obj);
Expand Down Expand Up @@ -3333,6 +3376,11 @@ static void spapr_instance_init(Object *obj)
stringify(KERNEL_LOAD_ADDR)
" for -kernel is the default");
spapr->kernel_addr = KERNEL_LOAD_ADDR;

object_property_add_bool(obj, "x-vof", spapr_get_vof, spapr_set_vof);
object_property_set_description(obj, "x-vof",
"Enable Virtual Open Firmware (experimental)");

/* The machine class defines the default interrupt controller mode */
spapr->irq = smc->irq;
object_property_add_str(obj, "ic-mode", spapr_get_ic_mode,
Expand Down Expand Up @@ -4496,6 +4544,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
XICSFabricClass *xic = XICS_FABRIC_CLASS(oc);
InterruptStatsProviderClass *ispc = INTERRUPT_STATS_PROVIDER_CLASS(oc);
XiveFabricClass *xfc = XIVE_FABRIC_CLASS(oc);
VofMachineIfClass *vmc = VOF_MACHINE_CLASS(oc);

mc->desc = "pSeries Logical Partition (PAPR compliant)";
mc->ignore_boot_device_suffixes = true;
Expand Down Expand Up @@ -4584,6 +4633,9 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
smc->smp_threads_vsmt = true;
smc->nr_xirqs = SPAPR_NR_XIRQS;
xfc->match_nvt = spapr_match_nvt;
vmc->client_architecture_support = spapr_vof_client_architecture_support;
vmc->quiesce = spapr_vof_quiesce;
vmc->setprop = spapr_vof_setprop;
}

static const TypeInfo spapr_machine_info = {
Expand All @@ -4603,6 +4655,7 @@ static const TypeInfo spapr_machine_info = {
{ TYPE_XICS_FABRIC },
{ TYPE_INTERRUPT_STATS_PROVIDER },
{ TYPE_XIVE_FABRIC },
{ TYPE_VOF_MACHINE_IF },
{ }
},
};
Expand Down
25 changes: 22 additions & 3 deletions hw/ppc/spapr_hcall.c
Expand Up @@ -1080,7 +1080,7 @@ target_ulong do_client_architecture_support(PowerPCCPU *cpu,
SpaprOptionVector *ov1_guest, *ov5_guest;
bool guest_radix;
bool raw_mode_supported = false;
bool guest_xive;
bool guest_xive, reset_fdt = false;
CPUState *cs;
void *fdt;
uint32_t max_compat = spapr->max_compat_pvr;
Expand Down Expand Up @@ -1233,8 +1233,8 @@ target_ulong do_client_architecture_support(PowerPCCPU *cpu,
spapr_setup_hpt(spapr);
}

fdt = spapr_build_fdt(spapr, false, fdt_bufsize);

reset_fdt = spapr->vof != NULL;
fdt = spapr_build_fdt(spapr, reset_fdt, fdt_bufsize);
g_free(spapr->fdt_blob);
spapr->fdt_size = fdt_totalsize(fdt);
spapr->fdt_initial_size = spapr->fdt_size;
Expand Down Expand Up @@ -1277,6 +1277,25 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
return ret;
}

target_ulong spapr_vof_client_architecture_support(MachineState *ms,
CPUState *cs,
target_ulong ovec_addr)
{
SpaprMachineState *spapr = SPAPR_MACHINE(ms);

target_ulong ret = do_client_architecture_support(POWERPC_CPU(cs), spapr,
ovec_addr, FDT_MAX_SIZE);

/*
* This adds stdout and generates phandles for boottime and CAS FDTs.
* It is alright to update the FDT here as do_client_architecture_support()
* does not pack it.
*/
spapr_vof_client_dt_finalize(spapr, spapr->fdt_blob);

return ret;
}

static target_ulong h_get_cpu_characteristics(PowerPCCPU *cpu,
SpaprMachineState *spapr,
target_ulong opcode,
Expand Down
153 changes: 153 additions & 0 deletions hw/ppc/spapr_vof.c
@@ -0,0 +1,153 @@
/*
* SPAPR machine hooks to Virtual Open Firmware,
*
* SPDX-License-Identifier: GPL-2.0-or-later
*/
#include "qemu/osdep.h"
#include "qemu-common.h"
#include "qapi/error.h"
#include "hw/ppc/spapr.h"
#include "hw/ppc/spapr_vio.h"
#include "hw/ppc/fdt.h"
#include "hw/ppc/vof.h"
#include "sysemu/sysemu.h"
#include "qom/qom-qobject.h"
#include "trace.h"

target_ulong spapr_h_vof_client(PowerPCCPU *cpu, SpaprMachineState *spapr,
target_ulong opcode, target_ulong *_args)
{
int ret = vof_client_call(MACHINE(spapr), spapr->vof, spapr->fdt_blob,
ppc64_phys_to_real(_args[0]));

if (ret) {
return H_PARAMETER;
}
return H_SUCCESS;
}

void spapr_vof_client_dt_finalize(SpaprMachineState *spapr, void *fdt)
{
char *stdout_path = spapr_vio_stdout_path(spapr->vio_bus);
int chosen;

vof_build_dt(fdt, spapr->vof);

_FDT(chosen = fdt_path_offset(fdt, "/chosen"));
_FDT(fdt_setprop_string(fdt, chosen, "bootargs",
spapr->vof->bootargs ? : ""));

/*
* SLOF-less setup requires an open instance of stdout for early
* kernel printk. By now all phandles are settled so we can open
* the default serial console.
*/
if (stdout_path) {
_FDT(vof_client_open_store(fdt, spapr->vof, "/chosen", "stdout",
stdout_path));
}
}

void spapr_vof_reset(SpaprMachineState *spapr, void *fdt,
target_ulong *stack_ptr, Error **errp)
{
Vof *vof = spapr->vof;

vof_init(vof, spapr->rma_size, errp);

*stack_ptr = vof_claim(vof, 0, VOF_STACK_SIZE, VOF_STACK_SIZE);
if (*stack_ptr == -1) {
error_setg(errp, "Memory allocation for stack failed");
return;
}
/* Stack grows downwards plus reserve space for the minimum stack frame */
*stack_ptr += VOF_STACK_SIZE - 0x20;

if (spapr->kernel_size &&
vof_claim(vof, spapr->kernel_addr, spapr->kernel_size, 0) == -1) {
error_setg(errp, "Memory for kernel is in use");
return;
}

if (spapr->initrd_size &&
vof_claim(vof, spapr->initrd_base, spapr->initrd_size, 0) == -1) {
error_setg(errp, "Memory for initramdisk is in use");
return;
}

spapr_vof_client_dt_finalize(spapr, fdt);

/*
* At this point the expected allocation map is:
*
* 0..c38 - the initial firmware
* 8000..10000 - stack
* 400000.. - kernel
* 3ea0000.. - initramdisk
*
* We skip writing FDT as nothing expects it; OF client interface is
* going to be used for reading the device tree.
*/
}

void spapr_vof_quiesce(MachineState *ms)
{
SpaprMachineState *spapr = SPAPR_MACHINE(ms);

spapr->fdt_size = fdt_totalsize(spapr->fdt_blob);
spapr->fdt_initial_size = spapr->fdt_size;
}

bool spapr_vof_setprop(MachineState *ms, const char *path, const char *propname,
void *val, int vallen)
{
SpaprMachineState *spapr = SPAPR_MACHINE(ms);

/*
* We only allow changing properties which we know how to update in QEMU
* OR
* the ones which we know that they need to survive during "quiesce".
*/

if (strcmp(path, "/rtas") == 0) {
if (strcmp(propname, "linux,rtas-base") == 0 ||
strcmp(propname, "linux,rtas-entry") == 0) {
/* These need to survive quiesce so let them store in the FDT */
return true;
}
}

if (strcmp(path, "/chosen") == 0) {
if (strcmp(propname, "bootargs") == 0) {
Vof *vof = spapr->vof;

g_free(vof->bootargs);
vof->bootargs = g_strndup(val, vallen);
return true;
}
if (strcmp(propname, "linux,initrd-start") == 0) {
if (vallen == sizeof(uint32_t)) {
spapr->initrd_base = ldl_be_p(val);
return true;
}
if (vallen == sizeof(uint64_t)) {
spapr->initrd_base = ldq_be_p(val);
return true;
}
return false;
}
if (strcmp(propname, "linux,initrd-end") == 0) {
if (vallen == sizeof(uint32_t)) {
spapr->initrd_size = ldl_be_p(val) - spapr->initrd_base;
return true;
}
if (vallen == sizeof(uint64_t)) {
spapr->initrd_size = ldq_be_p(val) - spapr->initrd_base;
return true;
}
return false;
}
}

return true;
}

0 comments on commit fc8c745

Please sign in to comment.