Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Cargo.lock
Original file line number Diff line number Diff line change
Expand Up @@ -5178,6 +5178,7 @@ dependencies = [
"chipset_device_resources",
"chipset_device_worker",
"chipset_legacy",
"crc32fast",
Comment thread
chris-oo marked this conversation as resolved.
"debug_ptr",
"disk_backend",
"fdt",
Expand Down Expand Up @@ -5226,6 +5227,7 @@ dependencies = [
"thiserror 2.0.16",
"tracing",
"uefi_nvram_storage",
"uefi_specs",
"virt",
"virt_hvf",
"virt_kvm",
Expand Down
8 changes: 4 additions & 4 deletions Guide/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,11 +86,11 @@
- [Developer Features]()
- [Hardware Debugging (gdbstub)](./reference/dev_feats/gdbstub.md)
- [Kernel Debugging (KDNET)](./reference/dev_feats/kdnet.md)
- [Firmware and Boot Modes](./reference/devices/firmware/overview.md)
- [UEFI: mu_msvm](./reference/devices/firmware/mu_msvm_uefi.md)
- [BIOS: Hyper-V PCAT BIOS](./reference/devices/firmware/pcat_bios.md)
- [Linux Direct](./reference/devices/firmware/linux_direct.md)
- [Devices]()
- [Firmware]()
- [UEFI: mu_msvm](./reference/devices/firmware/mu_msvm_uefi.md)
- [BIOS: Hyper-V PCAT BIOS](./reference/devices/firmware/pcat_bios.md)
- [Linux Direct]()
- [Virtio]()
- [virtio-fs]()
- [virtio-9p]()
Expand Down
100 changes: 100 additions & 0 deletions Guide/src/reference/devices/firmware/linux_direct.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Linux Direct Boot

Linux direct boot allows OpenVMM to load a Linux kernel directly into guest
memory without UEFI or BIOS firmware. The VMM itself acts as the bootloader:
it parses the kernel image, places the initrd, constructs the necessary boot
metadata, sets the initial register state, and starts execution at the kernel
entry point.

This is the fastest path from "run" to a Linux userspace prompt, and is
useful for lightweight testing and development scenarios.

## Architecture Support

| Architecture | Supported | Kernel format | Boot protocol |
|-------------|-----------|---------------|---------------|
| x86_64 | Yes | Uncompressed ELF (`vmlinux`) | Linux boot protocol (zero page) |
| AArch64 | Yes | ARM64 `Image` (flat binary) | ARM64 Image boot (device tree or ACPI) |

Compressed kernels (bzImage, gzip, etc.) are not supported. On x86_64,
pass the uncompressed `vmlinux` ELF. On AArch64, pass the uncompressed
`Image` file (not `Image.gz`).

## x86_64 Boot Flow

On x86_64, OpenVMM follows the standard Linux boot protocol:

1. The kernel image is loaded at the conventional 1 MB address.
2. An initrd (if provided) is placed after the kernel.
3. A **zero page** is constructed containing the memory map, command line
Comment thread
chris-oo marked this conversation as resolved.
pointer, and initrd location.
4. ACPI tables (MADT, FADT, DSDT, SRAT, etc.) are built by OpenVMM's ACPI
builder and written at `0xE0000`, where the kernel finds the RSDP via
its standard firmware scan.
5. A GDT and initial page tables are set up.
6. The BSP register state is configured and execution begins.

The DSDT includes whatever x86 chipset devices are configured (serial ports,
IOAPIC, PCI bus, VMBus, virtio-mmio, RTC, etc.).

## AArch64 Boot Flow

On AArch64, OpenVMM supports two modes for presenting hardware descriptions to
the kernel, selected by the `--device-tree` CLI flag:

### ACPI Mode (default)

This is the default. The kernel discovers devices through ACPI tables, just as
it would on a server with UEFI firmware.

Since the ARM64 kernel's ACPI code path requires entering through the EFI stub,
OpenVMM synthesizes a minimal set of EFI structures in guest memory:

1. **EFI System Table** — points to a configuration table with the ACPI RSDP
and an RT Properties entry that advertises no runtime services.
2. **EFI Memory Map** — describes the EFI metadata region, ACPI tables, and
conventional RAM.
3. **ACPI Tables** — FADT (with `HW_REDUCED_ACPI`), MADT (GICv3
redistributors, distributor, optional v2m MSI frame), GTDT (virtual timer),
DSDT (VMBus, serial UARTs), and optionally MCFG/SSDT for PCIe.

A **stub device tree** is then built. Unlike a full device tree, it contains
no hardware nodes — no CPUs, GIC, timer, or devices. Its only purpose is a
`/chosen` node with `linux,uefi-system-table` and `linux,uefi-mmap-*`
properties that point the kernel's EFI stub to the synthesized EFI structures.
From there, the kernel follows its standard ACPI discovery path.

```admonish tip title="When to use ACPI mode"
ACPI mode is the default and is recommended when running with the
Hyper-V hypervisor (`--hv`). Device tree mode also supports VMBus
(with recent kernels and hypervisor versions), but ACPI mode provides
broader compatibility.
```

### Device Tree Mode (`--device-tree`)

In this mode, a full device tree is built describing all hardware
directly — CPUs, interrupt controller, timers, serial ports, VMBus,
PCIe bridges, and memory regions. The kernel discovers everything
from the DT; no EFI structures or ACPI tables are involved.

```admonish note
Device tree mode is not supported on x86_64. Passing `--device-tree` on x86
will result in an error.
```

## CLI Usage

```bash
# x86_64 Linux direct boot
openvmm --kernel path/to/vmlinux --initrd path/to/initrd \
--cmdline "console=ttyS0"

# AArch64 ACPI mode (default)
openvmm --kernel path/to/Image --initrd path/to/initrd \
--cmdline "console=ttyAMA0 earlycon"

# AArch64 device tree mode
openvmm --kernel path/to/Image --initrd path/to/initrd \
--cmdline "console=ttyAMA0 earlycon" --device-tree
```
20 changes: 20 additions & 0 deletions Guide/src/reference/devices/firmware/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Firmware and Boot Modes

OpenVMM supports several ways to boot a guest VM, each with different
firmware requirements and guest OS compatibility:

| Boot mode | Architecture | Firmware | Use case |
|-----------|-------------|----------|----------|
| **UEFI** | x86_64, AArch64 | [mu_msvm](./mu_msvm_uefi.md) | Windows, modern Linux, full UEFI environment |
| **PCAT BIOS** | x86_64 | [Hyper-V PCAT BIOS](./pcat_bios.md) | Legacy OS, Gen1-style boot |
| **Linux Direct** | x86_64, AArch64 | None (VMM is the bootloader) | [Fast Linux boot](./linux_direct.md), development, testing |
| **IGVM** | x86_64, AArch64 | Packaged in IGVM file | OpenHCL paravisor, confidential VMs |

The boot mode is selected by which `--kernel`, `--uefi`, `--pcat`, or
`--igvm` flag is passed on the command line (or the equivalent ttrpc
configuration).

```admonish note
Not all boot modes are available on all architectures — see the table
above for supported combinations.
```
49 changes: 30 additions & 19 deletions openhcl/bootloader_fdt_parser/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ use inspect::Inspect;
use loader_defs::shim::MemoryVtlType;
use memory_range::MemoryRange;
use vm_topology::memory::MemoryRangeWithNode;
use vm_topology::processor::aarch64::GicInfo;
use vm_topology::processor::aarch64::Aarch64PlatformConfig;

/// A parsed cpu.
#[derive(Debug, Inspect, Clone, Copy, PartialEq, Eq)]
Expand Down Expand Up @@ -173,10 +173,8 @@ pub struct ParsedBootDtInfo {
#[inspect(iter_by_index)]
pub private_pool_ranges: Vec<MemoryRangeWithNode>,

/// GIC information, on AArch64.
pub gic: Option<GicInfo>,
/// PMU GSIV, on AArch64.
pub pmu_gsiv: Option<u32>,
/// GIC and platform interrupt configuration, on AArch64.
pub gic: Option<Aarch64PlatformConfig>,
}

fn err_to_owned(e: fdt::parser::Error<'_>) -> anyhow::Error {
Expand Down Expand Up @@ -526,17 +524,20 @@ fn parse_memory(node: &Node<'_>) -> anyhow::Result<MemoryRangeWithNode> {
}

/// Parse GIC config
fn parse_gic(node: &Node<'_>) -> anyhow::Result<GicInfo> {
fn parse_gic(node: &Node<'_>) -> anyhow::Result<Aarch64PlatformConfig> {
let reg = property_to_u64_vec(node, "reg")?;

if reg.len() != 4 {
bail!("gic node {} does not have 4 u64s", node.name);
}

Ok(GicInfo {
Ok(Aarch64PlatformConfig {
gic_distributor_base: reg[0],
gic_redistributors_base: reg[2],
gic_v2m: None,
pmu_gsiv: None,
// TODO: parse from the DT timer node instead of hardcoding.
virt_timer_ppi: 20,
})
}

Expand Down Expand Up @@ -662,6 +663,11 @@ impl ParsedBootDtInfo {

vtl2_memory.sort_by_key(|r| r.range.start());

// Merge PMU GSIV into the GIC platform config if both were parsed.
if let (Some(gic), Some(pmu_gsiv)) = (&mut gic, pmu_gsiv) {
gic.pmu_gsiv = Some(pmu_gsiv);
}

Ok(Self {
cpus,
vtl0_mmio,
Expand All @@ -671,7 +677,6 @@ impl ParsedBootDtInfo {
vtl0_alias_map,
accepted_ranges,
gic,
pmu_gsiv,
memory_allocation_mode,
isolation,
vtl2_reserved_range,
Expand Down Expand Up @@ -841,15 +846,20 @@ mod tests {
}

// PMU
if let Some(pmu_gsiv) = info.pmu_gsiv {
assert!((16..32).contains(&pmu_gsiv));
const GIC_PPI: u32 = 1;
const IRQ_TYPE_LEVEL_HIGH: u32 = 4;
root_builder = root_builder
.start_node("pmu")?
.add_str(p_compatible, "arm,armv8-pmuv3")?
.add_u32_array(p_interrupts, &[GIC_PPI, pmu_gsiv - 16, IRQ_TYPE_LEVEL_HIGH])?
.end_node()?;
if let Some(gic) = &info.gic {
if let Some(pmu_gsiv) = gic.pmu_gsiv {
anyhow::ensure!(
(16..32).contains(&pmu_gsiv),
"PMU GSIV {pmu_gsiv} is not a valid PPI (expected 16..32)"
);
const GIC_PPI: u32 = 1;
const IRQ_TYPE_LEVEL_HIGH: u32 = 4;
root_builder = root_builder
.start_node("pmu")?
.add_str(p_compatible, "arm,armv8-pmuv3")?
.add_u32_array(p_interrupts, &[GIC_PPI, pmu_gsiv - 16, IRQ_TYPE_LEVEL_HIGH])?
.end_node()?;
}
}

let mut openhcl_builder = root_builder.start_node("openhcl")?;
Expand Down Expand Up @@ -1054,12 +1064,13 @@ mod tests {
MemoryRange::new(0x30000..0x40000),
],
vtl0_alias_map: Some(1 << 48),
gic: Some(GicInfo {
gic: Some(Aarch64PlatformConfig {
gic_distributor_base: 0x10000,
gic_redistributors_base: 0x20000,
gic_v2m: None,
pmu_gsiv: Some(0x17),
virt_timer_ppi: 20,
}),
pmu_gsiv: Some(0x17),
accepted_ranges: vec![
MemoryRange::new(0x10000..0x20000),
MemoryRange::new(0x1000000..0x1500000),
Expand Down
10 changes: 6 additions & 4 deletions openhcl/openhcl_boot/src/dt.rs
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,9 @@ mod aarch64 {
// Architecturally, PPIs occupy INTID's in the [16..32) range. In DeviceTree,
// the type of the interrupt is specified first (PPI) and then the _relative_ INTID:
// for PPI INTID `27` `[GIC_PPI, 27-16, flags]` goes into the DT description.
pub const VMBUS_INTID: u32 = 2; // Note: the hardware INTID will be 16 + 2
/// VMBus PPI offset for the DT `interrupts` property.
/// Canonical INTID is DEFAULT_VMBUS_PPI (18) in openvmm_defs.
pub const VMBUS_PPI_OFFSET: u32 = 2;
pub const TIMER_INTID: u32 = 4; // Note: the hardware INTID will be 16 + 4

/// The Hyper-V default PMU_GSIV value.
Expand All @@ -52,7 +54,7 @@ mod aarch64 {

pub const GIC_PHANDLE: u32 = 1;
pub const GIC_PPI: u32 = 1;
pub const IRQ_TYPE_EDGE_FALLING: u32 = 2;
pub const IRQ_TYPE_EDGE_RISING: u32 = 1;
pub const IRQ_TYPE_LEVEL_LOW: u32 = 8;
pub const IRQ_TYPE_LEVEL_HIGH: u32 = 4;
}
Expand Down Expand Up @@ -146,7 +148,7 @@ fn write_vmbus<'a, T>(
// above specifies.
&[
aarch64::GIC_PPI,
aarch64::VMBUS_INTID,
aarch64::VMBUS_PPI_OFFSET,
interrupt_cell_value.expect("must be set on aarch64"),
],
)?;
Expand Down Expand Up @@ -456,7 +458,7 @@ pub fn write_dt(
p_interrupt_parent,
p_interrupts,
interrupt_cell_value: if cfg!(target_arch = "aarch64") {
Some(aarch64::IRQ_TYPE_EDGE_FALLING)
Some(aarch64::IRQ_TYPE_EDGE_RISING)
Comment thread
chris-oo marked this conversation as resolved.
} else {
None
},
Expand Down
37 changes: 24 additions & 13 deletions openhcl/underhill_core/src/loader/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -268,12 +268,14 @@ fn load_linux(params: LoadLinuxParams<'_>) -> Result<VpContext, Error> {
mem_layout,
cache_topology: None,
pcie_host_bridges: &vec![],
with_ioapic: true, // underhill always runs with ioapic
with_pic: false,
with_pit: false,
with_psp: platform_config.general.psp_enabled,
pm_base: crate::worker::PM_BASE,
acpi_irq: crate::worker::SYSTEM_IRQ_ACPI,
arch: vmm_core::acpi_builder::AcpiArchConfig::X86 {
with_ioapic: true, // openhcl always runs with ioapic
with_pic: false,
with_pit: false,
with_psp: platform_config.general.psp_enabled,
pm_base: crate::worker::PM_BASE,
acpi_irq: crate::worker::SYSTEM_IRQ_ACPI,
},
};

if mem_layout.mmio().len() < 2 {
Expand Down Expand Up @@ -306,7 +308,7 @@ fn load_linux(params: LoadLinuxParams<'_>) -> Result<VpContext, Error> {

dsdt.add_mmio_module(mem_layout.mmio()[0], mem_layout.mmio()[1]);
// TODO: change this once PCI is running in underhill
dsdt.add_vmbus(false);
dsdt.add_vmbus(false, None);
dsdt.add_rtc();
});
let acpi_len = acpi_tables.tables.len() + 0x1000;
Expand Down Expand Up @@ -462,12 +464,21 @@ pub fn write_uefi_config(
mem_layout,
cache_topology: None,
pcie_host_bridges: &vec![],
with_ioapic: cfg!(guest_arch = "x86_64"), // OpenHCL always runs with ioapic on x64
with_pic: false, // uefi never runs with pic or pit
with_pit: false,
with_psp: platform_config.general.psp_enabled,
pm_base: crate::worker::PM_BASE,
acpi_irq: crate::worker::SYSTEM_IRQ_ACPI,
#[cfg(guest_arch = "x86_64")]
arch: vmm_core::acpi_builder::AcpiArchConfig::X86 {
with_ioapic: true,
with_pic: false,
with_pit: false,
with_psp: platform_config.general.psp_enabled,
pm_base: crate::worker::PM_BASE,
acpi_irq: crate::worker::SYSTEM_IRQ_ACPI,
},
#[cfg(guest_arch = "aarch64")]
arch: vmm_core::acpi_builder::AcpiArchConfig::Aarch64 {
// Not used for MADT/SRAT generation; only matters for FADT.
hypervisor_vendor_identity: 0,
virt_timer_ppi: processor_topology.virt_timer_ppi(),
},
};

// Build the ACPI tables as specified.
Expand Down
Loading
Loading