Skip to content

Insufficient Reserved Memory Region Causes MMIO Faults and Allocator Metadata Corruption #221

@agicy

Description

@agicy

Bug Report

I am currently running the Debian system on the sysoul-x3300 platform (based on rk3588). During memory stress testing using memtester, I observed a critical stability issue.

When the tested memory size exceeds 4 GiB (total memory is 16 GiB, available 15 GiB; testing 1 GiB or 2 GiB works fine), an MMIO fault in zone0 is frequently triggered. This occurs even though the accessed memory address is correctly configured as belonging to zone0 in board.rs.

Logs

TODO: My logs are coming soon.

Configuration (board.rs)

TODO: My configuration is coming soon.

Root Cause Analysis

Upon investigation, the root cause is an insufficient reserved memory area for the hypervisor in the device tree, leading to memory corruption by the root-linux kernel.

According to src/consts.rs, the memory layout of hvisor consists of:

  1. Static binary code (.text, .data, etc.)
  2. Per-CPU local storage (Stack, etc.)
  3. Frame Allocator Memory Pool

Source Code Reference (src/consts.rs):

pub use crate::memory::PAGE_SIZE;
use crate::{memory::addr::VirtAddr, platform::BOARD_NCPUS};

/// Size of the hypervisor heap.
pub const HV_HEAP_SIZE: usize = 1024 * 1024; // 1 MiB
pub const HV_MEM_POOL_SIZE: usize = 64 * 1024 * 1024; // 64 MiB

/// Size of the per-CPU data (stack and other CPU-local data).
pub const PER_CPU_SIZE: usize = 512 * 1024; // 512 KiB

/// ... (omitted)

pub fn mem_pool_start() -> VirtAddr {
    core_end() + MAX_CPU_NUM * PER_CPU_SIZE
}

pub fn hv_end() -> VirtAddr {
    mem_pool_start() + HV_MEM_POOL_SIZE
}

Memory Layout Calculation (sysoul-x3300, 8 CPUs):

  • Start Address: 0x0050_0000
  • core_end (Binary end): 0x006e_6000
  • mem_pool_start: 0x00ae_6000
    • Calculation: core_end + (512 KiB * 8 CPUs) ≈ 0x006e_6000 + 4 MiB
  • hv_end: 0x04ae_6000
    • Calculation: mem_pool_start + 64 MiB (Frame Allocator)

The Discrepancy:
The actual required memory range extends up to 0x04ae_6000 (approx. 70 MiB total). However, most existing device tree configurations only reserve 4 MiB for hvisor.

%%{init: {'theme': 'base', 'themeVariables': { 'fontFamily': 'arial', 'fontSize': '14px'}}}%%
flowchart LR
    classDef memBlock fill:#e3f2fd,stroke:#1565c0,stroke-width:1px;
    classDef boundaryNode fill:none,stroke:none,color:#555,font-size:12px;
    classDef dangerBlock fill:#ffcdd2,stroke:#b71c1c,stroke-width:2px;

    subgraph Reserved ["✅&nbsp;Reserved&nbsp;Memory&nbsp;(Safe:&nbsp;4&nbsp;MiB)<br/>Range:&nbsp;0x0050_0000&nbsp;~&nbsp;0x0090_0000"]
        direction LR
        StartAddr["0x0050_0000"]:::boundaryNode
        Bin["Static Bin<br/>(~1.9 MiB)<br/>End: 0x006E_6000"]:::memBlock
        C0["CPU 0<br/>512 KiB"]:::memBlock
        C1["CPU 1<br/>512 KiB"]:::memBlock
        C2["CPU 2<br/>512 KiB"]:::memBlock
        C3["CPU 3<br/>512 KiB<br/>End: 0x008E_6000"]:::memBlock
        
        StartAddr --- Bin --- C0 --- C1 --- C2 --- C3
    end

    subgraph Unreserved ["❌&nbsp;Unreserved&nbsp;Region&nbsp;(Unsafe&nbsp;/&nbsp;MMIO&nbsp;Fault&nbsp;Risk)<br/>Range:&nbsp;0x0090_0000&nbsp;~&nbsp;0x04AE_6000"]
        direction LR
        C4["CPU 4<br/>(Cross Boundary)<br/>Start: 0x008E_6000"]:::dangerBlock
        C5["CPU 5<br/>512 KiB"]:::dangerBlock
        C6["CPU 6<br/>512 KiB"]:::dangerBlock
        C7["CPU 7<br/>512 KiB"]:::dangerBlock
        PoolStartAddr["0x00AE_6000"]:::boundaryNode
        FrameAlloc["Frame Allocator Pool<br/>Size: 64 MiB<br/>(Target of Corruption)"]:::dangerBlock
        EndAddr["0x04AE_6000"]:::boundaryNode
        
        C4 --- C5 --- C6 --- C7 --- PoolStartAddr --- FrameAlloc --- EndAddr
    end

    C3 --- C4

    style Reserved fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,stroke-dasharray: 5 5
    style Unreserved fill:#ffebee,stroke:#c62828,stroke-width:2px,stroke-dasharray: 5 5
Loading

Failure Mechanism

  1. The reserved 4 MiB covers the static binary and potentially the per-CPU data for the first few cores, but completely fails to cover the 64 MiB Frame Allocator.
  2. hvisor uses this Frame Allocator to manage memory regions via a BTree structure.
  3. When running memtester with large memory blocks, the root-linux kernel allocates pages that physically overlap with hvisor's unreserved Frame Allocator region.
  4. Linux overwrites the Frame Allocator data, corrupting the BTree metadata used for zone memory region tracking.
  5. Consequently, hvisor loses track of valid memory regions, resulting in false MMIO faults when those addresses are accessed.

Why it seemed to work before:

  • Luck: The specific physical pages used by the Frame Allocator were not allocated/overwritten by Linux during lighter loads.
  • Partial Coverage: The 4 MiB reservation covers the binary and initial CPU stacks. Since root-linux often utilizes fewer cores (e.g., 2 cores) during boot or idle, the per-CPU data for the active cores remained safe within the reserved area.

Action Items

To resolve this issue and prevent future occurrences, the following actions are required:

  • Build System Update: Implement a mechanism in the build system to calculate and output the exact required reserved memory range (Entry point to hv_end) during compilation.
  • Configuration Fix: Update all existing board configurations and Device Trees (DTS) to reserve sufficient memory (covering the full 64 MiB pool + per-CPU areas).
  • CI/CD Enhancement: Integrate memtester into the CI system test workflow. The root-linux should perform memory stress tests immediately after boot to ensure memory integrity before proceeding with other tests. This explains the high failure rate in past CI runs.
  • Documentation: Update the hvisor-book to explicitly document the static and runtime memory layout. Add a guide on how to correctly calculate and configure reserved-memory in the device tree.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingciGithub CIdocumentationImprovements or additions to documentationfeatureNew feature or requestquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions