feat: i686 page tables, snapshot compaction, and CoW (standalone)#1385
feat: i686 page tables, snapshot compaction, and CoW (standalone)#1385
Conversation
…eatures Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com> Co-authored-by: danbugs <danilochiarlone@gmail.com>
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com> Co-authored-by: danbugs <danilochiarlone@gmail.com>
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com> Co-authored-by: danbugs <danilochiarlone@gmail.com>
- sandbox/snapshot.rs: decode i686 PTEs with `u32::from_le_bytes`
instead of `from_ne_bytes`. Page-table entries are defined as
little-endian by arch spec; `from_ne_bytes` is incorrect on
big-endian hosts and inconsistent with surrounding helpers that
already use `from_le_bytes`.
- hyperlight_common/vmem.rs: guard the `i686-guest` feature with a
`compile_error!` on targets that are neither `x86` (the guest
itself) nor `x86_64` (the host that runs the guest). Previously
enabling the feature on e.g. aarch64 would silently compile but
downstream crates would hit confusing missing-item errors when
they reached for `vmem::i686_guest`.
- mem/mgr.rs, sandbox/snapshot.rs: persist the per-process
PD-roots count in `Snapshot` and rewrite the scratch bookkeeping
area (`SCRATCH_TOP_PD_ROOTS_{COUNT,ARRAY}_OFFSET`) during
`restore_snapshot`. Scratch is zeroed on restore, so without
this a subsequent `snapshot()` call would read count=0 through
`read_pd_roots_from_scratch` and fail. Root `i`'s compacted
GPA is deterministic (`layout.get_pt_base_gpa() + i * PAGE_SIZE`
— same layout `compact_i686_snapshot` used when building the
rebuilt PDs), so we only need to store the count.
Signed-off-by: danbugs <danilochiarlone@gmail.com>
There was a problem hiding this comment.
Reviewers: Opus, Gemini,ChatGPT
All three reviewers agree this is a well-structured PR that cleanly separates the nanvix-unstable feature into i686-guest and guest-counter, and replaces the TODO-laden real-mode boot with proper 32-bit protected mode + paging. The code quality is generally good — comments are thorough, unsafe blocks are contained, and the architecture is sensible.
However, all three independently flagged critical concerns around the CoW flag logic, potential scratch memory layout overlap, and the complete absence of tests for the i686 code path.
🟢 Good Stuff
- Unified
restore_all_state(): Removing the TODO-ladennanvix-unstablebranch with its "this is probably not correct" comment is a significant improvement. The CR3-based restore is now clean and shared. compile_error!guard for i686-guest on non-x86 — nice defensive programming.- OOB-safe
read_entryimplementations: BothSnapshotandSharedMemoryPageTableBufferreturn 0 (not-present) for OOB addresses rather than panicking. Exactly right for defensively walking guest-controlled page tables. - Clean feature flag split:
i686-guest+guest-counterare genuinely independent concerns. - Thorough doc comments on
Snapshotfields (n_pd_roots,separate_pt_bytes). - Scratch bookkeeping factoring: Extracting
copy_pt_to_scratch()fromupdate_scratch_bookkeeping()is a clean refactor.
🔴 Additional findings (on lines outside the diff)
Snapshot impl of TableReadOps always reads 8-byte entries (snapshot.rs line 131, Flagged by: 2/3)
The impl TableReadOps for Snapshot at line 131 of snapshot.rs unconditionally reads 8 bytes for a PTE. While it appears to only be used in the #[cfg(not(feature = "i686-guest"))] path currently, it's not gated by #[cfg]. If anyone ever calls virt_to_phys with a Snapshot on the i686 path, it will read 8-byte entries from a 4-byte PTE table — reading cross-entry data and producing garbage mappings. Either gate with #[cfg(not(feature = "i686-guest"))] or add a comment + compile_error! guard.
Endianness inconsistency (snapshot.rs line 149, Flagged by: 2/3)
Line 149 of snapshot.rs uses u64::from_ne_bytes() (native endian), while the i686 SharedMemoryPageTableBuffer::read_entry() at line 273 correctly uses u32::from_le_bytes() with a comment noting "Page-table entries are little-endian by arch spec". The x86_64 path happens to work on LE hosts but is technically incorrect by the same reasoning. Should use from_le_bytes consistently.
See inline comments for the remaining findings.
| }; | ||
| let mut flags = (pte & 0xFFF) | extra_flags; | ||
| // Mark writable or already-CoW pages as CoW (read-only + AVL bit). | ||
| if (flags & RW_FLAGS & !PTE_COW) != 0 || (flags & PTE_COW) != 0 { |
There was a problem hiding this comment.
🔴 Critical: Overly broad CoW flag condition (Flagged by:3/3)
RW_FLAGS = PTE_PRESENT | PTE_RW | PTE_ACCESSED = 0x23. The expression flags & RW_FLAGS & !PTE_COW simplifies to just flags & RW_FLAGS because PTE_COW (bit 9) doesn't overlap with RW_FLAGS. So flags & 0x23 != 0 is true for any present page (since PRESENT is bit 0 and every present PTE has it set).
This means every present page — including genuinely read-only .text/.rodata — gets marked CoW. Read-only pages don't need CoW; they can be shared directly.
Suggested fix — check for PAGE_RW specifically:
if (flags & PAGE_RW as u32) != 0 || (flags & PTE_COW) != 0 {| /// The guest writes this before signaling boot-complete so the host can walk | ||
| /// all active PDs during snapshot creation (not just CR3). | ||
| #[cfg(feature = "i686-guest")] | ||
| pub const SCRATCH_TOP_PD_ROOTS_COUNT_OFFSET: u64 = 0x28; |
There was a problem hiding this comment.
🔴 Critical: PD roots bookkeeping may overlap exception stack (Flagged by: 2/3)
SCRATCH_TOP_EXN_STACK_OFFSET = 0x20 means the exception stack top lives here and grows downward. PD_ROOTS_COUNT_OFFSET = 0x28 is only 8 bytes below the exception stack top.
If any i686 guest ever uses exceptions that push state to this stack, the first push would land at offset 0x28 — exactly on top of PD_ROOTS_COUNT. This would corrupt the PD roots count, causing the host to read garbage PD root GPAs.
Suggested mitigations:
- Move PD roots to a dedicated area well below the exception stack (e.g. offset
0x200+) - Or document the invariant explicitly with a compile-time check that the exception stack size is bounded
- Or reserve explicit space between the exception stack and PD roots bookkeeping
| } | ||
|
|
||
| #[cfg(test)] | ||
| #[cfg(not(feature = "i686-guest"))] |
There was a problem hiding this comment.
🔴 Critical: Zero test coverage on i686 code path (Flagged by: 3/3)
All existing tests are #[cfg(not(feature = "i686-guest"))]. This PR adds ~500 lines of complex page table manipulation, CoW map building, and snapshot compaction with zero test coverage. This is security-critical VMM code.
Functions without tests:
build_cow_map()— walks guest-controlled page tablescompact_i686_snapshot()— 100+ lines of compaction logicbuild_initial_i686_page_tables()— initial PT setupi686_pt::Builder— all methodsvirt_to_phys_all()— the i686 PT walkerread_pd_roots_from_scratch()— reads guest-controlled data
At minimum, unit tests should cover: the i686_pt::Builder, the CoW map builder with crafted PTs, compaction with multiple PD roots, and adversarial inputs (e.g. PD entries pointing out of bounds).
| let mut cow_map = std::collections::HashMap::new(); | ||
| let scratch_base = scratch_base_gpa(layout.get_scratch_size()); | ||
| let scratch_end = scratch_base + layout.get_scratch_size() as u64; | ||
| let mem_size = layout.get_memory_size().unwrap_or(0) as u64; |
There was a problem hiding this comment.
🟡 Warning: Silent fallback on error (Flagged by: 2/3)
get_memory_size().unwrap_or(0) silently uses 0 on error, which means the va < mem_size check will reject ALL mappings and the CoW map will be empty. This could cause incorrect snapshot behavior (no CoW pages detected). The error should be propagated, or at minimum logged.
| }; | ||
|
|
||
| // Resolve a VA through a PD to its physical frame. | ||
| let resolve_through_pd = |pd_gpa: u64, va: u64| -> u64 { |
There was a problem hiding this comment.
🟡 Warning: Fallback returns VA instead of PA (Flagged by: 1/3)
Returning va (a virtual address) when the PD walk fails mixes virtual and physical addresses. The caller uses this result to get a physical page table GPA. If the PD doesn't have an entry for a user PT, this returns the un-translated address, which could cause the host to read from an incorrect location when rebuilding user page tables.
Consider returning Option<u64> and skipping the PT rebuild when resolution fails.
| fs: data_seg, | ||
| gs: data_seg, | ||
| tr: tr_seg, | ||
| cr0: 0x80010011, // PE + ET + WP (write-protect for CoW) + PG |
There was a problem hiding this comment.
🟡 Warning: Magic number (Flagged by: 2/3)
The x86_64 path uses named constants (CR4_PAE, CR4_OSFXSR, etc.). The i686 path should define CR0_PE, CR0_ET, CR0_WP, CR0_PG constants similarly for self-documenting code:
const CR0_PE: u64 = 1;
const CR0_ET: u64 = 1 << 4;
const CR0_WP: u64 = 1 << 16;
const CR0_PG: u64 = 1 << 31;
cr0: CR0_PE | CR0_ET | CR0_WP | CR0_PG,| // memories, and so can't share | ||
| unshared_snapshot_mem: { any(feature = "nanvix-unstable", feature = "gdb") }, | ||
| // gdb needs writable snapshot memory for debug access. | ||
| unshared_snapshot_mem: { feature = "gdb" }, |
There was a problem hiding this comment.
🟡 Warning: unshared_snapshot_mem no longer gated on i686 (Flagged by: 1/3)
Old code: unshared_snapshot_mem: { any(feature = "nanvix-unstable", feature = "gdb") }. Now only gated on gdb. The separate_pt_bytes approach may avoid this need, but this should be explicitly validated — if any i686 code path writes to snapshot memory, it will silently fail with a read-only mapping.
cd065b1 to
105974d
Compare
Signed-off-by: Tomasz Andrzejak <andreiltd@gmail.com>
Signed-off-by: Tomasz Andrzejak <andreiltd@gmail.com>
327f85b to
8fcb7ea
Compare
Summary
The cleaned-up version of #1381: drops the dependencies on #1379 and #1380 so this PR can land on its own.
Commits
refactor: replace nanvix-unstable with i686-guest and guest-counter featuresfeat: i686 protected-mode boot and unified restore pathfeat: i686 page tables, snapshot compaction, and CoW supportfixup: address PR #1381 review commentsWhat this adds
nanvix-unstablegates split intoi686-guest(PT walker, protected-mode boot, compaction) andguest-counter(scratch counter plumbing). No behavior change for consumers using the old feature flag; Cargo.toml / Justfile / build.rs updated to match.What it does not include (vs #1381)
file_mappingsremoval (Remove HyperlightPEB file mappings #1380).Both can land separately; #1381 happened to stack on top of them.
Review items from #1381 addressed here
u32::from_le_bytes(previouslyfrom_ne_bytes) — PTEs are little-endian by arch spec.i686-guestfeature guarded with acompile_error!on targets that are neitherx86(guest) norx86_64(host).restore_snapshotrewrites the scratch PD-roots bookkeeping (SCRATCH_TOP_PD_ROOTS_{COUNT,ARRAY}_OFFSET) so a subsequentsnapshot()doesn't see a stale zero count.Snapshotpersists the root count; root GPAs are deterministic (layout.get_pt_base_gpa() + i * PAGE_SIZE).Verification
Built with
cargo check -p hyperlight-{common,host,guest} --features kvm,mshv3,executable_heap,i686-guest,guest-counter— clean. End-to-end tested against nanvix@e61306676: Hello 5/5 withNANVIX_REPEAT=4through the restore+call loop.