Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/workflows/dep_build_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,11 @@ jobs:
# with hw-interrupts feature enabled (+ explicit driver on Linux)
just test ${{ inputs.config }} ${{ runner.os == 'Linux' && (inputs.hypervisor == 'mshv3' && 'mshv3,hw-interrupts' || 'kvm,hw-interrupts') || 'hw-interrupts' }}

- name: Run Rust tests with enable_guest_clock
run: |
# with enable_guest_clock + hw-interrupts (+ explicit driver on Linux)
just test ${{ inputs.config }} ${{ runner.os == 'Linux' && (inputs.hypervisor == 'mshv3' && 'mshv3,hw-interrupts,enable_guest_clock' || 'kvm,hw-interrupts,enable_guest_clock') || 'hw-interrupts,enable_guest_clock' }}

- name: Run Rust Gdb tests
env:
RUST_LOG: debug
Expand Down
3 changes: 3 additions & 0 deletions Justfile
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,9 @@ test-like-ci config=default-target hypervisor="kvm":
@# with hw-interrupts enabled (+ explicit driver on Linux)
{{ if os() == "linux" { if hypervisor == "mshv3" { "just test " + config + " mshv3,hw-interrupts" } else { "just test " + config + " kvm,hw-interrupts" } } else { "just test " + config + " hw-interrupts" } }}

@# with enable_guest_clock (+ explicit driver + hw-interrupts on Linux)
{{ if os() == "linux" { if hypervisor == "mshv3" { "just test " + config + " mshv3,hw-interrupts,enable_guest_clock" } else { "just test " + config + " kvm,hw-interrupts,enable_guest_clock" } } else { "just test " + config + " hw-interrupts,enable_guest_clock" } }}

@# make sure certain cargo features compile
just check

Expand Down
1 change: 1 addition & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ This project is composed internally of several components, depicted in the below
* [How to build a Hyperlight guest binary](./how-to-build-a-hyperlight-guest-binary.md)
* [Security considerations](./security.md)
* [Technical requirements document](./technical-requirements-document.md)
* [Paravirtualized guest clock](./guest-time.md)

## For developers

Expand Down
135 changes: 135 additions & 0 deletions docs/guest-time.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Paravirtualized Guest Clock

Hyperlight's `enable_guest_clock` Cargo feature gives guests a cheap way to ask
"what time is it?" without taking a VM exit. When the host is built with the
feature, every sandbox exposes a paravirtualized clock that the guest can read
using ordinary memory loads.

## What the guest gets

When the feature is enabled the host populates a single 4 KiB "clock page"
inside the sandbox's scratch region. The page carries two pieces of
information:

- **A hypervisor-specific calibration block at offset `0x00`.** Written by
KVM (`kvm_clock`) or Hyper-V / MSHV (Reference TSC). Contains the TSC
frequency, scaling constants, and a sequence lock the guest uses to read it
atomically. The entire clock page is hypervisor-owned; Hyperlight does not
write to it.
- **Hyperlight metadata in the scratch bookkeeping page** (separate from the
clock page): a `u64` [`ClockType`](../src/hyperlight_common/src/time.rs) tag
and `boot_time_ns`, the Unix-epoch origin of the monotonic clock computed
by the host as `wall_now - monotonic_now` (see below). These live at fixed
offsets from the top of scratch (`-0x28` and `-0x30`), NOT in the clock
page, so a future TLFS extension cannot clobber them.

With those two pieces the guest can compute:

- **Monotonic nanoseconds since boot** — read the TSC, apply the scaling
factors from the calibration block, giving you a `CLOCK_MONOTONIC`
equivalent.
- **Wall-clock nanoseconds since the Unix epoch** — add `boot_time_ns` to the
monotonic value above, giving you a `CLOCK_REALTIME` / `gettimeofday`. `boot_time_ns` is computed by the host as
`SystemTime::now() - KVM_GET_CLOCK` (on KVM) or
`SystemTime::now() - TIME_REF_COUNT` (on Hyper-V) after sandbox
initialisation. Hyper-V has no equivalent to KVM's
`MSR_KVM_WALL_CLOCK_NEW`, so we use this uniform host-computed approach
on all backends.

> **Note (KVM only):** Wall-clock time returns `None` during
> `hyperlight_main` (guest init). On KVM, `KVM_GET_CLOCK` is unreliable
> until the "master clock" is established at first vCPU entry, so
> `boot_time_ns` is stamped after init completes. Monotonic time works
> fine during init. Wall-clock time becomes available on the first
> dispatch call.

Both reads are lock-free (well, seqlock-protected for the calibration block)
and never leave the guest.

## Using it in a Rust guest

The guest-side API lives in `hyperlight_guest::time` for the low-level
readers and `hyperlight_guest_bin::time` for a `std::time`-flavoured
wrapper:

```rust
// Low-level, no_std readers.
use hyperlight_guest::time;

if time::is_available() {
let mono_ns: u64 = time::monotonic_time_ns().unwrap();
let wall_ns: u64 = time::wall_clock_time_ns().unwrap();
}

// std::time-flavoured wrapper (hyperlight_guest_bin only).
use hyperlight_guest_bin::time::{Instant, SystemTime, UNIX_EPOCH};

let t0 = Instant::now()?;
// ... do work ...
let elapsed = t0.elapsed()?;

let now = SystemTime::now()?;
let unix_ns = now.duration_since(UNIX_EPOCH)?.as_nanos();
```

C guests that use picolibc get paravirt time for free: `hyperlight_guest_bin`
wires `clock_gettime(CLOCK_MONOTONIC|CLOCK_REALTIME)` and `gettimeofday` into
the same reader, so existing C code continues to work unchanged.

## Snapshot / restore semantics

Both `boot_time_ns` and the hypervisor calibration block live inside scratch
memory, which is not included in snapshots. On every
`MultiUseSandbox::restore`, the host re-arms the clock page: it re-installs
the pvclock MSR / Hyper-V register against the fresh vCPU state and stamps a
new `boot_time_ns` captured at the moment of restore. As a result a restored
guest observes wall-clock time reflecting the restore moment, not the
original boot — which is what wall clocks are supposed to do.

## Enabling the feature

Turn it on in the host's `Cargo.toml`:

```toml
[dependencies]
hyperlight-host = { version = "...", features = ["enable_guest_clock"] }
```

The feature is x86_64 only; on aarch64 it has no effect. It is off by default
so existing sandboxes don't pay for a facility they don't use. When off, the
clock page is still reserved in the layout (so memory maps are stable) but
left un-mapped against any hypervisor clock source; `hyperlight_guest::time`
readers then report "unavailable" and fall back to whatever the guest wants
to do about it (the picolibc wiring returns a synthetic 1-second-per-call
counter, which is enough to stop `strftime` crashing and not much else).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
counter, which is enough to stop `strftime` crashing and not much else).
counter).

It is also good stopgap for many other things that expect gettimeofday / clock_gettime to work (like StarlingMonkey and quickjs)


## Layout details

The clock page sits 3 pages below the very top of the scratch region:

| Offset from top | Size | Contents |
|-----------------|-------|------------------------------------------------|
| `-0x1000` | 4 KiB | Bookkeeping (size, allocator counter, ...) |
| `-0x2000` | 4 KiB | Reserved for shared-state counter |
| `-0x3000` | 4 KiB | Paravirtualized clock page |

Because the clock page is at the top of scratch, both the guest's main stack
and its IST1 (exception) stack are configured to start one page below the
clock page (at `MAX_GVA + 1 - SCRATCH_TOP_CLOCK_PAGE_OFFSET`) so stack writes
— including page-fault handlers running on IST1 — cannot clobber the trailer.
The allocator reserves the top three pages unconditionally so the memory map
stays identical whether or not the feature is enabled.

## Non-goals

- **Sub-microsecond accuracy.** `boot_time_ns` is computed from two
back-to-back host reads (`SystemTime::now()` and `KVM_GET_CLOCK` /
`TIME_REF_COUNT`). On KVM, residual disagreement between `KVM_GET_CLOCK`
and the pvclock page can add up to ~13ms of constant offset (observed on
WSL2; root cause uncertain). On Hyper-V the offset should be negligible.
- **`CLOCK_PROCESS_CPUTIME_ID` and friends.** The clock page exposes only
monotonic and wall-clock time; per-thread / per-process CPU time is out of
scope.
- **Timers or sleeps.** The guest can read the clock but has no way to ask
the hypervisor to wake it up later — that is still done through the
existing guest-function call model.
74 changes: 73 additions & 1 deletion src/hyperlight_common/src/layout.rs
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,31 @@ pub const SCRATCH_TOP_SIZE_OFFSET: u64 = 0x08;
pub const SCRATCH_TOP_ALLOCATOR_OFFSET: u64 = 0x10;
pub const SCRATCH_TOP_SNAPSHOT_PT_GPA_BASE_OFFSET: u64 = 0x18;
pub const SCRATCH_TOP_SNAPSHOT_GENERATION_OFFSET: u64 = 0x20;
pub const SCRATCH_TOP_EXN_STACK_OFFSET: u64 = 0x30;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to EXN_STACK_OFFSET? I would assume it was moved to after clock pages but it looks like it's gone?

/// Offset from the top of scratch for the `clock_type` field (u64).
///
/// Identifies which paravirtualized clock the host configured
/// ([`crate::time::ClockType`]). Lives in the bookkeeping page at the
/// top of scratch — NOT in the clock page itself — so the hypervisor
/// cannot clobber it if it extends the TLFS-reserved region.
pub const SCRATCH_TOP_CLOCK_TYPE_OFFSET: u64 = 0x28;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just curious, is top scratch now a goto mechanism to share configuration between host and guest replacing PEB?


/// Offset from the top of scratch for the `boot_time_ns` field (u64).
///
/// The Unix-epoch origin of the monotonic clock, computed by the host
/// as `SystemTime::now() - current_monotonic_ns()` and written in
/// `arm_clock`. The guest recovers wall time as
/// `boot_time_ns + monotonic_time_ns()`.
///
/// Hyper-V has no equivalent to KVM's `MSR_KVM_WALL_CLOCK_NEW`, so
/// we use this uniform host-computed approach on all backends.
pub const SCRATCH_TOP_BOOT_TIME_NS_OFFSET: u64 = 0x30;

// ---- Next free offset in the bookkeeping page: 0x38 ----
// When adding new host→guest shared fields, use the next multiple of
// 8 after the last offset above. All fields in this page are u64,
// little-endian, host-written and guest-read, and are excluded from
// snapshots because they live in scratch memory.

/// Offset from the top of scratch memory for a shared host-guest u64 counter.
///
Expand All @@ -49,12 +73,60 @@ pub const SCRATCH_TOP_EXN_STACK_OFFSET: u64 = 0x30;
#[cfg(feature = "guest-counter")]
pub const SCRATCH_TOP_GUEST_COUNTER_OFFSET: u64 = 0x1008;

/// Offset from the top of scratch memory for the start of the paravirtualized
/// clock page.
///
/// The clock page is a single 4 KiB page occupying the scratch offsets
/// `[0x3000, 0x2000)` from the top — i.e. one page lower than the
/// guest-counter page, to avoid the i686 frame-number issue that forces the
/// counter off the very last page (see [`SCRATCH_TOP_GUEST_COUNTER_OFFSET`]).
///
/// The constant is the *high* (exclusive) offset; the page base is one page
/// below, at `top - SCRATCH_TOP_CLOCK_PAGE_OFFSET` + 1 byte — in other words,
/// subtract this value from `MAX_GPA`/`MAX_GVA` + 1 to get the page base.
///
/// The page is always reserved regardless of the `enable_guest_clock`
/// feature so that the memory layout (and therefore stack positions)
/// is stable across feature-flag builds. The host only populates it
/// when the feature is enabled; otherwise it stays zero-filled and
/// the guest sees `ClockType::None`.
pub const SCRATCH_TOP_CLOCK_PAGE_OFFSET: u64 = 0x3000;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add static assertions that the address is properly aligned to store KvmPvclockVcpuTimeInfo etc?


/// Size of the paravirtualized clock page in bytes (one 4 KiB page).
/// The entire page is owned by the hypervisor (KVM pvclock or Hyper-V
/// Reference TSC). Hyperlight's own metadata (`clock_type`,
/// `boot_time_ns`) lives in the bookkeeping page at offsets
/// `SCRATCH_TOP_CLOCK_TYPE_OFFSET` / `SCRATCH_TOP_BOOT_TIME_NS_OFFSET`,
/// NOT in the clock page, so a future TLFS extension cannot clobber it.
pub const CLOCK_PAGE_SIZE: u64 = 0x1000;

pub fn scratch_base_gpa(size: usize) -> u64 {
(MAX_GPA - size + 1) as u64
}
pub fn scratch_base_gva(size: usize) -> u64 {
(MAX_GVA - size + 1) as u64
}

/// Guest physical address of the base of the paravirtualized clock page.
///
/// The clock page sits at a fixed offset from the top of the guest physical
/// address space, independent of `scratch_size`: it is always
/// `MAX_GPA + 1 - SCRATCH_TOP_CLOCK_PAGE_OFFSET`.
///
/// Only meaningful when the host is built with the `enable_guest_clock`
/// feature; otherwise the page is not populated.
pub const fn clock_page_gpa() -> u64 {
(MAX_GPA as u64) + 1 - SCRATCH_TOP_CLOCK_PAGE_OFFSET
}

/// Guest virtual address of the base of the paravirtualized clock page.
///
/// See [`clock_page_gpa`]. Scratch is mapped identity-style from
/// `scratch_base_gva` to `scratch_base_gpa`, so the clock page sits at the
/// equivalent offset in the guest virtual address space.
pub const fn clock_page_gva() -> u64 {
(MAX_GVA as u64) + 1 - SCRATCH_TOP_CLOCK_PAGE_OFFSET
}

/// Compute the minimum scratch region size needed for a sandbox.
pub use arch::min_scratch_size;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don's see the arch specific min_scratch_size updates - should they account for new pages?

4 changes: 4 additions & 0 deletions src/hyperlight_common/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -48,5 +48,9 @@ pub mod func;
// cbindgen:ignore
pub mod vmem;

/// Paravirtualized clock structures shared between host and guest.
/// cbindgen:ignore
pub mod time;

/// ELF note types for embedding hyperlight version metadata in guest binaries.
pub mod version_note;
Loading
Loading