Skip to content

flowey: add --with-mi-secure flag and CI gate for mimalloc secure mode#3161

Merged
will-j-wright merged 10 commits intomicrosoft:mainfrom
will-j-wright:ohcl_mi_secure
Apr 6, 2026
Merged

flowey: add --with-mi-secure flag and CI gate for mimalloc secure mode#3161
will-j-wright merged 10 commits intomicrosoft:mainfrom
will-j-wright:ohcl_mi_secure

Conversation

@will-j-wright
Copy link
Copy Markdown
Contributor

@will-j-wright will-j-wright commented Mar 30, 2026

Add a new CI gate which builds OpenHCL with MI_SECURE enabled and runs some tests on it.

MI_SECURE adds the following:

  • All internal mimalloc pages are surrounded by guard pages and the heap metadata is behind a guard page as well (so a buffer overflow exploit cannot reach into the metadata).
  • All free list pointers are encoded with per-page keys which is used both to prevent overwrites with a known pointer, as well as to detect heap corruption.
  • Double free's are detected (and ignored).
  • The free lists are initialized in a random order and allocation randomly chooses between extension and reuse within a page to mitigate against attacks that rely on a predicable allocation order. Similarly, the larger heap blocks allocated by mimalloc from the OS are also address randomized.

https://microsoft.github.io/mimalloc/modes.html

This will hopefully allow us to catch any obvious memory violations earlier.

@will-j-wright will-j-wright requested review from a team as code owners March 30, 2026 20:58
Copilot AI review requested due to automatic review settings March 30, 2026 20:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “mimalloc secure mode” feature to the OpenHCL build and wires it through Flowey so it can be enabled via a local cargo xflowey build-igvm flag and exercised by a dedicated CI gate.

Changes:

  • Introduce a new Cargo feature (mi-secure) for underhill_entry / openvmm_hcl and a corresponding Flowey feature enum (OpenvmmHclFeature::MiSecure).
  • Add an extra_features mechanism to OpenHCL IGVM recipe builds so CI can layer features on top of recipe defaults.
  • Add a new CI gate that builds an x64 OpenHCL IGVM with mi-secure enabled and runs a subset of vmm-tests against it.

Reviewed changes

Copilot reviewed 11 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
openhcl/underhill_entry/Cargo.toml Adds mi-secure feature to enable mimalloc/secure.
openhcl/openvmm_hcl/Cargo.toml Plumbs mi-secure feature through to underhill_entry.
flowey/flowey_lib_hvlite/src/build_openvmm_hcl.rs Adds MiSecure to OpenvmmHclFeature and maps it to the mi-secure Cargo feature.
flowey/flowey_lib_hvlite/src/build_openhcl_igvm_from_recipe.rs Adds extra_features to request and merges into recipe feature set.
flowey/flowey_lib_hvlite/src/_jobs/local_build_igvm.rs Adds with_mi_secure customization to local build-igvm job and inserts MiSecure.
flowey/flowey_lib_hvlite/src/_jobs/local_build_and_run_nextest_vmm_tests.rs Updates recipe build request with new extra_features field.
flowey/flowey_lib_hvlite/src/_jobs/build_and_publish_openhcl_igvm_from_recipe.rs Adds extra_features to build params and forwards it into recipe builds.
flowey/flowey_hvlite/src/pipelines/checkin_gates.rs Adds CI build+test gate for mi-secure OpenHCL on x64.
flowey/flowey_hvlite/src/pipelines/build_reproducible.rs Initializes extra_features to empty for reproducible builds.
flowey/flowey_hvlite/src/pipelines/build_igvm.rs Adds --with-mi-secure CLI flag and plumbs it into local build job.
ci-flowey/openvmm-pr.yaml Regenerated pipeline YAML to include new mi-secure jobs.
.github/workflows/openvmm-pr-release.yaml Regenerated GitHub Actions workflow to include the mi-secure gate.

Comment thread flowey/flowey_hvlite/src/pipelines/checkin_gates.rs Outdated
Comment thread flowey/flowey_hvlite/src/pipelines/build_igvm.rs
Copilot AI review requested due to automatic review settings March 30, 2026 21:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 14 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings March 30, 2026 21:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 14 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings March 30, 2026 23:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 14 changed files in this pull request and generated no new comments.

@github-actions
Copy link
Copy Markdown

@smalis-msft
Copy link
Copy Markdown
Contributor

What's the motivation for adding this?

@will-j-wright
Copy link
Copy Markdown
Contributor Author

What's the motivation for adding this?

It's kind of an ASAN "lite" in that it adds guard pages and some other memory security stuff. We were able to repro that weird ARM crash with MI_SECURE enabled. @benhillis thought it would be a good idea to run a test suite with it enabled in CI.

https://microsoft.github.io/mimalloc/modes.html

Copilot AI review requested due to automatic review settings March 31, 2026 17:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 14 changed files in this pull request and generated no new comments.

Comment thread flowey/flowey_hvlite/src/pipelines/checkin_gates.rs Outdated
Comment thread flowey/flowey_lib_hvlite/src/_jobs/local_build_igvm.rs
@will-j-wright will-j-wright changed the title WIP: flowey: add --with-mi-secure flag and CI gate for mimalloc secure mode flowey: add --with-mi-secure flag and CI gate for mimalloc secure mode Mar 31, 2026
Copy link
Copy Markdown
Contributor

@justus-camp-microsoft justus-camp-microsoft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM besides the one comment

Comment thread flowey/flowey_lib_hvlite/src/build_openhcl_igvm_from_recipe.rs Outdated
Copilot AI review requested due to automatic review settings April 1, 2026 20:45
smalis-msft
smalis-msft previously approved these changes Apr 3, 2026
will-j-wright and others added 10 commits April 3, 2026 19:28
Add a new Customization flag that enables the mimalloc 'secure' feature
when building OpenHCL. This adds extra security hardening (guard pages,
randomized allocation, encrypted free lists) at a small performance cost.

The feature chain is: openvmm_hcl/mi-secure -> underhill_entry/mi-secure
-> mimalloc/secure.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add an extra_features field to OpenhclIgvmBuildParams and the
build_openhcl_igvm_from_recipe Request, allowing CI pipelines to add
cargo features on top of a recipe's defaults. This enables building
IGVM variants (e.g., with mi-secure) without duplicating recipes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add CI jobs in checkin_gates that:
1. Build X64 OpenHCL with mimalloc secure mode (MiSecure feature)
2. Run basic OpenHCL VMM tests against the mi-secure IGVM

The test job reuses standard Windows x86 artifacts (openvmm, pipette,
guest_test_uefi, etc.) but substitutes mi-secure IGVM files. It filters
to openhcl tests only, excluding servicing, CVM, and very_heavy tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use 'build openhcl (mi-secure) [x64-linux]' for OSS and
'build msft internal OpenHCL (mi-secure) [x64-linux]' for internal,
matching the existing naming patterns.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The trimmed artifact list was missing Ubuntu2504ServerX64Vhd (and
others), causing test failures when openhcl tests needed those images.
Use the same full set of standard x64 test artifacts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move the definition earlier so the mi-secure gate can reference it
via clone instead of duplicating the list.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Build all three x64 recipe variants with mi-secure enabled so that
tests needing linux-direct and CVM IGVMs can also run.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The prepped_vbs test requires a prep_steps-generated image and
needs_prep_run which the mi-secure test job doesn't provide.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@will-j-wright will-j-wright dismissed stale reviews from smalis-msft and sunilmut via 505c831 April 3, 2026 19:32
@will-j-wright will-j-wright enabled auto-merge (squash) April 3, 2026 19:33
@benhillis benhillis added the flowey Improvements to the flowey build infrastructure label Apr 6, 2026
@will-j-wright will-j-wright merged commit 67664ff into microsoft:main Apr 6, 2026
86 checks passed
smalis-msft added a commit that referenced this pull request Apr 6, 2026
#3143 renamed our CI pools at
the same time as #3161 added a
new test on the old pools. Fix this up.
benhillis pushed a commit to benhillis/openvmm that referenced this pull request Apr 8, 2026
The boot_heavy test creates VMs with 16 VPs and 2 NUMA nodes. When
running with MI_SECURE enabled, mimalloc's guard pages around every
allocation significantly increase OpenHCL's VTL2 memory overhead,
causing the kernel to OOM with only ~67 MB available in VTL2.

This test has been failing consistently on x64-windows-intel-mi-secure
in every CI run since the mi-secure gate was introduced (PR microsoft#3161),
making the CI perpetually red on main.

The regular boot test (without the heavy VP configuration) still runs
on mi-secure and provides adequate MI_SECURE coverage. The boot_heavy
test continues to run on all other (non-mi-secure) platforms.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
benhillis added a commit that referenced this pull request Apr 8, 2026
## Summary

The `boot_heavy` test (16 VPs, 2 NUMA nodes) has been failing
**consistently on every CI run** since the mi-secure gate was introduced
in PR #3161. This is the root cause of the persistent CI failures on
`main`.

## Root Cause

MI_SECURE mode surrounds all mimalloc allocations with guard pages,
significantly increasing OpenHCL's VTL2 memory usage. The `boot_heavy`
test's 16-VP configuration pushes VTL2 memory demand beyond the
available ~67 MB, triggering a kernel OOM panic:

```
Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled
```

The mi-secure filter excluded `very_heavy` but not `boot_heavy`, so this
test was included in the gate.

## Fix


## Evidence

Failing in all 9+ recent CI runs on `x64-windows-intel-mi-secure`:
- Test:
`multiarch::openvmm_openhcl_uefi_x64_ubuntu_2504_server_x64_boot_heavy`
- Sometimes also:
`openvmm_openhcl_uefi_x64_windows_datacenter_core_2022_x64_boot_heavy`
- Last successful main CI run: `9013a166` (before mi-secure was added)

Co-authored-by: Ben Hillis <benhill@ntdev.microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bitranox
Copy link
Copy Markdown
Contributor

bitranox commented Apr 8, 2026

uh, I guess there is a regression, see : #3225

mfrohlich-msft pushed a commit to mfrohlich-msft/openvmm that referenced this pull request Apr 8, 2026
## Summary

The `boot_heavy` test (16 VPs, 2 NUMA nodes) has been failing
**consistently on every CI run** since the mi-secure gate was introduced
in PR microsoft#3161. This is the root cause of the persistent CI failures on
`main`.

## Root Cause

MI_SECURE mode surrounds all mimalloc allocations with guard pages,
significantly increasing OpenHCL's VTL2 memory usage. The `boot_heavy`
test's 16-VP configuration pushes VTL2 memory demand beyond the
available ~67 MB, triggering a kernel OOM panic:

```
Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled
```

The mi-secure filter excluded `very_heavy` but not `boot_heavy`, so this
test was included in the gate.

## Fix


## Evidence

Failing in all 9+ recent CI runs on `x64-windows-intel-mi-secure`:
- Test:
`multiarch::openvmm_openhcl_uefi_x64_ubuntu_2504_server_x64_boot_heavy`
- Sometimes also:
`openvmm_openhcl_uefi_x64_windows_datacenter_core_2022_x64_boot_heavy`
- Last successful main CI run: `9013a166` (before mi-secure was added)

Co-authored-by: Ben Hillis <benhill@ntdev.microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
moor-coding pushed a commit to moor-coding/openvmm that referenced this pull request Apr 13, 2026
microsoft#3161)

Add a new CI gate which builds OpenHCL with MI_SECURE enabled and runs
some tests on it.

MI_SECURE adds the following:

- All internal mimalloc pages are surrounded by guard pages and the heap
metadata is behind a guard page as well (so a buffer overflow exploit
cannot reach into the metadata).
- All free list pointers are
[encoded](https://github.com/microsoft/mimalloc/blob/783e3377f79ee82af43a0793910a9f2d01ac7863/include/mimalloc-internal.h#L396)
with per-page keys which is used both to prevent overwrites with a known
pointer, as well as to detect heap corruption.
- Double free's are detected (and ignored).
- The free lists are initialized in a random order and allocation
randomly chooses between extension and reuse within a page to mitigate
against attacks that rely on a predicable allocation order. Similarly,
the larger heap blocks allocated by mimalloc from the OS are also
address randomized.

https://microsoft.github.io/mimalloc/modes.html

This will hopefully allow us to catch any obvious memory violations
earlier.

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
moor-coding pushed a commit to moor-coding/openvmm that referenced this pull request Apr 13, 2026
microsoft#3143 renamed our CI pools at
the same time as microsoft#3161 added a
new test on the old pools. Fix this up.
moor-coding pushed a commit to moor-coding/openvmm that referenced this pull request Apr 13, 2026
## Summary

The `boot_heavy` test (16 VPs, 2 NUMA nodes) has been failing
**consistently on every CI run** since the mi-secure gate was introduced
in PR microsoft#3161. This is the root cause of the persistent CI failures on
`main`.

## Root Cause

MI_SECURE mode surrounds all mimalloc allocations with guard pages,
significantly increasing OpenHCL's VTL2 memory usage. The `boot_heavy`
test's 16-VP configuration pushes VTL2 memory demand beyond the
available ~67 MB, triggering a kernel OOM panic:

```
Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled
```

The mi-secure filter excluded `very_heavy` but not `boot_heavy`, so this
test was included in the gate.

## Fix


## Evidence

Failing in all 9+ recent CI runs on `x64-windows-intel-mi-secure`:
- Test:
`multiarch::openvmm_openhcl_uefi_x64_ubuntu_2504_server_x64_boot_heavy`
- Sometimes also:
`openvmm_openhcl_uefi_x64_windows_datacenter_core_2022_x64_boot_heavy`
- Last successful main CI run: `9013a166` (before mi-secure was added)

Co-authored-by: Ben Hillis <benhill@ntdev.microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

flowey Improvements to the flowey build infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants