flowey: add --with-mi-secure flag and CI gate for mimalloc secure mode#3161
flowey: add --with-mi-secure flag and CI gate for mimalloc secure mode#3161will-j-wright merged 10 commits intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new “mimalloc secure mode” feature to the OpenHCL build and wires it through Flowey so it can be enabled via a local cargo xflowey build-igvm flag and exercised by a dedicated CI gate.
Changes:
- Introduce a new Cargo feature (
mi-secure) forunderhill_entry/openvmm_hcland a corresponding Flowey feature enum (OpenvmmHclFeature::MiSecure). - Add an
extra_featuresmechanism to OpenHCL IGVM recipe builds so CI can layer features on top of recipe defaults. - Add a new CI gate that builds an x64 OpenHCL IGVM with
mi-secureenabled and runs a subset of vmm-tests against it.
Reviewed changes
Copilot reviewed 11 out of 14 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| openhcl/underhill_entry/Cargo.toml | Adds mi-secure feature to enable mimalloc/secure. |
| openhcl/openvmm_hcl/Cargo.toml | Plumbs mi-secure feature through to underhill_entry. |
| flowey/flowey_lib_hvlite/src/build_openvmm_hcl.rs | Adds MiSecure to OpenvmmHclFeature and maps it to the mi-secure Cargo feature. |
| flowey/flowey_lib_hvlite/src/build_openhcl_igvm_from_recipe.rs | Adds extra_features to request and merges into recipe feature set. |
| flowey/flowey_lib_hvlite/src/_jobs/local_build_igvm.rs | Adds with_mi_secure customization to local build-igvm job and inserts MiSecure. |
| flowey/flowey_lib_hvlite/src/_jobs/local_build_and_run_nextest_vmm_tests.rs | Updates recipe build request with new extra_features field. |
| flowey/flowey_lib_hvlite/src/_jobs/build_and_publish_openhcl_igvm_from_recipe.rs | Adds extra_features to build params and forwards it into recipe builds. |
| flowey/flowey_hvlite/src/pipelines/checkin_gates.rs | Adds CI build+test gate for mi-secure OpenHCL on x64. |
| flowey/flowey_hvlite/src/pipelines/build_reproducible.rs | Initializes extra_features to empty for reproducible builds. |
| flowey/flowey_hvlite/src/pipelines/build_igvm.rs | Adds --with-mi-secure CLI flag and plumbs it into local build job. |
| ci-flowey/openvmm-pr.yaml | Regenerated pipeline YAML to include new mi-secure jobs. |
| .github/workflows/openvmm-pr-release.yaml | Regenerated GitHub Actions workflow to include the mi-secure gate. |
d432180 to
ea07d45
Compare
ea07d45 to
87aaccd
Compare
54dd563 to
a066a8d
Compare
|
What's the motivation for adding this? |
It's kind of an ASAN "lite" in that it adds guard pages and some other memory security stuff. We were able to repro that weird ARM crash with MI_SECURE enabled. @benhillis thought it would be a good idea to run a test suite with it enabled in CI. |
8dc28c9 to
98b4b82
Compare
justus-camp-microsoft
left a comment
There was a problem hiding this comment.
LGTM besides the one comment
Add a new Customization flag that enables the mimalloc 'secure' feature when building OpenHCL. This adds extra security hardening (guard pages, randomized allocation, encrypted free lists) at a small performance cost. The feature chain is: openvmm_hcl/mi-secure -> underhill_entry/mi-secure -> mimalloc/secure. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add an extra_features field to OpenhclIgvmBuildParams and the build_openhcl_igvm_from_recipe Request, allowing CI pipelines to add cargo features on top of a recipe's defaults. This enables building IGVM variants (e.g., with mi-secure) without duplicating recipes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add CI jobs in checkin_gates that: 1. Build X64 OpenHCL with mimalloc secure mode (MiSecure feature) 2. Run basic OpenHCL VMM tests against the mi-secure IGVM The test job reuses standard Windows x86 artifacts (openvmm, pipette, guest_test_uefi, etc.) but substitutes mi-secure IGVM files. It filters to openhcl tests only, excluding servicing, CVM, and very_heavy tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use 'build openhcl (mi-secure) [x64-linux]' for OSS and 'build msft internal OpenHCL (mi-secure) [x64-linux]' for internal, matching the existing naming patterns. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The trimmed artifact list was missing Ubuntu2504ServerX64Vhd (and others), causing test failures when openhcl tests needed those images. Use the same full set of standard x64 test artifacts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move the definition earlier so the mi-secure gate can reference it via clone instead of duplicating the list. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Build all three x64 recipe variants with mi-secure enabled so that tests needing linux-direct and CVM IGVMs can also run. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The prepped_vbs test requires a prep_steps-generated image and needs_prep_run which the mi-secure test job doesn't provide. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
0f03961 to
505c831
Compare
The boot_heavy test creates VMs with 16 VPs and 2 NUMA nodes. When running with MI_SECURE enabled, mimalloc's guard pages around every allocation significantly increase OpenHCL's VTL2 memory overhead, causing the kernel to OOM with only ~67 MB available in VTL2. This test has been failing consistently on x64-windows-intel-mi-secure in every CI run since the mi-secure gate was introduced (PR microsoft#3161), making the CI perpetually red on main. The regular boot test (without the heavy VP configuration) still runs on mi-secure and provides adequate MI_SECURE coverage. The boot_heavy test continues to run on all other (non-mi-secure) platforms. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
## Summary The `boot_heavy` test (16 VPs, 2 NUMA nodes) has been failing **consistently on every CI run** since the mi-secure gate was introduced in PR #3161. This is the root cause of the persistent CI failures on `main`. ## Root Cause MI_SECURE mode surrounds all mimalloc allocations with guard pages, significantly increasing OpenHCL's VTL2 memory usage. The `boot_heavy` test's 16-VP configuration pushes VTL2 memory demand beyond the available ~67 MB, triggering a kernel OOM panic: ``` Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled ``` The mi-secure filter excluded `very_heavy` but not `boot_heavy`, so this test was included in the gate. ## Fix ## Evidence Failing in all 9+ recent CI runs on `x64-windows-intel-mi-secure`: - Test: `multiarch::openvmm_openhcl_uefi_x64_ubuntu_2504_server_x64_boot_heavy` - Sometimes also: `openvmm_openhcl_uefi_x64_windows_datacenter_core_2022_x64_boot_heavy` - Last successful main CI run: `9013a166` (before mi-secure was added) Co-authored-by: Ben Hillis <benhill@ntdev.microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
uh, I guess there is a regression, see : #3225 |
## Summary The `boot_heavy` test (16 VPs, 2 NUMA nodes) has been failing **consistently on every CI run** since the mi-secure gate was introduced in PR microsoft#3161. This is the root cause of the persistent CI failures on `main`. ## Root Cause MI_SECURE mode surrounds all mimalloc allocations with guard pages, significantly increasing OpenHCL's VTL2 memory usage. The `boot_heavy` test's 16-VP configuration pushes VTL2 memory demand beyond the available ~67 MB, triggering a kernel OOM panic: ``` Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled ``` The mi-secure filter excluded `very_heavy` but not `boot_heavy`, so this test was included in the gate. ## Fix ## Evidence Failing in all 9+ recent CI runs on `x64-windows-intel-mi-secure`: - Test: `multiarch::openvmm_openhcl_uefi_x64_ubuntu_2504_server_x64_boot_heavy` - Sometimes also: `openvmm_openhcl_uefi_x64_windows_datacenter_core_2022_x64_boot_heavy` - Last successful main CI run: `9013a166` (before mi-secure was added) Co-authored-by: Ben Hillis <benhill@ntdev.microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
microsoft#3161) Add a new CI gate which builds OpenHCL with MI_SECURE enabled and runs some tests on it. MI_SECURE adds the following: - All internal mimalloc pages are surrounded by guard pages and the heap metadata is behind a guard page as well (so a buffer overflow exploit cannot reach into the metadata). - All free list pointers are [encoded](https://github.com/microsoft/mimalloc/blob/783e3377f79ee82af43a0793910a9f2d01ac7863/include/mimalloc-internal.h#L396) with per-page keys which is used both to prevent overwrites with a known pointer, as well as to detect heap corruption. - Double free's are detected (and ignored). - The free lists are initialized in a random order and allocation randomly chooses between extension and reuse within a page to mitigate against attacks that rely on a predicable allocation order. Similarly, the larger heap blocks allocated by mimalloc from the OS are also address randomized. https://microsoft.github.io/mimalloc/modes.html This will hopefully allow us to catch any obvious memory violations earlier. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
microsoft#3143 renamed our CI pools at the same time as microsoft#3161 added a new test on the old pools. Fix this up.
## Summary The `boot_heavy` test (16 VPs, 2 NUMA nodes) has been failing **consistently on every CI run** since the mi-secure gate was introduced in PR microsoft#3161. This is the root cause of the persistent CI failures on `main`. ## Root Cause MI_SECURE mode surrounds all mimalloc allocations with guard pages, significantly increasing OpenHCL's VTL2 memory usage. The `boot_heavy` test's 16-VP configuration pushes VTL2 memory demand beyond the available ~67 MB, triggering a kernel OOM panic: ``` Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled ``` The mi-secure filter excluded `very_heavy` but not `boot_heavy`, so this test was included in the gate. ## Fix ## Evidence Failing in all 9+ recent CI runs on `x64-windows-intel-mi-secure`: - Test: `multiarch::openvmm_openhcl_uefi_x64_ubuntu_2504_server_x64_boot_heavy` - Sometimes also: `openvmm_openhcl_uefi_x64_windows_datacenter_core_2022_x64_boot_heavy` - Last successful main CI run: `9013a166` (before mi-secure was added) Co-authored-by: Ben Hillis <benhill@ntdev.microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a new CI gate which builds OpenHCL with MI_SECURE enabled and runs some tests on it.
MI_SECURE adds the following:
https://microsoft.github.io/mimalloc/modes.html
This will hopefully allow us to catch any obvious memory violations earlier.