mana: save and restore mana devices when keepalive is enabled #2123

justus-camp-microsoft · 2025-10-08T20:36:14Z

This adds the code that utilizes GdmaDevice::restore and VfioDevice::restore in cases where MANA keepalive is enabled. This requires a GuestServicingFlag to be set to enable MANA keepalive, a command line parameter of OPENHCL_MANA_KEEP_ALIVE=1, and for OPENHCL_ENABLE_VTL2_GPA_POOL to be set with enough memory for keepalive to function.

I've also modified the interactive console's service-vtl2 to take arguments for --mana-keepalive and --nvme-keepalive so that keepalive can be manually tested with the console.

… but might be storage?

…ve unnecessary results

Copilot

Pull Request Overview

This PR implements MANA keepalive functionality for OpenHCL, allowing MANA device state and DMA memory to be preserved across servicing operations. This builds on the existing keepalive infrastructure by adding MANA-specific save/restore capabilities alongside the previously implemented NVMe keepalive feature.

Key changes:

Added comprehensive save/restore functionality for MANA devices including driver state and memory preservation
Extended command-line and flag support for MANA keepalive configuration
Added test coverage for MANA keepalive functionality

Reviewed Changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
vmm_tests/vmm_tests/tests/tests/x86_64/openhcl_linux_direct.rs	Removed duplicate MANA servicing test to avoid conflicts
vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs	Added MANA NIC validation function and new test for keepalive functionality
vm/devices/net/net_mana/src/lib.rs	Updated test calls to include new mana_state parameter
vm/devices/net/mana_driver/src/save_restore.rs	Added ManaSavedState and ManaDeviceSavedState protobuf structures
vm/devices/net/mana_driver/src/mana.rs	Added save method and state restoration logic to ManaDevice
vm/devices/net/mana_driver/src/gdma_driver.rs	Removed dead_code attributes from save/restore methods
vm/devices/get/guest_emulation_device/src/lib.rs	Added mana_keepalive flag support to GED capabilities
vm/devices/get/get_resources/src/lib.rs	Added mana_keepalive field to GuestServicingFlags
vm/devices/get/get_protocol/src/lib.rs	Added enable_mana_keepalive bit to SaveGuestVtl2StateFlags
petri/src/worker.rs	Added mana_keepalive field mapping for servicing flags
petri/src/vm/mod.rs	Added enable_mana_keepalive field to OpenHclServicingFlags
openvmm/openvmm_entry/src/lib.rs	Added CLI support for mana-keepalive parameter
openhcl/underhill_core/src/worker.rs	Integrated MANA state handling into VM lifecycle management
openhcl/underhill_core/src/servicing.rs	Added mana_state field to servicing state structures
openhcl/underhill_core/src/options.rs	Added OPENHCL_MANA_KEEP_ALIVE environment variable support
openhcl/underhill_core/src/lib.rs	Wired mana_keep_alive option through worker configuration
openhcl/underhill_core/src/emuplat/netvsp.rs	Implemented comprehensive MANA device save/restore in VF manager
openhcl/underhill_core/src/dispatch/mod.rs	Added network settings save method and MANA state coordination

Comments suppressed due to low confidence (1)

openhcl/underhill_core/src/emuplat/netvsp.rs:1

Using expect() can cause panic if the memory with matching PFN is not found. Consider returning a proper error instead of panicking in production code.

// Copyright (c) Microsoft Corporation.

Copilot · 2025-10-08T20:37:08Z

vm/devices/net/mana_driver/src/mana.rs

+        if let Some(hwc_task) = self.hwc_task {
+            hwc_task.cancel().await;
+        }
+        let inner = Arc::into_inner(self.inner).unwrap();


Using unwrap() on Arc::into_inner() can panic if there are multiple references to the Arc. Consider using a safer approach with proper error handling or ensuring single ownership before this call.

Suggested change

let inner = Arc::into_inner(self.inner).unwrap();

let inner = match Arc::into_inner(self.inner) {

Some(inner) => inner,

None => {

tracing::error!("Failed to save MANA device state: multiple references to device exist");

return (

Err(anyhow::anyhow!("Failed to save MANA device state: multiple references to device exist")),

// We cannot recover the device, so return a default error value.

// This assumes T: Default, otherwise consider another approach.

// For now, use std::mem::zeroed() as a placeholder (unsafe).

unsafe { std::mem::zeroed() },

);

}

};

To copilot's point, what is preventing the unwrap from panic'ing here?

We use the same pattern in the shutdown path - this is definitely a potential footgun but it's something we already do

Copilot · 2025-10-08T20:37:08Z

openhcl/underhill_core/src/emuplat/netvsp.rs

+
+                        if let Some(device) = self.mana_device.take() {
+                            let (saved_state, device) = device.save().await;
+                            std::mem::forget(device);


Using std::mem::forget() prevents the device destructor from running, which could lead to resource leaks. Consider documenting why this is necessary or finding an alternative approach that properly manages the device lifetime.

Suggested change

std::mem::forget(device);

mattkur · 2025-10-08T23:16:01Z

vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs

+/// Test an OpenHCL Linux direct VM with a MANA nic assigned to VTL2 (backed by
+/// the MANA emulator), and vmbus relay. Perform servicing and validate that the
+/// nic is still functional.


Have you given thought to how you will validate that servicing used keepalive, rather than falling back to the non-KA path?

In addition, please consider adding cases for going to/from downlevel.

That's a fair question, and I actually ran into a race condition during manual testing where it looked like we successfully went through the save/restore path but actually hadn't.

For a short answer, I'm not sure how to validate that we went through the codepath we think we did without parsing some logs (do we see a gdma restoration message etc.). How is NVMe validating this?

@gurasinghMS can give you some pointers. But in short, we inject faults into the nvme emulator, such that a restore would never succeed if keepalive isn't used. In addition, I'm adding checks that look at the driver state in VTL2, so that we can ascertain if it thinks it will go down the right path (this is more akin to look at logs, as you say)

For the nvme-keepalive side, we ended up just forking out the nvme-emulator in to a test-only emulator where we added hooks to fault certain functions (This was much easier than adding test hooks alongside the existing implementation).
To test that keepalive was being used I added tests that would check for command activity during restore. In our case, the driver should never be issuing CREATE_IO_QUEUE commands during restore. Happy to discuss more offline too if you would like

mattkur · 2025-10-08T23:19:07Z

vm/devices/net/mana_driver/src/gdma_driver.rs

            mem: SavedMemoryState {
                base_pfn: self.dma_buffer.pfns()[0],
                len: self.dma_buffer.len(),
            },


Presumably you need some check here (or at a higher layer?) to gracefully fail the save if this memory is not from a persistent pool. (You don't have that primitive from the DMA APIs yet, ofc).

The DMA client usage is definitely very optimistic/fail-fast in this PR. I need to take another pass and think more critically about how to handle failures here or if we're going to pass that responsibility onto the DMA APIs somehow. Do you have thoughts here?

I know the DmaClient will error if all marked persisted memory isn't restored, but saving the memory this way I don't think there's any way to to check that we're not saving some non-persisted memory. Maybe DmaClient should have some method that spits out a common MemorySavedState that can be passed into persisted state and it can double check that it's persisted memory? Not sure

You'd check at allocation time. See my suggested change in #2087 .

mattkur

Nice. Few questions to get started.

mattkur · 2025-10-20T14:22:21Z

openhcl/underhill_core/src/worker.rs

-            persistent_allocations: false,
+            persistent_allocations: save_restore_supported,


This code will need to be more discerning here:

Use the DMA client with persistent allocations for the memory regions you will save/restore

Use a new DMA client (without persistent allocations) for the memory regions you will not save/restore

The private pool is a limited resource, and should only be used for memory that needs to come from it. @chris-oo FYI.

openhcl/underhill_core/src/emuplat/netvsp.rs

Brian-Perkins · 2025-10-10T23:00:43Z

openhcl/underhill_core/src/worker.rs

        Ok(params)
    }
+
+    async fn save(&mut self) -> Vec<ManaSavedState> {


This looks like shutdown_vf_devices. There is probably a better way to do this either by incorporating save into shutdown or at least both calling into a function that does most of this work

I moved the shared code into begin_vf_teardown. Let me know if that's sufficient or if you had something else in mind.

I think you could have what is currently 'begin_vf_teardown' take an async method as a parameter and either shutdown or save so you could also incorporate the common code below. That said, this is better so I would consider this a nit.

openhcl/underhill_core/src/emuplat/netvsp.rs

openhcl/underhill_core/src/worker.rs

github-actions · 2025-10-30T22:21:37Z

At least one Petri test failed.

sunilmut · 2025-10-30T22:25:49Z

openhcl/underhill_core/src/emuplat/netvsp.rs

+
+        match save_state {
+            Ok(None) => {
+                tracing::warn!("No MANA device present when saving state, returning None");


Is this statement always true? This can also happen if the save state RPC failed in device.save(), right?

Yeah there are two error conditions here - the device being gone and hwc_failure == true and a subsequent failure to save. I added an error enum here and cleaned up the error paths.

sunilmut · 2025-10-30T23:30:33Z

openhcl/underhill_core/src/worker.rs

+        let mut i = 0;
+        while i < self.nics.len() {
+            if instance_ids.contains(&self.nics[i].0) {
+                let val = self.nics.remove(i);


rust vector remove is expensive because of the shift. Consider using retain instead which has been optimized for the remove operation. So, something like:

let mut nic_channels = Vec::new(); self.nics.retain(|nic| { if instance_ids.contains(&nic.0) { nic_channels.push(nic.clone()); false // Remove this element } else { true // Keep this element } });

I think you reviewed before I moved code out to the helper, but this is the same as the existing code in the shutdown path (originally duplicated, now in a shared helper). I can make this change, but this is the code that's already in use today for the shutdown path.

sunilmut · 2025-10-30T23:35:34Z

openhcl/underhill_core/src/worker.rs

+        }
+
+        for instance_id in instance_ids {
+            if !nic_channels.iter().any(|(id, _)| *id == instance_id) {


After the above loop, will the code ever hit this condition?

Same here, this code was already here - I'm not sure what the original motivation was for this or if it occurs, but I didn't want to remove it without knowing.

sunilmut · 2025-10-30T23:41:30Z

vm/devices/net/mana_driver/src/mana.rs

+        if let Some(hwc_task) = self.hwc_task {
+            hwc_task.cancel().await;
+        }
+        let inner = Arc::into_inner(self.inner).unwrap();


To copilot's point, what is preventing the unwrap from panic'ing here?

Brian-Perkins · 2025-10-31T17:58:28Z

openhcl/underhill_core/src/worker.rs

        Ok(params)
    }
+
+    async fn save(&mut self) -> Vec<ManaSavedState> {


I think you could have what is currently 'begin_vf_teardown' take an async method as a parameter and either shutdown or save so you could also incorporate the common code below. That said, this is better so I would consider this a nit.

Brian-Perkins · 2025-10-31T18:06:50Z

openhcl/underhill_core/src/worker.rs

        let _span = tracing::info_span!("network_settings", CVM_ALLOWED).entered();
        for nic_config in controllers.mana.into_iter() {
+            let nic_servicing_state = if let Some(ref state) = servicing_state.mana_state {
+                state.iter().find(|s| s.pci_id == nic_config.pci_id)


Using the pci id will probably be fine, but it is just a tiny part of the full instance id, which may be better.

github-actions · 2025-10-31T18:27:38Z

At least one Petri test failed.

sunilmut · 2025-10-31T19:12:26Z

vm/devices/net/mana_driver/src/mana.rs

+            let gdma_memory = memory
+                .iter()
+                .find(|m| m.pfns()[0] == mana_state.gdma.mem.base_pfn)
+                .expect("gdma restored memory not found")


I think we will have to have different conditions under which attach_pending_buffers and GdmaDriver::restore are called. attach_pending_buffers should always be called if there is a MANA device saved state and GdmaDriver::restore should be only called when the MANA keep alive feature is enabled and there is a mana saved state. This will allow downgrading to a version that doesnt support (or where the feature is turned off) MANA keep alive, from a saved state that has the MANA keep alive device state.

justus-camp-microsoft and others added 26 commits April 23, 2025 15:41

cherry-pick gdma changes

eb06840

some cleanup

e9e293f

unused import

1d5479d

get rid of crate

93dfba9

some of feedback

eea14e1

remove arm state, re-arm on restore

d6fc130

save on hwc failure, save hwc failure state

fe7e8bd

move some duplicated code to init function

a48267e

Merge branch 'main' into gdma

1950a50

Merge branch 'main' into gdma

06ac5b2

retarget always

87ec2a3

unmap interrupts on drop

980f5b0

PR feedback

054c767

unmap all interrupts at once

1baec11

remove eq_id_msix saving and reconstruct, other minor feedback

038c2da

enable keepalive

e6fcd99

triple fault to fix, only calling save and not restoring

d799d6b

calling save but still destroying everything, triple faults sometimes…

0b4c25b

… but might be storage?

passing test, enabled by default currently

4dde9aa

default off

3177ddc

run format, make RPC return an option in case device disappears, remo…

c403fc5

…ve unnecessary results

some logging

8e30a76

merge main

6646ca2

cleanup from self-review

b3a95cc

don't have mana keepalive on by default in openvmm RPC

60008ad

fix some ordering issues, move some tests around

6b4351e

justus-camp-microsoft requested a review from a team as a code owner October 8, 2025 20:36

Copilot AI review requested due to automatic review settings October 8, 2025 20:36

Copilot AI reviewed Oct 8, 2025

View reviewed changes

Merge branch 'main' into full_enablement

c53567f

mattkur reviewed Oct 8, 2025

View reviewed changes

add a comment, put some duplicated code in a helper method

46ac820

justus-camp-microsoft requested review from Brian-Perkins and sunilmut October 10, 2025 21:53

mattkur reviewed Oct 20, 2025

View reviewed changes

justus-camp-microsoft added 3 commits October 21, 2025 15:09

Merge remote-tracking branch 'upstream/main' into full_enablement

cdad413

add an upgrade test

b5a4c16

Merge branch 'main' into full_enablement

c7925a8

justus-camp-microsoft mentioned this pull request Oct 28, 2025

dma_client: split dma_client into ephemeral and persistent clients #2296

Open

Brian-Perkins reviewed Oct 30, 2025

View reviewed changes

justus-camp-microsoft added 5 commits October 30, 2025 12:52

logging changes

de661a1

bail when keepalive not supported

ba67c7b

split into helper

d9b0b21

Merge branch 'main' into full_enablement

0058602

self-review

3c2648c

sunilmut reviewed Oct 30, 2025

View reviewed changes

justus-camp-microsoft added 2 commits October 30, 2025 17:37

clean up error paths

64d766e

large_enum_variant

1150c18

Brian-Perkins reviewed Oct 31, 2025

View reviewed changes

sunilmut reviewed Oct 31, 2025

View reviewed changes

-        let inner = Arc::into_inner(self.inner).unwrap();
+        let inner = match Arc::into_inner(self.inner) {
+            Some(inner) => inner,
+            None => {
+                tracing::error!("Failed to save MANA device state: multiple references to device exist");
+                return (
+                    Err(anyhow::anyhow!("Failed to save MANA device state: multiple references to device exist")),
+                    // We cannot recover the device, so return a default error value.
+                    // This assumes T: Default, otherwise consider another approach.
+                    // For now, use std::mem::zeroed() as a placeholder (unsafe).
+                    unsafe { std::mem::zeroed() },
+                );
+            }
+        };

		persistent_allocations: false,
		persistent_allocations: save_restore_supported,

mana: save and restore mana devices when keepalive is enabled #2123

Are you sure you want to change the base?

mana: save and restore mana devices when keepalive is enabled #2123

Conversation

justus-camp-microsoft commented Oct 8, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

justus-camp-microsoft Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattkur left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Oct 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 31, 2025

Uh oh!

sunilmut Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

justus-camp-microsoft Oct 31, 2025 •

edited

Loading

sunilmut Oct 31, 2025 •

edited

Loading