Skip to content

Conversation

@mattkur
Copy link
Contributor

@mattkur mattkur commented Nov 12, 2025

This PR enables NVMe Keepalive across servicing operations by default (the per-servicing flag
is the final control of whether the actual servicing uses the feature). This gives slightly
more coverage in our CI.

In addition, ensure that the "many devices" test has enough resources to allocate from private
pool, so that keepalive will be used.

  • nvme_driver: format nvme errors to be more verbose
  • openvmm_entry: cmdline to enable nvme keepalive when running servicing
  • wip: turn on keepalive for the many devices servicing test
  • minor self / copilot pr feedback
  • turn on keepalive by default

Copilot AI review requested due to automatic review settings November 12, 2025 18:29
@mattkur mattkur requested review from a team as code owners November 12, 2025 18:29
@mattkur mattkur requested a review from a team November 12, 2025 18:29
Copilot finished reviewing on behalf of mattkur November 12, 2025 18:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables NVMe Keepalive by default in OpenHCL servicing operations to provide more test coverage in CI. The main changes include updating the default value of enable_nvme_keepalive from false to true in OpenHclServicingFlags, adjusting test resource configurations to support keepalive functionality, and improving NVMe error message formatting.

  • Changed OpenHclServicingFlags::enable_nvme_keepalive to default to true instead of requiring explicit opt-in
  • Enhanced the "many NVMe devices" test with increased memory, VPs, and private pool allocation to support keepalive testing
  • Improved NVMe error formatting to provide more detailed status code type and status code information

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
petri/src/vm/mod.rs Implements custom Default trait for OpenHclServicingFlags with enable_nvme_keepalive: true as the new default value
vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs Updates servicing tests to use the new default flags, simplifying test code by removing explicit enable_nvme_keepalive: true settings
vmm_tests/vmm_tests/tests/tests/x86_64/openhcl_linux_direct.rs Renames test to many_nvme_devices_servicing_heavy and configures it with 8GB memory, 4 VPs, and 64MB private pool to support keepalive functionality
vm/devices/storage/disk_nvme/nvme_driver/src/queue_pair.rs Enhances NVMe error display to show detailed status code types (SCT) and status codes (SC) for better debugging, covering all defined status code types
openhcl/underhill_core/src/dispatch/mod.rs Adds vf_keepalive_flag to the tracing span for better observability during NVMe manager save operations

smalis-msft
smalis-msft previously approved these changes Nov 12, 2025
@github-actions
Copy link

…that decides what is default and supported (or not)
Comment on lines +157 to +158
/// Get the default servicing flags (based on what this backend supports)
fn default_servicing_flags() -> OpenHclServicingFlags;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjones60 does this look like the right way to get some per-backend defaults?

Comment on lines +117 to +121
OpenHclServicingFlags {
enable_nvme_keepalive: true,
override_version_checks: false,
stop_timeout_hint_secs: None,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions
Copy link

(firmware.quirks().hyperv, VmmQuirks::default())
}

fn default_servicing_flags() -> OpenHclServicingFlags {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much work is needed for this support?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. I remember adding this check, but I don't quite know the delta. I will certainly be digging into this soon-ish.

@mattkur mattkur merged commit c24ff90 into microsoft:main Nov 13, 2025
56 checks passed
@mattkur mattkur deleted the more-ka-test branch November 13, 2025 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants