Skip to content

nvme_driver: don't lazily restore interrupts#3220

Merged
mattkur merged 2 commits intomicrosoft:mainfrom
mattkur:eager-restore
Apr 8, 2026
Merged

nvme_driver: don't lazily restore interrupts#3220
mattkur merged 2 commits intomicrosoft:mainfrom
mattkur:eager-restore

Conversation

@mattkur
Copy link
Copy Markdown
Contributor

@mattkur mattkur commented Apr 7, 2026

Works around a device-side issue: some devices cannot handle the case where interrupts are reprogrammed out of IV order.

Copilot AI review requested due to automatic review settings April 7, 2026 23:51
@mattkur mattkur requested review from a team as code owners April 7, 2026 23:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates NVMe servicing restore behavior to avoid “lazy” interrupt restoration by eagerly restoring I/O queues in a deterministic interrupt-vector order, working around device/VPci interrupt allocation ordering issues.

Changes:

  • Change I/O queue restore strategy to (by default) restore all queues sorted by MSI-X interrupt vector.
  • Add a saved-state flag (allow_lazy_restore) to optionally fall back to the prior eager+proto (lazy) restore split.
  • Persist allow_lazy_restore: Some(false) on save to force eager restore for now.

Comment on lines +1008 to +1014
.filter(|q| {
q.queue_data.qid == 1
|| !q.queue_data.handler_data.pending_cmds.commands.is_empty()
})
.flat_map(|q| -> Result<IoQueue<D>, anyhow::Error> {
let qid = q.queue_data.qid;
let cpu = q.cpu;
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterator::flat_map with a closure returning Result<_, _> will silently drop Err values (because Result implements IntoIterator). That means queue restore failures (e.g. interrupt mapping / IoQueue::restore) can be ignored and the restore may succeed with missing queues. Use a fallible collection pattern (e.g., map + collect::<Result<Vec<_>, _>>()?, or try_fold) so errors propagate.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-existing pattern — this PR didn't change the collection strategy. Worth fixing in a follow-up.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #3229 to track this and the other two pre-existing error handling items together.

Comment on lines +1112 to +1116
worker.io = sorted_io
.into_iter()
.flat_map(|q| -> Result<IoQueue<D>, anyhow::Error> {
let qid = q.queue_data.qid;
let cpu = q.cpu;
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above: this flat_map over Result will drop restore errors instead of propagating them, potentially leaving the driver partially restored without failing. Switch to a Result-propagating collection (collect::<Result<...>>()? / try_fold) so any queue restore error aborts restore.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same — pre-existing in the eager path.

Comment on lines +1133 to +1134
.expect("unable to find restored mem block")
.to_owned();
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restore uses expect("unable to find restored mem block"), which will panic if the saved state doesn't match the restored DMA allocations (corrupt/partial state, version skew, etc.). Prefer returning a structured restore error with context instead of panicking.

Suggested change
.expect("unable to find restored mem block")
.to_owned();
.cloned()
.ok_or_else(|| {
anyhow::anyhow!(
"unable to find restored mem block for {}, qid {}, cpu {}, mem_len {}, base_pfn {}",
pci_id,
qid,
cpu,
q.queue_data.mem_len,
q.queue_data.base_pfn
)
})?;

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-existing — the original code had the same expect. Agree the suggestion is better, but out of scope for this PR.

issuer: q.queue.issuer().clone(),
cpu: q.cpu,
};
this.io_issuers.per_cpu[q.cpu as usize].set(issuer).unwrap();
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q.cpu from saved state is used as an index (per_cpu[q.cpu as usize]) and .set(...).unwrap() assumes it is both in-range and not already set. With the new eager-restore-all-queues path, any mismatch between saved CPUs and cpu_count (or duplicate CPUs) will panic during restore. Validate q.cpu < cpu_count and handle duplicate/previously-set issuers by returning an error instead of panicking.

Suggested change
this.io_issuers.per_cpu[q.cpu as usize].set(issuer).unwrap();
let issuer_slot = this
.io_issuers
.per_cpu
.get(q.cpu as usize)
.ok_or_else(|| {
anyhow::anyhow!(
"saved queue {} has invalid cpu {} (cpu_count {})",
qid,
q.cpu,
this.io_issuers.per_cpu.len()
)
})?;
issuer_slot.set(issuer).map_err(|_| {
anyhow::anyhow!(
"duplicate io issuer restore for cpu {} on queue {}",
q.cpu,
qid
)
})?;

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-existing — identical .unwrap() existed in the original eager restore path. Saved state CPU values always come from our own prior save. Agree defensive validation would be better, but out of scope.

io,
qsize: worker_state.qsize,
max_io_queues: worker_state.max_io_queues,
allow_lazy_restore: Some(false), // For now, we always restore eagerly to work around device bugs.
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allow_lazy_restore is always saved as Some(false), and there is no code path that can persist Some(true). That makes the lazy-restore branch effectively dead and likely to rot. Consider either (a) removing the flag/branch for now, or (b) plumbing a real knob (config/feature/env) and adding coverage so the lazy path stays tested.

Suggested change
allow_lazy_restore: Some(false), // For now, we always restore eagerly to work around device bugs.
// For now, we always restore eagerly to work around device bugs.
// Don't persist an explicit `false` value for a mode that has no way
// to be enabled by current save paths; `None` preserves the default.
allow_lazy_restore: None,

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional. The lazy path is preserved so the restore side can opt back into it once the host-side IV ordering bug is fixed. Some(false) is explicit — it signals "we chose eager" rather than "this field wasn't set."

Comment on lines +1098 to +1111
// Eager restore path: restore ALL queues sorted by interrupt
// vector for ordered VPci allocation (MSI-X ordering workaround).
//
// Devnote: Safety of inline new_self_drained(): This loop is fully
// synchronous (no .await). Although IoQueue::restore() spawns
// queue handler tasks, they don't poll until the async runtime
// yields — which happens only after .collect() completes. So all
// new_self_drained() and new_draining() calls finish before any
// handler can fire the drain-complete signal. If this loop is ever
// refactored to be async, the waiters for empty queues must be
// pre-created (as done in the lazy path above).
let mut sorted_io: Vec<_> = saved_state.worker_data.io.iter().collect();
sorted_io.sort_by_key(|q| q.iv);

Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new restore behavior (restoring all IO queues sorted by interrupt vector to enforce VPci CreateInterruptMessage ordering) is not covered by tests in this crate. Add a unit/integration test that exercises servicing save/restore and asserts the interrupt mapping/allocation order or at least that all queues are restored eagerly (no proto queues) under the default setting.

Copilot generated this review using guidance from repository custom instructions.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Blocked by test infrastructure gaps tracked in #489 and #3219.

#[mesh(4)]
pub max_io_queues: u16,
/// Whether to allow lazy restore of IO queues that had no pending commands at the time of save.
#[mesh(5)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do we need to save this if it is always false? We can just initialize the variable to false every time

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, since a missing bool defaults to false, saving a false is overkill. I just wanted to be very explicit in the ternary: we know it's unsafe, we know it's safe, or we don't know (assume unsafe).

@mattkur mattkur merged commit 958b8a4 into microsoft:main Apr 8, 2026
61 checks passed
@mattkur mattkur deleted the eager-restore branch April 8, 2026 23:19
mattkur added a commit to mattkur/openvmm that referenced this pull request Apr 9, 2026
Works around a device-side issue: some devices cannot handle the case
where interrupts are reprogrammed out of IV order.

(cherry picked from commit 958b8a4)
moor-coding pushed a commit to moor-coding/openvmm that referenced this pull request Apr 13, 2026
Works around a device-side issue: some devices cannot handle the case
where interrupts are reprogrammed out of IV order.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants