Skip to content

[vsock] wait for device readiness#1112

Merged
papertigers merged 7 commits intomasterfrom
spr/papertigers/vsock-wait-for-device-readiness
Apr 13, 2026
Merged

[vsock] wait for device readiness#1112
papertigers merged 7 commits intomasterfrom
spr/papertigers/vsock-wait-for-device-readiness

Conversation

@papertigers
Copy link
Copy Markdown
Contributor

@papertigers papertigers commented Apr 13, 2026

Fixes #1110

Created using jj-spr 0.1.0
Created using jj-spr 0.1.0
Created using jj-spr 0.1.0
Created using jj-spr 0.1.0
Created using jj-spr 0.1.0
@papertigers papertigers marked this pull request as ready for review April 13, 2026 18:01
Copy link
Copy Markdown
Member

@iximeow iximeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice fix, this is an unfortunately subtle interaction and unfortunately hard to test...

// a new instance spec to the pre-existing propolis-server we will
// accumulate these "dead" threads, howerver we never do this in the
// actual product.
self.start_barrier.wait();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remarking for onlookers: we'd talked about setting this up for a Tokio watch or something where we can discover that the sender went away, or that the sender actually wanted to tear down this loop without ever starting it, but @papertigers wants to refactor some of this now that we've seen how this ends up actually used anyway.

so the structure of the event loop here even might not survive too long, and the interim of accumulating an idle thread when propolis-server is used in ways we don't use it seems fine for now.

Created using jj-spr 0.1.0
Created using jj-spr 0.1.0
@papertigers
Copy link
Copy Markdown
Contributor Author

Note that while testing this with @iximeow and @leftwo we ran into #1115

We were able to confirm this fix by duplicating line 607 at line 610

// Send synchronous start commands to all devices.
for (name, dev) in objects.device_map() {
info!(self.log, "sending start request to {}", name);
let res = dev.start();
if let Err(e) = res {
error!(
self.log, "device start() returned an error";
"device" => %name,
"error" => %e
);
return VmStartOutcome::Failed;
}
}

Before the change in 4a309f3 we crashed, and after 4a309f3 was applied we no longer crashed.

@papertigers papertigers merged commit bc489dd into master Apr 13, 2026
12 checks passed
@papertigers papertigers deleted the spr/papertigers/vsock-wait-for-device-readiness branch April 13, 2026 20:20
papertigers added a commit to oxidecomputer/omicron that referenced this pull request Apr 13, 2026
This bumps propolis to commit bc489dd.

This brings in the following:
- oxidecomputer/propolis#1112 **(the reason for
this PR)**
- oxidecomputer/propolis#1109
- oxidecomputer/propolis#1084
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

thread 'vsock-event-loop' panicked in some cases during vmm termination

2 participants