Fix Cuttlefish multi-instance restart deadlocks, socket hangs, and UWB/bootconfig configuration mismatches by SuperStrongDinosaur · Pull Request #2581 · google/android-cuttlefish

SuperStrongDinosaur · 2026-05-19T08:55:30Z

Description
This change addresses a set of critical deadlocks, resource-leak hangs, and configuration-mapping bugs encountered in multi-instance CVD deployments, specifically during a cvd restart.

By resolving these architectural synchronization issues, single instances within a multi-instance group can now be safely restarted independently without hanging or crashing other running instances, and without deadlocking host-side shared daemons.

Fixes

Prevent FIFO Unlinking Deadlocks (SharedFD::Fifo & DeleteFifos )

Issue: Previously, SharedFD::Fifo always deleted the path before calling mkfifo(). In a multi-instance setup, global host-side daemons are started once and hold open connections to the VM instances. Unlinking these paths on a single-instance restart destroys the inode mapping, meaning the restarted instance's crosvm would construct new FIFOs that the active netsimd daemon has no knowledge of. This caused crosvm to hang indefinitely on startup, waiting for a reader/writer connection that would never come.

Solution: Modified SharedFD::Fifo to perform a stat() check on the target path first. If the file already exists and is verified to be a FIFO, it is opened directly instead of unlinking and recreating it. Removed bt_fifo_vm, nfc_fifo_vm, and uwb_fifo_vm from the unlinking sequence in ServerLoopImpl::DeleteFifos() to ensure their persistent paths are preserved across instance-specific lifecycles.

Ensure Proper FIFO Provisioning for Shared UWB Services (Files affected: uwb_connector.cpp)

Issue: The UWB host-connector logic only created the uwb_fifo_vm FIFOs if instance.enable_host_uwb_connector() was evaluated to true. If config.enable_host_uwb() was true but the specific instance did not run the connector, the missing FIFOs would crash host-side daemons or break crosvm initialization.

Solution: Decoupled the FIFO generation from the launcher command check. The FIFOs are now always initialized if global UWB is enabled, while the local host-connector service itself is only spawned if the instance enable_host_uwb_connector() is enabled.

Fix ProcessMonitor Shutdown Socket Hangs (Files affected: process_monitor.cc, process_monitor.h)

Issue: During a shutdown or restart event, ProcessMonitor could hang waiting on socket reads in ReadMonitorSocketLoop because the control channel remained blocked. If the socket connection returned an error or was half-closed, the loop could fail or crash rather than exiting cleanly.

Solution: Retained the child socket's raw file descriptor (child_sock_) within the ProcessMonitor class. Added an explicit child_sock_->Shutdown(SHUT_RDWR) call at the end of the MonitorRoutine execution to actively force-unblock any outstanding or stuck reads on the socket during shutdown. Hardened ReadMonitorSocketLoop to ignore read errors gracefully if the monitor has already been marked as shutting down (!running.load()).

Testing and Verification
The stability of these fixes was successfully validated under multi-instance conditions:

Multi-Instance Booting

Booted two parallel VM instances using the locally compiled binaries:
cvd create --config=sdv_core_instance1 --num_instances=2

Both instances initialized, completed guest execution, and reached the VIRTUAL_DEVICE_BOOT_COMPLETED signal cleanly.

Single-Instance Restarting (No Deadlocks)

Executed a single-instance restart command targeting instance 2:
cvd --group_name=cvd_1 --instance_name=2 restart

Instance 2 successfully halted its virtual processes, safely terminated crosvm and its children, preserved the shared Bluetooth/UWB/NFC FIFOs, and booted back to VIRTUAL_DEVICE_BOOT_COMPLETED.

Instance 1 remained completely undisturbed and functional during the entire restart cycle.

Verified using cvd status --print that both instances returned to a healthy Running state.

b/510634395

…B/bootconfig configuration mismatches

SuperStrongDinosaur marked this pull request as ready for review May 19, 2026 09:56

SuperStrongDinosaur requested review from Databean and jemoreira May 19, 2026 09:57

SuperStrongDinosaur added bug kokoro:run Run e2e tests. labels May 19, 2026

GoogleCuttlefishTesterBot removed the kokoro:run Run e2e tests. label May 19, 2026

SuperStrongDinosaur changed the title ~~Fix Cuttlefish multi-instance restart deadlocks, socket hangs, and UW…~~ Fix Cuttlefish multi-instance restart deadlocks, socket hangs, and UWB/bootconfig configuration mismatches May 19, 2026

Databean approved these changes May 19, 2026

View reviewed changes

Comment thread base/cvd/cuttlefish/common/libs/fs/shared_fd.cpp Outdated

jemoreira reviewed May 19, 2026

View reviewed changes

Comment thread base/cvd/cuttlefish/common/libs/fs/shared_fd.cpp Outdated

SuperStrongDinosaur requested review from Databean and jemoreira May 20, 2026 12:07

SuperStrongDinosaur force-pushed the restartHangFix branch from 2448508 to d246800 Compare May 21, 2026 13:36

Databean approved these changes May 23, 2026

View reviewed changes

jemoreira approved these changes May 26, 2026

View reviewed changes

SuperStrongDinosaur added the kokoro:run Run e2e tests. label May 27, 2026

GoogleCuttlefishTesterBot removed the kokoro:run Run e2e tests. label May 27, 2026

SuperStrongDinosaur enabled auto-merge May 27, 2026 09:03

3405691582 added the kokoro:force-run Trigger a presubmit build unconditionally. label May 27, 2026

GoogleCuttlefishTesterBot removed the kokoro:force-run Trigger a presubmit build unconditionally. label May 27, 2026

SuperStrongDinosaur added the kokoro:force-run Trigger a presubmit build unconditionally. label May 28, 2026

GoogleCuttlefishTesterBot removed the kokoro:force-run Trigger a presubmit build unconditionally. label May 28, 2026

SuperStrongDinosaur added the kokoro:force-run Trigger a presubmit build unconditionally. label May 28, 2026

SuperStrongDinosaur added 3 commits May 28, 2026 11:57

Fix Cuttlefish multi-instance restart deadlocks, socket hangs, and UW…

e9c8c3a

…B/bootconfig configuration mismatches

Reading while there is data to read in FIFO

bd14a99

Centralize FIFO reuse and draining logic, migrate IPC connectors

7102151

GoogleCuttlefishTesterBot removed the kokoro:force-run Trigger a presubmit build unconditionally. label May 28, 2026

SuperStrongDinosaur force-pushed the restartHangFix branch from d246800 to 7102151 Compare May 28, 2026 09:57

SuperStrongDinosaur added the kokoro:force-run Trigger a presubmit build unconditionally. label May 28, 2026

GoogleCuttlefishTesterBot removed the kokoro:force-run Trigger a presubmit build unconditionally. label May 28, 2026

SuperStrongDinosaur added this pull request to the merge queue May 28, 2026

Merged via the queue into google:main with commit b8d9225 May 28, 2026
32 checks passed

SuperStrongDinosaur deleted the restartHangFix branch May 28, 2026 12:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Cuttlefish multi-instance restart deadlocks, socket hangs, and UWB/bootconfig configuration mismatches#2581

Fix Cuttlefish multi-instance restart deadlocks, socket hangs, and UWB/bootconfig configuration mismatches#2581
SuperStrongDinosaur merged 3 commits into
google:mainfrom
SuperStrongDinosaur:restartHangFix

SuperStrongDinosaur commented May 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

SuperStrongDinosaur commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

SuperStrongDinosaur commented May 19, 2026 •

edited

Loading