Skip to content

Harden ARM64 AHCI exec reads against signal interruption#405

Merged
ryanbreen merged 1 commit into
mainfrom
fix/arm64-ahci-ext2-concurrent-exec
Jun 1, 2026
Merged

Harden ARM64 AHCI exec reads against signal interruption#405
ryanbreen merged 1 commit into
mainfrom
fix/arm64-ahci-ext2-concurrent-exec

Conversation

@ryanbreen
Copy link
Copy Markdown
Owner

Summary

  • root-causes the relaxed ARM64 init failure to signal-interrupted AHCI completion waits during exec reads
  • adds an explicit uninterruptible Completion wait for already-issued hardware commands and uses it for AHCI slot-0 waits
  • removes the F19 init serialization workaround: bsshd starts before the service wave, and the per-service/bounce yield+nanosleep delays are gone

Evidence

  • Reproduction before fix, with init serialization removed: /bin/bwm exec failed with AHCI: interrupted immediately after xhci_counters exited/SIGCHLD arrived.
    • turn336-artifacts/relaxed-prefix.serial.log lines 425-428
  • Fixed fresh Parallels ARM64 boot: bsshd, heartbeat, bwm, telnetd, and bounce all start under relaxed ordering.
    • turn336-artifacts/fixed-relaxed.serial.log lines 381-487
  • Live fixed no-build replay reached ~79s uptime with sustained heartbeat and 82 bwm FPS samples in range 114-191.
    • turn336-artifacts/fixed-relaxed-live.serial.log
    • turn336-artifacts/fixed-relaxed-live.screenshot.png shows Bounce on the BWM desktop
  • Clean build check: ./run.sh --parallels --test 70 produced no compile-stage warning/error lines via grep -E "^(warning|error)" turn336-artifacts/fixed-relaxed-run.log.
  • Negative checks: no AHCI: interrupted, AHCI timeout, panic, lockup, SCHED_RESCUE, or VIRTGPU_FAIL in fixed serial logs.

Notes

  • cargo fmt is blocked by pre-existing trailing whitespace in tests/shared_qemu.rs; the touched files were formatted directly with rustfmt.

Fixes breenix-4yu

Root cause under relaxed ARM64 init ordering was a submitted AHCI read returning EINTR when SIGCHLD arrived during exec. The driver then cleared port ownership while slot 0 could still be in flight, making the next exec read race the old command/DMA lifecycle.\n\nAdd an explicit uninterruptible completion wait for already-issued hardware commands and use it for AHCI slot-0 waits. Relax the ARM64 init serialization workaround so bsshd starts before the boot service wave and the per-service sleeps are gone.\n\nCo-authored-by: Ryan Breen <ryan@ryanbreen.com>\nCo-authored-by: Claude Code <noreply@anthropic.com>
@ryanbreen ryanbreen merged commit dda080d into main Jun 1, 2026
@ryanbreen ryanbreen deleted the fix/arm64-ahci-ext2-concurrent-exec branch June 1, 2026 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant