-
Notifications
You must be signed in to change notification settings - Fork 63
Fix SMP=4 boot hang regression #111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 1 file
Prompt for AI agents (all 1 issues)
Understand the root cause of the following 1 issues and fix them.
<file name="main.c">
<violation number="1" location="main.c:1217">
`peripheral_update_ctr` never exceeds 64 because it is a countdown reset to 64 each tick, so this `> BOOT_SETTLE_ITERATIONS` check never succeeds and `boot_complete` stays false. That keeps the timer and UART fds permanently active, undoing the post-boot idle optimization this block is supposed to preserve.</violation>
</file>
React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.
| const uint64_t BOOT_SETTLE_ITERATIONS = 5000; | ||
| bool boot_complete = | ||
| all_harts_started && | ||
| (emu->peripheral_update_ctr > BOOT_SETTLE_ITERATIONS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
peripheral_update_ctr never exceeds 64 because it is a countdown reset to 64 each tick, so this > BOOT_SETTLE_ITERATIONS check never succeeds and boot_complete stays false. That keeps the timer and UART fds permanently active, undoing the post-boot idle optimization this block is supposed to preserve.
Prompt for AI agents
Address the following comment on main.c at line 1217:
<comment>`peripheral_update_ctr` never exceeds 64 because it is a countdown reset to 64 each tick, so this `> BOOT_SETTLE_ITERATIONS` check never succeeds and `boot_complete` stays false. That keeps the timer and UART fds permanently active, undoing the post-boot idle optimization this block is supposed to preserve.</comment>
<file context>
@@ -1203,10 +1203,20 @@ static int semu_run(emu_state_t *emu)
+ const uint64_t BOOT_SETTLE_ITERATIONS = 5000;
+ bool boot_complete =
+ all_harts_started &&
+ (emu->peripheral_update_ctr > BOOT_SETTLE_ITERATIONS);
+ bool harts_active = (vm->n_hart == 1) || !boot_complete ||
+ (idle_harts == 0);
</file context>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
peripheral_update_ctr is a countdown timer (64→0→64) and will never exceed 5000, meaning boot_complete stays false permanently. This breaks the intended idle optimization.
d1eaca7 to
2a8451a
Compare
After the introduction of #110, the adaptive timer/UART fd registration logic would exclude timer fd monitoring when all harts entered WFI state during early boot. This created a deadlock: harts waited for timer interrupts, but the timer fd was not being polled, preventing wakeup. Symptom: - SMP=4 hung after "smp: Brought up 1 node, 4 CPUs" (49 lines of output) - Never reached "clocksource: Switched to clocksource" or login prompt - SMP=1 continued to work correctly This commit introduces boot completion heuristic using peripheral_update_ctr. Consider boot "incomplete" for the first 5000 scheduler iterations after all harts start. During this period, always keep timer and UART fds active to ensure harts can receive timer interrupts even when temporarily in WFI.
2a8451a to
523748e
Compare
Problem
#110 introduced a regression where SMP=4 configurations hang during boot after SMP initialization completes.
Symptoms
[0.049151] smp: Brought up 1 node, 4 CPUs(49 lines of output)clocksource: Switched to clocksourceor login promptRoot Cause
The adaptive timer/UART fd registration logic excluded timer fd from poll() monitoring when all harts entered WFI state. During early boot, the kernel temporarily puts all harts in WFI while waiting for timer interrupts. This created a deadlock:
idle_harts == started_harts, setsharts_active = falseFix
Introduce boot completion heuristic using
peripheral_update_ctr:During the first 5000 scheduler iterations after SMP initialization, always keep timer and UART fds active. This ensures harts can receive timer interrupts even when temporarily in WFI during early boot.
Summary by cubic
Fixes the SMP=4 boot hang introduced in PR #110 by keeping timer and UART polling active during early boot to avoid a WFI deadlock. Boots reliably on SMP=1 and SMP=4, with no impact to post-boot idle optimizations.
Written for commit 523748e. Summary will update automatically on new commits.