-
Notifications
You must be signed in to change notification settings - Fork 63
Implement event-driven UART coroutine with CPU optimization #110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
860d661 to
de11dde
Compare
dff7839 to
4aed1b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 5 files
Prompt for AI agents (all 1 issues)
Understand the root cause of the following 1 issues and fix them.
<file name="main.c">
<violation number="1" location="main.c:1236">
poll_timeout is forced to -1 even when no fds are registered, so once all started harts enter WFI the scheduler calls poll(0, -1) and the emulator hangs indefinitely.</violation>
</file>
React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 issues found across 5 files
Prompt for AI agents (all 2 issues)
Understand the root cause of the following 2 issues and fix them.
<file name="main.c">
<violation number="1" location="main.c:1200">
Blocking removes the timerfd when any hart idles, causing poll(-1) with no FDs and permanently freezing all harts in WFI.</violation>
<violation number="2" location="main.c:1241">
Sleeping 10ms whenever any hart is idle throttles active harts to ~6k instr/s, effectively freezing the guest.</violation>
</file>
React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.
d49b574 to
fb9ff5b
Compare
|
Thank @cubic-dev-ai for reviewing. I have addressed both safety concerns: Violation 1: Deadlock Prevention ✅Issue: Fix: Added safety guard in timeout logic (main.c:1242): if (pfd_count > 0 &&
(started_harts == 0 || idle_harts == started_harts)) {
poll_timeout = -1;
}Now blocking timeout (-1) is only used when:
When Violation 2: SMP Boot Throttling ✅Issue: 10ms timeout during boot could throttle active harts. Fix: SMP boot safety checks ensure timer/UART always included during boot (main.c:1199, 1223): bool all_harts_started = (started_harts >= vm->n_hart);
bool harts_active = !all_harts_started || (idle_harts == 0);
bool need_uart = !all_harts_started || (idle_harts == 0) || emu->uart.has_waiting_hart;During SMP boot (
Validation Results
The commit message has been updated with detailed safety explanations. |
4158ce3 to
ce3c7ad
Compare
This commit implements an event-driven UART handling mechanism using coroutines and kqueue/poll, significantly reducing CPU usage during idle periods while maintaining responsiveness. Key improvements: 1. Hart state-aware scheduling: Track WFI states to optimize polling 2. Conditional fd registration: Only monitor active event sources 3. Adaptive timeout strategy: Three-tier polling (0/10/-1ms) 4. Race condition fix: Clear in_wfi in interrupt handlers 5. SMP boot safety: Defer optimization until all harts started 6. Deadlock prevention: Guard against poll(0, -1) with no fds Performance results: - CPU usage: 20% → 0.3% (98.5% reduction)
Root cause: After PR #110 (coro-uart), the adaptive timer/UART fd registration logic would exclude timer fd monitoring when all harts entered WFI state during early boot. This created a deadlock: harts waited for timer interrupts, but the timer fd wasn't being polled, preventing wakeup. Symptom: - SMP=4 hung after "smp: Brought up 1 node, 4 CPUs" (49 lines of output) - Never reached "clocksource: Switched to clocksource" or login prompt - SMP=1 continued to work correctly Fix: Introduce boot completion heuristic using peripheral_update_ctr. Consider boot "incomplete" for the first 5000 scheduler iterations after all harts start. During this period, always keep timer and UART fds active to ensure harts can receive timer interrupts even when temporarily in WFI. Verification: - SMP=1: Boots successfully to login prompt ✓ - SMP=4: Now completes boot to "Run /init" and login ✓ - Pre-fix SMP=4: Hung at line 49 ✗ - Pre-regression (4552c62): Worked correctly ✓ The fix preserves PR #110's CPU optimization benefits (0.3% idle usage) while ensuring multi-core boot reliability. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
After the introduction of #110, the adaptive timer/UART fd registration logic would exclude timer fd monitoring when all harts entered WFI state during early boot. This created a deadlock: harts waited for timer interrupts, but the timer fd wasn't being polled, preventing wakeup. Symptom: - SMP=4 hung after "smp: Brought up 1 node, 4 CPUs" (49 lines of output) - Never reached "clocksource: Switched to clocksource" or login prompt - SMP=1 continued to work correctly This commit introduces boot completion heuristic using peripheral_update_ctr. Consider boot "incomplete" for the first 5000 scheduler iterations after all harts start. During this period, always keep timer and UART fds active to ensure harts can receive timer interrupts even when temporarily in WFI.
After the introduction of #110, the adaptive timer/UART fd registration logic would exclude timer fd monitoring when all harts entered WFI state during early boot. This created a deadlock: harts waited for timer interrupts, but the timer fd wasn't being polled, preventing wakeup. Symptom: - SMP=4 hung after "smp: Brought up 1 node, 4 CPUs" (49 lines of output) - Never reached "clocksource: Switched to clocksource" or login prompt - SMP=1 continued to work correctly This commit introduces boot completion heuristic using peripheral_update_ctr. Consider boot "incomplete" for the first 5000 scheduler iterations after all harts start. During this period, always keep timer and UART fds active to ensure harts can receive timer interrupts even when temporarily in WFI.
After the introduction of #110, the adaptive timer/UART fd registration logic would exclude timer fd monitoring when all harts entered WFI state during early boot. This created a deadlock: harts waited for timer interrupts, but the timer fd wasn't being polled, preventing wakeup. Symptom: - SMP=4 hung after "smp: Brought up 1 node, 4 CPUs" (49 lines of output) - Never reached "clocksource: Switched to clocksource" or login prompt - SMP=1 continued to work correctly This commit introduces boot completion heuristic using peripheral_update_ctr. Consider boot "incomplete" for the first 5000 scheduler iterations after all harts start. During this period, always keep timer and UART fds active to ensure harts can receive timer interrupts even when temporarily in WFI.
After the introduction of #110, the adaptive timer/UART fd registration logic would exclude timer fd monitoring when all harts entered WFI state during early boot. This created a deadlock: harts waited for timer interrupts, but the timer fd was not being polled, preventing wakeup. Symptom: - SMP=4 hung after "smp: Brought up 1 node, 4 CPUs" (49 lines of output) - Never reached "clocksource: Switched to clocksource" or login prompt - SMP=1 continued to work correctly This commit introduces boot completion heuristic using peripheral_update_ctr. Consider boot "incomplete" for the first 5000 scheduler iterations after all harts start. During this period, always keep timer and UART fds active to ensure harts can receive timer interrupts even when temporarily in WFI.
Critical Issues Solved
Solution: Event-Driven Scheduling with Adaptive Polling
Achieves 98.5% CPU reduction (from 20% to 0.3%) through:
Performance Results
Implementation Details
1. Hart State Observation
Observe hart states BEFORE resuming them to prevent race conditions:
2. Conditional Timer Inclusion
Only add timer to poll() when ALL harts are active:
Impact: Prevents 1ms timer from waking poll() when harts are idle
3. Conditional UART Inclusion
Only add UART to poll() when needed:
Impact: Prevents TTY stdin (always "readable") from preventing sleep
4. Three-Tier Adaptive Timeout
Impact: Handles SMP partial idle state (common in Linux)
5. Unconditional poll() Call
Impact: Always call poll(), use it as sleep when no fds (pfd_count==0)
6. WFI Race Condition Fix
Clear
in_wfiflag in interrupt handlers:Impact: Prevents scheduler from seeing stale WFI state after interrupt
7. UART Coroutine Support
Hart yields when no stdin data available:
Design Rationale
Why Three-Tier Timeout Instead of Pure Event-Driven?
Problem: Linux SMP systems rarely have ALL harts idle simultaneously
Solution: Three-tier strategy handles partial idle
Result: Handles real-world SMP behavior, achieving 0.3% CPU
Why Conditional Timer/UART Inclusion?
Timer Problem: 1ms kqueue timer prevents poll() from sleeping
UART Problem: TTY stdin always reports "readable"