-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PLIC interrupts stop working after some time #59
Comments
|
Can you please share the Haiku image/fw in question to help reproducing it? I haven't seen such thing on Linux, so my best guess is blindly looking at plic code/dox |
Where can I upload it? It is about 35 MB. |
https://temp.sh/, or really any similar service you'd like I can also give sshfs access to some vm |
Does it run? It should boot to desktop, but no mouse/keyboard working (only show PS/2 output in kernel log). |
It does fine, after applying the ATA patch; I have reproduced the issue by actively spamming input at the Haiku boot time, but if I touch input only after it reaches the desktop it's fine. Will investigate more. Could it be an issue with Haiku somehow not resetting interrupts when finishing the boot process? |
It stop receive interrupts even after reaching desktop for me if moving a mouse for 2-3 minutes. |
This is how extern interrupts are processed: |
Managed to reproduce it. What was interesting, is that initially I stopped doing any input, but For now I'll keep debugging the state of devices |
From my debugging results (see second log) HART busy flag is sometimes not cleared causing all PLIC interrupts to stop working. Timer interrupt seems still alive because CPU load meter tray icon keep changing. |
Ahh I see. Thanks. @cerg2010cerg2010, could you also have a look at that, please? You're the one who initially wrote these devices, although I have no problem completely adopting them if needed. That PLIC device needs huge refactoring anyways.
They are unrelated to PLIC, it's the aclint-mtimer who is responsible for that. |
Side note: SMP seems completely broken with this setup, 4 cores crash the Haiku bootloader. With 2 cores, second one spins somewhere indefinitely. |
It is expected because haiku_loader.riscv is not yet SMP-aware. SMP currently work only with haiku_loader.efi. |
Alright, fine. Just please do some kind of protection like picking a boot hart (using atomic lottery) and place all others in a WFI sleep. This is all what OpenSBI does, the harts are then woken up by IPI after the control is given to the next boot stage. |
It seems like at some point a hart is interrupted, but never bothers reading PLIC claim register. |
Does it actually call interrupt vector (STVEC)? |
It doesn't. Re-enabling pending interrupts in xIE CSR should dispatch them though, I'll see if that's the case. |
I don't see signs of writing into sie register at all, however.
|
It is i8042 PS/2 controller driver that is not used with RVVM. |
It is set early and not touched after that: https://github.com/haiku/haiku/blob/34e92438724cdb062dae1765fc7e765b44f51ac7/src/system/kernel/arch/riscv64/arch_cpu.cpp#L40. |
|
@X547 It's could be a bug in your M-mode bootloader.
CPU enters M-mode due to M-mode timer interrupt. It writes to mip CSR to push timer interrupt down to S-mode (presumably, as that's what OpenSBI also does), but zeroes pending external interrupt as well. It never fires because of that. Perhaps you should use atomic CSR operation to set bits, instead of read/write. Otherwise an external interrupt might come in between your read/write operations. This is why atomic CSR operations to set/clear bits exist at all. |
@X547 there are major power outages where I live, my phone is discharging so I could be offline soon for god knows how long. I hope the provided information about the culpit is enough so far, good luck. |
Maybe NVMe driver bug or strange RVVM virtual NVMe device behavior. |
So it is basically a race condition when setting/clearing flag by reading, changing and writing back? |
Yeah, that's it. You should use csrrs/csrrc instructions for setting/clearing bits in IP register. Otherwise you are subject to an interrupt data race in between RMW operation on it. Why that didn't happen in Temu/QEMU - no idea, either their input devices don't send so much interrupts as PS2 or their interrupt granularity is lower. Anyways that's unpredictable and should be fixed (Especially for real HW). |
It could be NVMe emulation bug, I'll investigate that (Have some admin command related trouble on mind). This also probably should be better handler by the Haiku driver - faulty devices should not cause kernel crashes when they are avoidable, and we want some warning, right? |
Filled bug report: https://dev.haiku-os.org/ticket/18093 |
If you will confirm this is fixable on M-mode bootloader side, we should close this issue (notabug). About NVMe troubles, I'll either simply provide patches soon or open another one. |
Seems fixed with following patch: diff --git a/headers/private/system/arch/riscv64/arch_cpu_defs.h b/headers/private/system/arch/riscv64/arch_cpu_defs.h
index 67b8c96307..abf993f7c7 100644
--- a/headers/private/system/arch/riscv64/arch_cpu_defs.h
+++ b/headers/private/system/arch/riscv64/arch_cpu_defs.h
@@ -222,6 +222,10 @@ static B_ALWAYS_INLINE uint64 Mip() {
uint64 x; asm volatile("csrr %0, mip" : "=r" (x)); return x;}
static B_ALWAYS_INLINE void SetMip(uint64 x) {
asm volatile("csrw mip, %0" : : "r" (x));}
+static B_ALWAYS_INLINE void SetBitsMip(uint64 x) {
+ asm volatile("csrs mip, %0" : : "r" (x));}
+static B_ALWAYS_INLINE void ClearBitsMip(uint64 x) {
+ asm volatile("csrc mip, %0" : : "r" (x));}
static B_ALWAYS_INLINE uint64 Sip() {
uint64 x; asm volatile("csrr %0, sip" : "=r" (x)); return x;}
static B_ALWAYS_INLINE void SetSip(uint64 x) {
@@ -236,6 +240,10 @@ static B_ALWAYS_INLINE uint64 Mie() {
uint64 x; asm volatile("csrr %0, mie" : "=r" (x)); return x;}
static B_ALWAYS_INLINE void SetMie(uint64 x) {
asm volatile("csrw mie, %0" : : "r" (x));}
+static B_ALWAYS_INLINE void SetBitsMie(uint64 x) {
+ asm volatile("csrs mie, %0" : : "r" (x));}
+static B_ALWAYS_INLINE void ClearBitsMie(uint64 x) {
+ asm volatile("csrc mie, %0" : : "r" (x));}
// exception delegation
static B_ALWAYS_INLINE uint64 Medeleg() {
diff --git a/src/system/boot/platform/riscv/traps.cpp b/src/system/boot/platform/riscv/traps.cpp
index 649ec29ee3..968e9fec03 100644
--- a/src/system/boot/platform/riscv/traps.cpp
+++ b/src/system/boot/platform/riscv/traps.cpp
@@ -128,12 +128,12 @@ MTrap(iframe* frame)
enable, frame->a2);
*/
// dprintf(" mtime: %" B_PRIu64 "\n", gClintRegs->mTime);
- SetMip(Mip() & ~(1 << sTimerInt));
+ ClearBitsMip(1 << sTimerInt);
if (!enable) {
- SetMie(Mie() & ~(1 << mTimerInt));
+ ClearBitsMie(1 << mTimerInt);
} else {
gClintRegs->mtimecmp[0] = frame->a2;
- SetMie(Mie() | (1 << mTimerInt));
+ SetBitsMie(1 << mTimerInt);
}
frame->a0 = B_OK;
return;
@@ -145,8 +145,8 @@ MTrap(iframe* frame)
break;
}
case causeInterrupt + mTimerInt: {
- SetMie(Mie() & ~(1 << mTimerInt));
- SetMip(Mip() | (1 << sTimerInt));
+ ClearBitsMie(1 << mTimerInt);
+ SetBitsMip(1 << sTimerInt);
return;
}
} |
Yes, I also cannot reproduce the issue with the updated bootloader. Great) Btw, is this bootloader planned to be for Haiku only? It sounds like a cool thing to have for other guests, something like a more advanced firmware than SBI/UBoot (It has UI and multiple storage drivers, I like that. SBI spec could be complicated tho.) |
Yes. In theory it can be improved to load Linux/FreeBSD kernel, construct kernel args from menu etc., but it will become out of scope of Haiku project and need fork. |
Alright, I see. Good luck on that) |
Tested on Haiku host and guest, no SMP guest.
PS/2 mouse received interrupts and data log:
The text was updated successfully, but these errors were encountered: