-
Notifications
You must be signed in to change notification settings - Fork 154
WFI doesn't return when a IPI is issued #132
Comments
Using |
I tested the fix that I proposed on IRC and I agree that it's not safe to reference mie in another thread. I tested the change and had rcu messages after a few hours under load. Currently, we need to raise the interrupt even if it is unmasked unless the mie CSR is also updated to raise and lower interrupts (atomically) i.e. when the mie interrupt enable mask changes. @atishp04 do you have mie.SSIE enabled on the sleeping thread? there could be a case of a missed edge. i.e. cpu_interrupt(CPU(cpu), CPU_INTERRUPT_HARD); is called but cpu_has_work returns to sleep. IPIs must be working in general, so it is puzzling? |
@michaeljclark . Do you mean sleeping thread in kernel or QEMU perspective? The cpu that is about to offline, disables all interrupt except software interrupt (SSIE is enabled) and calls WFI. IPI is issued from another cpu during online operation. I guess QEMU manages every cpu in a separate thread and put it to sleep for WFI instruction. If that's the case mie.SSIE would be enabled on the sleeping thread. |
You could consider trying something like this:
I don't have time to test right now because it takes approximately 24 hours (at least 2 hours) under heavy parallel load to make sure we don't get rcu_sched self-detected stall on CPU. As @sorear mentioned, accessing env->mie from another thread is not safe. I've actually updated the code a little to make updates truly atomic but I have also been very careful to maintain the same logic as before (besides adding tighter atomicity). The prior code made two calls to
The changes to the CSR code were required because SiFive have work that requires atomic CSRs. I'm working on a branch that I will be submitting to qemu-devel for review. The new CSR implementation allows model specific CSRs which can be truly atomic (the prior mechanism could return CSR values that were different to the ones it updated; this is particularly important for CSRs that set or clear bits): |
I have a spare machine and I can leave it running for days. |
I was able to trigger the rcu_sched self-detected stall on CPU during a gcc bootstrap on a 4 CPU Linux SMP instance running the Fedora image. The fedora image has enough tools to perform a gcc boostrap. You can download the Fedora image I used a point to point bridge with a private ip and masquarading (SNAT) so that the Fedora image (192.168.0.2) can reach the Internet via a bridge (192.168.0.1) on the host. This is required so that On the x86_64 Linux host (4 vCPU c5.xlarge AWS instance):
Expand the Fedora image disk before using it as the default 4GB is too small for a gcc bootstrap:
In the Fedora RISC-V guest:
I kept the QEMU instance running in screen and checked the console for kernel messages. The bootstrap took 22 hours. When I applied the patch I sent your via irc, I saw a rcu stall message after 2-4 hours. I think if we take the mie masking out and try
We might have better luck... Let me know... If I get time I can try it at some point. I also measured idle cpu before and after applying patches, etc, and did some other performance testing... |
I got a OOM Trying with higher memory size(16G) for qemu |
@michaeljclark : The fix seems to work. GCC bootstrap went successfully without any rcu stalls. Let me know if you want me to run any other workload for verificaiton |
@michaeljclark Just wanted to check if you need more testing or the patch can be merged. |
This effectively changes riscv_cpu_update_mip from edge to level. i.e. cpu_interrupt or cpu_reset_interrupt are called regardless of the current interrupt level. Fixes WFI doesn't return when a IPI is issued: - riscvarchive#132 To test: 1) Apply RISC-V Linux CPU hotplug patch: - http://lists.infradead.org/pipermail/linux-riscv/2018-May/000603.html 2) Enable CONFIG_CPU_HOTPLUG in linux .config 3) Try to offline and online cpus: echo 1 > /sys/devices/system/cpu/cpu2/online echo 0 > /sys/devices/system/cpu/cpu2/online echo 1 > /sys/devices/system/cpu/cpu2/online Reported-by: Atish Patra <atishp04@gmail.com> Cc: Atish Patra <atishp04@gmail.com> Cc: Alistair Francis <Alistair.Francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com>
This effectively changes riscv_cpu_update_mip from edge to level. i.e. cpu_interrupt or cpu_reset_interrupt are called regardless of the current interrupt level. Fixes WFI doesn't return when a IPI is issued: - #132 To test: 1) Apply RISC-V Linux CPU hotplug patch: - http://lists.infradead.org/pipermail/linux-riscv/2018-May/000603.html 2) Enable CONFIG_CPU_HOTPLUG in linux .config 3) Try to offline and online cpus: echo 1 > /sys/devices/system/cpu/cpu2/online echo 0 > /sys/devices/system/cpu/cpu2/online echo 1 > /sys/devices/system/cpu/cpu2/online Reported-by: Atish Patra <atishp04@gmail.com> Cc: Atish Patra <atishp04@gmail.com> Cc: Alistair Francis <Alistair.Francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com>
This effectively changes riscv_cpu_update_mip from edge to level. i.e. cpu_interrupt or cpu_reset_interrupt are called regardless of the current interrupt level. Fixes WFI doesn't return when a IPI is issued: - #132 To test: 1) Apply RISC-V Linux CPU hotplug patch: - http://lists.infradead.org/pipermail/linux-riscv/2018-May/000603.html 2) Enable CONFIG_CPU_HOTPLUG in linux .config 3) Try to offline and online cpus: echo 1 > /sys/devices/system/cpu/cpu2/online echo 0 > /sys/devices/system/cpu/cpu2/online echo 1 > /sys/devices/system/cpu/cpu2/online Reported-by: Atish Patra <atishp04@gmail.com> Cc: Atish Patra <atishp04@gmail.com> Cc: Alistair Francis <Alistair.Francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com>
This effectively changes riscv_cpu_update_mip from edge to level. i.e. cpu_interrupt or cpu_reset_interrupt are called regardless of the current interrupt level. Fixes WFI doesn't return when a IPI is issued: - #132 To test: 1) Apply RISC-V Linux CPU hotplug patch: - http://lists.infradead.org/pipermail/linux-riscv/2018-May/000603.html 2) Enable CONFIG_CPU_HOTPLUG in linux .config 3) Try to offline and online cpus: echo 1 > /sys/devices/system/cpu/cpu2/online echo 0 > /sys/devices/system/cpu/cpu2/online echo 1 > /sys/devices/system/cpu/cpu2/online Reported-by: Atish Patra <atishp04@gmail.com> Cc: Atish Patra <atishp04@gmail.com> Cc: Alistair Francis <Alistair.Francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com>
This effectively changes riscv_cpu_update_mip from edge to level. i.e. cpu_interrupt or cpu_reset_interrupt are called regardless of the current interrupt level. Fixes WFI doesn't return when a IPI is issued: - #132 To test: 1) Apply RISC-V Linux CPU hotplug patch: - http://lists.infradead.org/pipermail/linux-riscv/2018-May/000603.html 2) Enable CONFIG_CPU_HOTPLUG in linux .config 3) Try to offline and online cpus: echo 1 > /sys/devices/system/cpu/cpu2/online echo 0 > /sys/devices/system/cpu/cpu2/online echo 1 > /sys/devices/system/cpu/cpu2/online Reported-by: Atish Patra <atishp04@gmail.com> Cc: Atish Patra <atishp04@gmail.com> Cc: Alistair Francis <Alistair.Francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com>
This effectively changes riscv_cpu_update_mip from edge to level. i.e. cpu_interrupt or cpu_reset_interrupt are called regardless of the current interrupt level. Fixes WFI doesn't return when a IPI is issued: - #132 To test: 1) Apply RISC-V Linux CPU hotplug patch: - http://lists.infradead.org/pipermail/linux-riscv/2018-May/000603.html 2) Enable CONFIG_CPU_HOTPLUG in linux .config 3) Try to offline and online cpus: echo 1 > /sys/devices/system/cpu/cpu2/online echo 0 > /sys/devices/system/cpu/cpu2/online echo 1 > /sys/devices/system/cpu/cpu2/online Reported-by: Atish Patra <atishp04@gmail.com> Cc: Atish Patra <atishp04@gmail.com> Cc: Alistair Francis <Alistair.Francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com>
This effectively changes riscv_cpu_update_mip from edge to level. i.e. cpu_interrupt or cpu_reset_interrupt are called regardless of the current interrupt level. Fixes WFI doesn't return when a IPI is issued: - #132 To test: 1) Apply RISC-V Linux CPU hotplug patch: - http://lists.infradead.org/pipermail/linux-riscv/2018-May/000603.html 2) Enable CONFIG_CPU_HOTPLUG in linux .config 3) Try to offline and online cpus: echo 1 > /sys/devices/system/cpu/cpu2/online echo 0 > /sys/devices/system/cpu/cpu2/online echo 1 > /sys/devices/system/cpu/cpu2/online Reported-by: Atish Patra <atishp04@gmail.com> Cc: Atish Patra <atishp04@gmail.com> Cc: Alistair Francis <Alistair.Francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com>
This effectively changes riscv_cpu_update_mip from edge to level. i.e. cpu_interrupt or cpu_reset_interrupt are called regardless of the current interrupt level. Fixes WFI doesn't return when a IPI is issued: - #132 To test: 1) Apply RISC-V Linux CPU hotplug patch: - http://lists.infradead.org/pipermail/linux-riscv/2018-May/000603.html 2) Enable CONFIG_CPU_HOTPLUG in linux .config 3) Try to offline and online cpus: echo 1 > /sys/devices/system/cpu/cpu2/online echo 0 > /sys/devices/system/cpu/cpu2/online echo 1 > /sys/devices/system/cpu/cpu2/online Reported-by: Atish Patra <atishp04@gmail.com> Cc: Atish Patra <atishp04@gmail.com> Cc: Alistair Francis <Alistair.Francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com>
This effectively changes riscv_cpu_update_mip from edge to level. i.e. cpu_interrupt or cpu_reset_interrupt are called regardless of the current interrupt level. Fixes WFI doesn't return when a IPI is issued: - riscvarchive/riscv-qemu#132 To test: 1) Apply RISC-V Linux CPU hotplug patch: - http://lists.infradead.org/pipermail/linux-riscv/2018-May/000603.html 2) Enable CONFIG_CPU_HOTPLUG in linux .config 3) Try to offline and online cpus: echo 1 > /sys/devices/system/cpu/cpu2/online echo 0 > /sys/devices/system/cpu/cpu2/online echo 1 > /sys/devices/system/cpu/cpu2/online Reported-by: Atish Patra <atishp04@gmail.com> Cc: Atish Patra <atishp04@gmail.com> Cc: Alistair Francis <Alistair.Francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com>
This effectively changes riscv_cpu_update_mip from edge to level. i.e. cpu_interrupt or cpu_reset_interrupt are called regardless of the current interrupt level. Fixes WFI doesn't return when a IPI is issued: - riscvarchive/riscv-qemu#132 To test: 1) Apply RISC-V Linux CPU hotplug patch: - http://lists.infradead.org/pipermail/linux-riscv/2018-May/000603.html 2) Enable CONFIG_CPU_HOTPLUG in linux .config 3) Try to offline and online cpus: echo 1 > /sys/devices/system/cpu/cpu2/online echo 0 > /sys/devices/system/cpu/cpu2/online echo 1 > /sys/devices/system/cpu/cpu2/online Reported-by: Atish Patra <atishp04@gmail.com> Cc: Atish Patra <atishp04@gmail.com> Cc: Alistair Francis <Alistair.Francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
According to the specs, WFI can be used to put the cpu in a low power mode. Any enabled interrupt will return from WFI and the cpu will start executing after that. Currently, Linux kernel running under QEMU doesn't return from WFI when a IPI (software interrupt) is issued.
This happens because of the following reason.
riscv_set_local_interrupt() ignores cpu interrupt only when either mip or new mip register is zero.
It may happen that both are nonzero but interrupt is issued from Supervisor mode.
The following temporary fix has been suggested by @michaeljclark which seems to resolve the issue for now. This may or may not be the final fix.
This is required as a part of cpu hotplug operation. During cpu offline operation, WFI is issued for that cpu to put it in low power mode. During online, an IPI is issued for that cpu so that it can return from WFI and resume its operation.
Here is the series that enables this feature.
http://lists.infradead.org/pipermail/linux-riscv/2018-April/000509.html
or github link:
https://github.com/atishp04/riscv-linux/commits/cpu_hotplug_v2_working
@sorear
The text was updated successfully, but these errors were encountered: