-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] CPU usage for firecracker grows out of control #3043
Comments
Update: We noticed this only happens on kernel v5.11. Diagnosing more with |
We're not seeing any halt-polling events:
|
I believe this issue is the same as what we're currently facing: cloud-hypervisor/cloud-hypervisor#2277 (comment) We're seeing roughly 1 million exits per second on affected VMs. The fix appears to be to disable |
I have implemented the same fix in a fork here: jeromegn@d6e73e4 Please let me know if I'm making a grave mistake 😅 |
/close |
Did the commits make it into this repo? I don't see them. |
Oops, you are right, it have not yet pr for that. |
Disabling KVM_FEATURE_ASYNC_PF_INT was a temporary fix as per below conversation: Disabling KVM_FEATURE_ASYNC_PF_INT avoids the issue but the actual issue is in KVM and the fix is in the below patch on the host side: I see the fix available in https://github.com/torvalds/linux/blob/v5.13/arch/x86/kvm/lapic.c so moving to the latest kernel above 5.13 should resolve this issue. |
Closing this issue. The fix is present in the new Linux kernels. If anyone sees it again, please reopen it. |
Describe the bug
After some time, some
firecracker
processes on our hosts grow to use as much CPU as possible (100%). Inside the guests, CPU usage is low (2-3%).We have collected some diagnostics...
perf kvm
on a high-CPUfirecracker
perf kvm
on a normally operatingfirecracker
Running
strace
on a high-cpu firecracker outputs mostioctl
KVM_RUN
.strace
on high-cpu firecrackerstrace
on normally-operating firecracker(these 2 firecrackers were on the same host, running at the same time)
CPU usage inside the guest for a high-cpu firecracker
(green is normal, red is kernel and light blue is steal CPU)
To Reproduce
We do not know how to reproduce this. It happens seemingly randomly.
Expected behaviour
CPU usage of
firecracker
should be proportional to CPU usage inside the guest.Was this a known bug with the versions we're using? Would upgrading to a more recent
firecracker
help?Environment
v0.23.5
andv0.24.4
5.11.7
and5.11.16
5.12.2
Additional context
This increases CPU usage and load on a host to very bad levels. VMs running on the host all experience significant steal even if we are not overprovisioning CPU.
Checks
I am not certain this is a firecracker issue, but I don't know what else it could be at this point.
The text was updated successfully, but these errors were encountered: