-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
agent gets oom-killed, manifests as virtiofsd dying #1111
Comments
Solution suggested by Qian: set oom_adj score for kata-agent to prevent that from happening. |
Under stress, the agent can be OOM-killed, which exists the sandbox. One possible hard-to-diagnose manifestation is a virtiofsd crash. Fixes: kata-containers#1111 Reported-by: Qian Cai <caiqian@redhat.com> Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
What pod/container spec you running with? Are you specifying memory for the workload? |
If you run an unconstrained memory hungry workload, I’m not too surprised we see this. We don’t handle “best effort” particularly well, and usually deployments will set a default CPU/mem limit when none is specified. |
Reproducer, as requested by @egernst on Slack:
Not specifying |
(@egernst Sorry I initially commented on the PR instead of the issue) |
@egernst I have spent a little more time looking at this. First, I observe a behavior that I find a little strange:
But then if I pass the
Even in that case, I see:
(the file is empty) So there are two things that are a bit unexpected to me:
Maybe |
@egernst In any case, I asked the original poster to tell me what |
Get your issue reviewed faster
To help us understand the problem more quickly, please do the following:
kata-collect-data.sh
script, which is installed as part of Kata Containers.$ sudo kata-collect-data.sh > /tmp/kata.log
I will request this information from the original poster.
Description of problem
The agent can be selected by the Linux OOM killer. When this happens, the symptoms can be very hard to diagnose. In at least one instance, the obvious manifestation was virtiofsd crashing, the reason being it lost the connexion with qemu, and the reason it lost the connexion was because qemu had exited as a result of the guest shutting down.
Expected result
Actual result
Further information
After running instructions here: https://github.com/dgibson/kata-vfio-tools/blob/main/podman.md, you get a crash of
virfiofsd
that looks like this:Investigation shows that this is really the guest agent being OOM-killed, which causes qemu to quit:
The text was updated successfully, but these errors were encountered: