SystemOOMs not reported for containers #88868
Labels
kind/bug
Categorizes issue or PR as related to a bug.
priority/important-soon
Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
sig/node
Categorizes an issue or PR as relevant to SIG Node.
What happened:
A SystemOOM event was not reported even when the memory limit hit was the root cgroup memory limit.
What you expected to happen:
A SystemOOM event should have been reported.
How to reproduce it (as minimally and precisely as possible):
Do not set
--system-reserved
or--kube-reserved
, and set--eviction-hard=
to disable memory eviction on the kubelet. Run a pod without memory limits which slowly consumes memory, such as withk8s.gcr.io/stress:v1
from https://github.com/vishh/stress. Use arguments -mem-alloc-size", "100Mi", "-mem-alloc-sleep", "10s". This should trigger an OOM on the root cgroup, but no event will be generated.Anything else we need to know?:
cAdvisor populates ContainerName and VictimContainerName from matching the regexp:
Task in (.*) killed as a result of limit of (.*)
. A SystemOOM should mean VictimContainerName == "/", as we are looking for OOMs that are "killed as a results of limit of /". However, we incorrectly check ContainerName instead in the kubelet.cc @kubernetes/sig-node-bugs @derekwaynecarr @sjenning @dchen1107
The text was updated successfully, but these errors were encountered: