-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resource limits not enforced in EKS cluster #8047
Comments
@fvoznika and @konstantin-s-bogom, this is AWS, and this doesn't look like kubernetes/kubernetes#107172. Have we tried this on GKE yet? If it is us, I suspect it is some mismatch of a message in the shim, but I don't know that code well enough yet to say either way. |
I think it's likely that kubernetes/kubernetes#107172 is at play here, because AFAIK kubelet versions <1.25 will prefer cAdvisor stats, which report incorrect resource usage for runsc containers. @DobromirM I don't know if EKS supports using a custom kubelet, but if you know of a way, then try one with this patch applied and see if it fixes the issue: kubernetes/kubernetes@9c3a4aa |
This issue is not related to cAdvisor. cAdvisor just affects reported metrics, not cgroup limits. In K8s both pods and containers have limits. Users can only set container limits. Pod limits are automatically set with the aggregate of all containers. For example, if one container has a memory limit of 512MB and another has a 128MB limit, the pod limit will be 640MB. gVisor doesn't currently enforce container limits because it doesn't have cgroups support. However, pod limits are enforced. This is done by making the sandbox process join the pod cgroup in the host. In the example above, a container may be allowed to go over its own limit, let's say the 128MB limit container will be allowed to allocate more than 128MB, but it won't be allowed to allocate more than 640MB which is the pod limit. It important, however, to ensure that all containers in the pod have a limit set, otherwise the aggregate limit for the pod will be unlimited. |
I see that in your example you have a single container, in that case the pod limit should be enforced, unless EKS is not setting up pod cgroup appropriately. You can check it by looking at the cgroup configuration in the node. Find the sandbox process, then look up its cgroup, and check whether a limit is being set. This is what I get running in GKE: $ cat /proc/$(pidof runsc-sandbox)/cgroup
$ cat /sys/fs/cgroup/memory/kubepods/burstable/poda6624945-2024-4cd1-94da-8abb842f568f/memory.limit_in_bytes
671088640 |
|
When I try to run your deployment, the pod doesn't even start because gVisor fails to boot due to the low memory limit (which is the expected behavior when such a low memory limit is set). Can you describe in more details what happens when you try to run the deployment? Can you collect and attach debug logs (see instructions here and here)? |
When I run the deployment with gVisor as the runtime, the pod starts and runs as if the limits are not there. Running the deployment with the default runtime makes it fail instantly due to the low limit, which as you said, is the expected behaviour. I've created a new node on the cluster with the following debug configs: etc/containerd/runsc.toml log_path = "/var/log/runsc/%ID%/shim.log"
log_level = "debug"
[runsc_config]
debug = "true"
debug-log = "/var/log/runsc/%ID%/gvisor.%COMMAND%.log" etc/containerd/config.toml
After applying only the I've uploaded the logs here: https://github.com/DobromirM/gvisor-logs |
I believe the problem is that the node is using systemd cgroup driver, given the path below:
The options available in
|
We switched to an EKS optimized Ubuntu Linux AMI to avoid the original problem. |
Thanks for the update. |
Description
When a Kubernetes deployment is created with gVisor, the resource limits are not enforced. Even if the pods exceed the capacity of the node, they will not terminate but will instead crash the node.
The limits work fine with the default runtime and the pods can be stopped manually.
containerd/config.toml
Steps to reproduce
eksNodeGroup.yaml
eksctl create nodegroup --config-file eksNodeGroup.yaml
runtimeClass.yaml
kubectl apply -f runtimeClass.yaml
deployment.yaml
kubectl apply -f deployment.yaml
runsc version
docker version (if using docker)
No response
uname
Linux ip-172-31-11-114.us-west-1.compute.internal 5.4.209-116.367.amzn2.x86_64 #1 SMP Wed Aug 31 00:09:52 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
kubectl (if using Kubernetes)
kubeclt get nodes:
The text was updated successfully, but these errors were encountered: