-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubectl shows node memory >100%? #86499
Comments
(Taking my best shot here, but if this is wrong please freely adjust it.) /sig instrumentation |
Other than 101% reporting issue, do you observe any other abnormality ? |
Well, yes, the node didn't seem to be actually at 100% memory use; as mentioned, it seemed to have significant headroom. |
how about |
Node Memory utilization is ratio of node Working Set Bytes and node allocatable memory. Allocatable memory is available on node object: Node Working Set Bytes are available on api: Please provide results from those commands so we can distinquish if problem is in kubectl top or in metrics pipeline. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I have similar problem:
kubectl describe node secondary-pool-dqiqzzikb5
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes/secondary-pool-dqiqzzikb5
I have no idea what's causing it or how to fix :/ |
@pdabrowski-it-solutions: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The memory usage result sometimes can be 100+% overflow result. Because the memory usage calculation is based on logical "Allocatable" node total memory which depends on "[Allocatable] = [Node Capacity] - [system-reserved] - [Hard-Eviction-Thresholds]", not actual host total memory. * Fix: kubernetes#86499 * Reference: kubernetes#100222
…l node memory total usage. If "Allocatable" is used to a node total memory size, under high memory pressure or pre-reserved memory value is bigger, the "MEMORY%" can be bigger than 100%. For suppressing the confusing, add a option to show node real memory usage based on "Capacity". * Reference: kubernetes#86499
/reopen |
@ydcool: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle rotten |
Seeing same issue here too
|
Another data point (on AWS EKS): Seems like happen only on smaller nodetypes (I see it with both micro/nano, others are medium)
Usage is displayed correctly with
NodeMetrics:
|
Why kubelet node memory utilization can exceed 100% (same can happen for CPU)? Simple, because it's not utilization of physical device, but utilization of resources allocated for pods and system daemons. How utilization of resources allocated is calculated? Simple, it's sum of resources used by on node divided by all resources allocated. How Kubelet allocates resources for pods? It takes all resources available on VM and substracts resources that it reserves for itself, kernel etc. How Kubelet knows how much resources it needs to reserve? User provides it in flags. There is no error, there is no Kubernetes magic that gets us over 100%, it's just question of how you define utilization. Kubelet reserves some resources for system, they are not included when calculating node utilization, which means it can go above 100% when pods start using reserved resources (this is what over-committing means in "kubectl describe node" output). Some math based on your kubectl describe node output Node utilization: Node usage / node allocatable = 717664Ki / 559848Ki = 128% |
Based on my reading of the metrics server implementation, I don't think "Node Usage" include only usage from pods. The |
Hmm, I was not aware of that. This is a good point, as utilization was implemented in kubectl by different SIG (CLI), it's possible that it was not properly reviewed by other stakeholders (SIG Instrumentation/SIG Node). I would make sense to revisit what should be displayed as node utilization. |
…l node memory total usage. If "Allocatable" is used to a node total memory size, under high memory pressure or pre-reserved memory value is bigger, the "MEMORY%" can be bigger than 100%. For suppressing the confusing, add a option to show node real memory usage based on "Capacity". * Reference: kubernetes/kubernetes#86499 Kubernetes-commit: 862937bf1c7975d3f54ae47a2958e47f2c50150f
What happened:
We ran the following, and got the following:
How/why do we have a node reporting >100% memory usage? (There seems to be plenty of memory on the host as reported by the kernel's
MemAvailable
statistic. (multiple gigabytes)What you expected to happen:
Memory usage can't exceed 100%, no?
How to reproduce it (as minimally and precisely as possible): we unfortunately don't know
Anything else we need to know?: no swap on these VMs. We're curious what kernel memory statistic goes into computing the total from Kubernetes; it's my understanding that there is various ways to go over 100%, e.g., by summing RSS over several processes. (E.g., shared and resident pages would get double-counted.)
Environment:
kubectl version
):cat /etc/os-release
): AKS instance?uname -a
): (unsure, since this is Azure AKS; we don't have good access to this piece of data…)The text was updated successfully, but these errors were encountered: