Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need simple kubectl command to see cluster resource usage #17512

Closed
goltermann opened this issue Nov 19, 2015 · 98 comments
Closed

Need simple kubectl command to see cluster resource usage #17512

goltermann opened this issue Nov 19, 2015 · 98 comments

Comments

@goltermann
Copy link
Contributor

@goltermann goltermann commented Nov 19, 2015

Users are getting tripped up by pods not being able to schedule due to resource deficiencies. It can be hard to know when a pod is pending because it just hasn't started up yet, or because the cluster doesn't have room to schedule it. http://kubernetes.io/v1.1/docs/user-guide/compute-resources.html#monitoring-compute-resource-usage helps, but isn't that discoverable (I tend to try a 'get' on a pod in pending first, and only after waiting a while and seeing it 'stuck' in pending, do I use 'describe' to realize it's a scheduling problem).

This is also complicated by system pods being in a namespace that is hidden. Users forget that those pods exist, and 'count against' cluster resources.

There are several possible fixes offhand, I don't know what would be ideal:

  1. Develop a new pod state other than Pending to represent "tried to schedule and failed for lack of resources".

  2. Have kubectl get po or kubectl get po -o=wide display a column to detail why something is pending (perhaps the container.state that is Waiting in this case, or the most recent event.message).

  3. Create a new kubectl command to more easily describe resources. I'm imagining a "kubectl usage" that gives an overview of total cluster CPU and Mem, per node CPU and Mem and each pod/container's usage. Here we would include all pods, including system ones. This might be useful long term alongside more complex schedulers, or when your cluster has enough resources but no single node does (diagnosing the 'no holes large enough' problem).

@davidopp
Copy link
Member

@davidopp davidopp commented Nov 20, 2015

Something along the lines of (2) seems reasonable, though the UX folks would know better than me.

(3) seems vaguely related to #15743 but I'm not sure they're close enough to combine.

@chrishiestand
Copy link
Contributor

@chrishiestand chrishiestand commented Sep 15, 2016

In addition to the case above, it would be nice to see what resource utilization we're getting.

kubectl utilization requests might show (maybe kubectl util or kubectl usage are better/shorter):

cores: 4.455/5 cores (89%)
memory: 20.1/30 GiB (67%)
...

In this example, the aggregate container requests are 4.455 cores and 20.1 GiB and there are 5 cores and 30GiB total in the cluster.

@xmik
Copy link

@xmik xmik commented Dec 19, 2016

There is:

$ kubectl top nodes
NAME                    CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%   
cluster1-k8s-master-1   312m         15%       1362Mi          68%       
cluster1-k8s-node-1     124m         12%       233Mi           11% 
@ozbillwang
Copy link

@ozbillwang ozbillwang commented Jan 11, 2017

I use below command to get a quick view for the resource usage. It is the simplest way I found.

kubectl describe nodes
@tonglil
Copy link
Contributor

@tonglil tonglil commented Jan 20, 2017

If there was a way to "format" the output of kubectl describe nodes, I wouldn't mind scripting my way to summarize all node's resource requests/limits.

@from-nibly
Copy link

@from-nibly from-nibly commented Feb 1, 2017

here is my hack kubectl describe nodes | grep -A 2 -e "^\\s*CPU Requests"

@jredl-va
Copy link

@jredl-va jredl-va commented May 25, 2017

@from-nibly thanks, just what i was looking for

@tonglil
Copy link
Contributor

@tonglil tonglil commented May 25, 2017

Yup, this is mine:

$ cat bin/node-resources.sh 
#!/bin/bash
set -euo pipefail

echo -e "Iterating...\n"

nodes=$(kubectl get node --no-headers -o custom-columns=NAME:.metadata.name)

for node in $nodes; do
  echo "Node: $node"
  kubectl describe node "$node" | sed '1,/Non-terminated Pods/d'
  echo
done
@k8s-github-robot
Copy link
Contributor

@k8s-github-robot k8s-github-robot commented May 31, 2017

@goltermann There are no sig labels on this issue. Please add a sig label by:
(1) mentioning a sig: @kubernetes/sig-<team-name>-misc
(2) specifying the label manually: /sig <label>

Note: method (1) will trigger a notification to the team. You can find the team list here.

@kargakis
Copy link
Member

@kargakis kargakis commented Jun 10, 2017

@alok87
Copy link
Contributor

@alok87 alok87 commented Jul 5, 2017

You can use the below command to find the percentage cpu utlisation of your nodes

alias util='kubectl get nodes | grep node | awk '\''{print $1}'\'' | xargs -I {} sh -c '\''echo   {} ; kubectl describe node {} | grep Allocated -A 5 | grep -ve Event -ve Allocated -ve percent -ve -- ; echo '\'''
Note: 4000m cores is the total cores in one node
alias cpualloc="util | grep % | awk '{print \$1}' | awk '{ sum += \$1 } END { if (NR > 0) { result=(sum**4000); printf result/NR \"%\n\" } }'"

$ cpualloc
3.89358%

Note: 1600MB is the total cores in one node
alias memalloc='util | grep % | awk '\''{print $3}'\'' | awk '\''{ sum += $1 } END { if (NR > 0) { result=(sum*100)/(NR*1600); printf result/NR "%\n" } }'\'''

$ memalloc
24.6832%
@alok87
Copy link
Contributor

@alok87 alok87 commented Jul 21, 2017

@tomfotherby alias util='kubectl get nodes | grep node | awk '\''{print $1}'\'' | xargs -I {} sh -c '\''echo {} ; kubectl describe node {} | grep Allocated -A 5 | grep -ve Event -ve Allocated -ve percent -ve -- ; echo '\'''

@tomfotherby
Copy link

@tomfotherby tomfotherby commented Jul 25, 2017

@alok87 - Thanks for your aliases. In my case, this is what worked for me given that we use bash and m3.large instance types (2 cpu , 7.5G memory).

alias util='kubectl get nodes --no-headers | awk '\''{print $1}'\'' | xargs -I {} sh -c '\''echo {} ; kubectl describe node {} | grep Allocated -A 5 | grep -ve Event -ve Allocated -ve percent -ve -- ; echo '\'''

# Get CPU request total (we x20 because because each m3.large has 2 vcpus (2000m) )
alias cpualloc='util | grep % | awk '\''{print $1}'\'' | awk '\''{ sum += $1 } END { if (NR > 0) { print sum/(NR*20), "%\n" } }'\'''

# Get mem request total (we x75 because because each m3.large has 7.5G ram )
alias memalloc='util | grep % | awk '\''{print $5}'\'' | awk '\''{ sum += $1 } END { if (NR > 0) { print sum/(NR*75), "%\n" } }'\'''
$util
ip-10-56-0-178.ec2.internal
  CPU Requests	CPU Limits	Memory Requests	Memory Limits
  960m (48%)	2700m (135%)	630Mi (8%)	2034Mi (27%)

ip-10-56-0-22.ec2.internal
  CPU Requests	CPU Limits	Memory Requests	Memory Limits
  920m (46%)	1400m (70%)	560Mi (7%)	550Mi (7%)

ip-10-56-0-56.ec2.internal
  CPU Requests	CPU Limits	Memory Requests	Memory Limits
  1160m (57%)	2800m (140%)	972Mi (13%)	3976Mi (53%)

ip-10-56-0-99.ec2.internal
  CPU Requests	CPU Limits	Memory Requests	Memory Limits
  804m (40%)	794m (39%)	824Mi (11%)	1300Mi (17%)

cpualloc 
48.05 %

$ memalloc 
9.95333 %
@nfirvine
Copy link

@nfirvine nfirvine commented Aug 30, 2017

#17512 (comment) kubectl top shows usage, not allocation. Allocation is what causes the insufficient CPU problem. There's a ton of confusion in this issue about the difference.

AFAICT, there's no easy way to get a report of node CPU allocation by pod, since requests are per container in the spec. And even then, it's difficult since .spec.containers[*].requests may or may not have the limits/requests fields (in my experience)

@misterikkit
Copy link

@misterikkit misterikkit commented Jan 2, 2018

@negz
Copy link
Contributor

@negz negz commented Feb 21, 2018

Getting in on this shell scripting party. I have an older cluster running the CA with scale down disabled. I wrote this script to determine roughly how much I can scale down the cluster when it starts to bump up on its AWS route limits:

#!/bin/bash

set -e

KUBECTL="kubectl"
NODES=$($KUBECTL get nodes --no-headers -o custom-columns=NAME:.metadata.name)

function usage() {
	local node_count=0
	local total_percent_cpu=0
	local total_percent_mem=0
	local readonly nodes=$@

	for n in $nodes; do
		local requests=$($KUBECTL describe node $n | grep -A2 -E "^\\s*CPU Requests" | tail -n1)
		local percent_cpu=$(echo $requests | awk -F "[()%]" '{print $2}')
		local percent_mem=$(echo $requests | awk -F "[()%]" '{print $8}')
		echo "$n: ${percent_cpu}% CPU, ${percent_mem}% memory"

		node_count=$((node_count + 1))
		total_percent_cpu=$((total_percent_cpu + percent_cpu))
		total_percent_mem=$((total_percent_mem + percent_mem))
	done

	local readonly avg_percent_cpu=$((total_percent_cpu / node_count))
	local readonly avg_percent_mem=$((total_percent_mem / node_count))

	echo "Average usage: ${avg_percent_cpu}% CPU, ${avg_percent_mem}% memory."
}

usage $NODES

Produces output like:

ip-REDACTED.us-west-2.compute.internal: 38% CPU, 9% memory
...many redacted lines...
ip-REDACTED.us-west-2.compute.internal: 41% CPU, 8% memory
ip-REDACTED.us-west-2.compute.internal: 61% CPU, 7% memory
Average usage: 45% CPU, 15% memory.
@ylogx
Copy link

@ylogx ylogx commented Feb 21, 2018

There is also pod option in top command:

kubectl top pod
@shtouff
Copy link

@shtouff shtouff commented Mar 4, 2018

My way to obtain the allocation, cluster-wide:

$ kubectl get po --all-namespaces -o=jsonpath="{range .items[*]}{.metadata.namespace}:{.metadata.name}{'\n'}{range .spec.containers[*]}  {.name}:{.resources.requests.cpu}{'\n'}{end}{'\n'}{end}"

It produces something like:

kube-system:heapster-v1.5.0-dc8df7cc9-7fqx6
  heapster:88m
  heapster-nanny:50m
kube-system:kube-dns-6cdf767cb8-cjjdr
  kubedns:100m
  dnsmasq:150m
  sidecar:10m
  prometheus-to-sd:
kube-system:kube-dns-6cdf767cb8-pnx2g
  kubedns:100m
  dnsmasq:150m
  sidecar:10m
  prometheus-to-sd:
kube-system:kube-dns-autoscaler-69c5cbdcdd-wwjtg
  autoscaler:20m
kube-system:kube-proxy-gke-cluster1-default-pool-cd7058d6-3tt9
  kube-proxy:100m
kube-system:kube-proxy-gke-cluster1-preempt-pool-57d7ff41-jplf
  kube-proxy:100m
kube-system:kubernetes-dashboard-7b9c4bf75c-f7zrl
  kubernetes-dashboard:50m
kube-system:l7-default-backend-57856c5f55-68s5g
  default-http-backend:10m
kube-system:metrics-server-v0.2.0-86585d9749-kkrzl
  metrics-server:48m
  metrics-server-nanny:5m
kube-system:tiller-deploy-7794bfb756-8kxh5
  tiller:10m
@kierenj
Copy link

@kierenj kierenj commented Mar 13, 2018

This is weird. I want to know when I'm at or nearing allocation capacity. It seems a pretty basic function of a cluster. Whether it's a statistic that shows a high % or textual error... how do other people know this? Just always use autoscaling on a cloud platform?

@dpetzold
Copy link

@dpetzold dpetzold commented May 1, 2018

I authored https://github.com/dpetzold/kube-resource-explorer/ to address #3. Here is some sample output:

$ ./resource-explorer -namespace kube-system -reverse -sort MemReq
Namespace    Name                                               CpuReq  CpuReq%  CpuLimit  CpuLimit%  MemReq    MemReq%  MemLimit  MemLimit%
---------    ----                                               ------  -------  --------  ---------  ------    -------  --------  ---------
kube-system  event-exporter-v0.1.7-5c4d9556cf-kf4tf             0       0%       0         0%         0         0%       0         0%
kube-system  kube-proxy-gke-project-default-pool-175a4a05-mshh  100m    10%      0         0%         0         0%       0         0%
kube-system  kube-proxy-gke-project-default-pool-175a4a05-bv59  100m    10%      0         0%         0         0%       0         0%
kube-system  kube-proxy-gke-project-default-pool-175a4a05-ntfw  100m    10%      0         0%         0         0%       0         0%
kube-system  kube-dns-autoscaler-244676396-xzgs4                20m     2%       0         0%         10Mi      0%       0         0%
kube-system  l7-default-backend-1044750973-kqh98                10m     1%       10m       1%         20Mi      0%       20Mi      0%
kube-system  kubernetes-dashboard-768854d6dc-jh292              100m    10%      100m      10%        100Mi     3%       300Mi     11%
kube-system  kube-dns-323615064-8nxfl                           260m    27%      0         0%         110Mi     4%       170Mi     6%
kube-system  fluentd-gcp-v2.0.9-4qkwk                           100m    10%      0         0%         200Mi     7%       300Mi     11%
kube-system  fluentd-gcp-v2.0.9-jmtpw                           100m    10%      0         0%         200Mi     7%       300Mi     11%
kube-system  fluentd-gcp-v2.0.9-tw9vk                           100m    10%      0         0%         200Mi     7%       300Mi     11%
kube-system  heapster-v1.4.3-74b5bd94bb-fz8hd                   138m    14%      138m      14%        301856Ki  11%      301856Ki  11%
@hmsvigle
Copy link

@hmsvigle hmsvigle commented Apr 27, 2020

  • kubectl describe nodes OR kubectl top nodes , which one should be considered to calculate cluster resource utilization ?
  • Also Why there is difference between these 2 results.
    Is there any logical explanation this yet ?
@brianpursley
Copy link
Member

@brianpursley brianpursley commented Apr 29, 2020

/kind feature

@prathameshd9
Copy link

@prathameshd9 prathameshd9 commented Apr 30, 2020

All the comments and hacks with nodes worked well for me. I also need something for a higher view to keep track of..like sum of resources per node pool !

arunsah added a commit to arunsah/arunsah.github.io that referenced this issue May 5, 2020
@rajjar123456
Copy link

@rajjar123456 rajjar123456 commented Jul 17, 2020

hi,
I want to log the cpu and memory usage for a pod , every 5 mins over a period of time. I would then use this data to create a graph in excel. Any ideas? Thanks

@yceoshda
Copy link

@yceoshda yceoshda commented Jul 28, 2020

Hi,
I'm happy to see that Google pointed all of us to this issue :-) (a bit disappointed that it's still open after almost 5y.) Thanks for all the shell snips and other tools.

@SerheyDolgushev
Copy link

@SerheyDolgushev SerheyDolgushev commented Aug 5, 2020

Simple and quick hack:

$ kubectl describe nodes | grep 'Name:\|  cpu\|  memory'

Name:               XXX-2-wke2
  cpu                                               1552m (77%)   2402m (120%)
  memory                                            2185Mi (70%)  3854Mi (123%)
Name:               XXX-2-wkep
  cpu                                               1102m (55%)   1452m (72%)
  memory                                            1601Mi (51%)  2148Mi (69%)
Name:               XXX-2-wkwz
  cpu                                               852m (42%)    1352m (67%)
  memory                                            1125Mi (36%)  3624Mi (116%)
@boniek83
Copy link

@boniek83 boniek83 commented Aug 5, 2020

Simple and quick hack:

$ kubectl describe nodes | grep 'Name:\|  cpu\|  memory'

Name:               XXX-2-wke2
  cpu                                               1552m (77%)   2402m (120%)
  memory                                            2185Mi (70%)  3854Mi (123%)
Name:               XXX-2-wkep
  cpu                                               1102m (55%)   1452m (72%)
  memory                                            1601Mi (51%)  2148Mi (69%)
Name:               XXX-2-wkwz
  cpu                                               852m (42%)    1352m (67%)
  memory                                            1125Mi (36%)  3624Mi (116%)

Device plugins are not there. They should be. Such devices are resources as well.

@aeciopires
Copy link

@aeciopires aeciopires commented Aug 7, 2020

Hello!

I created this script and share it with you.

https://github.com/Sensedia/open-tools/blob/master/scripts/listK8sHardwareResources.sh

This script has a compilation of some of the ideas you shared here. The script can be incremented and can help other people get the metrics more simply.

Thanks for sharing the tips and commands!

@laurybueno
Copy link

@laurybueno laurybueno commented Aug 9, 2020

For my use case, I ended up writing a simple kubectl plugin that lists CPU/RAM limits/reservations for nodes in a table. It also checks current pod CPU/RAM consumption (like kubectl top pods), but ordering output by CPU in descending order.

Its more of a convenience thing than anything else, but maybe someone else will find it useful too.

https://github.com/laurybueno/kubectl-hoggers

@denissabramovs
Copy link

@denissabramovs denissabramovs commented Sep 20, 2020

Whoa, what a huge thread and still no proper solution from kubernetes team to properly calculate current overall cpu usage of a whole cluster?

@raul1991
Copy link

@raul1991 raul1991 commented Oct 16, 2020

For those looking to run this on minikube , first enable the metric server add-on
minikube addons enable metrics-server
and then run the command
kubectl top nodes

@benjick
Copy link

@benjick benjick commented Nov 14, 2020

If you're using Krew:

kubectl krew install resource-capacity
kubectl resource-capacity
NODE                                          CPU REQUESTS   CPU LIMITS     MEMORY REQUESTS   MEMORY LIMITS
*                                             16960m (35%)   18600m (39%)   26366Mi (14%)     3100Mi (1%)
ip-10-0-138-176.eu-north-1.compute.internal   2460m (31%)    4200m (53%)    567Mi (1%)        784Mi (2%)
ip-10-0-155-49.eu-north-1.compute.internal    2160m (27%)    2200m (27%)    4303Mi (14%)      414Mi (1%)
ip-10-0-162-84.eu-north-1.compute.internal    3860m (48%)    3900m (49%)    8399Mi (27%)      414Mi (1%)
ip-10-0-200-101.eu-north-1.compute.internal   2160m (27%)    2200m (27%)    4303Mi (14%)      414Mi (1%)
ip-10-0-231-146.eu-north-1.compute.internal   2160m (27%)    2200m (27%)    4303Mi (14%)      414Mi (1%)
ip-10-0-251-167.eu-north-1.compute.internal   4160m (52%)    3900m (49%)    4491Mi (14%)      660Mi (2%)
@fejta-bot
Copy link

@fejta-bot fejta-bot commented Feb 12, 2021

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@abelal83
Copy link

@abelal83 abelal83 commented Feb 12, 2021

5 years and still open. I understand there are loads of tools available to check pod resource usage but honestly why not supply a standard one out of the box that's simple to use? Bundling grafana and prometheus with all the monitoring you could require would have been a god send for my team. We wasted months experimenting with different solutions. Please kube mainteners give us something out of the box and close this issue!

@tculp
Copy link

@tculp tculp commented Feb 12, 2021

/remove-lifecycle stale

@rokcarl
Copy link

@rokcarl rokcarl commented Mar 22, 2021

Even with all the tools above (I currently use kubectl-view-utilization), none of them can answer: "Can I run 3 replicas of an application pod that requires 1500 mCPU on my app nodes?" I have to do some number-crunching manually.

@manigandham
Copy link

@manigandham manigandham commented Mar 22, 2021

Highly recommend another great tool called K9S: https://github.com/derailed/k9s

It's a separate CLI tool but uses the same config context for access and offers a lot of terminal/UI utility for monitoring and managing your cluster.

@j-zimnowoda
Copy link

@j-zimnowoda j-zimnowoda commented Mar 31, 2021

kubectl describe nodes | grep "Allocated resources" -A 9
@eddiezane
Copy link
Member

@eddiezane eddiezane commented Mar 31, 2021

From the long history of comments here it seems everyone has different expectations by the many issues and requests reported in this thread. This thread is more of a wiki now.

We'd be happy to see one of these plugins be proposed for upstreaming via a KEP. If someone wants to own this and bias for action with a decision, please open a KEP for discussion.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Mar 31, 2021

@eddiezane: Closing this issue.

In response to this:

From the long history of comments here it seems everyone has different expectations by the many issues and requests reported in this thread. This thread is more of a wiki now.

We'd be happy to see one of these plugins be proposed for upstreaming via a KEP. If someone wants to own this and bias for action with a decision, please open a KEP for discussion.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@champak
Copy link

@champak champak commented Apr 15, 2021

In case folks are still listening in on this issue...Has anyone attempted using the standard resource usage apis like getrusage() for sw running inside containers/pods(). For cpu stats it does not seem that it will far off from what the node level cgroup would have to report.

mem stats seem more problematic. Unclear whether say /sys/fs/crgoup/memory/<> from inside a container really reflects memory usage correctly. Being able to monitor the resource usage from within an app (and then changing behavior in the app etc.) is a neat capability. Seems unclear when that will be available in k8s so casting around for workarounds.

@dguyhasnoname
Copy link

@dguyhasnoname dguyhasnoname commented Jun 18, 2021

another tool to see resources node-wise, namespace-wise: https://github.com/dguyhasnoname/k8s-day2-ops/tree/master/resource_calcuation/k8s-toppur

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet