Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need simple kubectl command to see cluster resource usage #17512

Open
goltermann opened this Issue Nov 19, 2015 · 40 comments

Comments

Projects
None yet
@goltermann
Copy link
Contributor

goltermann commented Nov 19, 2015

Users are getting tripped up by pods not being able to schedule due to resource deficiencies. It can be hard to know when a pod is pending because it just hasn't started up yet, or because the cluster doesn't have room to schedule it. http://kubernetes.io/v1.1/docs/user-guide/compute-resources.html#monitoring-compute-resource-usage helps, but isn't that discoverable (I tend to try a 'get' on a pod in pending first, and only after waiting a while and seeing it 'stuck' in pending, do I use 'describe' to realize it's a scheduling problem).

This is also complicated by system pods being in a namespace that is hidden. Users forget that those pods exist, and 'count against' cluster resources.

There are several possible fixes offhand, I don't know what would be ideal:

  1. Develop a new pod state other than Pending to represent "tried to schedule and failed for lack of resources".

  2. Have kubectl get po or kubectl get po -o=wide display a column to detail why something is pending (perhaps the container.state that is Waiting in this case, or the most recent event.message).

  3. Create a new kubectl command to more easily describe resources. I'm imagining a "kubectl usage" that gives an overview of total cluster CPU and Mem, per node CPU and Mem and each pod/container's usage. Here we would include all pods, including system ones. This might be useful long term alongside more complex schedulers, or when your cluster has enough resources but no single node does (diagnosing the 'no holes large enough' problem).

@davidopp

This comment has been minimized.

Copy link
Member

davidopp commented Nov 20, 2015

Something along the lines of (2) seems reasonable, though the UX folks would know better than me.

(3) seems vaguely related to #15743 but I'm not sure they're close enough to combine.

@chrishiestand

This comment has been minimized.

Copy link
Contributor

chrishiestand commented Sep 15, 2016

In addition to the case above, it would be nice to see what resource utilization we're getting.

kubectl utilization requests might show (maybe kubectl util or kubectl usage are better/shorter):

cores: 4.455/5 cores (89%)
memory: 20.1/30 GiB (67%)
...

In this example, the aggregate container requests are 4.455 cores and 20.1 GiB and there are 5 cores and 30GiB total in the cluster.

@xmik

This comment has been minimized.

Copy link

xmik commented Dec 19, 2016

There is:

$ kubectl top nodes
NAME                    CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%   
cluster1-k8s-master-1   312m         15%       1362Mi          68%       
cluster1-k8s-node-1     124m         12%       233Mi           11% 
@ozbillwang

This comment has been minimized.

Copy link

ozbillwang commented Jan 11, 2017

I use below command to get a quick view for the resource usage. It is the simplest way I found.

kubectl describe nodes
@tonglil

This comment has been minimized.

Copy link
Contributor

tonglil commented Jan 20, 2017

If there was a way to "format" the output of kubectl describe nodes, I wouldn't mind scripting my way to summarize all node's resource requests/limits.

@from-nibly

This comment has been minimized.

Copy link

from-nibly commented Feb 1, 2017

here is my hack kubectl describe nodes | grep -A 2 -e "^\\s*CPU Requests"

@jredl-va

This comment has been minimized.

Copy link

jredl-va commented May 25, 2017

@from-nibly thanks, just what i was looking for

@tonglil

This comment has been minimized.

Copy link
Contributor

tonglil commented May 25, 2017

Yup, this is mine:

$ cat bin/node-resources.sh 
#!/bin/bash
set -euo pipefail

echo -e "Iterating...\n"

nodes=$(kubectl get node --no-headers -o custom-columns=NAME:.metadata.name)

for node in $nodes; do
  echo "Node: $node"
  kubectl describe node "$node" | sed '1,/Non-terminated Pods/d'
  echo
done
@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented May 31, 2017

@goltermann There are no sig labels on this issue. Please add a sig label by:
(1) mentioning a sig: @kubernetes/sig-<team-name>-misc
(2) specifying the label manually: /sig <label>

Note: method (1) will trigger a notification to the team. You can find the team list here.

@kargakis

This comment has been minimized.

Copy link
Member

kargakis commented Jun 10, 2017

@alok87

This comment has been minimized.

Copy link

alok87 commented Jul 5, 2017

You can use the below command to find the percentage cpu utlisation of your nodes

alias util='kubectl get nodes | grep node | awk '\''{print $1}'\'' | xargs -I {} sh -c '\''echo   {} ; kubectl describe node {} | grep Allocated -A 5 | grep -ve Event -ve Allocated -ve percent -ve -- ; echo '\'''
Note: 4000m cores is the total cores in one node
alias cpualloc="util | grep % | awk '{print \$1}' | awk '{ sum += \$1 } END { if (NR > 0) { result=(sum**4000); printf result/NR \"%\n\" } }'"

$ cpualloc
3.89358%

Note: 1600MB is the total cores in one node
alias memalloc='util | grep % | awk '\''{print $3}'\'' | awk '\''{ sum += $1 } END { if (NR > 0) { result=(sum*100)/(NR*1600); printf result/NR "%\n" } }'\'''

$ memalloc
24.6832%
@alok87

This comment has been minimized.

Copy link

alok87 commented Jul 21, 2017

@tomfotherby alias util='kubectl get nodes | grep node | awk '\''{print $1}'\'' | xargs -I {} sh -c '\''echo {} ; kubectl describe node {} | grep Allocated -A 5 | grep -ve Event -ve Allocated -ve percent -ve -- ; echo '\'''

@tomfotherby

This comment has been minimized.

Copy link

tomfotherby commented Jul 25, 2017

@alok87 - Thanks for your aliases. In my case, this is what worked for me given that we use bash and m3.large instance types (2 cpu , 7.5G memory).

alias util='kubectl get nodes --no-headers | awk '\''{print $1}'\'' | xargs -I {} sh -c '\''echo {} ; kubectl describe node {} | grep Allocated -A 5 | grep -ve Event -ve Allocated -ve percent -ve -- ; echo '\'''

# Get CPU request total (we x20 because because each m3.large has 2 vcpus (2000m) )
alias cpualloc='util | grep % | awk '\''{print $1}'\'' | awk '\''{ sum += $1 } END { if (NR > 0) { print sum/(NR*20), "%\n" } }'\'''

# Get mem request total (we x75 because because each m3.large has 7.5G ram )
alias memalloc='util | grep % | awk '\''{print $5}'\'' | awk '\''{ sum += $1 } END { if (NR > 0) { print sum/(NR*75), "%\n" } }'\'''
$util
ip-10-56-0-178.ec2.internal
  CPU Requests	CPU Limits	Memory Requests	Memory Limits
  960m (48%)	2700m (135%)	630Mi (8%)	2034Mi (27%)

ip-10-56-0-22.ec2.internal
  CPU Requests	CPU Limits	Memory Requests	Memory Limits
  920m (46%)	1400m (70%)	560Mi (7%)	550Mi (7%)

ip-10-56-0-56.ec2.internal
  CPU Requests	CPU Limits	Memory Requests	Memory Limits
  1160m (57%)	2800m (140%)	972Mi (13%)	3976Mi (53%)

ip-10-56-0-99.ec2.internal
  CPU Requests	CPU Limits	Memory Requests	Memory Limits
  804m (40%)	794m (39%)	824Mi (11%)	1300Mi (17%)

cpualloc 
48.05 %

$ memalloc 
9.95333 %
@nfirvine

This comment has been minimized.

Copy link

nfirvine commented Aug 30, 2017

#17512 (comment) kubectl top shows usage, not allocation. Allocation is what causes the insufficient CPU problem. There's a ton of confusion in this issue about the difference.

AFAICT, there's no easy way to get a report of node CPU allocation by pod, since requests are per container in the spec. And even then, it's difficult since .spec.containers[*].requests may or may not have the limits/requests fields (in my experience)

@misterikkit

This comment has been minimized.

Copy link
Contributor

misterikkit commented Jan 2, 2018

@negz

This comment has been minimized.

Copy link
Contributor

negz commented Feb 21, 2018

Getting in on this shell scripting party. I have an older cluster running the CA with scale down disabled. I wrote this script to determine roughly how much I can scale down the cluster when it starts to bump up on its AWS route limits:

#!/bin/bash

set -e

KUBECTL="kubectl"
NODES=$($KUBECTL get nodes --no-headers -o custom-columns=NAME:.metadata.name)

function usage() {
	local node_count=0
	local total_percent_cpu=0
	local total_percent_mem=0
	local readonly nodes=$@

	for n in $nodes; do
		local requests=$($KUBECTL describe node $n | grep -A2 -E "^\\s*CPU Requests" | tail -n1)
		local percent_cpu=$(echo $requests | awk -F "[()%]" '{print $2}')
		local percent_mem=$(echo $requests | awk -F "[()%]" '{print $8}')
		echo "$n: ${percent_cpu}% CPU, ${percent_mem}% memory"

		node_count=$((node_count + 1))
		total_percent_cpu=$((total_percent_cpu + percent_cpu))
		total_percent_mem=$((total_percent_mem + percent_mem))
	done

	local readonly avg_percent_cpu=$((total_percent_cpu / node_count))
	local readonly avg_percent_mem=$((total_percent_mem / node_count))

	echo "Average usage: ${avg_percent_cpu}% CPU, ${avg_percent_mem}% memory."
}

usage $NODES

Produces output like:

ip-REDACTED.us-west-2.compute.internal: 38% CPU, 9% memory
...many redacted lines...
ip-REDACTED.us-west-2.compute.internal: 41% CPU, 8% memory
ip-REDACTED.us-west-2.compute.internal: 61% CPU, 7% memory
Average usage: 45% CPU, 15% memory.
@ylogx

This comment has been minimized.

Copy link

ylogx commented Feb 21, 2018

There is also pod option in top command:

kubectl top pod
@nfirvine

This comment has been minimized.

Copy link

nfirvine commented Feb 21, 2018

@shtouff

This comment has been minimized.

Copy link

shtouff commented Mar 4, 2018

My way to obtain the allocation, cluster-wide:

$ kubectl get po --all-namespaces -o=jsonpath="{range .items[*]}{.metadata.namespace}:{.metadata.name}{'\n'}{range .spec.containers[*]}  {.name}:{.resources.requests.cpu}{'\n'}{end}{'\n'}{end}"

It produces something like:

kube-system:heapster-v1.5.0-dc8df7cc9-7fqx6
  heapster:88m
  heapster-nanny:50m
kube-system:kube-dns-6cdf767cb8-cjjdr
  kubedns:100m
  dnsmasq:150m
  sidecar:10m
  prometheus-to-sd:
kube-system:kube-dns-6cdf767cb8-pnx2g
  kubedns:100m
  dnsmasq:150m
  sidecar:10m
  prometheus-to-sd:
kube-system:kube-dns-autoscaler-69c5cbdcdd-wwjtg
  autoscaler:20m
kube-system:kube-proxy-gke-cluster1-default-pool-cd7058d6-3tt9
  kube-proxy:100m
kube-system:kube-proxy-gke-cluster1-preempt-pool-57d7ff41-jplf
  kube-proxy:100m
kube-system:kubernetes-dashboard-7b9c4bf75c-f7zrl
  kubernetes-dashboard:50m
kube-system:l7-default-backend-57856c5f55-68s5g
  default-http-backend:10m
kube-system:metrics-server-v0.2.0-86585d9749-kkrzl
  metrics-server:48m
  metrics-server-nanny:5m
kube-system:tiller-deploy-7794bfb756-8kxh5
  tiller:10m
@kierenj

This comment has been minimized.

Copy link

kierenj commented Mar 13, 2018

This is weird. I want to know when I'm at or nearing allocation capacity. It seems a pretty basic function of a cluster. Whether it's a statistic that shows a high % or textual error... how do other people know this? Just always use autoscaling on a cloud platform?

@dpetzold

This comment has been minimized.

Copy link

dpetzold commented May 1, 2018

I authored https://github.com/dpetzold/kube-resource-explorer/ to address #3. Here is some sample output:

$ ./resource-explorer -namespace kube-system -reverse -sort MemReq
Namespace    Name                                               CpuReq  CpuReq%  CpuLimit  CpuLimit%  MemReq    MemReq%  MemLimit  MemLimit%
---------    ----                                               ------  -------  --------  ---------  ------    -------  --------  ---------
kube-system  event-exporter-v0.1.7-5c4d9556cf-kf4tf             0       0%       0         0%         0         0%       0         0%
kube-system  kube-proxy-gke-project-default-pool-175a4a05-mshh  100m    10%      0         0%         0         0%       0         0%
kube-system  kube-proxy-gke-project-default-pool-175a4a05-bv59  100m    10%      0         0%         0         0%       0         0%
kube-system  kube-proxy-gke-project-default-pool-175a4a05-ntfw  100m    10%      0         0%         0         0%       0         0%
kube-system  kube-dns-autoscaler-244676396-xzgs4                20m     2%       0         0%         10Mi      0%       0         0%
kube-system  l7-default-backend-1044750973-kqh98                10m     1%       10m       1%         20Mi      0%       20Mi      0%
kube-system  kubernetes-dashboard-768854d6dc-jh292              100m    10%      100m      10%        100Mi     3%       300Mi     11%
kube-system  kube-dns-323615064-8nxfl                           260m    27%      0         0%         110Mi     4%       170Mi     6%
kube-system  fluentd-gcp-v2.0.9-4qkwk                           100m    10%      0         0%         200Mi     7%       300Mi     11%
kube-system  fluentd-gcp-v2.0.9-jmtpw                           100m    10%      0         0%         200Mi     7%       300Mi     11%
kube-system  fluentd-gcp-v2.0.9-tw9vk                           100m    10%      0         0%         200Mi     7%       300Mi     11%
kube-system  heapster-v1.4.3-74b5bd94bb-fz8hd                   138m    14%      138m      14%        301856Ki  11%      301856Ki  11%
@harryge00

This comment has been minimized.

Copy link
Contributor

harryge00 commented May 22, 2018

@shtouff

root@debian9:~# kubectl get po -n chenkunning-84 -o=jsonpath="{range .items[*]}{.metadata.namespace}:{.metadata.name}{'\n'}{range .spec.containers[*]}  {.name}:{.resources.requests.cpu}{'\n'}{end}{'\n'}{end}"
error: error parsing jsonpath {range .items[*]}{.metadata.namespace}:{.metadata.name}{'\n'}{range .spec.containers[*]}  {.name}:{.resources.requests.cpu}{'\n'}{end}{'\n'}{end}, unrecognized character in action: U+0027 '''
root@debian9:~# kubectl version
Client Version: version.Info{Major:"1", Minor:"6+", GitVersion:"v1.6.7-beta.0+$Format:%h$", GitCommit:"bb053ff0cb25a043e828d62394ed626fda2719a1", GitTreeState:"dirty", BuildDate:"2017-08-26T09:34:19Z", GoVersion:"go1.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"6+", GitVersion:"v1.6.7-beta.0+$Format:84c3ae0384658cd40c1d1e637f5faa98cf6a965c$", GitCommit:"3af2004eebf3cbd8d7f24b0ecd23fe4afb889163", GitTreeState:"clean", BuildDate:"2018-04-04T08:40:48Z", GoVersion:"go1.8.1", Compiler:"gc", Platform:"linux/amd64"}

@nfirvine

This comment has been minimized.

Copy link

nfirvine commented May 22, 2018

@harryge00: U+0027 is a curly quote, probably a copy-paste problem

@harryge00

This comment has been minimized.

Copy link
Contributor

harryge00 commented May 25, 2018

@nfirvine Thanks! I have solved problem by using:


kubectl get pods -n my-ns -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].resources.limits.cpu} {"\n"}{end}' |awk '{sum+=$2 ; print $0} END{print "sum=",sum}'

It works for namespaces whose pods only containing 1 container each.

@abushoeb

This comment has been minimized.

Copy link

abushoeb commented Jun 5, 2018

@xmik Hey, I'm using k8 1.7 and running hepaster. When I run $ kubectl top nodes --heapster-namespace=kube-system, it shows me "error: metrics not available yet". Any clue for tackling the error?

@xmik

This comment has been minimized.

Copy link

xmik commented Jun 5, 2018

@abushoeb:

  1. I don't think kubectl top supports flag: --heapster-namespace. Edit: this flag is supported, you were right: #44540 (comment) .
  2. If you see "error: metrics not available yet", then you should check heapster deployment. What its logs say? Is the heapster service ok, endpoints are not <none>? Check the latter with a command like: kubectl -n kube-system describe svc/heapster
@abushoeb

This comment has been minimized.

Copy link

abushoeb commented Jun 5, 2018

@xmik you are right, the heapster wasn't configured properly. Thanks a lot. It's working now. Do you know if there is a way to get real-time GPU usage information? This top command only gives CPU and Memory usage.

@xmik

This comment has been minimized.

Copy link

xmik commented Jun 5, 2018

I don't know that. :(

@avgKol

This comment has been minimized.

Copy link

avgKol commented Jun 21, 2018

@abushoeb I am getting the same error "error: metrics not available yet" . How did you fix it?

@abushoeb

This comment has been minimized.

Copy link

abushoeb commented Jun 22, 2018

@avgKol check you heapster deployment first. In my case, it was not deployed properly. One way to check it is to access metrics via CURL command like curl -L http://heapster-pod-ip:heapster-service-port/api/v1/model/metrics/. If it doesn't show metrics then check the heapster pod and logs. The hepster metrics can be accessed via a web browser too like this.

@hjacobs

This comment has been minimized.

Copy link

hjacobs commented Jul 18, 2018

If anybody is interested, I created a tool to generate static HTML for Kubernetes resource usage (and costs): https://github.com/hjacobs/kube-resource-report

@tonglil

This comment has been minimized.

Copy link
Contributor

tonglil commented Jul 18, 2018

@hjacobs I would like to use that tool but not a fan of installing/using python packages. Mind packaging it up as a docker image?

@hjacobs

This comment has been minimized.

Copy link

hjacobs commented Jul 18, 2018

@tonglil the project is pretty early, but my plan is to have an out-of-the-box ready Docker image incl. webserver which you can just do kubectl apply -f .. with.

@arun-gupta

This comment has been minimized.

Copy link
Contributor

arun-gupta commented Sep 12, 2018

Here is what worked for me:

kubectl get nodes -o=jsonpath="{range .items[*]}{.metadata.name}{'\t'}{.status.allocatable.memory}{'\t'}{.status.allocatable.cpu}{'\n'}{end}"

It shows output as:

ip-192-168-101-177.us-west-2.compute.internal	251643680Ki	32
ip-192-168-196-254.us-west-2.compute.internal	251643680Ki	32
@hjacobs

This comment has been minimized.

Copy link

hjacobs commented Sep 13, 2018

@tonglil a Docker image is now available: https://github.com/hjacobs/kube-resource-report

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Dec 17, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@geerlingguy

This comment has been minimized.

Copy link

geerlingguy commented Dec 30, 2018

/remove-lifecycle stale

Every month or so, my Googling leads me back to this issue. There are ways of getting the statistics I need with long jq strings, or with Grafana dashboards with a bunch of calculations... but it would be so nice if there were a command like:

# kubectl utilization cluster
cores: 19.255/24 cores (80%)
memory: 16.4/24 GiB (68%)

# kubectl utilization [node name]
cores: 3.125/4 cores (78%)
memory: 2.1/4 GiB (52%)

(similar to what @chrishiestand mentioned way earlier in the thread).

I am often building and destroying a few dozen test clusters per week, and I'd rather not have to build automation or add in some shell aliases to be able to just see "if I put this many servers out there, and toss these apps on them, what is my overall utilization/pressure".

Especially for smaller / more esoteric clusters, I don't want to set up autoscale-to-the-moon (usually for money reasons), but do need to know if I have enough overhead to handle minor pod autoscaling events.

@evankanderson

This comment has been minimized.

Copy link

evankanderson commented Jan 1, 2019

One additional request -- I'd like to be able to see summed resource usage by namespace (at a minimum; by Deployment/label would also be useful), so I can focus my resource-trimming efforts by figuring out which namespaces are worth concentrating on.

@etopeter

This comment has been minimized.

Copy link

etopeter commented Jan 13, 2019

I made a small plugin kubectl-view-utilization, that provides functionality @geerlingguy described. Installation via krew plugin manager is available. This is implemented in BASH and it needs awk and bc.
With kubectl plugin framework this could be completely abstracted away from core tools.

@weeco

This comment has been minimized.

Copy link

weeco commented Mar 3, 2019

I am glad others were also facing this challenge. I created Kube Eagle (a prometheus exporter) which helped me gaining a better overview of cluster resources and ultimately let me better utilize the available hardware resources:

https://github.com/google-cloud-tools/kube-eagle

Kubernetes Resource monitoring dashboard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.