-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubectl uses far more memory than expected #978
Comments
|
@Michael-Sinz thanks for reporting! I have a feeling this is related to #866. For triage sake could you please test this with v1.19.3 and the v1.20 beta since you have access to a large cluster? https://storage.googleapis.com/kubernetes-release/release/v1.19.3/bin/linux/amd64/kubectl |
|
A quick test (will do more tomorrow) shows that 1.20.0-beta.1 is actually slightly more memory use (very small) but I need to get access to the large production clusters to do a full verification as it is after business hours now and things are scaling down from their peak usage. (We see 4x to 10x scale variation from low to high during the day.) Also still seeing the significantly higher memory use to output yaml than to output json (multiples of size) |
|
I have run the 1.20.0-beta.1 and 1.19.3 now against a large cluster. Note that the cluster is rather dynamic so the runs are not of the exact same state across the runs. I also could not complete the yaml output as 1.20.0-beta.1 ran out of memory on my access VM (which has 16GBytes of RAM). In json it does complete but it is still over 9x the memory compared to the output. (7.3GBytes of RSS for 781MBytes of output in json) However, based on minor testing, it seems that 1.20 is better than 1.19: Just for comparison, the kubectl 1.18.6 run against the cluster. Note that it seems to use less memory than 1.19 or 1.20 (but, again, conditions continue to change in the cluster. I would look at output size vs RSS used as a measure at this scale) |
|
This came up in our sig-cli meeting today. This might be addressed by not sorting the data or streaming the output as it comes in. We currently toss everything into a /triage accepted |
|
I do find it interesting that the memory use is so far more than the size of the JSON (and even worse, the YAML - still unclear how we can used that much more memory just because the output is YAML) While the kind: List may be a part of this, I think the bigger problem is more fundamental. |
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
|
As far as I know, this issue is still live - the amount of memory used to get the complete pod status of the cluster is amazingly large for non-trivial clusters. Far more than expected given the size of the (rather verbose) json formatted output. /remove-lifecycle stale |
|
@Michael-Sinz turns out we have some profiling built-in to kubectl. Can you please run the following and upload the dumps? If you want to take a peak at what you're uploading you can run |
|
I will have to see what I can poke at. I assume no details about the cluster will be in the profile reports, right? I may not have the same size cluster available right now - we dynamically scale rather drastically all the time (up and down) so the overall scale of the cluster changes. (In addition to moving various workloads between different clusters) But I should have some interesting files to post soon. |
From what I can tell. Here's the proto it writes out. |
|
Note that our clusters are relatively dynamic so I can not assure you that the same state exists across all of the runs. I did two clusters - a smaller one and a larger one. The larger one took a long time to complete and I had to use an different machine to run the test on it as the memory use was too high for my normal dev VM. (It literally was OOM killed by the kernel on my regular dev VM) I hope the data provides what you need. A quick look at the allocs shows a major cost for yaml. Almost like it produces the json in memory and then produces the yaml. Both show a rather large amount of memory usage. json: yaml: |
|
/cc |
|
whoa! @eddiezane |
|
cc @liggitt |
This is correct - in Kubernetes yaml is produced by translation from json. Note that Go will typically make a heap twice as big as it needs for your data; you can adjust this with the I tend to agree that #866 is the real underlying problem. |
|
I think #866 is part of the problem but there must be some serious inefficiencies when even the textual form of json (which is rather verbose and indented/etc) is many times smaller than the memory needed to produce it. Something very wrong is going on when that is the case. (Not to mention that the yaml is a translation of the json and not a serialization of the same objects that produced the json - I don't really care since I can consume the json just as well as the yaml) RAM used: 6,952,484K Size of json output: 757,897K Where is the extra memory being used? Even if this is 2x the actual used, we are still at 3,476,242K of memory used. Where did that extra 2.4GB of RAM get used over the text form of the json data? How much additional data or structures would be needed to do this given the fact that the json has so much redundant data in it due to the k:v syntax. |
that's not surprising at all. a json map can be represented in two bytes: |
First, we are not talking about empty set. They are special cases and in both cases are harder to measure. Once you have thousands of objects in there, things should look different. With a few thousand pods, one would have 'metadata' key (and all of the other keys in the maps) a few thousand times but they don't need to be duplicated across the maps. And yet, in the actual json output they are duplicated due to the format. When I wrote parsers and data storage for things like this, the keys were always mapped into unique entries. Only the customer data may wish to be unique each time (and that too is false in kubernetes since there is a lot of duplication there too) so I would look at string tables or unique-string types of solutions. Especially since strings are usually seen as immutable - you replace the whole thing, not edit in place, so this works out amazingly well. We should be able to get closer to the actual information content (information theory) in memory and be faster and more efficient because of it since you will not bust the caches so hard. (Or make the working set so high) |
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
|
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
|
/remove-lifecycle rotten |
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
|
/remove-lifecycle rotten |
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
|
This issue has not been updated in over 1 year, and should be re-triaged. You can:
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/ /remove-triage accepted |
|
Any updates on this issue? I've run into this problem as well. Getting ~600 resources within a namespace uses 400MB, the resulting yaml is 22MB, json is 31MB. |
|
/help |
|
@pacoxu: GuidelinesPlease ensure that the issue body includes answers to the following questions:
For more details on the requirements of such an issue, please see here and ensure that they are met. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |

We use kubectl in some jobs to check on the cluster. However, we have found that it can be rather costly in memory use in either small or large clusters. Even more surprising, getting the results as YAML costs more than twice the memory as getting the results as JSON (even though the YAML output is usually half the size of the JSON output)
What happened:
First, the amount of memory used has gone up over the versions of kubernetes - significant increase in 1.18 vs 1.15 (the prior time I looked).
However, there is obviously something wrong with kubectl since the amount of use is far beyond reasonable.
Using the /usr/bin/time command to measure process resource consumption when doing these commands to a cluster we have and using wc to show the number of characters created in the output we see that the number of characters of yaml output is 479,416,955 and the amount of memory used by kubectl is 4,143,732K (4 gigabytes!) or around 9 times the size of the output!
What is worse is when doing the same with yaml we get noticeably smaller 233,672,375 characters of output (which is due to yaml being far more compact representation) and yet we consume 14,683,072K (yes, 14 gigabytes!) which is almost 70 times as much memory as the output and over 3 times as much memory as producing the same data in JSON!
vs
What you expected to happen:
I would expect that this output would be produced at the most at around 1x of data size since serialization of the fields has a lot of repeated elements (field names) that should not be repeated in memory.
Even worse was the drastic jump in memory consumption based on output format! Going to 14 Gigabytes of peak RAM use for yaml output!
For contrast, see these two calls (same thing) to a much, much smaller cluster:
JSON output size: 3,385,403 while RAM used is 69,528K
YAML output size: 1,608,293 while RAM used is 182,472K
The YAML output is nearly 3 times the memory used by the kubectl process (and again 1/2 the output size)
vs
How to reproduce it (as minimally and precisely as possible):
See above - the cluster above is not tiny - this effect is much more impactful on larger clusters but even on small clusters the memory consumption is far larger than the size of the data.
Anything else we need to know?:
We have small periodic jobs that use the kubectl API to do some work. We like the abstraction this gives us from the underlying APIs for these higher level jobs. They have also worked across Kubernetes versions, other than the resource requirements.
We had already switched to json output and processing due to kubectl's large memory footprint for yaml but have again hit issues with the large memory footprint (over 4GB just for kubectl process).
Environment:
Kubernetes 1.18.6 and kubectl 1.18.6
Running in Azure
OS Ubuntu 18.04.5 LTS (workstation and all cluster VMs)
The text was updated successfully, but these errors were encountered: