Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upKubernetes SD consuming too much/leaking memory #2685
Comments
This comment has been minimized.
This comment has been minimized.
|
We are having similar reports in the operator repo and told people to just allocate more resources. But as we experienced now ourselves, there must be something else going wrong: coreos/prometheus-operator#326 |
This comment has been minimized.
This comment has been minimized.
|
I think the urgency score behaves reasonably, given the circumstances, which are that this Prometheus server is using too much RAM. So urgency goes up to 1, ingestion stops, all to reduce memory usage to prevent OOMing. The urgency calculation of 1.6 works completely different from 1.5. The latter does not incorporate heap size at all. So it'll happily OOM without a dent in the urgency score. What we have to look for is why so much RAM is used even with this modest load. Could be memory-hungry queries, memory-hungry LevelDB, … First step is to look at the process_resident_memory_bytes and max_over_time and min_over_time of go_memstats_alloc_bytes. Then compare this to prometheus_local_storage_open_head_chunks, prometheus_local_storage_chunks_to_persist, and prometheus_local_storage_memory_chunks. If you post both plots here, I'll help interpreting them. It could also be a "Prometheus cannot count that low" issue. The container manifest has only 1GiB target heap size. Perhaps even with the modest load, there is so much baseline memory usage, that 1GiB heap size (and 1.5GiB or even 2GiB total RAM) is not enough. Having said that, we are running a couple of 1GiB target heap size Prometheis here at SC. They seem to do just fine. The two smallest are configured at 1.08 GiB target heap size. They show quite different patterns:
In general, even lighter load than in your case. Not sure if that explains it already. The difference might also be in the queries (incl. rule evaluation), relabeling, larger LevelDB size because of longer/more labels, whatever… As said, the memory plot as described above would be helpful to look at. |
This comment has been minimized.
This comment has been minimized.
|
I found another one that gets a bit closer to your load: Target heap size 1.3GiB, peak heap size just below that as expected, peak RSS 1.55GiB, 750k memory chunks, 60k series in memory, 60 targets, 1.4k samples/s. This one is doing just fine, too. All examples are EC2 instances. Non-containerized… |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
First result (byproduct): the prometheus_local_storage_open_head_chunks has a bug and gets negative... I guess that happens if a chunk is closed due to time-out. Perhaps we decrement it twice or something. Will look into it... see #2687 |
This comment has been minimized.
This comment has been minimized.
|
Preliminary analysis: I have to guess a bit which one is 1.5 and which one is 1.6 (not visible in the legend), and I also have to guess when the OOMs happen, but I'm pretty sure I guessed right. So: 1.5 just goes up and up in RAM usage until it OOMs. This is correlated with an increase in memory chunks, but the RAM increases way more than can be justified by the memory chunks. I.e. you max out at about 125k memory chunks, while using 2.0GiB RSS (heap range between 1.0GiB and 1.8GiB). That's a 14x overhead in terms of RSS, which is quite a lot. (I saw an overhead like that only with an extremely large index due to excessive amount of labels and length of label names and values (but not necessarily high series cardinality)). This is really weird, something must go horribly wrong here. First conclusion: This is a problem that exists with 1.5 as well. I don't know what is happening, but the different behavior of 1.6 can be explained be the different strategy to deal with this problem. See next paragraph. So what does 1.6 do? RAM is going up and up and up, so when it gets to the limits, 1.6 evicts all the chunks it can (you see memory chunk dropping). This helps a bit in terms of memory usage, you see the heap size drop. But it only makes a little dent. Memory usage still increases after the drop. Since there are no more chunks to evict, 1.6 simply stops ingestion completely. Number of memory chunks drops to 0 eventually. However, the heap still grows!!! No ingestion, no chunks in memory, but still something takes more and more heap space. Eventually, even 1.6 OOMs, and everything starts from the beginning. You have to find out what takes so much heap, even if no ingestion is happening anymore and all memary chunks have been evicted. 1.6 make that easy as it actually does the above, while in 1.5, you still have the "noise" of ingesting and memory chunks. If you take a heap profile of a 1.6 server at the time it has practically no memory chunks anymore, you should clearly see where the heap is consumed. |
This comment has been minimized.
This comment has been minimized.
|
FTR |
This comment has been minimized.
This comment has been minimized.
|
Thanks for the thorough analysis @beorn7 We'll look into more detailed heap profiles then. |
fabxc
changed the title
Prometheus 1.6 runs unstable
Kubernetes SD consuming too much/leaking memory
May 10, 2017
This comment has been minimized.
This comment has been minimized.
|
We also ran a 2.0 server and it runs into the same issues. A mem profile there was rather interesting, because there's almost nothing to see aside from k8s SD. So there's obviously a problem with the SD and we are receiving about 15 updates/sec on service endpoints across 8 endpoint configs. That alone doesn't seem to justify the k8s client decoders holding over 800MB of memory though. So while we are certainly duplicating subscriptions across SD configs, that alone shouldn't really be responsible for what are seeing here. I renamed the issue as it's obviously a different problem. |
This comment has been minimized.
This comment has been minimized.
|
This seems to mostly occur when monitoring endpoints of k8s components (scheduler, controller manager, ...) which basically receive an update every second (without actually any changes in the endpoint list). What's getting updated (every second or less) is an annotation in the control plane components: Current theory is that those updates trigger us to rebuild the targets from those endpoints, which should only hit our caches. But maybe the caches hold the raw data and every cache access triggers a decoding and its just that decoding is incredibly memory hungry. That would explain why we get endpoint events but memory usage shows in pods. |
This comment has been minimized.
This comment has been minimized.
f0
commented
May 18, 2017
|
I can confirm with 1.6.2 , the probleme is gone,,, |
This comment has been minimized.
This comment has been minimized.
|
Great, thanks! |
fabxc
closed this
May 18, 2017
This comment has been minimized.
This comment has been minimized.
ichekrygin
commented
Jun 15, 2017
|
Hello, I think I see "memory leak" issue in Configuration:
Please LMK if you want me to open a separate issue, or need more info? Thank you |
This comment has been minimized.
This comment has been minimized.
|
@ichekrygin can you run |
This comment has been minimized.
This comment has been minimized.
ichekrygin
commented
Jun 16, 2017
This comment has been minimized.
This comment has been minimized.
|
That memory goes to the storage, so not k8s SD related. From which version did you upgrade? |
This comment has been minimized.
This comment has been minimized.
ichekrygin
commented
Jun 16, 2017
|
I see, so it could be we just packing too many TS's into it. We upgraded from |
This comment has been minimized.
This comment has been minimized.
|
@ichekrygin You need to check your flag values. The forward compatibility is just educated guess work, i.e. Prometheus tries to guess from your configured value of Further reading: https://prometheus.io/docs/operating/storage/ If you still have questions, please mail the prometheus-users mailing list as this doesn't seem to have anything to do with the K8s issue this GitHub issue is about. |
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 23, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |










fabxc commentedMay 8, 2017
We added support for 1.6 to the Prometheus operator and attempted switching to it.
This turned out very problematic. At a target heap size of 1.5GB we have a memory limit of 2GB. About 20 (stable) targets are monitored at a load of 2k/samples/sec at ~30k mem series.
Prometheus 1.6 consistently grows memory to 2GB and gets OOM killed. The urgency score is very unstable and randomly shoots up to 1 and sometimes stays there for hours without any apparent reason (again, environment is very stable) and as a consequency stops ingesting any data.
Bottom line a 1.5 for comparison. Not sure whether this relates to GOGC=40 or some other changes that were made. But practically speaking, this version seems completely unusable for some reason. As we haven't gotten significant other reports yet, I wonder whether it's something we do wrong but given the environment, I've no idea what that could possibly be.
What supports that is that 1.5 also gets OOM killed way more often than I'd expect at that load – but it recovers sort of immediately and doesn't block ingestion for hours. All memory issues of <=1.5 aside, even then >2GB seems excessive for that load.
This is the 1.6 pod manifest in case it shows some misconfiguration: https://gist.github.com/brancz/ae4f6e6767999718a09d83d52c882c49
@beorn7