Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upOOM :killed process(prometheus),is there memory leak? #1549
Comments
This comment has been minimized.
This comment has been minimized.
|
That Prometheus should only be using ~3GB of RAM, but it looks like it'll top out at ~70GB. Do you happen to have over 20M timeseries? If so you need a bigger box and to increase -storage.local.memory-chunks. |
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil thanks for your quickly replay. ah, what do you mean that 20M timeseries? may i ask you for more description on the relation among of them? BTW, node and cadvisor is all of my exporter running there. thanks a lot. |
This comment has been minimized.
This comment has been minimized.
|
What's the value of |
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
|
37k timeseries, there's something very wrong here. Can you try the latest Prometheus, and get us a pprof memory profile? |
This comment has been minimized.
This comment has been minimized.
|
ok, the latest is https://github.com/prometheus/prometheus/releases/tag/0.18.0rc1 ?and ,how to get the pprof memory profile? i'm a newbie in profiling. thanks. |
This comment has been minimized.
This comment has been minimized.
|
https://golang.org/pkg/net/http/pprof/ the heap profile is what we want. |
brian-brazil
added
bug
labels
Apr 12, 2016
This comment has been minimized.
This comment has been minimized.
|
Example:
Then send the heap.svg. |
grobie
referenced this issue
Apr 12, 2016
Closed
Create tool to make it easy to share debug information #1551
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil @juliusv sorry for late replay, i got some trouble on putting the pprof file (about 40+kB) because the security of my company proxy. and any debug information or investigate point you can let me know. until i break the security limitation of proxy. thanks a lot. up to 16GB in 1 hours, here is the top20
|
This comment has been minimized.
This comment has been minimized.
|
hi, i'm sure what cause the high memory. but i'm not sure it's a problem or not. the cause is remote opentsdb is down. the memory of prometheus process will increase endless until OOM. if i repair the opentsdb and start it again. the memory of prometheus is acceptable. up to 1.9 GB in 4 hours. @brian-brazil @juliusv should you confirm the abnormal case, that the remote is down forever. thanks. |
This comment has been minimized.
This comment has been minimized.
|
That doesn't sound like it, as there's timeouts and other limits on that code path. Can you try without opentsdb configured to be sure? |
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil , yes i'm validating without opentsdb. BTW, it's the heap.svg that memory is up to 16 GB in 1 hours. you can investigate it. |
This comment has been minimized.
This comment has been minimized.
|
The heap graph indicates a leak in the remote storage code. Nothing is jumping out at me from the code. |
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil @juliusv , i had did the validation, several point cause the OOM,
then, the pending samples will grow more and more large until to OOM. is it possible? any suggestion? Regards. |
brian-brazil
removed
the
question
label
Apr 25, 2016
This comment has been minimized.
This comment has been minimized.
|
Is Prometheus producing any log messages about remote storage? |
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil , only io timeout print. |
This comment has been minimized.
This comment has been minimized.
|
Okay, so we're not dropping samples on the floor in the queue manager. Therefore I suspect the problem is in https://github.com/prometheus/prometheus/blob/master/storage/remote/opentsdb/client.go#L74-L131 |
fabxc
added this to the v1.0.0 milestone
Apr 25, 2016
fabxc
added
kind/bug
and removed
bug
labels
Apr 28, 2016
This comment has been minimized.
This comment has been minimized.
|
I just got a report from someone who implemented their own writer to remote storage based on the graphite one that they also see memory issues - but also CPU. Did you see CPU issues? This would hint that the issue isn't just with the opentsdb code. |
This comment has been minimized.
This comment has been minimized.
|
as the #issuecomment-208775153 said, the cpu is normal, only up to 16 percent. |
brian-brazil
referenced this issue
May 11, 2016
Closed
it takes too much time to stop prometheus using SIGTERM #1622
This comment has been minimized.
This comment has been minimized.
|
#1643 should fix this for you, if not please let us know. |
brian-brazil
closed this
May 20, 2016
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 24, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |

guanglinlv commentedApr 12, 2016
Hi, I have a single prometheus server that scrape about 50+ targets. it'will be OOM running several hours. i'm confused that,
thanks.