Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upprometheus memory leak #4372
Comments
This comment has been minimized.
This comment has been minimized.
maurorappa
commented
Jul 12, 2018
•
|
bit generic, eh? Leaving to the more experts people now :) |
This comment has been minimized.
This comment has been minimized.
|
@maurorappa thanks! @Fadih yeah we need more info here. Why do you think there is a memory leak would be a good starts. the 2.3.2 release #4370 will include a debug command for the promtool so you can use it attach the debug info. |
This comment has been minimized.
This comment has been minimized.
tonobo
commented
Jul 16, 2018
|
@maurorappa We could see the same behavior. We've discoverd the query which will trigger the exception.
The requested memory stats below:
|
This comment has been minimized.
This comment has been minimized.
|
When you say a memory leak, does that mean the memory usage starts growing to the point where Prometheus gets OOM killed? Also gathering some snapshots from the Grafana would be useful. |
This comment has been minimized.
This comment has been minimized.
|
the promtool debug is included in the latest 2.3.2 release. |
This comment has been minimized.
This comment has been minimized.
tonobo
commented
Jul 16, 2018
•
This comment has been minimized.
This comment has been minimized.
|
@tonobo I am thinking the chances are that your issue is not related to the original report so could you please open a new one. Please be as specific as possible with steps and config to replicate and attach the file produced by the Everyone here want's to help, but We can't determine if the problem is with your config, the local setup or an actual bug if we don't have enough information. |
This comment has been minimized.
This comment has been minimized.
tonobo
commented
Jul 17, 2018
|
I'll reopen another issue. I'm unable to find the debug option, the promtool from latest release wouldn't support this option. |
This comment has been minimized.
This comment has been minimized.
|
yes sorry , because of the moratorium this didn't go in the last release. I just merged it so you can build the promtool from source or use the binary attached here that I have just build for linux. |
This comment has been minimized.
This comment has been minimized.
tonobo
commented
Jul 19, 2018
|
Great! Thank you. Please see the attached debug output. |
This comment has been minimized.
This comment has been minimized.
|
Thanks, |
simonpasquier
added
the
kind/more-info-needed
label
Jul 23, 2018
This comment has been minimized.
This comment has been minimized.
|
@tonobo I suspect that you've got too many time series for Prometheus to handle. Your report says that @Fadih if you still have the issue, please attach the output of |
This comment has been minimized.
This comment has been minimized.
tonobo
commented
Jul 23, 2018
|
@simonpasquier This sounds legit. In my opinon this worked quite better before upgrading prometheus? |
This comment has been minimized.
This comment has been minimized.
|
@tonobo you're hitting the limits of Prometheus so maybe your server was ingesting a little less time series before or your query load has increased. But in any case this doesn't look like a memory leak in your case. |
This comment has been minimized.
This comment has been minimized.
tonobo
commented
Jul 24, 2018
|
Ok, thank you for clarification. |
This comment has been minimized.
This comment has been minimized.
|
hi , i found what caused a memory leak |
This comment has been minimized.
This comment has been minimized.
|
@Fadih thanks for the follow up. I'm closing the issue then. |
simonpasquier
closed this
Jul 24, 2018
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 22, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |


Fadih commentedJul 12, 2018
•
edited
i had memory leak when i run my prometheus for 5 hours ,
as i notice if i deleting all the files under /wall/* this will resolve the issue for a while
Environment
production
Linux 4.9.81-35.56.amzn1.x86_64 x86_64
Prometheus version:
version 2.3.0
Alertmanager version:
0.14.0
Prometheus configuration file:
global:
scrape_interval: 30s # By default, scrape targets every 15 seconds.
rule_files:
- /etc/prometheus/itrs-alerts.yml
- /etc/prometheus/itms-alerts.yml
- /etc/prometheus/common-alerts.yml
- /etc/prometheus/recording-rules.yml
scrape_configs:
job_name: 'prometheus'
static_configs:
job_name: 'ecs-cluster'
static_configs:
metrics_path: /ecs-exporter-metrics
scheme: http
job_name: 'prometheus-ec2-instance-devtools'
ec2_sd_configs:
access_key: $AWS_ACCESS_KEY_ID
secret_key: $AWS_SECRET_ACCESS_KEY
port: 9100
relabel_configs:
regex: devtools-prometheus-monitoring
action: keep
metrics_path: /metrics
scheme: http
job_name: 'ecs-instances-tagger-devtools'
ec2_sd_configs:
access_key: $AWS_ACCESS_KEY_ID
secret_key: $AWS_SECRET_ACCESS_KEY
port: 1234
relabel_configs:
regex: devtools-prometheus-monitoring
action: keep
metrics_path: /metrics
scheme: http
job_name: 'instrumentedTest-server-node'
ec2_sd_configs:
access_key: $AWS_ACCESS_KEY_ID
secret_key: $AWS_SECRET_ACCESS_KEY
port: 9100
relabel_configs:
regex: espresso-server
action: keep
regex: .*
action: keep
target_label: project
target_label: name
target_label: instance_id
metrics_path: /metrics
scheme: http