Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upPrometheus is restarting again and again #5016
Comments
This comment has been minimized.
This comment has been minimized.
|
I suspect that the Prometheus container gets OOMed by the system. Please try to know whether there's something about this in the Kubernetes logs. Also what are the memory limits of the pod? |
simonpasquier
added
the
kind/more-info-needed
label
Dec 19, 2018
This comment has been minimized.
This comment has been minimized.
|
resource limit
|
This comment has been minimized.
This comment has been minimized.
|
@simonpasquier seen the kublet log, can't able to see any problem there |
This comment has been minimized.
This comment has been minimized.
|
Can you get any information from Kubernetes about whether it killed the pod or the application crashed? Maybe looking at the events... |
This comment has been minimized.
This comment has been minimized.
|
I had a same issue before, the prometheus server restarted again and again. I deleted a wal file and then it was normal. |
This comment has been minimized.
This comment has been minimized.
|
@aixeshunter where i can find wal file?? |
This comment has been minimized.
This comment has been minimized.
|
@simonpasquier , from the logs, think Prometheus pod is looking for prometheus.conf to be loaded but when it can't able to load the conf file it restarts the pod and the pod was still there but it restarts the |
inyee786
changed the title
prmetheus is restating again and again
Prometheus is restating again and again
Dec 23, 2018
This comment has been minimized.
This comment has been minimized.
|
@simonpasquier, after the below log the prometheus container restarted
|
This comment has been minimized.
This comment has been minimized.
zrbcool
commented
Dec 26, 2018
•
|
we have the same issue also with version prometheus:v2.6.0
lots of memory using before crash in zabbix the timezone is +8 China time zone |
This comment has been minimized.
This comment has been minimized.
zrbcool
commented
Dec 26, 2018
|
also paste the dmesg message when OOM
|
This comment has been minimized.
This comment has been minimized.
|
@zrbcool how many workload/application you are running in the cluster, did you added node selection for Prometheus deployment? |
This comment has been minimized.
This comment has been minimized.
|
kublet log while starting the Prometheus
kublet log at the time of Prometheus stop
|
This comment has been minimized.
This comment has been minimized.
|
@aixeshunter did you have created docker image of Prometheus without a wal file? |
This comment has been minimized.
This comment has been minimized.
|
The memory requirements depend mostly on the number of scraped time series (check the @inyee786 you could increase the memory limits of the Prometheus pod. @zrbcool IIUC you're not running Prometheus with cgroup limits so you'll have to increase the amount of RAM or reduce the number of scrape targets. |
This comment has been minimized.
This comment has been minimized.
|
@simonpasquier |
This comment has been minimized.
This comment has been minimized.
|
A rough estimation is that you need at least 8kB per time series in the head (check the |
inyee786
changed the title
Prometheus is restating again and again
Prometheus is restarting again and again
Jan 9, 2019
This comment has been minimized.
This comment has been minimized.
|
@simonpasquier and i used 2.0.0 version and it is working |
This comment has been minimized.
This comment has been minimized.
|
@inyee786 can you increase the memory limits and see if it helps? getting the logs from the crashed pod would also be useful. |
This comment has been minimized.
This comment has been minimized.
dcvtruong
commented
Mar 11, 2019
•
|
Hi @simonpasquier, I've also getting this error in the prometheus-server (v2.6.1 + k8s 1.13). I've increased the RAM but prometheus-server never recover. Is there a remedy or workaround?
|
This comment has been minimized.
This comment has been minimized.
nickychow
commented
Mar 17, 2019
|
I got the exact same issues. The prometheus-server is running on 16G RAM worker nodes without the resource limits. Kubelet logs totally normal. here are the prometheus-server logs
|
This comment has been minimized.
This comment has been minimized.
|
@dcvtruong @nickychow your issues don't seem to be related to the original one. Data on disk seems to be corrupted somehow and you'll have to delete the data directory. Also make sure that you're running the latest stable version of Prometheus as recent versions include many stability improvements. |
This comment has been minimized.
This comment has been minimized.
vtomasr5
commented
Mar 21, 2019
•
|
Hi, We have the same problem. We increased the memory but it doesn't solve the problem. We use consul for autodiscover the services that has the metrics. In our case, we've discovered that consul queries that are used for checking the services to scrap last too long and reaches the timeout limit.
As you can see, the Running some Is there any configuration that we can tune or change in order to improve the service checking using consul? EDIT: We use prometheus 2.7.1 and consul 1.4.3 Thanks. |


inyee786 commentedDec 19, 2018
Proposal
Use case. Why is this important?
using Prometheus with openebs volume and for 1 to 3 hour it work fine but after some time,
Prometheus is starting again and again and conf file not able to load
“Nice to have” is not a good use case. :)
Bug Report
What did you do?
What did you expect to see?
it should not restart again
What did you see instead? Under which circumstances?
Environment
GKE 8 node cluster
System information:
insert output of
uname -srmhereLinux 4.15.0-1017-gcp x86_64Prometheus version:
insert output of
prometheus --versionhereprom/prometheus:v2.6.0Prometheus configuration file: