Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upPrometheus poller not handling remote storage failures elegantly #4159
Comments
sevagh
changed the title
Prometheus poller not handling remote_write/remote_read adapter outages
Prometheus poller not handling remote storage failures elegantly
May 11, 2018
This comment has been minimized.
This comment has been minimized.
|
Currently investigating if it's related to #3941 ( |
This comment has been minimized.
This comment has been minimized.
|
This may also be relevant: #2972 |
This comment has been minimized.
This comment has been minimized.
|
I tested 2.2.1 with the same behavior - if the adapter goes bad, Prometheus goes downhill also. The shards increase and the queued remote climb high. |
This comment has been minimized.
This comment has been minimized.
|
/cc @tomwilkie |
This comment has been minimized.
This comment has been minimized.
vishksaj
commented
May 29, 2018
•
|
+1 |
This comment has been minimized.
This comment has been minimized.
|
Will be fixed by #4187 |
tomwilkie
closed this
May 29, 2018
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 22, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
sevagh commentedMay 11, 2018
•
edited
Bug Report
What did you do?
I tried to insert Prometheus metrics into Elasticsearch using a remote storage adapter:
Unfortunately the adapter is buggy, and when it crashes, Prometheus is outputting these errors:
At this point, I try to disable the remote storage config and send a HUP to Prometheus to reload the config without the
remote_configs. However, it stays stuck in a bad state until SIGKILL.systemctl stop prometheus2,systemctl restart prometheus2, nothing helps at this point except a SIGKILL.What did you expect to see?
I expected Prometheus to disregard the broken remote storage adapter and continue being responsive to HUPs, TERMs, etc.
What did you see instead? Under which circumstances?
Environment
Linux 4.9.0-4-amd64 x86_64So obviously I need to ensure that my adapter is working correctly. But every time I test it, I have to SIGKILL Prometheus - is that expected, desireable, normal?