Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upMarathon service discovery successful for 500 response with valid JSON in response body #4090
Comments
This comment has been minimized.
This comment has been minimized.
|
Raised a pull request for fixing this issue: #4091 @beorn7 @fabxc @xperimental: Please take a look and let me know if you want any changes before merging it to master. Thanks for your time and contributions :) |
rohit01
added a commit
to rohit01/prometheus
that referenced
this issue
Apr 17, 2018
brian-brazil
closed this
in
30c3e02
Apr 17, 2018
gouthamve
added a commit
to gouthamve/prometheus
that referenced
this issue
Aug 1, 2018
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 22, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
lock
bot
locked and limited conversation to collaborators
Mar 22, 2019
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
rohit01 commentedApr 16, 2018
Problem statement
We run multiple Prometheus clusters at large scale with Marathon service discovery. At times, marathon responds with 500 error but a valid JSON response is returned. Marathon service discovery does not check for HTTP status code and continues to process. This results in a job without targets and services are not monitored until the next API call.
Use case. Why is this important?
We miss metrics intermittently.
Solution
Treat non-2xx HTTP response as error. It will preserve the existing targets.
Bug Report
What did you do?
It's happening on our production systems intermittently. Was able to reproduce the same with simulated marathon responses in staging.
What did you expect to see?
No target loss. No missing metrics.
What did you see instead? Under which circumstances?
Sudden metric drop for all containers at once triggering alerts.
Prometheus version:
Prometheus 2.2.1
Alertmanager version:
version 0.11.0
Prometheus configuration file:
A standard config with
marathon_sd_configLogs:
All ok in logs.