Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheus_sd_configs_failed_total is not actionable, gauge would be better #4890

Open
juliusv opened this Issue Nov 21, 2018 · 5 comments

Comments

Projects
None yet
4 participants
@juliusv
Copy link
Member

juliusv commented Nov 21, 2018

The metric prometheus_sd_configs_failed_total (

failedConfigs = prometheus.NewCounter(
prometheus.CounterOpts{
Name: "prometheus_sd_configs_failed_total",
Help: "Total number of service discovery configurations that failed to load.",
},
)
) currently tracks the total number of SD configurations that were not able to be applied during configuration (re)load. However, the actual rate of that counter depends highly on the configuration reload rate, so a bad SD config might stick around forever without actually causing any counter increments. This means that this metric is not very useful for alerts. We should introduce a gauge metric that tells us how many SD configurations are currently invalid / were not able to be instantiated.

@infoverload

This comment has been minimized.

Copy link
Contributor

infoverload commented Nov 21, 2018

I would like to work on this

@geekodour

This comment has been minimized.

Copy link
Contributor

geekodour commented Dec 6, 2018

@infoverload , are you working on this?

@infoverload

This comment has been minimized.

Copy link
Contributor

infoverload commented Dec 6, 2018

@geekodour hey, yes I am!

@palash25

This comment has been minimized.

Copy link
Contributor

palash25 commented Feb 17, 2019

@juliusv I don't see any referenced PR by the original claimer on this issue, would it be okay if I take a crack at this?

@juliusv

This comment has been minimized.

Copy link
Member Author

juliusv commented Feb 17, 2019

@palash25 Yes, please go ahead, I haven't heard anything in a while either from @infoverload about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.