Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PROPOSAL: Allow scrape_timeout and scrape_interval to be configured via relabeling (i.e. based on kubernetes_sd annotations) #4561

Open
frittentheke opened this Issue Aug 29, 2018 · 4 comments

Comments

Projects
None yet
4 participants
@frittentheke
Copy link

frittentheke commented Aug 29, 2018

Proposal

Prometheus allows to configure a global scrape_timeout value but also allows to set an override for each job - https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config
This allows to only selectively set higher timeouts, i.e. if there is a rather slow target somewhere. Also running with different scrape_intervals makes sense and is easily configurable.
While it's possible to apply tailored configuration of timeouts just for targets that need special treatment this changes when using service discovery, i.e. the kubernetes_sd to configure the jobs. There the scraping targets and their specifics are not all read from the Prometheus config file, but learned dynamically. Behind each job there could be thousands of targets to scrape rendering any per target "tailoring" of the timeout useless or rather too broad as it will affect all targets not a single target which is usually behind a single statically configured scraping job. But with xyz_sd the jobs are now massive multiplexers.

Labeling allows to override a series of values such as the path and the scheme and I also know there has been a discussion about exposing more scrape_config variables to the labeling #1176 before.

In that very discussion @brian-brazil labeled ability to use dynamic configuration of the scrape_timeout and scrape_interval a feature request (#1176 (comment))

this very feature I now like to propose.

I know it's quite possible to create two or more sets of each job with boolean labels to turn on either one or the other for each target based on some annotation, but this really causes the configuration to become messy, especially if you think about different scrape_interval and well as independently have different scrape_timeouts. Additionally with service discovery you want the source (in this case the application developers / users of the Kubernetes cluster) to define if and how exactly their application should be scraped. Certainly when it comes to sensible information such as authentication this poses a limit to what Prometheus can handle and should take in via this mechanism.

But as for this very proposal I suggest to allow per target configuration of settings which are already available to configure "one layer up", on the job level.

(Maybe adding lower and upper bounds makes sense, but this could also simply be done with the already existing regex matching on the annotations value by simply ignoring too high or too low values.)

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 29, 2018

Also running with different scrape_intervals makes sense and is easily configurable.

Can you clarify what exactly you're asking for here? Multiple scrape intervals per Prometheus is discouraged for sanity, and is thus not something we're going to make easier.

rather to broad.

What's the problem with the timeout being too high?

Additionally with service discovery you want the source (in this case the application developers / users of the Kubernetes cluster) to define if and how exactly their application should be scraped.

As was discussed in the issue you linked, there are limits to what we're going to accept in Prometheus - at some point needs are sufficiently complex that you need a proper configuration management approach. Particularly if you're doing something innately complicated like allowing users to select their own scrape intervals.

Maybe adding lower and upper bounds makes sense, but this could also simply be done with the already existing regex matching on the annotations value by simply ignoring too high or too low values.

This is getting fairly complicated, you seem to be asking relabelling to gain the ability to do arithmetic.

I that very discussion @brian-brazil labeled ability to use dynamic configuration of the scrape_timeout and scrape_interval a feature request (#1176 (comment))

That's bug triage, I wouldn't read much into it. Any issue requesting new functionality is a feature request, it doesn't mean that it's something we're actually going to implement it.

Overall it sounds like your use case is quite complicated, to the point where it's not going to be sane to add all the features you require to Prometheus itself. I'd suggest taking a configuration management approach, and also consider if you can make your setup simpler.

@frittentheke frittentheke changed the title Allow scrape_timeout to be configured via relabeling (i.e. based on kubernetes_sd annotations) PROPOSAL: Allow scrape_timeout and scrape_interval to be configured via relabeling (i.e. based on kubernetes_sd annotations) Aug 29, 2018

@frittentheke

This comment has been minimized.

Copy link
Author

frittentheke commented Sep 7, 2018

Also running with different scrape_intervals makes sense and is easily configurable.

Can you clarify what exactly you're asking for here? Multiple scrape intervals per Prometheus is discouraged for sanity, and is thus not something we're going to make easier

I initially just ran into the issue of having different timeouts for very slow responding exporters.
Raising the scrape interval for just those was just a second idea. But I guess you are right this makes things A LOT more complex (having different sampling / scraping rates)

rather to broad.

What's the problem with the timeout being too high?

My thought here was, that a timeout should somehow help the overall system stability and performance, but it would also allow to recognize non-responding exporters quicker by not using a huge but potentially necessary high timeout that's required for one special exporter for all the others as well.

Maybe adding lower and upper bounds makes sense, but this could also simply be done with the already existing regex matching on the annotations value by simply ignoring too high or too low values.

This is getting fairly complicated, you seem to be asking relabelling to gain the ability to do arithmetic.

No, as I stated regarding the regex filtering, limiting what values of a label should be allowed could be done using the regex feature. My thought was less about implementation, but rather the use case of not allowing every (technically) possible value for timeouts (or, if even, wanted "scrape_interval").

Overall it sounds like your use case is quite complicated, to the point where it's not going to be sane to add all the features you require to Prometheus itself. I'd suggest taking a configuration management approach, and also consider if you can make your setup simpler.

Maybe giving things a second thought, having two scraping configs in Prometheus, each having their own timeout value or maybe also scrape_interval and then simply allow "selecting" either using labels is the way to go here.

@souvikdas95

This comment has been minimized.

Copy link

souvikdas95 commented Feb 12, 2019

Maybe giving things a second thought, having two scraping configs in Prometheus, each having their own timeout value or maybe also scrape_interval and then simply allow "selecting" either using labels is the way to go here.

I'd like to know what is the default scrape_interval for targets discovered via SD? Is that the global scrape_interval or the one mentioned inside the SD job? eg. If kubernetes_sd_config is used for scraping over pods with a scrape_interval of 30s and a global of 15s, what will the discovered target pods inherit as their scrape_interval?

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Feb 13, 2019

@souvikdas95 Prometheus uses scrape_interval from the scrape configuration, if this is undefined then it uses the scrape_interval value from global.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.