New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a scrape_job label to scrape_samples_* metrics #9285
Comments
I do not see a way to do this without a breaking change I expect users to already have a scrape_job config. It would also not benefit most of the users who have a single limit or always job=scrape_job. However, what would you think about something similar to #9247 but for sample_limit, as an alternative (behind the same feature flag)? |
Yes, it could be a breaking change for some users, but I'm not sure what's the promise/policy of label stability for internal metrics, maybe a note in the changelog is enough? I'm happy with a new metric and a feature flag, I don't really mind how this is surfaced and it would solve my problem. |
In prometheus 2.x users can use any label. We reserve the prefix
it would be the same feature flag, that's why it is called |
Do you want to give a go to adding a scape_sample_limit: metric? #9247 is in. |
Yes, I'll try to work on this today, if not next week. Thanks |
From a quick reading of scrape code I see that if I wanted to have a metric with Line 1615 in 31f4108
which calls scrapeLoop->addReportSample(): Line 1676 in 31f4108
There are two problems here: This really means that adding any metric that is exposed per |
it is not needed to add scrape_job. Your issue is to have the sample limit per target. You can add a sample limit metric with the targets' labels. |
Well, maybe it's indeed enough, let's start with that. One of the problems is that scrape config allows so much flexibility thanks to relabelling that it's sometimes difficult to figure out which scrape config was involved. But I think that you're right: if I get a full set of labels it will be enough. Previously I exposed scrape limits by a custom exporter that queries prometheus for runtime config and exports per job metrics describing the limit. When I discovered that job label itself isn't always enough I did try to export limit metrics with all the labels (as the PR I'll raise would do), but realised that I would basically have to reimplement service discovery to get this right - I can parse relabel config and figure out labels forced this way, but I cannot tell which labels each target will have, hence this ticket. |
Raised #9295 |
Proposal
Use case. Why is this important?
Context: Cardinality is a big issue for large deployments and it often requires setting scrape limits to avoid accidental exporting of a huge number of samples per target via
sample_limit
parameter. This works well but it's pretty binary - target exposes less than limit and everything works OR target is exposing more than the limit allows and all samples are rejected, so it's a bit of a last resort tool.One of the problems with using
sample_limit
is that it is not always easy to monitor scrape jobs to alert if they are getting too close to the limit. This is due to flexibility Prometheus allows for in scrape configuration -job_name
in scrape config block doesn't have to be the same asjob
label onscrape_samples_post_metric_relabeling
metric describing per target result of a scrape config.Some other metrics that are scrape related (like
prometheus_target_scrape_pool_targets
) havescrape_job
label that is identical to thejob_name
key in scrape configuration block which allows to map it 1:1 to the configuration itself.Request: would it be possible to add
scrape_job
label to at leastscrape_samples_scraped
&scrape_samples_post_metric_relabeling
so it's possible to link series to a scrape configuration block, which would allow help with auditing TSDB when Prometheus is ingesting too many samples.The text was updated successfully, but these errors were encountered: