Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize label~=".*" #807

Closed
mwitkow opened this Issue Jun 15, 2015 · 8 comments

Comments

Projects
None yet
5 participants
@mwitkow
Copy link
Contributor

mwitkow commented Jun 15, 2015

We've built our consoles to always have an instance label template, which defaults to .*. It allows us to take the same dashboard and drill-down to a specific host if needed while maintaining dashboard generalism.

However, apparently specifying label~=",*" makes Prometheus skip the indexes and do a full scan. Is this true? If so, can we optimize the case of a "match all" to be taken out of the selector (doing that in Promdash is hard, so maybe in Prometheus query eval)?

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Jun 15, 2015

@mwitkow-io Yes, using regexes does a full scan+match for all label values for the given label name, and then does a subsequent (indexed) lookup for each label/value pair that matched the regex.

For interpreting label=~".*" to match any time series, we need to figure out the semantic implications. Currently label=~".*" doesn't select all series, it only selects series that have the label label at all (no matter the value). If we keep that behavior, we would still not be able to skip the indexes, because we would still have to look up all label values for label (and then do the subsequent index access). Changing that behavior would be a backwards-incompatible change which needs some consideration.

Related issue about treating empty labels the same as unset labels: #494. Especially if label=".*" is the only matcher in a vector selector (like {label=".*"}), this would actually mean using no indexes at all and selecting all time series in the storage.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Jun 15, 2015

CC @fabxc @brian-brazil for intellectual stimulation :)

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jun 15, 2015

{label=~".*"} should match all timeseries.

Currently the way we use analyse selectors isn't very efficient. If we change it to be a bit smarter and do the simple lookups first (particularly on __name__) and then work from there, I think the performance issues will abate.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Jun 15, 2015

Yeah, when taking into account that empty labels should be treated the same as unset labels as requested in #494, that would make semantic sense and is worth a breaking change.

It's still not easy to implement in a performant way though. We don't have a way for looking up all time series for a given empty label in the index, and even if we do the simple lookups (like __name__) first, those lookups only give us sets of series fingerprints without label information attached. So we'd already need to fetch all label sets for those fingerprints during the query analysis time and then analyse their actual label values to see if they should be in the final set or not. Maybe possible, but needs some more thinking.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jun 15, 2015

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Jun 30, 2015

@fabxc #810 is merged and tries to do a preselection by equality matchers before applying any non-equals-matchers, as well as enforcing that there must be at least one matcher which doesn't match all time series. So this issue should be solved (at least it's going to be as good as it gets), right?

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jul 15, 2015

Yes, this is as good as it gets... closing.

@beorn7 beorn7 closed this Jul 15, 2015

simonpasquier pushed a commit to simonpasquier/prometheus that referenced this issue Oct 12, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.