Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize queries using regex matchers for set lookups #2651

Open
grobie opened this Issue Apr 22, 2017 · 16 comments

Comments

Projects
None yet
8 participants
@grobie
Copy link
Member

grobie commented Apr 22, 2017

Use case

A common use case for regex matchers is to use them to query all series matching a set of label values, e.g. up{instance=~"foo|bar|baz"}. Grafana's template variables feature is a big user of that pattern.

Problem

Both the old and new storage implement fast-path series lookups for simple equality matchers. While the described set regex matchers could benefit from the same principles, they are treated like any other regex lookup, requiring the lookup of all label values for a given label. In large setups, this will result in a serious query latency penalty.

Proposal

Optimize this special regex usage, as it should be possible to achieve similar lookup times when equality matchers are used. A few options come to my mind:

  1. Detect such regex usage in the storage/tsdb itself and add a fast-path. This will add additional complexity to the storage and other storages would suffer from the same issue.
  2. Implement a query optimizer, which could rewrite such queries. E.g. for single element set lookups (not uncommon usage in with Grafana templates): up{instance=~"foo"} becomes simply up{instance="foobar"}, or a more complicated example rate(requests{instance=~"foo|bar"}[1m]) becomes rate(requests{instance="foo"}[1m]) OR rate(requests{instance="bar"}[1m]).
  3. Add an inSet matcher, similar to what MySQL offers. I can't immediately come up with a simple elegant syntax.

@fabxc

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Apr 24, 2017

I think that should be part of the storage itself. Depending no how you look at it, PromQL queries are translated to database queries + additional processing. So it makes sense for the DB to optimise its own queries.

See my TODO here, which exactly aims at your suggestion: introspecting regexp matchers and inferring more optimised lookups from there.
Do you have any performance data on whether this is viable to do rather sooner than later?

While I think having it as a first-class concept in PromQL is valuable, it may be a bit lade by now and just complicate things.

@grobie

This comment has been minimized.

Copy link
Member Author

grobie commented Apr 24, 2017

I saw your TODO, but wasn't quite sure what you meant with interface upgrading in this context. I can't provide any performance data at this point, as I don't have access to a production prometheus machine right now (especially not with a huge 2.0 database). Would be cool to have a dataset available for such purposes.

I was actually not thinking at adding this to PromQL itself, but a separate query optimizer package. At the moment this would have to implement the Querier interface and delegate most stuff down to the actual querier, while rewriting a query before.

If you're fine with adding it to tsdb for now, this looks like an easy start.

I foresee we'll have more needs for query optimization later, e.g. optimizing queries like rate(some_huge_metric{}[1m]) AND on(some_identifier) roles{role="small_subset"} by executing the RHS first and adding the additional some_identifier label matcher to the LHS. So at some point we might want to revisit the interfaces in use and how to make such things easier.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Apr 24, 2017

I was actually not thinking at adding this to PromQL itself, but a separate query optimizer package. At the moment this would have to implement the Querier interface and delegate most stuff down to the actual querier, while rewriting a query before.

That sounds like a better idea actually.
Initially it could live anywhere, but having an explicit set matcher for tsdb sounds good.

I foresee we'll have more needs for query optimization later, e.g. optimizing queries like rate(some_huge_metric{}[1m]) AND on(some_identifier) roles{role="small_subset"} by executing the RHS first and adding the additional some_identifier label matcher to the LHS. So at some point we might want to revisit the interfaces in use and how to make such things easier.

Yes, stuff like this will be interesting. In this case here one would also add the role matcher to the LHS probably.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jun 23, 2017

Another idea is detecting regexes that are actually just (potentially escaped) simple strings.

by executing the RHS first and adding the additional some_identifier label matcher to the LHS.

This doesn't work with remote read, as it's not required to return metrics that match the matchers. My thinking is that as that's not the usual case that we could get such endpoints marked by the user in the configuration.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jun 23, 2017

Another idea is detecting regexes that are actually just (potentially escaped) simple strings.

That IS the original idea of this issue.

This doesn't work with remote read, as it's not required to return metrics that match the matchers. My thinking is that as that's not the usual case that we could get such endpoints marked by the user in the configuration.

I don't get that. This refers to a query planning step. Remote read is just retrieval of the data for said query.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jun 26, 2017

I don't get that. This refers to a query planning step. Remote read is just retrieval of the data for said query.

That's true only if the other end is storage. There is no requirement that a remote read request for {foo="bar"} must return time series with that label.

@dilyevsky

This comment has been minimized.

Copy link

dilyevsky commented Oct 30, 2017

I'm working on a pr for prefix regex optimization. Should I open a separate issue? Also I noticed that user-provided regexes are wrapped like so "^(?:"+m.Value+")$" which seems to imply partial matches aren't working right now. Is this the expected behavior? Seem to contradict the documentation:

=~: Select labels that regex-match the provided string (or substring).
@grobie

This comment has been minimized.

Copy link
Member Author

grobie commented Oct 31, 2017

Yes, please open a new PR with your implementation. If you have any questions, feel free to ask them here or via prometheus-developers@.

All regular expressions in the Prometheus projects are anchored to prevent simple mistakes. Feel free to send a PR against prometheus/prometheus with changes to our docs/ folder (we're in the process of changing our docs to serve them from each repo directly).

@rajkumarrvarma

This comment has been minimized.

Copy link

rajkumarrvarma commented Nov 13, 2017

Hi Experts,
I'm new to Prometheus tool sets.I configured global alert rules for CPU & Memory with standrad thresolds (>60 & >80).The requirement is for 3-4 particular group of app servers (appxxx01-appxxx22) need different thresholds for CPU & Memory,so how it can be achieved,please help.Thanks.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Nov 13, 2017

@rajkumarrvarma Please do not ask support questions on unrelated issues.

@SivaKrishna05

This comment has been minimized.

Copy link

SivaKrishna05 commented Apr 17, 2018

@brian-brazil @fabxc sorry i wasn't able to frame the question correctly. i want to scrape the deployment/service/ingress meta labels i am unable to seeing in the each scrape how to scrape deployment lables is there any way of getting those with out having kube-state-metrics

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Apr 17, 2018

@SivaKrishna05 It is impolite to ask support questions on unrelated issues. Please use the users mailing list.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Nov 9, 2018

Since the PR referencing this issue in tsdb is merged I assume the work on the tsdb side is done? so I can remove the local_storage label?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Nov 9, 2018

The tsdb issue is unrelated (and in fact can be reverted, it's dead code).

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Nov 9, 2018

just to be clear you are talking about prometheus/tsdb#111 right?

Is there anything that needs to be done on the tsdb side?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Nov 9, 2018

Yes.

Personally I think this should be implemented entirely in the TSDB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.