Optimized label regex matcher with literal prefix and/or suffix #7453

pracucci · 2020-06-24T21:37:44Z

We're running the Cortex blocks storage (based on TSDB) for a large customer (30M active series) and we see many slow queries caused by regex label matchers on the head (most queries have time range < 1h).

In this specific context, we see very high cardinality label values but only few of them actually match the regex. Profiling these requests we've noticed that most of the time executing the query is spent in m.Matches() called by postingsForMatcher().

We've also noticed these queries frequently use regex patterns in the format literal.* or .*literal, so I've experimented a bit if there's any way to optimise for such use case. In this PR I'm proposing an enhancement to check against prefix/suffix literals (in the regex) before running the regex engine.

I've also added a couple of cases to the benchmark and below you can see the results. In particular, this is an extract of performances for cases using at least 1 regex matcher:

benchmark                                                               old ns/op     new ns/op     delta
BenchmarkPostingsForMatchers/Head/i=~".*"-4                             141412994     139611723     -1.27%
BenchmarkPostingsForMatchers/Head/i=~"1.*"-4                            72386565      63834564      -11.81%
BenchmarkPostingsForMatchers/Head/i=~".*1"-4                            162121598     51062469      -68.50%
BenchmarkPostingsForMatchers/Head/i=~".+"-4                             166952730     168795251     +1.10%
BenchmarkPostingsForMatchers/Head/i=~""-4                               109869748     88150872      -19.77%
BenchmarkPostingsForMatchers/Head/n="1",i=~".*",j="foo"-4               160325969     138704328     -13.49%
BenchmarkPostingsForMatchers/Head/n="1",i=~".*",i!="2",j="foo"-4        141858504     143360620     +1.06%
BenchmarkPostingsForMatchers/Head/n="1",i=~".+",j="foo"-4               167463874     167819538     +0.21%
BenchmarkPostingsForMatchers/Head/n="1",i=~"1.+",j="foo"-4              72138068      62558565      -13.28%
BenchmarkPostingsForMatchers/Head/n="1",i=~".+",i!="2",j="foo"-4        170007950     164805613     -3.06%
BenchmarkPostingsForMatchers/Head/n="1",i=~".+",i!~"2.*",j="foo"-4      237978257     228254717     -4.09%
BenchmarkPostingsForMatchers/Block/i=~".*"-4                            89782503      89729627      -0.06%
BenchmarkPostingsForMatchers/Block/i=~"1.*"-4                           19885221      18396364      -7.49%
BenchmarkPostingsForMatchers/Block/i=~".*1"-4                           112496484     3701780       -96.71%
BenchmarkPostingsForMatchers/Block/i=~".+"-4                            106395256     110066721     +3.45%
BenchmarkPostingsForMatchers/Block/i=~""-4                              37995281      36883687      -2.93%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4              87652393      86843844      -0.92%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4       86963750      88551160      +1.83%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4              105548757     105455238     -0.09%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4             19595147      15067670      -23.11%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4       105063676     105516827     +0.43%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4     123610587     121141472     -2.00%

Full benchmark output

benchmark                                                               old ns/op     new ns/op     delta
BenchmarkPostingsForMatchers/Head/n="1"-4                               800           786           -1.75%
BenchmarkPostingsForMatchers/Head/n="1",j="foo"-4                       1099          1088          -1.00%
BenchmarkPostingsForMatchers/Head/j="foo",n="1"-4                       1080          1062          -1.67%
BenchmarkPostingsForMatchers/Head/n="1",j!="foo"-4                      1286          1282          -0.31%
BenchmarkPostingsForMatchers/Head/i=~".*"-4                             141412994     139611723     -1.27%
BenchmarkPostingsForMatchers/Head/i=~"1.*"-4                            72386565      63834564      -11.81%
BenchmarkPostingsForMatchers/Head/i=~".*1"-4                            162121598     51062469      -68.50%
BenchmarkPostingsForMatchers/Head/i=~".+"-4                             166952730     168795251     +1.10%
BenchmarkPostingsForMatchers/Head/i=~""-4                               109869748     88150872      -19.77%
BenchmarkPostingsForMatchers/Head/i!=""-4                               76367176      75790726      -0.75%
BenchmarkPostingsForMatchers/Head/n="1",i=~".*",j="foo"-4               160325969     138704328     -13.49%
BenchmarkPostingsForMatchers/Head/n="1",i=~".*",i!="2",j="foo"-4        141858504     143360620     +1.06%
BenchmarkPostingsForMatchers/Head/n="1",i!=""-4                         75490562      74511647      -1.30%
BenchmarkPostingsForMatchers/Head/n="1",i!="",j="foo"-4                 76842927      75789077      -1.37%
BenchmarkPostingsForMatchers/Head/n="1",i=~".+",j="foo"-4               167463874     167819538     +0.21%
BenchmarkPostingsForMatchers/Head/n="1",i=~"1.+",j="foo"-4              72138068      62558565      -13.28%
BenchmarkPostingsForMatchers/Head/n="1",i=~".+",i!="2",j="foo"-4        170007950     164805613     -3.06%
BenchmarkPostingsForMatchers/Head/n="1",i=~".+",i!~"2.*",j="foo"-4      237978257     228254717     -4.09%
BenchmarkPostingsForMatchers/Block/n="1"-4                              34132         38276         +12.14%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4                      624798        634759        +1.59%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4                      614932        616675        +0.28%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4                     615871        614401        -0.24%
BenchmarkPostingsForMatchers/Block/i=~".*"-4                            89782503      89729627      -0.06%
BenchmarkPostingsForMatchers/Block/i=~"1.*"-4                           19885221      18396364      -7.49%
BenchmarkPostingsForMatchers/Block/i=~".*1"-4                           112496484     3701780       -96.71%
BenchmarkPostingsForMatchers/Block/i=~".+"-4                            106395256     110066721     +3.45%
BenchmarkPostingsForMatchers/Block/i=~""-4                              37995281      36883687      -2.93%
BenchmarkPostingsForMatchers/Block/i!=""-4                              22313304      25742571      +15.37%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4              87652393      86843844      -0.92%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4       86963750      88551160      +1.83%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4                        22253683      23795152      +6.93%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4                22222271      22648445      +1.92%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4              105548757     105455238     -0.09%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4             19595147      15067670      -23.11%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4       105063676     105516827     +0.43%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4     123610587     121141472     -2.00%

benchmark                                                               old allocs     new allocs     delta
BenchmarkPostingsForMatchers/Head/n="1"-4                               6              6              +0.00%
BenchmarkPostingsForMatchers/Head/n="1",j="foo"-4                       11             11             +0.00%
BenchmarkPostingsForMatchers/Head/j="foo",n="1"-4                       11             11             +0.00%
BenchmarkPostingsForMatchers/Head/n="1",j!="foo"-4                      13             13             +0.00%
BenchmarkPostingsForMatchers/Head/i=~".*"-4                             10             10             +0.00%
BenchmarkPostingsForMatchers/Head/i=~"1.*"-4                            11140          11140          +0.00%
BenchmarkPostingsForMatchers/Head/i=~".*1"-4                            7              7              +0.00%
BenchmarkPostingsForMatchers/Head/i=~".+"-4                             100039         100039         +0.00%
BenchmarkPostingsForMatchers/Head/i=~""-4                               100043         100043         +0.00%
BenchmarkPostingsForMatchers/Head/i!=""-4                               100039         100039         +0.00%
BenchmarkPostingsForMatchers/Head/n="1",i=~".*",j="foo"-4               15             15             +0.00%
BenchmarkPostingsForMatchers/Head/n="1",i=~".*",i!="2",j="foo"-4        21             21             +0.00%
BenchmarkPostingsForMatchers/Head/n="1",i!=""-4                         100044         100044         +0.00%
BenchmarkPostingsForMatchers/Head/n="1",i!="",j="foo"-4                 100048         100048         +0.00%
BenchmarkPostingsForMatchers/Head/n="1",i=~".+",j="foo"-4               100048         100048         +0.00%
BenchmarkPostingsForMatchers/Head/n="1",i=~"1.+",j="foo"-4              11149          11149          +0.00%
BenchmarkPostingsForMatchers/Head/n="1",i=~".+",i!="2",j="foo"-4        100054         100054         +0.00%
BenchmarkPostingsForMatchers/Head/n="1",i=~".+",i!~"2.*",j="foo"-4      111236         111250         +0.01%
BenchmarkPostingsForMatchers/Block/n="1"-4                              6              6              +0.00%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4                      11             11             +0.00%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4                      11             11             +0.00%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4                     13             13             +0.00%
BenchmarkPostingsForMatchers/Block/i=~".*"-4                            10             10             +0.00%
BenchmarkPostingsForMatchers/Block/i=~"1.*"-4                           11140          11140          +0.00%
BenchmarkPostingsForMatchers/Block/i=~".*1"-4                           7              7              +0.00%
BenchmarkPostingsForMatchers/Block/i=~".+"-4                            100039         100039         +0.00%
BenchmarkPostingsForMatchers/Block/i=~""-4                              100043         100043         +0.00%
BenchmarkPostingsForMatchers/Block/i!=""-4                              100039         100039         +0.00%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4              15             15             +0.00%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4       21             21             +0.00%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4                        100044         100044         +0.00%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4                100048         100048         +0.00%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4              100048         100048         +0.00%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4             11149          11149          +0.00%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4       100054         100054         +0.00%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4     111236         111250         +0.01%

benchmark                                                               old bytes     new bytes     delta
BenchmarkPostingsForMatchers/Head/n="1"-4                               296           296           +0.00%
BenchmarkPostingsForMatchers/Head/n="1",j="foo"-4                       424           424           +0.00%
BenchmarkPostingsForMatchers/Head/j="foo",n="1"-4                       424           424           +0.00%
BenchmarkPostingsForMatchers/Head/n="1",j!="foo"-4                      488           488           +0.00%
BenchmarkPostingsForMatchers/Head/i=~".*"-4                             1606069       1606057       -0.00%
BenchmarkPostingsForMatchers/Head/i=~"1.*"-4                            3148023       3147994       -0.00%
BenchmarkPostingsForMatchers/Head/i=~".*1"-4                            1611310       1605944       -0.33%
BenchmarkPostingsForMatchers/Head/i=~".+"-4                             17264702      17264702      +0.00%
BenchmarkPostingsForMatchers/Head/i=~""-4                               17264740      17264740      +0.00%
BenchmarkPostingsForMatchers/Head/i!=""-4                               17264616      17264616      +0.00%
BenchmarkPostingsForMatchers/Head/n="1",i=~".*",j="foo"-4               1606209       1606197       -0.00%
BenchmarkPostingsForMatchers/Head/n="1",i=~".*",i!="2",j="foo"-4        1606377       1606423       +0.00%
BenchmarkPostingsForMatchers/Head/n="1",i!=""-4                         17264744      17264744      +0.00%
BenchmarkPostingsForMatchers/Head/n="1",i!="",j="foo"-4                 17264872      17264872      +0.00%
BenchmarkPostingsForMatchers/Head/n="1",i=~".+",j="foo"-4               17264958      17264945      -0.00%
BenchmarkPostingsForMatchers/Head/n="1",i=~"1.+",j="foo"-4              3148257       3148253       -0.00%
BenchmarkPostingsForMatchers/Head/n="1",i=~".+",i!="2",j="foo"-4        17265121      17265107      -0.00%
BenchmarkPostingsForMatchers/Head/n="1",i=~".+",i!~"2.*",j="foo"-4      20416937      20417844      +0.00%
BenchmarkPostingsForMatchers/Block/n="1"-4                              296           296           +0.00%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4                      424           424           +0.00%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4                      424           424           +0.00%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4                     1464          1464          +0.00%
BenchmarkPostingsForMatchers/Block/i=~".*"-4                            1606020       1606027       +0.00%
BenchmarkPostingsForMatchers/Block/i=~"1.*"-4                           3147959       3147954       -0.00%
BenchmarkPostingsForMatchers/Block/i=~".*1"-4                           1609684       1605928       -0.23%
BenchmarkPostingsForMatchers/Block/i=~".+"-4                            17264636      17264645      +0.00%
BenchmarkPostingsForMatchers/Block/i=~""-4                              17264698      17264697      -0.00%
BenchmarkPostingsForMatchers/Block/i!=""-4                              17264600      17264600      +0.00%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4              1606159       1606152       -0.00%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4       1606344       1606344       +0.00%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4                        17264728      17264728      +0.00%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4                17264856      17264856      +0.00%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4              17264901      17264903      +0.00%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4             3148212       3148208       -0.00%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4       17265068      17265077      +0.00%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4     20416828      20417757      +0.00%

pkg/labels/labels_test.go

codesome

LGTM! And this might also help club other PromQL specific regex optimizations in the same place in the future. We can also move the label=~"val1|val2" -> label="val1" or label="val2" optimisation inside this at some point.

pstibrany

LGTM.

We can also move the label=~"val1|val2" -> label="val1" or label="val2" optimisation inside this at some point.

Optimization in TSDB layer helps to fetch less postings, and cannot be "moved".

brian-brazil

Might it make sense to push this upstream, rather than us slowly writing our own regex engine?

pkg/labels/regexp.go

codesome · 2020-06-25T08:22:25Z

Might it make sense to push this upstream, rather than us slowly writing our own regex engine?

I feel that would slow down the development, and we would only optimize those regexes that are of interest to PromQL usage, so it might make more sense here. Anything which looks very general we could put it upstream and remove it from here.

brian-brazil · 2020-06-25T08:27:30Z

Prefixes and suffixes sound pretty general to me. Also our goal should be to have the best code overall over time, not to do whatever is the most expedient at any given moment.

pracucci · 2020-06-25T08:33:02Z

Prefixes and suffixes sound pretty general to me. Also our goal should be to have the best code overall over time, not to do whatever is the most expedient at any given moment.

This optimisation fits quite well in Prometheus because all our label regex are anchored to the begin/end of text. That being said, I don't think it's necessarily bad have a short experimenting path doing improvements here and then try to upstream whatever we see could make sense for the general use case. However, doing such optimizations in Prometheus would allow us to have a faster feedback loop for the Prometheus use case. Realistically, how many weeks/months we would have to wait before an improvement we upstream in Go will be released and Prometheus will build with that?

brian-brazil · 2020-06-25T09:06:57Z

I'm just a bit suspicious about prefixes, as I'm not sure we should be getting such a big speedup compared to a regex's FSM. For suffixes I'd not expect a regex to do better for.

bwplotka

Amazing. LGTM

However I have to agree with @brian-brazil - regexp library should do that offhand. I would suggest adding an issue on Golang regexp lib to start this discussion and merge this. The potential contribution will be available to use only after few months if any (: Some TODO with link to such issue would be nice 🤗

pracucci · 2020-06-25T09:12:02Z

I'm just a bit suspicious about prefixes, as I'm not sure we should be getting such a big speedup compared to a regex's FSM. For suffixes I'd not expect a regex to do better for.

That's a good point. I was as well, especially because the regex engine has an optimization for prefixes, but the micro benchmarking showed some performance improvement doing the prefix matching ourselves while iterating over a large set of label values for which only a small % matches.

dgl · 2020-06-25T20:13:22Z

That's a good point. I was as well, especially because the regex engine has an optimization for prefixes

I think the optimisation only applies for unanchored patterns, you can see if you use LiteralPrefix as a proxy for the
prefix optimisation that it doesn't work for the patterns Prometheus will generate (https://play.golang.org/p/NPfMJTqwJaj). Definitely worth asking the Go people.

Signed-off-by: Marco Pracucci <marco@pracucci.com>

codesome · 2020-06-26T07:48:15Z

I think this PR is good to go. Discussions in Go can follow this. I will merge this in ~2h if no concerns.

bwplotka · 2020-06-26T09:42:05Z

👍

pracucci commented Jun 24, 2020

View reviewed changes

pkg/labels/labels_test.go Outdated Show resolved Hide resolved

codesome approved these changes Jun 25, 2020

View reviewed changes

codesome mentioned this pull request Jun 25, 2020

Make labels.Matcher an interface #7449

Closed

pstibrany approved these changes Jun 25, 2020

View reviewed changes

brian-brazil reviewed Jun 25, 2020

View reviewed changes

pkg/labels/regexp.go Show resolved Hide resolved

bwplotka approved these changes Jun 25, 2020

View reviewed changes

pracucci added 4 commits June 26, 2020 09:35

Optimized label regex matcher with literal prefix and/or suffix

0e7ca62

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Added license

f0fb32d

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Added more tests cases with newlines

0f3404e

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Restored deleted test

359143d

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci force-pushed the add-filter-to-label-matchers branch from edfb402 to 359143d Compare June 26, 2020 07:37

pracucci mentioned this pull request Jun 26, 2020

regexp: fast path for prefix/suffix literals matching on anchored regexps golang/go#39869

Open

codesome merged commit cef4dd6 into prometheus:master Jun 26, 2020

codesome mentioned this pull request Jun 26, 2020

Upgrade Prometheus to current master cortexproject/cortex#2798

Merged

pracucci mentioned this pull request Jul 2, 2020

Optimise labels regex matchers containing a literal within the pattern #7503

Merged

codesome mentioned this pull request Jul 17, 2020

Cut 2.20.0-rc.0 #7583

Merged

chicknsoup mentioned this pull request Oct 5, 2020

PromQL regexp takes forever #8010

Closed

pracucci deleted the add-filter-to-label-matchers branch October 5, 2021 13:00

roidelapluie mentioned this pull request May 2, 2022

scrape_config can use "prefix set" inclusion/exclusion instead of regex #10661

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized label regex matcher with literal prefix and/or suffix #7453

Optimized label regex matcher with literal prefix and/or suffix #7453

pracucci commented Jun 24, 2020

codesome left a comment

pstibrany left a comment

brian-brazil left a comment

codesome commented Jun 25, 2020

brian-brazil commented Jun 25, 2020

pracucci commented Jun 25, 2020 •

edited

Loading

brian-brazil commented Jun 25, 2020

bwplotka left a comment

pracucci commented Jun 25, 2020

dgl commented Jun 25, 2020

codesome commented Jun 26, 2020

bwplotka commented Jun 26, 2020

Optimized label regex matcher with literal prefix and/or suffix #7453

Optimized label regex matcher with literal prefix and/or suffix #7453

Conversation

pracucci commented Jun 24, 2020

Full benchmark output

codesome left a comment

Choose a reason for hiding this comment

pstibrany left a comment

Choose a reason for hiding this comment

brian-brazil left a comment

Choose a reason for hiding this comment

codesome commented Jun 25, 2020

brian-brazil commented Jun 25, 2020

pracucci commented Jun 25, 2020 • edited Loading

brian-brazil commented Jun 25, 2020

bwplotka left a comment

Choose a reason for hiding this comment

pracucci commented Jun 25, 2020

dgl commented Jun 25, 2020

codesome commented Jun 26, 2020

bwplotka commented Jun 26, 2020

pracucci commented Jun 25, 2020 •

edited

Loading