Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache optimized regexp matchers #465

Merged
merged 6 commits into from
Mar 31, 2023
Merged

Cache optimized regexp matchers #465

merged 6 commits into from
Mar 31, 2023

Conversation

pracucci
Copy link
Collaborator

@pracucci pracucci commented Mar 30, 2023

NewFastRegexMatcher() is typically very fast but there are some edge cases where parsing the regex may be very expensive. In this PR I propose to add an LRU cache on top of NewFastRegexMatcher() to avoid reparsing regex matchers that are frequently used. The size of the LRU is hardcoded to 10k entries.

To measure the overhead of the cache, I've created a new benchmark BenchmarkNewFastRegexMatcher_CacheMisses which measures NewFastRegexMatcher() on cache misses. I personally think the overhead is acceptable, considering the huge impact this cache could have on recurring complex regex matchers.

BenchmarkNewFastRegexMatcher

I've modified this benchmark to measure NewFastRegexMatcher() both with and without cache. Obviously the comparison with main shows am huge CPU reduction, but in practice this benefit will exist only if the regex is frequently used:

name                                                                   old time/op    new time/op    delta
NewFastRegexMatcher/with_cache/foo-12                                    3.59µs ± 9%    0.03µs ± 1%   -99.15%  (p=0.002 n=3+3)
NewFastRegexMatcher/with_cache/^foo-12                                   4.03µs ± 3%    0.03µs ± 3%   -99.20%  (p=0.000 n=3+3)
NewFastRegexMatcher/with_cache/(foo|bar)-12                              6.17µs ± 1%    0.03µs ± 1%   -99.48%  (p=0.000 n=3+3)
NewFastRegexMatcher/with_cache/foo.*-12                                  5.02µs ± 5%    0.03µs ± 7%   -99.33%  (p=0.001 n=3+3)
NewFastRegexMatcher/with_cache/.*foo-12                                  4.54µs ± 1%    0.04µs ± 0%   -99.22%  (p=0.000 n=3+3)
NewFastRegexMatcher/with_cache/^.*foo$-12                                5.40µs ± 7%    0.03µs ± 1%   -99.38%  (p=0.001 n=3+3)
NewFastRegexMatcher/with_cache/^.+foo$-12                                5.34µs ± 1%    0.04µs ±12%   -99.26%  (p=0.000 n=3+3)
NewFastRegexMatcher/with_cache/.*-12                                     3.07µs ± 1%    0.03µs ± 1%   -98.87%  (p=0.000 n=3+3)
NewFastRegexMatcher/with_cache/.+-12                                     3.32µs ±12%    0.04µs ± 1%   -98.93%  (p=0.005 n=3+3)
NewFastRegexMatcher/with_cache/foo.+-12                                  5.05µs ± 1%    0.04µs ± 5%   -99.25%  (p=0.000 n=3+3)
NewFastRegexMatcher/with_cache/.+foo-12                                  4.65µs ± 1%    0.04µs ± 3%   -99.20%  (p=0.000 n=3+3)
NewFastRegexMatcher/with_cache/foo_.+-12                                 5.43µs ± 9%    0.04µs ± 4%   -99.29%  (p=0.002 n=3+3)
NewFastRegexMatcher/with_cache/foo_.*-12                                 5.17µs ± 0%    0.04µs ± 2%   -99.26%  (p=0.000 n=3+3)
NewFastRegexMatcher/with_cache/.*foo.*-12                                5.51µs ± 7%    0.04µs ± 1%   -99.35%  (p=0.001 n=3+3)
NewFastRegexMatcher/with_cache/.+foo.+-12                                5.39µs ± 1%    0.04µs ± 7%   -99.29%  (p=0.000 n=3+3)
NewFastRegexMatcher/with_cache/(?s:.*)-12                                3.20µs ± 0%    0.04µs ± 2%   -98.87%  (p=0.000 n=3+3)
NewFastRegexMatcher/with_cache/(?s:.+)-12                                3.40µs ± 9%    0.04µs ± 1%   -98.96%  (p=0.002 n=3+3)
NewFastRegexMatcher/with_cache/(?s:^.*foo$)-12                           5.42µs ± 1%    0.04µs ± 5%   -99.29%  (p=0.000 n=3+3)
NewFastRegexMatcher/with_cache/(?i:foo)-12                               4.07µs ± 0%    0.04µs ±15%   -99.06%  (p=0.000 n=3+3)
NewFastRegexMatcher/with_cache/(?i:(foo|bar))-12                         8.00µs ± 4%    0.04µs ±14%   -99.52%  (p=0.001 n=3+3)
NewFastRegexMatcher/with_cache/(?i:(foo1|foo2|bar))-12                   9.77µs ± 0%    0.04µs ± 2%   -99.62%  (p=0.000 n=3+3)
NewFastRegexMatcher/with_cache/^(?i:foo|oo)|(bar)$-12                    10.6µs ± 7%     0.0µs ± 1%   -99.65%  (p=0.001 n=3+3)
NewFastRegexMatcher/with_cache/(?i:(foo1|foo2|aaa|bbb|ccc|ddd|e-12       42.3µs ± 1%     0.0µs ± 5%   -99.90%  (p=0.000 n=3+3)
NewFastRegexMatcher/with_cache/((.*)(bar|b|buzz)(.+)|foo)$-12            13.8µs ± 5%     0.0µs ± 1%   -99.74%  (p=0.001 n=3+3)
NewFastRegexMatcher/with_cache/^$-12                                     3.08µs ± 0%    0.04µs ± 8%   -98.78%  (p=0.000 n=3+3)
NewFastRegexMatcher/with_cache/(prometheus|api_prom)_api_v1_.+-12        14.2µs ± 3%     0.0µs ± 0%   -99.74%  (p=0.000 n=3+3)
NewFastRegexMatcher/with_cache/10\.0\.(1|2)\.+-12                        7.16µs ± 0%    0.04µs ± 1%   -99.50%  (p=0.000 n=3+3)
NewFastRegexMatcher/with_cache/10\.0\.(1|2).+-12                         7.71µs ± 1%    0.04µs ± 2%   -99.52%  (p=0.000 n=3+3)
NewFastRegexMatcher/with_cache/((fo(bar))|.+foo)-12                      9.61µs ± 9%    0.04µs ± 1%   -99.63%  (p=0.002 n=3+3)
NewFastRegexMatcher/with_cache/zQPbMkNO|NNSPdvMi|iWuuSoAl|qbvKM-12        213µs ± 2%       0µs ± 2%   -99.97%  (p=0.000 n=3+3)
NewFastRegexMatcher/with_cache/(?i:(zQPbMkNO|NNSPdvMi|iWuuSoAl|-12        269µs ±15%       0µs ± 5%   -99.98%  (p=0.005 n=3+3)
NewFastRegexMatcher/without_cache/foo-12                                 3.33µs ± 1%    3.43µs ± 6%      ~     (p=0.446 n=3+3)
NewFastRegexMatcher/without_cache/^foo-12                                4.01µs ± 1%    4.21µs ± 9%      ~     (p=0.395 n=3+3)
NewFastRegexMatcher/without_cache/(foo|bar)-12                           6.31µs ± 3%    6.32µs ± 1%      ~     (p=0.941 n=3+3)
NewFastRegexMatcher/without_cache/foo.*-12                               4.93µs ± 0%    5.03µs ± 2%      ~     (p=0.203 n=3+3)
NewFastRegexMatcher/without_cache/.*foo-12                               4.89µs ± 9%    4.88µs ± 7%      ~     (p=0.957 n=3+3)
NewFastRegexMatcher/without_cache/^.*foo$-12                             5.39µs ± 3%    5.34µs ± 1%      ~     (p=0.531 n=3+3)
NewFastRegexMatcher/without_cache/^.+foo$-12                             5.37µs ± 1%    5.92µs ±12%      ~     (p=0.254 n=3+3)
NewFastRegexMatcher/without_cache/.*-12                                  3.40µs ±13%    3.28µs ± 8%      ~     (p=0.678 n=3+3)
NewFastRegexMatcher/without_cache/.+-12                                  3.09µs ± 4%    2.99µs ± 0%      ~     (p=0.267 n=3+3)
NewFastRegexMatcher/without_cache/foo.+-12                               5.27µs ± 5%    5.12µs ± 4%      ~     (p=0.450 n=3+3)
NewFastRegexMatcher/without_cache/.+foo-12                               4.73µs ± 1%    4.75µs ± 0%      ~     (p=0.436 n=3+3)
NewFastRegexMatcher/without_cache/foo_.+-12                              5.27µs ± 0%    5.28µs ± 0%      ~     (p=0.296 n=3+3)
NewFastRegexMatcher/without_cache/foo_.*-12                              5.35µs ± 4%    5.47µs ± 7%      ~     (p=0.644 n=3+3)
NewFastRegexMatcher/without_cache/.*foo.*-12                             5.33µs ± 1%    5.40µs ± 1%      ~     (p=0.065 n=3+3)
NewFastRegexMatcher/without_cache/.+foo.+-12                             5.86µs ±16%    5.86µs ± 7%      ~     (p=0.998 n=3+3)
NewFastRegexMatcher/without_cache/(?s:.*)-12                             3.24µs ± 1%    3.29µs ± 1%      ~     (p=0.127 n=3+3)
NewFastRegexMatcher/without_cache/(?s:.+)-12                             3.11µs ± 1%    3.18µs ± 2%      ~     (p=0.178 n=3+3)
NewFastRegexMatcher/without_cache/(?s:^.*foo$)-12                        5.63µs ± 7%    5.72µs ± 7%      ~     (p=0.767 n=3+3)
NewFastRegexMatcher/without_cache/(?i:foo)-12                            4.08µs ± 0%    4.14µs ± 1%      ~     (p=0.085 n=3+3)
NewFastRegexMatcher/without_cache/(?i:(foo|bar))-12                      7.87µs ± 7%    8.17µs ± 7%      ~     (p=0.535 n=3+3)
NewFastRegexMatcher/without_cache/(?i:(foo1|foo2|bar))-12                9.77µs ± 1%    9.92µs ± 2%      ~     (p=0.293 n=3+3)
NewFastRegexMatcher/without_cache/^(?i:foo|oo)|(bar)$-12                 10.2µs ± 0%    10.3µs ± 1%    +1.70%  (p=0.029 n=3+3)
NewFastRegexMatcher/without_cache/(?i:(foo1|foo2|aaa|bbb|ccc|ddd|e-12    45.5µs ± 4%    46.7µs ±17%      ~     (p=0.780 n=3+3)
NewFastRegexMatcher/without_cache/((.*)(bar|b|buzz)(.+)|foo)$-12         13.3µs ± 1%    13.5µs ± 1%    +2.01%  (p=0.038 n=3+3)
NewFastRegexMatcher/without_cache/^$-12                                  3.20µs ± 6%    3.25µs ± 7%      ~     (p=0.749 n=3+3)
NewFastRegexMatcher/without_cache/(prometheus|api_prom)_api_v1_.+-12     14.1µs ± 1%    14.3µs ± 1%      ~     (p=0.402 n=3+3)
NewFastRegexMatcher/without_cache/10\.0\.(1|2)\.+-12                     7.31µs ± 3%    7.24µs ± 1%      ~     (p=0.630 n=3+3)
NewFastRegexMatcher/without_cache/10\.0\.(1|2).+-12                      8.45µs ±17%    8.16µs ± 9%      ~     (p=0.743 n=3+3)
NewFastRegexMatcher/without_cache/((fo(bar))|.+foo)-12                   9.16µs ± 0%    9.32µs ± 0%    +1.71%  (p=0.002 n=3+3)
NewFastRegexMatcher/without_cache/zQPbMkNO|NNSPdvMi|iWuuSoAl|qbvKM-12     215µs ± 4%     214µs ± 0%      ~     (p=0.810 n=3+3)
NewFastRegexMatcher/without_cache/(?i:(zQPbMkNO|NNSPdvMi|iWuuSoAl|-12     254µs ± 2%     283µs ±18%      ~     (p=0.380 n=3+3)

BenchmarkNewFastRegexMatcher_CacheMisses

To measure the overhead of the cache, I've created a new benchmark which measures NewFastRegexMatcher() on cache misses:

name                                                                   old time/op    new time/op    delta
NewFastRegexMatcher_CacheMisses/simple_regexp-12                         6.60µs ± 1%    7.46µs ± 3%   +12.95%  (p=0.016 n=3+3)
NewFastRegexMatcher_CacheMisses/complex_regexp-12                         260µs ± 7%     272µs ± 8%      ~     (p=0.464 n=3+3)

name                                                                   old alloc/op   new alloc/op   delta
NewFastRegexMatcher_CacheMisses/simple_regexp-12                         6.99kB ± 0%    7.03kB ± 0%    +0.65%  (p=0.001 n=3+3)
NewFastRegexMatcher_CacheMisses/complex_regexp-12                         302kB ± 0%     302kB ± 0%    +0.02%  (p=0.000 n=3+3)

name                                                                   old allocs/op  new allocs/op  delta
NewFastRegexMatcher_CacheMisses/simple_regexp-12                           87.0 ± 0%      88.0 ± 0%      ~     (zero variance)
NewFastRegexMatcher_CacheMisses/complex_regexp-12                         1.35k ± 0%     1.35k ± 0%      ~     (zero variance)

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci marked this pull request as ready for review March 30, 2023 12:10
@pracucci pracucci requested a review from colega March 30, 2023 12:10
Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like good idea to me.

model/labels/regexp.go Outdated Show resolved Hide resolved
model/labels/regexp.go Outdated Show resolved Hide resolved
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci
Copy link
Collaborator Author

@pstibrany Thanks for your review. I've replaced the LRU with the v2, and updated the benchmark in the PR description (same results from the benchmark).

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

@pracucci pracucci merged commit 05a3a79 into main Mar 31, 2023
@pracucci pracucci deleted the cache-NewFastRegexMatcher branch March 31, 2023 02:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants