Skip to content

<regex>: Optimize skip heuristic for searches of patterns with initial dot wildcards#6189

Merged
StephanTLavavej merged 2 commits intomicrosoft:mainfrom
muellerj2:regex-optimize-initial-dot-searches
Mar 31, 2026
Merged

<regex>: Optimize skip heuristic for searches of patterns with initial dot wildcards#6189
StephanTLavavej merged 2 commits intomicrosoft:mainfrom
muellerj2:regex-optimize-initial-dot-searches

Conversation

@muellerj2
Copy link
Copy Markdown
Contributor

@muellerj2 muellerj2 commented Mar 29, 2026

Towards #5468.

This closes one of the remaining gaps in the skip heuristic. It doesn't really make sense to use the dot wildcard itself as the basis for skipping ahead, as it essentially matches everything (except for newlines in ECMAScript or NUL in POSIX grammars). So instead, this uses the following NFA node to compute the skip.

Benchmark

Relevant changes highlighted.

benchmark before [ns] after [ns] speedup
bm_lorem_search/"^bibe"/2 50.2232 51.5625 0.97
bm_lorem_search/"^bibe"/3 50.2232 50.2232 1.00
bm_lorem_search/"^bibe"/4 50 50.8161 0.98
bm_lorem_search/"bibe"/2 2887.83 2845.99 1.01
bm_lorem_search/"bibe"/3 5625 5580.36 1.01
bm_lorem_search/"bibe"/4 11718.8 11474.6 1.02
bm_lorem_search/"bibe".collate/2 3013.39 2887.83 1.04
bm_lorem_search/"bibe".collate/3 5580.36 5580.36 1.00
bm_lorem_search/"bibe".collate/4 10742.2 11230.5 0.96
bm_lorem_search/"(bibe)"/2 3529.57 3529.57 1.00
bm_lorem_search/"(bibe)"/3 7114.96 6975.45 1.02
bm_lorem_search/"(bibe)"/4 13811.3 13671.9 1.01
bm_lorem_search/"(bibe)+"/2 4603.8 4652.62 0.99
bm_lorem_search/"(bibe)+"/3 8998.29 8998.29 1.00
bm_lorem_search/"(bibe)+"/4 17578.3 18415.3 0.95
bm_lorem_search/"(?:bibe)+"/2 4185.27 4010.88 1.04
bm_lorem_search/"(?:bibe)+"/3 7847.38 7672.99 1.02
bm_lorem_search/"(?:bibe)+"/4 15694.7 15380.8 1.02
bm_lorem_search/R"(\bbibe)"/2 64174.1 64174.1 1.00
bm_lorem_search/R"(\bbibe)"/3 131138 125558 1.04
bm_lorem_search/R"(\bbibe)"/4 256696 254981 1.01
bm_lorem_search/R"(\Bibe)"/2 144385 138108 1.05
bm_lorem_search/R"(\Bibe)"/3 288771 278700 1.04
bm_lorem_search/R"(\Bibe)"/4 610352 578125 1.06
bm_lorem_search/R"((?=....)bibe)"/2 3989.95 4687.5 0.85
bm_lorem_search/R"((?=....)bibe)"/3 8021.76 9068.08 0.88
bm_lorem_search/R"((?=....)bibe)"/4 15346 18415.3 0.83
bm_lorem_search/R"((?=bibe)....)"/2 3759.77 4010.88 0.94
bm_lorem_search/R"((?=bibe)....)"/3 7324.22 7672.99 0.95
bm_lorem_search/R"((?=bibe)....)"/4 13950.9 14997.2 0.93
bm_lorem_search/R"((?!lorem)bibe)"/2 3449.35 3452.85 1.00
bm_lorem_search/R"((?!lorem)bibe)"/3 6835.94 6975.45 0.98
bm_lorem_search/R"((?!lorem)bibe)"/4 13253.3 12869.6 1.03
bm_lorem_search/"bibe|soda"/2 429688 414406 1.04
bm_lorem_search/"bibe|soda"/3 836680 837054 1.00
bm_lorem_search/"bibe|soda"/4 1727580 1633710 1.06
bm_lorem_search/"(id )?bibe"/2 486592 464965 1.05
bm_lorem_search/"(id )?bibe"/3 1004020 962182 1.04
bm_lorem_search/"(id )?bibe"/4 1968830 1759380 1.12
bm_lorem_search/".bibe"/2 190438 2887.83 65.95
bm_lorem_search/".bibe"/3 374930 5781.25 64.85
bm_lorem_search/".bibe"/4 784738 11160.7 70.31

Note that there is some observable slowdown for the regular expressions (?=....)bibe" and (?=bibe)..... This is because the new logic keeps analyzing the regex beyond the first dot wildcard, so more time is spent on the subpattern ..... This additional analysis turns out to be not helpful on this specific subpattern. But this subpattern is also not realistic for an assertion because it doesn't actually restrict the set of strings the regex can match, and I think this slight slowdown in some cases is worth it given the potential huge acceleration in more realistic regular expressions.

@muellerj2 muellerj2 requested a review from a team as a code owner March 29, 2026 15:01
@github-project-automation github-project-automation Bot moved this to Initial Review in STL Code Reviews Mar 29, 2026
@StephanTLavavej StephanTLavavej added performance Must go faster regex meow is a substring of homeowner labels Mar 29, 2026
@StephanTLavavej StephanTLavavej self-assigned this Mar 29, 2026
Comment thread benchmarks/src/regex_search.cpp Outdated
@StephanTLavavej StephanTLavavej removed their assignment Mar 30, 2026
@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Mar 30, 2026
@StephanTLavavej
Copy link
Copy Markdown
Member

I'm mirroring this to the MSVC-internal repo. Please notify me if any further changes are pushed, otherwise no action is required.

@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews Mar 31, 2026
@StephanTLavavej StephanTLavavej merged commit 8c2f5fe into microsoft:main Mar 31, 2026
49 checks passed
@github-project-automation github-project-automation Bot moved this from Merging to Done in STL Code Reviews Mar 31, 2026
@StephanTLavavej
Copy link
Copy Markdown
Member

🐌 ⚡ 🐇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Must go faster regex meow is a substring of homeowner

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants