<regex>: Optimize skip heuristic for searches of patterns with initial dot wildcards#6189
Merged
StephanTLavavej merged 2 commits intomicrosoft:mainfrom Mar 31, 2026
Conversation
…ial dot wildcards
StephanTLavavej
approved these changes
Mar 30, 2026
Member
|
I'm mirroring this to the MSVC-internal repo. Please notify me if any further changes are pushed, otherwise no action is required. |
Member
🐌 ⚡ 🐇 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Towards #5468.
This closes one of the remaining gaps in the skip heuristic. It doesn't really make sense to use the dot wildcard itself as the basis for skipping ahead, as it essentially matches everything (except for newlines in ECMAScript or NUL in POSIX grammars). So instead, this uses the following NFA node to compute the skip.
Benchmark
Relevant changes highlighted.
Note that there is some observable slowdown for the regular expressions
(?=....)bibe"and(?=bibe)..... This is because the new logic keeps analyzing the regex beyond the first dot wildcard, so more time is spent on the subpattern..... This additional analysis turns out to be not helpful on this specific subpattern. But this subpattern is also not realistic for an assertion because it doesn't actually restrict the set of strings the regex can match, and I think this slight slowdown in some cases is worth it given the potential huge acceleration in more realistic regular expressions.