Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automata: fix incorrect use of Aho-Corasick's "standard" semantics #1072

Merged
merged 2 commits into from Aug 26, 2023

Conversation

BurntSushi
Copy link
Member

This fixes a bug in how prefilters were applied for multi-regexes compiled with "all" semantics. It turns out that this corresponds to the regex crate's RegexSet API, but only its is_match routine.

See the comment on the regression test added in this PR for an explanation of what happened. Basically, it came down to incorrectly using Aho-Corasick's "standard" semantics, which doesn't necessarily report leftmost matches. Since the regex crate is really all about leftmost matching, this can lead to skipping over parts of the haystack and thus lead to missing matches.

Fixes #1070

This fixes a bug in how prefilters were applied for multi-regexes
compiled with "all" semantics. It turns out that this corresponds to the
regex crate's RegexSet API, but only its `is_match` routine.

See the comment on the regression test added in this PR for an
explanation of what happened. Basically, it came down to incorrectly
using Aho-Corasick's "standard" semantics, which doesn't necessarily
report leftmost matches. Since the regex crate is really all about
leftmost matching, this can lead to skipping over parts of the haystack
and thus lead to missing matches.

Fixes #1070
The main reason we used mips before was to get test coverage on a big
endian target. Now that mips no longer seems to work[1], I wanted to
add at least one other big endian target. From the tier 2 supported
platforms[2], the only big endian targets I could find were powerpc and
s390x. So we just add both here.

[1]: rust-lang/compiler-team#648
[2]: https://doc.rust-lang.org/nightly/rustc/platform-support.html#tier-2-with-host-tools
@BurntSushi BurntSushi merged commit c788378 into master Aug 26, 2023
16 checks passed
@BurntSushi BurntSushi deleted the ag/fix-1070 branch August 26, 2023 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RegexSet and Regex give different results for the same pattern in 1.9
1 participant