Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RegexSet and Regex give different results for the same pattern in 1.9 #1070

Closed
jorendorff opened this issue Aug 25, 2023 · 3 comments · Fixed by #1072
Closed

RegexSet and Regex give different results for the same pattern in 1.9 #1070

jorendorff opened this issue Aug 25, 2023 · 3 comments · Fixed by #1072

Comments

@jorendorff
Copy link

What version of regex are you using?

1.9.3. The issue is present in regex 1.9.0 and later.

Describe the bug at a high level.

RegexSet::new([r"(?m)^ *v [0-9]"]).unwrap().is_match("v 0") incorrectly returns false in version 1.9.0 and later.

It returns true in 1.8.4.

It returns true if I use a Regex instead of a RegexSet.

What are the steps to reproduce the behavior?

fn main() {
    let pattern = r"(?m)^ *v [0-9]";
    let text = "v 0";

    let re = regex::Regex::new(pattern).unwrap();
    println!("re is: {re:?}");
    println!("{}", re.is_match(text)); // true (correct)

    let rs = regex::RegexSet::new([pattern]).unwrap();
    println!("rs is: {rs:?}");
    println!("{}", rs.is_match(text)); // false (incorrect)
}

(playground link)

What is the actual behavior?

re is: Regex("(?m)^ *v [0-9]")
true
rs is: RegexSet(["(?m)^ *v [0-9]"])
false

What is the expected behavior?

The last line should be true.

@BurntSushi
Copy link
Member

One interesting bit here is that while is_match returns false, matches(..).matched_any() returns true!

fn main() {
    env_logger::init();
    let pattern = r"(?m)^ *v [0-9]";
    let text = "v 0";

    let rs = regex::RegexSet::new([pattern]).unwrap();
    println!("rs is: {rs:?}");
    println!("{}", rs.is_match(text)); // false (incorrect)
    println!("{}", rs.matches(text).matched_any()); // true!
}

BurntSushi added a commit that referenced this issue Aug 26, 2023
This fixes a bug in how prefilters were applied for multi-regexes
compiled with "all" semantics. It turns out that this corresponds to the
regex crate's RegexSet API, but only its `is_match` routine.

See the comment on the regression test added in this PR for an
explanation of what happened. Basically, it came down to incorrectly
using Aho-Corasick's "standard" semantics, which doesn't necessarily
report leftmost matches. Since the regex crate is really all about
leftmost matching, this can lead to skipping over parts of the haystack
and thus lead to missing matches.

Fixes #1070
BurntSushi added a commit that referenced this issue Aug 26, 2023
This fixes a bug in how prefilters were applied for multi-regexes
compiled with "all" semantics. It turns out that this corresponds to the
regex crate's RegexSet API, but only its `is_match` routine.

See the comment on the regression test added in this PR for an
explanation of what happened. Basically, it came down to incorrectly
using Aho-Corasick's "standard" semantics, which doesn't necessarily
report leftmost matches. Since the regex crate is really all about
leftmost matching, this can lead to skipping over parts of the haystack
and thus lead to missing matches.

Fixes #1070
BurntSushi added a commit that referenced this issue Aug 26, 2023
This fixes a bug in how prefilters were applied for multi-regexes
compiled with "all" semantics. It turns out that this corresponds to the
regex crate's RegexSet API, but only its `is_match` routine.

See the comment on the regression test added in this PR for an
explanation of what happened. Basically, it came down to incorrectly
using Aho-Corasick's "standard" semantics, which doesn't necessarily
report leftmost matches. Since the regex crate is really all about
leftmost matching, this can lead to skipping over parts of the haystack
and thus lead to missing matches.

Fixes #1070
@BurntSushi
Copy link
Member

This is fixed in regex 1.9.4 and regex-automata 0.3.7 on crates.io. See #1072 for the details about what caused this. Nice find and thank you for the report!

@jorendorff
Copy link
Author

obviously, killing it here. just wow. i don't have the words

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants