Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Valid prefix search (with ^) goes into dead state #1169

Closed
acarl005 opened this issue Mar 4, 2024 · 3 comments · Fixed by #1170
Closed

Valid prefix search (with ^) goes into dead state #1169

acarl005 opened this issue Mar 4, 2024 · 3 comments · Fixed by #1170
Labels

Comments

@acarl005
Copy link

acarl005 commented Mar 4, 2024

What version of regex are you using?

regex-automata = "0.4.5"

Describe the bug at a high level.

I'm trying to do a prefix search by adding a ^ at the beginning of my pattern. I'm searching right-to-left, but the dfa is failing to find a match and entering the "dead" state.

What are the steps to reproduce the behavior?

use regex_automata::{hybrid::dfa::DFA, nfa::thompson, Input};
use std::error::Error;

fn main() -> Result<(), Box<dyn Error>> {
    let pattern = r"^Qu";
    let dfa = DFA::builder()
        .thompson(thompson::Config::new().reverse(true))
        .build_many(&[pattern])?;

    let mut cache = dfa.create_cache();
    let hay = "Quartz";

    let mut state = dfa.start_state_reverse(&mut cache, &Input::new(hay))?;
    for &b in hay.as_bytes().iter().rev() {
        state = dfa.next_state(&mut cache, state, b)?;
        if state.is_dead() {
            panic!("DEAD");
        }
        if state.is_match() {
            println!("BREAK");
            break;
        }
    }

    state = dfa.next_eoi_state(&mut cache, state)?;
    assert!(state.is_match());
    Ok(())
}

What is the actual behavior?

❯ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/reg`
thread 'main' panicked at src/main.rs:17:13:
DEAD
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

What is the expected behavior?

I expect the program to run through to the end, finding a match and passing the assertion, as the pattern ^Qu should match the string "Quartz".

If I remove the ^ from the pattern, this program behaves as expected.

@acarl005 acarl005 closed this as not planned Won't fix, can't repro, duplicate, stale Mar 4, 2024
@acarl005 acarl005 changed the title Including a ^ forces the entire line to match Valid prefix search (with ^) goes into dead state Mar 4, 2024
@acarl005 acarl005 reopened this Mar 4, 2024
BurntSushi added a commit that referenced this issue Mar 4, 2024
Previously, when compiling a Thompson NFA, we were omitting an
unanchored prefix when the HIR contained a `^` in its prefix. We did
this because unanchored prefix in that case would never match because of
the requirement imposed by `^`.

The problem with that is it's incorrect when compiling a reverse
automaton. For example, in the case of building a reverse NFA for `^Qu`,
we should sitll include an unanchored prefix because the `^` in that
case has no conflict with it. It would be like if we omitted an
unanchored prefix for `Qu$` in a forward NFA, which is obviously wrong.

The fix here is pretty simple: in the reverse case, check for `$` in the
suffix of the HIR rather than a `^` in the prefix.

Fixes #1169
@BurntSushi BurntSushi added the bug label Mar 4, 2024
BurntSushi added a commit that referenced this issue Mar 4, 2024
Previously, when compiling a Thompson NFA, we were omitting an
unanchored prefix when the HIR contained a `^` in its prefix. We did
this because unanchored prefix in that case would never match because of
the requirement imposed by `^`.

The problem with that is it's incorrect when compiling a reverse
automaton. For example, in the case of building a reverse NFA for `^Qu`,
we should sitll include an unanchored prefix because the `^` in that
case has no conflict with it. It would be like if we omitted an
unanchored prefix for `Qu$` in a forward NFA, which is obviously wrong.

The fix here is pretty simple: in the reverse case, check for `$` in the
suffix of the HIR rather than a `^` in the prefix.

Fixes #1169
@BurntSushi
Copy link
Member

This is fixed in regex-automata 0.4.6 on crates.io.

@BurntSushi
Copy link
Member

Thank you for the nice repro. :)

@acarl005
Copy link
Author

acarl005 commented Mar 4, 2024

Thanks for the quick fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants