Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite loop in captures #375

Closed
adamcrume opened this issue May 31, 2017 · 1 comment
Closed

Infinite loop in captures #375

adamcrume opened this issue May 31, 2017 · 1 comment

Comments

@adamcrume
Copy link
Contributor

Running the code

Regex::new("((?:.*)*?)=").unwrap().captures("a=b");

against regex 0.2.2 results in an infinite loop. The stack trace is

#0  0x00005555555e4a16 in regex::backtrack::Bounded<regex::input::CharInput>::backtrack<regex::input::CharInput> (self=0x7ffff67efbc0, start=...)
    at /home/adam/.cargo/registry/src/github.com-1ecc6299db9ec823/regex-0.2.2/src/backtrack.rs:189
#1  0x00005555555e47ad in regex::backtrack::Bounded<regex::input::CharInput>::exec_<regex::input::CharInput> (self=0x7ffff67efbc0, at=...)
    at /home/adam/.cargo/registry/src/github.com-1ecc6299db9ec823/regex-0.2.2/src/backtrack.rs:169
#2  0x00005555555e3a7a in regex::backtrack::Bounded<regex::input::CharInput>::exec<regex::input::CharInput> (prog=0x7ffff6038028, cache=0x7ffff6064000, matches=...,
    slots=..., input=..., start=0) at /home/adam/.cargo/registry/src/github.com-1ecc6299db9ec823/regex-0.2.2/src/backtrack.rs:112
#3  0x000055555562b87f in regex::exec::ExecNoSync::exec_backtrack (self=0x7ffff67fce20, matches=..., slots=..., text=..., start=0)
    at /home/adam/.cargo/registry/src/github.com-1ecc6299db9ec823/regex-0.2.2/src/exec.rs:977
#4  0x000055555562b24b in regex::exec::ExecNoSync::exec_nfa (self=0x7ffff67fce20, ty=regex::exec::MatchNfaType::Auto, matches=..., slots=..., quit_after_match=false,
    text=..., start=0) at /home/adam/.cargo/registry/src/github.com-1ecc6299db9ec823/regex-0.2.2/src/exec.rs:922
#5  0x000055555562af00 in regex::exec::ExecNoSync::captures_nfa_type (self=0x7ffff67fce20, ty=regex::exec::MatchNfaType::Auto, slots=..., text=..., start=0)
    at /home/adam/.cargo/registry/src/github.com-1ecc6299db9ec823/regex-0.2.2/src/exec.rs:893
#6  0x000055555562adc0 in regex::exec::ExecNoSync::captures_nfa (self=0x7ffff67fce20, slots=..., text=..., start=0)
    at /home/adam/.cargo/registry/src/github.com-1ecc6299db9ec823/regex-0.2.2/src/exec.rs:882
#7  0x000055555562ad0d in regex::exec::ExecNoSync::captures_nfa_with_match (self=0x7ffff67fce20, slots=..., text=..., match_start=0, match_end=2)
    at /home/adam/.cargo/registry/src/github.com-1ecc6299db9ec823/regex-0.2.2/src/exec.rs:870
#8  0x000055555562a2fe in regex::exec::{{impl}}::read_captures_at (self=0x7ffff67fce20, locs=0x7ffff67fcfc0, text=..., start=0)
    at /home/adam/.cargo/registry/src/github.com-1ecc6299db9ec823/regex-0.2.2/src/exec.rs:559
#9  0x000055555563790e in regex::exec::{{impl}}::read_captures_at (self=<optimized out>, locs=<optimized out>, text=..., start=<optimized out>)
    at /home/adam/.cargo/registry/src/github.com-1ecc6299db9ec823/regex-0.2.2/src/exec.rs:361
#10 regex::re_unicode::Regex::read_captures_at (self=0x7ffff67fd0f0, locs=0x7ffff67fcfc0, text=..., start=0)
    at /home/adam/.cargo/registry/src/github.com-1ecc6299db9ec823/regex-0.2.2/src/re_unicode.rs:717
#11 0x00005555556373a4 in regex::re_unicode::Regex::captures (self=0x7ffff67fd0f0, text=...)
    at /home/adam/.cargo/registry/src/github.com-1ecc6299db9ec823/regex-0.2.2/src/re_unicode.rs:331

(Not that the regex itself is useful; it was randomly generated.)

@adamcrume
Copy link
Contributor Author

The program for the regex is

self.prog[0] = Save(InstSave { goto: 1, slot: 0 })
self.prog[1] = Save(InstSave { goto: 2, slot: 2 })
self.prog[2] = Split(InstSplit { goto1: 5, goto2: 3 })
self.prog[3] = Split(InstSplit { goto1: 4, goto2: 2 })
self.prog[4] = Ranges(InstRanges { goto: 3, ranges: [('\u{0}', '\t'), ('\u{b}', '\u{10ffff}')] })
self.prog[5] = Save(InstSave { goto: 6, slot: 3 })
self.prog[6] = Char(InstChar { goto: 7, c: '=' })
self.prog[7] = Save(InstSave { goto: 8, slot: 1 })
self.prog[8] = Match(0)

and the step function keeps repeating

ip = 2, at = InputAt { pos: 3, c: Empty, byte: None, len: 0 }, prog[ip] = Split(InstSplit { goto1: 5, goto2: 3 })
ip = 3, at = InputAt { pos: 3, c: Empty, byte: None, len: 0 }, prog[ip] = Split(InstSplit { goto1: 4, goto2: 2 })

over and over. I'm not an expert with this codebase, but I suspect it may be compiling the regex to the wrong bytecode. Note that a similar regex ((.*)*?)= (where the inner group is a capturing group) compiles to

self.prog[0] = Save(InstSave { goto: 1, slot: 0 })                                                               
self.prog[1] = Save(InstSave { goto: 2, slot: 2 })                                                               
self.prog[2] = Split(InstSplit { goto1: 7, goto2: 3 })                                                           
self.prog[3] = Save(InstSave { goto: 4, slot: 4 })                                                                                                          
self.prog[4] = Split(InstSplit { goto1: 5, goto2: 6 })                                                           
self.prog[5] = Ranges(InstRanges { goto: 4, ranges: [('\u{0}', '\t'), ('\u{b}', '\u{10ffff}')] })                
self.prog[6] = Save(InstSave { goto: 2, slot: 5 })                                                               
self.prog[7] = Save(InstSave { goto: 8, slot: 3 })                                                               
self.prog[8] = Char(InstChar { goto: 9, c: '=' })                                                                
self.prog[9] = Save(InstSave { goto: 10, slot: 1 })                                                              
self.prog[10] = Match(0)

adamcrume added a commit to adamcrume/regex that referenced this issue Jun 19, 2017
…pressions

With certain repeated empty expressions similar to (x*)*?, the backtracker can
go into an infinite loop. This change adds the Progress instruction which
requires the engine to make progress to continue matching a repeated
subexpression.

Fixes rust-lang#375
adamcrume added a commit to adamcrume/regex that referenced this issue Jun 19, 2017
…pressions

With certain repeated empty expressions similar to (x*)*?, the backtracker can
go into an infinite loop. This change adds the Progress instruction which
requires the engine to make progress to continue matching a repeated
subexpression.

Fixes rust-lang#375
adamcrume added a commit to adamcrume/regex that referenced this issue Jun 20, 2017
…step

This prevents us from executing instructions if they have been executed before, rather than returning after the instruction has already been executed.

Fixes rust-lang#375
bors added a commit that referenced this issue Jun 23, 2017
Move has_visited check to the top of the loop in backtrack::Bounded::step

…pressions

With certain repeated empty expressions similar to (x*)*?, the backtracker can
go into an infinite loop. This change adds the Progress instruction which
requires the engine to make progress to continue matching a repeated
subexpression.

Fixes #375

Note that this was inspired by https://swtch.com/~rsc/regexp/regexp2.html#real (mentioned in HACKING.md), which mentions that a progress instruction can be used to prevent backtracking loops.
@bors bors closed this as completed in #384 Jun 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant