You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
use logos::Logos;#[derive(Debug,Logos)]enumToken{#[regex(r"(a*b)*")]Foo,}fnmain(){let input = "a";for token inToken::lexer(input){println!("{token:?}");}}
The regex (a*b)* should not match the string a, so this program should print Err(()). But Logos incorrectly thinks the regex matches and the program actually prints Ok(Foo).
Analysis
The problem appears to be in the construction of the NFA. For the program above, Logos will currently construct this NFA, which allows a single a to match:
flowchart LR
s[start]-->2
2((2))--a-->2((2))
2((2))--_-->4((4))
4((4))--b-->2((2))
4((4))--_-->1((1))
1((1))-->e[::Foo]
Loading
A possible solution would be to construct an NFA with separate paths for the first and the remaining iterations of the inner loop, such as the following. This way, every a must be eventually followed by a b to match.
Repro
I tried this code:
The regex
(a*b)*
should not match the stringa
, so this program should printErr(())
. But Logos incorrectly thinks the regex matches and the program actually printsOk(Foo)
.Analysis
The problem appears to be in the construction of the NFA. For the program above, Logos will currently construct this NFA, which allows a single
a
to match:A possible solution would be to construct an NFA with separate paths for the first and the remaining iterations of the inner loop, such as the following. This way, every
a
must be eventually followed by ab
to match.I will submit a PR implementing this approach shortly.
The text was updated successfully, but these errors were encountered: