Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Trailing context consumed if initial expression matches it #165
I am finding that expressions of the form
In the attached example there are two expressions to be matched. In the first, any number of "a" or "b" characters followed by a final "b":
In the second, any number of "c" followed by a final "d":
I find that, in the first example, matches result in YYCURSOR pointing past the final "b", which is the trailing context. In the second example, matches result in YYCURSOR pointing, correctly, at the final "d".
Yep, it's a well-known bug, duplicate of #121. The bug itself has been fixed in
In short, the fix required switching from DFA to TDFA, the latter being a relatively new invention in automata theory (year 2000, Ville Laurikari). While working on it I built a slightly different kind of TDFA than Laurikari did -- probably a more efficient one. I have to carefully reconstruct all the proofs, formalize the model and compare the two types of TDFA. This is what takes me so long.
Re2c will support both kinds of TDFA and will be capable of submatch extraction.
I'm mostly done with the implementation, but I can't release re2c until I fully formalize the underlying theory. Hopefully it will take me a couple of months, hard to say.