I am finding that expressions of the form X / Y
result in YYCURSOR pointing after Y instead of after X, i.e., the trailing context is consumed - ifY can also be matched by X.
In the attached example there are two expressions to be matched. In the first, any number of "a" or "b" characters followed by a final "b":
[ab]* / "b"
In the second, any number of "c" followed by a final "d":
[c]* / "d"
I find that, in the first example, matches result in YYCURSOR pointing past the final "b", which is the trailing context. In the second example, matches result in YYCURSOR pointing, correctly, at the final "d". re2c_testcase.zip
The text was updated successfully, but these errors were encountered:
Yep, it's a well-known bug, duplicate of #121. The bug itself has been fixed in devel for quite a while (since 7db4bab), but it turned out to be just a top of an iceberg -- much larger problem with a more general solution.
In short, the fix required switching from DFA to TDFA, the latter being a relatively new invention in automata theory (year 2000, Ville Laurikari). While working on it I built a slightly different kind of TDFA than Laurikari did -- probably a more efficient one. I have to carefully reconstruct all the proofs, formalize the model and compare the two types of TDFA. This is what takes me so long.
Re2c will support both kinds of TDFA and will be capable of submatch extraction.
I'm mostly done with the implementation, but I can't release re2c until I fully formalize the underlying theory. Hopefully it will take me a couple of months, hard to say.