Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trailing context consumed if initial expression matches it #165

Closed
jefftrull opened this issue Nov 30, 2016 · 3 comments

Comments

@jefftrull
Copy link

commented Nov 30, 2016

I am finding that expressions of the form
X / Y
result in YYCURSOR pointing after Y instead of after X, i.e., the trailing context is consumed - if Y can also be matched by X.

In the attached example there are two expressions to be matched. In the first, any number of "a" or "b" characters followed by a final "b":

[ab]* / "b"

In the second, any number of "c" followed by a final "d":

[c]* / "d"

I find that, in the first example, matches result in YYCURSOR pointing past the final "b", which is the trailing context. In the second example, matches result in YYCURSOR pointing, correctly, at the final "d".
re2c_testcase.zip

@jefftrull

This comment has been minimized.

Copy link
Author

commented Nov 30, 2016

Just for fun I cloned the repo and discovered that this issue is fixed in devel branch :) Is there a release schedule?

@jefftrull

This comment has been minimized.

Copy link
Author

commented Nov 30, 2016

Perhaps this is a duplicate of bug #121

@skvadrik

This comment has been minimized.

Copy link
Owner

commented Nov 30, 2016

Yep, it's a well-known bug, duplicate of #121. The bug itself has been fixed in devel for quite a while (since 7db4bab), but it turned out to be just a top of an iceberg -- much larger problem with a more general solution.

In short, the fix required switching from DFA to TDFA, the latter being a relatively new invention in automata theory (year 2000, Ville Laurikari). While working on it I built a slightly different kind of TDFA than Laurikari did -- probably a more efficient one. I have to carefully reconstruct all the proofs, formalize the model and compare the two types of TDFA. This is what takes me so long.

Re2c will support both kinds of TDFA and will be capable of submatch extraction.

I'm mostly done with the implementation, but I can't release re2c until I fully formalize the underlying theory. Hopefully it will take me a couple of months, hard to say.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.