-
Notifications
You must be signed in to change notification settings - Fork 183
trailing contexts are fundamentally broken #121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is because re2c tries to implement variable-length trailing contexts using single pointer ( Of all re2c tests, only very few are affected (re2c own lexer and one of example lexers). The problem can be fixed by adding multiple pointers (one pointer per rule) and setting |
This problem is related to the general problem of extracting submatch in regular expressions, which requires either NFA or TDFA (DFA with tagged transitions). The problem is exposed by the following example:
For which re2c generates this code:
Clearly, this code sets I can think of two ways to solve this problem: either implement TDFA or report error in cases where greedy match is insufficient to correctly extract submatch. |
When there are multiple overlapping trailing contexts, re2c-generated code may set incorrect
YYCTXMARKER
value. The following example fails on input string that matches pattern"abb" [^c]
: rule1
matches and should consume single charactera
, but it consumesab
becauseYYCTXMARKER
gets overwritten when trying to match rule0
.Source code (1.re):
Generated code:
The text was updated successfully, but these errors were encountered: