Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
trailing contexts are fundamentally broken #121
When there are multiple overlapping trailing contexts, re2c-generated code may set incorrect
Source code (1.re):
This is because re2c tries to implement variable-length trailing contexts using single pointer (
Of all re2c tests, only very few are affected (re2c own lexer and one of example lexers).
The problem can be fixed by adding multiple pointers (one pointer per rule) and setting
This problem is related to the general problem of extracting submatch in regular expressions, which requires either NFA or TDFA (DFA with tagged transitions).
The problem is exposed by the following example:
For which re2c generates this code:
Clearly, this code sets
I can think of two ways to solve this problem: either implement TDFA or report error in cases where greedy match is insufficient to correctly extract submatch.