Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
flex-style named definitions cause ambiguity in re2c grammar #115
Bison reports 10 shift/reduce conflicts when compiling re2c parser. Turns out that all of them are caused by one unfortunate production in grammar:
which stands for flex-style named definitions of the form:
re2c tries to partially support flex syntax with '-F' flag. Native re2c named definitions are of the form:
Another notable difference is that re2c allows newlines inside of regular expressions in named definitions, while flex doesn't.
re2c syntax allows to mix named definitions with rules. With native re2c named definitions that's ok: they have an ending semicolon that allows to distinguish them from rules. However, flex-style named definitions don't have an ending semicolon (newline acts as a delimiter in flex, but not in re2c), so mixing them with rules introduces parsing ambiguity. Consider the following example:
One can interpret this fragment in two different ways:
both ways are valid, so there's a real ambiguity in grammar, not just some stupid LALR(1) conflict.
In flex, there's no parsing problem: it has newline as a delimiter and doesn't allow to mix named definitions with rules. Named definitions must all come together in a separate section delimited by "%%" :
As of now, re2c will fail to parse the example above. However, parsing problem vanishes in '-c' mode, because with '-c' rules have different form:
Some re2c users (and notably, PHP team) use '-F' together with '-c' and don't face the parsing problem.
So what should we do? I see the following options:
I vote for (1) for the following reasons:
If (1) raises no objections, what should we do with -F option (remove or leave deprecated)?