New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
empty character class [] matches empty string #59
Comments
I'm thinking of completely forbidding empty character classes. It seems to conform perl and posix BRE/ERE regular expressions. Original comment by: skvadrik |
Further analyses of the following case:
showed that versions <=0.13.6 and >=0.13.7 behave differently. Up to 0.13.6 re2c consistently considered that empty range should match empty string:
Starting from 0.13.7 this behaviour was broken (the faulty commit is unsurprisingly the big one that added UTF8 support). empty positive range [] and empty difference (e.g. [a-z][a-z]) still match empty string, but empty negative range (e.g. [^\x00-\xFF]) matches nothing (always fails):
Whether we choose to match empty string or match nothing on empty ranges, the behaviour must be consistent (apply to all cases of range construction). Original comment by: skvadrik |
I vote for making it an error (or at least invalid according to the re2c grammar). As far as my imagination goes, it's a pointless thing to do. Looking at the posix standard for regexes, they're not allowed: "A bracket expression (an expression enclosed in square brackets, "[]" ) is an RE that shall match a single collating element contained in the non-empty set of collating elements represented by the bracket expression." http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html Original comment by: nuffer |
I agree entirely, but you know, re2c users are quite conservative ;) I think I'll add an option "--empty-class <match-empty|match-none|error>" and default to "match-empty" (match empty input, as in <=0.13.6) in 0.15, then switch the default to "error" in 0.16 (if nobody objects to it). Anyway, a warning will be useful. Original comment by: skvadrik |
This commit adds the "--empty-class" option. Dan, does it look good to you? Original comment by: skvadrik |
+1 Looks good. Original comment by: nuffer |
Original comment by: skvadrik |
We have one 'yyaccept' initialization per re2c block. Each block consists of one or more DFA (multiple DFA in '-c' mode in case of multiple conditions). Each DFA may or may not use 'yyaccept' (that is, save 'yyaccept' in some states and have a dispatch state based on saved 'yyaccept' value). Description of the bug: in '-c' mode, sometimes a DFA would have states that save 'yyaccept', but no dispatch state that uses that saved values. DFA didn't actually need 'yyaccept' (all the assignments vanished if other conditions that need 'yyaccept' were removed). The essence of the bug: re2c decided whether to output 'yyaccept' related stuff on a per-block basis: for multiple conditions in the same block, the same decision was made (if any condition needed 'yyaccept', all of them would to output it). The fix: 'yyaccept' initialization should be done on a per-block basis, while assignments to 'yyaccept' should be done on a per-DFA basis. Also, 'yyaccept' initialization must be delayed, while assignments to 'yyaccept' must not. Note: we may consider per-DFA 'yyaccept' initialization (have a local 'yyaccept' variable per DFA). This wouldn't conflict with '-f' switch (as it might seem) as long as we name all the variables 'yyaccept' and don't generate any 'yyaccept' initializations with '-f'.
E.g. the following source:
compiles to the following code:
Reproducible with 0.13.6, 0.14 and HEAD (and seems that it has always been that way). It of course should never match (and preferably report a warning).
Original comment by: skvadrik
The text was updated successfully, but these errors were encountered: