Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

redundant use of YYMARKER #61

Closed
skvadrik opened this issue Jun 5, 2015 · 2 comments

Comments

@skvadrik
Copy link
Owner

commented Jun 5, 2015

Consider the following re2c source (1.re):

/*!re2c
    <c1> [^]+ "a" {}
    <c1> "b" {}

    <c2> "d" {}
    <c2> "ddd" {}
*/

Build and grep YYMARKER and condition borders:

$ re2c -c 1.re | grep -E "YYMARKER|yyc_"
        case yycc1: goto yyc_c1;
        case yycc2: goto yyc_c2;
yyc_c1:
        yych = *(YYMARKER = ++YYCURSOR);
        YYMARKER = ++YYCURSOR;
yyc_c2:
        YYCURSOR = YYMARKER;
        yych = *(YYMARKER = ++YYCURSOR);

Condition c2 needs YYMARKER (it backups and restores YYCURSOR). Condition c1 doesn't need YYMARKER (it never restores YYCURSOR, but backups it).

With HEAD the bug emerges even without '-c':

/*!re2c
    [^]+ "a" {}
    "b" {}
*/

Build and grep:

$ re2c 1.re | grep "YYMARKER"
        yych = *(YYMARKER = ++YYCURSOR);
        YYMARKER = ++YYCURSOR;

Note: It may be not so obvious that YYMARKER is redundant. One may think that re2c should have generated code that restores YYCURSOR (that the real error is not redundant backup, but the lack of restore). That's not true: in this case, though rules overlap, longer rule always succeeds (so there's no need to backup shorter rule match). When I say "longer rule always succeeds" I'm pretty aware that input string may end unexpectedly. In this case either YYFILL will supply enough characters for the longer rule to succeed or YYFILL will not return and the shorter rule match will be discarded anyway.

Original comment by: skvadrik

@skvadrik

This comment has been minimized.

Copy link
Owner Author

commented Jun 6, 2015

  • status: accepted --> closed-fixed

Original comment by: skvadrik

@skvadrik

This comment has been minimized.

Copy link
Owner Author

commented Jun 6, 2015

Fixed, see this commit.

Note that this bugfix affects one of PHP lexers.

Original comment by: skvadrik

@skvadrik skvadrik self-assigned this Jul 23, 2015

@skvadrik skvadrik closed this Jul 23, 2015

skvadrik added a commit that referenced this issue Nov 21, 2015

Partial fix for bug #61 "empty character class [] matches empty string".
Given the following code:
    /*!re2c
        [] {}
    */

    /*!re2c
        [^\x00-\xFF] {}
    */

    /*!re2c
        [\x00-\xFF]\[\x00-\xFF] {}
    */
re2c versions <=0.13.6 and >=0.13.7 behaved differently.
0.13.6 consistently considered that empty range should match empty string.
Since 0.13.7 empty positive range [] and empty difference (e.g. [a-z][a-z])
still match empty string, but empty negative range (e.g. [^\x00-\xFF])
matches nothing (always fails). The faulty commit is
28ee7c9
"Added UTF-8 encoding support and tests for it."

This commit brings back consistent behaviour of 0.13.6: empty range,
however it was constructed, always matches empty string. Whether this
behaviour is sane or not is another question.

skvadrik added a commit that referenced this issue Nov 21, 2015

Added cmd option "--empty-class <match-empty|match-none|error>".
This option controls re2c actions when it encounters empty character
class (e.g. [], [^\0x00-\xFF] or [\0x00-\xFF]\[\0x00-\xFF]):
    match-empty (default) - match on empty input
    match-none - fail to match on any input
    error - compilation error

This is a final fix for bug #61 "empty character class [] matches empty string".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.