Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re2c option -F (flex syntax) broken #229

Closed
genivia-inc opened this issue Nov 4, 2018 · 4 comments

Comments

@genivia-inc
Copy link

commented Nov 4, 2018

re2x -F (Flex syntax) version 1.1.1 throws a syntax error for the regex (a|ab)ba but the regex (a|ab)(ba) works fine for some reason. This is valid Flex syntax.

I can work around this issue for now by using regex ("a"|"ab")"ba", but it would be nice if re2c accepts Flex syntax.

@skvadrik

This comment has been minimized.

Copy link
Owner

commented Nov 5, 2018

This does look like an error to me. But this behaviour goes back as far as re2c-0.13.6 (and supposedly before that), and Flex syntax has always been only vaguely defined in re2c. I first need to understand what re2c already supports (otherwise the fix may break someone else's code), and then see how we can fix that without introducing ambiguity in grammar.

There is another workaround. This causes syntax error:

/*!re2c
    (a|ab)ba {}
*/

But this doesn't (indentation doesn't matter as long as the code in braces is on a new line):

/*!re2c
    (a|ab)ba
    {}
*/
@genivia-inc

This comment has been minimized.

Copy link
Author

commented Nov 5, 2018

Thanks for the suggestion. With the { } code block on the same line it is strange that any character after a ), ], * or + in the regex causes this failure, e.g. (a)b and [a]b and a*b and a+b fail.

@skvadrik

This comment has been minimized.

Copy link
Owner

commented Nov 6, 2018

@genivia-inc It's not the special characters ), ], *, + that cause the error, it's the raw character at the end of the regexp followed by one or more spaces and the opening brace {. This simple example also fails:

/*!re2c
    a {}
*/

The reason why it fails is partly historical: partial support for flex syntax was added on top of already existing re2c-specific syntax, which caused parsing conflicts (some due to limitations in the implementation, some due to genuine ambiguity in grammar).

I have a fix for the original example, but I want to make sure that it doesn't break existing code and check similar cases.

skvadrik added a commit that referenced this issue Nov 14, 2018

Fixed a couple of lexer/parser errors in flex mode (-F option).
This fixes bug #229: re2c option -F (flex syntax) broken,
reported by Robert van Engelen.

A well-formed example that caused syntax error (flex-style raw literal
followed by one or more spaces and a curly brace):

/*!re2c
    a {}
*/

The faulty behaviour goes back as far as re2c-0.13.6 (and supposedly
before that): in flex mode, raw literal may occur in various contexts
both as a regexp (string literal) and an identifier (named definition,
condiiton name). RE2C uses lookahead to infer the context and determine
the appropriate type of lexer token, but it missed some cases.

The fix has two sides. First, if reduces the number of contexts where
the general lexer may encounter raw literal (by using specialized lexers
for condition lists <x,y,...,z> and condition goto => and :=>). Second,
it fixes the lookahead regexps used for context inference.

Also added a bunch of tests (generated by a script).
@skvadrik

This comment has been minimized.

Copy link
Owner

commented Nov 14, 2018

@genivia-inc This should fix your example (and a couple of other cases in -F mode): 30a0682

@skvadrik skvadrik closed this Jan 16, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.