Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zephir: re2c 1.0.2 keeps the initial "/" at the beginning of an "docblock" #197

Closed
sergeyklay opened this issue Nov 4, 2017 · 22 comments
Closed

Comments

@sergeyklay
Copy link
Collaborator

sergeyklay commented Nov 4, 2017

Hi @skvadrik

It seems to me there are some backward incompatible changes in re2c 1.x.
We use the following expression for Zephir Language:

DCOMMENT = ("/**"([^*]+|[*]+[^/*])*[*]*"*/");
DCOMMENT {
	token->opcode = XX_T_COMMENT;
	token->value = estrndup(q, YYCURSOR - q - 1);
	token->len = YYCURSOR - q - 1;
	{
		int k, ch = s->active_char;
		for (k = 0; k < (token->len - 1); k++) {
			if (token->value[k] == '\n') {
				ch = 1;
				s->active_line++;
			} else {
				ch++;
			}
		}
		s->active_char = ch;
	}
	q = YYCURSOR;
	return 0;
}

to find comments of the form:

/**
 * Comment
 */

So token->value for re2c >= 0.13.6 && < 1.0 is:

    **
     * Comment
     *

but for re2c >= 1.0 is:

-    **
+    /**
     * Comment
     *

Did I create the backward incompatible expression to catch "docblocks" or is it the expected behavior?
Thanks


Refs: zephir-lang/php-zephir-parser#31

sergeyklay added a commit to zephir-lang/php-zephir-parser that referenced this issue Nov 4, 2017
Actually this is temporary workaround to be able try latest changes
of re2c >= 1.x.

Related issues:

* skvadrik/re2c#197
* #31
* zephir-lang/zephir#1591
* phalcon/cphalcon#13140
@skvadrik
Copy link
Owner

skvadrik commented Nov 4, 2017

Hi Serghei, I'm unable to reproduce the difference with re2c-1.0.2 and re2c-0.15.3.
I'm running re2c -i on the code:

/*!re2c

"/**" ([^*]+|[*]+[^/*])* [*]* "*/" {}
* {}

*/

The output is identical (up to the date and version info that re2c includes in the generated file).

So please attach the full example and re2c options.

@skvadrik
Copy link
Owner

skvadrik commented Nov 4, 2017

I managed to reproduce the dfference on php-zephir-parser/parser/scanner.re, looking into it.

@skvadrik
Copy link
Owner

skvadrik commented Nov 5, 2017

Did you test with re2c-0.16? It seems that the change is between 0.15.3 and 0.16 rather than between 0.16 and 1.0.2.

@sergeyklay
Copy link
Collaborator Author

Hmmm.. Nope :-/ I have to try

@sergeyklay
Copy link
Collaborator Author

Just tested 0.16. Works as expected (without initial "/").

@skvadrik
Copy link
Owner

skvadrik commented Nov 5, 2017

Ok, what re2c options are you using?

@sergeyklay
Copy link
Collaborator Author

--no-generation-date -o scanner.c scanner.re

@skvadrik
Copy link
Owner

skvadrik commented Nov 5, 2017

One of the differences between 0.16 and 1.0.2 is in handling yyaccept. This I was able to bisect down to f51403d. I cannot be completely sure that this is the breaking change though, since I don't have a minimal test case for it.

@skvadrik
Copy link
Owner

skvadrik commented Nov 5, 2017

Here are two files: parser1.cc generated before the (presumably) faulty commit and parser2.cc generated after it. Could you try building zephyr with both files and confirm the difference?

parser1.cc.txt
parser2.cc.txt

@sergeyklay
Copy link
Collaborator Author

Both tests was passed successfully.

@sergeyklay
Copy link
Collaborator Author

sergeyklay commented Nov 5, 2017

Just to you know. Such ugly patch:

DCOMMENT = ("/**"([^*]+|[*]+[^/*])*[*]*"*/");
DCOMMENT {
++  if (q[0] == '/') {
++      q++;
++  }
    token->opcode = XX_T_COMMENT;
    token->value = estrndup(q, YYCURSOR - q - 1);
    token->len = YYCURSOR - q - 1;
    {
        int k, ch = s->active_char;
        for (k = 0; k < (token->len - 1); k++) {
            if (token->value[k] == '\n') {
                ch = 1;
                s->active_line++;
            } else {
                ch++;
            }
        }
        s->active_char = ch;
    }
    q = YYCURSOR;
    return 0;
}

solves all our problems.

I used this patched version called as "development" in Zephir tests. As you can see all tests with patched parser version was passed successfully: https://travis-ci.org/phalcon/zephir/builds/297509488. Yes, there are failed tests for non patched version.

Also I used local environment to tests Zephir Parser with:

https://github.com/phalcon/php-zephir-parser/blob/development/install-development

-- ${RE2C_BIN} --no-generation-date -o scanner.c scanner.re
++ cp ../parser1.cc.txt scanner.c

and

-- ${RE2C_BIN} --no-generation-date -o scanner.c scanner.re
++ cp ../parser2.cc.txt scanner.c

All tests of Zephir Parser was passed successfully.

@sergeyklay
Copy link
Collaborator Author

@skvadrik Both files has Generated by re2c 0.16 in the header. So it should be?

@skvadrik
Copy link
Owner

skvadrik commented Nov 5, 2017

Yes, they were generated with two intermediate re2c versions (between 0.16 and 1.0.2), different in one commit only. Now at least we know that the changes introduced by f51403d do not break anything.

I will try building php-zephir-parser, reproducing the actual failure and bisecting once again.

@sergeyklay
Copy link
Collaborator Author

Feel free to ask me in case of any issue with setting up local environment.

@skvadrik
Copy link
Owner

skvadrik commented Nov 5, 2017

Eh, it turns out f51403d is the faulty commit indeed. When I sent you the pre-generated files, I forgot to comment out your fix, so naturally you weren't able to spot the difference. :)

@skvadrik
Copy link
Owner

skvadrik commented Nov 5, 2017

So after some thinking I've come to the following conclusions.

The real problem seems to be in the use of local variable q. First, you use q to define YYMARKER. Then you use q for your own needs, probably as a lexeme start marker (but sometimes you use start instead of q). Note that re2c makes no guarantees on how YYMARKER is used: depending on the control flow in lexer the value of YYMARKER may be updated, or it may not be updated. In general, the value of YYMARKER is undefined from the user's perspective.

After f51403d re2c does more aggressive dead code elimination. In particular, it probably eliminates some useless writes to YYMARKER, which happens to break your code.

The fix would be to introduce a separate local variable for YYMARKER, and not to rely on internal re2c usage of it. Another local variable should be used to track lexeme start. Something along these lines (but the rest of the code should be updated, since q is no longer in sync with YYMARKER):

--- a/parser/scanner.re
+++ b/parser/scanner.re
@@ -18,9 +18,9 @@
 #define YYCURSOR (s->start)
 #define YYLIMIT (s->end)
-#define YYMARKER q
+#define YYMARKER qm

 int xx_get_token(xx_scanner_state *s, xx_scanner_token *token) {

-       char *q = YYCURSOR, *start = YYCURSOR;
+       char *q = YYCURSOR, *start = YYCURSOR, *qm;
        int status = XX_SCANNER_RETCODE_IMPOSSIBLE;
        int is_constant = 0, j;

Also, you may find it useful to run re2c with -W.

This is an unrelated issue, but in your grammar comments of the form /**/ won't be parsed as COMMENT, they will be parsed as the start of DCOMMENT and lexer will greedily eat all characters until it meets the closing */.

@sergeyklay
Copy link
Collaborator Author

Oh, good catch! I'll try to play around with this ASAP. Currently 00:30 AM here :)

@skvadrik
Copy link
Owner

skvadrik commented Nov 5, 2017

Sure, there's no rush. Now that I know that it's not a bug in re2c internal optimization algorithm, I don't need to fix anything. And you have a workaround for php-zephir-parser.

If you have any further troubles with re2c API, I'd be glad to help. I'm leaving the bug open until everything is sorted out.

@Izopi4a
Copy link

Izopi4a commented Nov 6, 2017

sorry for the offtopic, I just can not restrain myself
Skvadrik is officially the nicest person in GitHub

@skvadrik
Copy link
Owner

skvadrik commented Nov 6, 2017

Eh, things might have been different if it was a bug in re2c internals. :D

@sergeyklay
Copy link
Collaborator Author

Fixed. Thank you very much 👍

@skvadrik
Copy link
Owner

skvadrik commented Nov 8, 2017

Cool!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants