Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lexical Feedback #1158

Closed
wants to merge 30 commits into from
Closed

Lexical Feedback #1158

wants to merge 30 commits into from

Conversation

marcioAlmada
Copy link
Contributor

This is related to the "Context Sensitive Lexer" RFC: https://wiki.php.net/rfc/context_sensitive_lexer. This new W.I.P patch is an alternative implementation of #1054.

About this new implementation:

  • Has no regression | forward compatibility risks and is highly predictable
  • Has an very small footprint compared to the previous attempt involving a pure lexical approach
  • Is highly configurable, to make a word semi-reserved you only have to edit this inclusive parser rule.
  • Doesn't require compile time check

Tasks

  • Pass all tests
  • Keep %expect equals to 2
  • Port ext tokenizer
    • 30%
    • 60%
    • 90%
    • 100%
  • Review

@ghost
Copy link

ghost commented Mar 7, 2015

Can one of the admins verify this patch?

| simple_variable { $$ = zend_ast_create(ZEND_AST_VAR, $1); }
;

property_name:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not handle property names with the same system?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because property names have sigils and therefore no naming limitations, if you handle it the same way as the other member names they will be restricted by the inclusive SEMI_RESERVED rule https://github.com/php/php-src/pull/1158/files#diff-7eff82c2c5b45db512a9dc49fb990bb8R273

I'm sure you don't want that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I forgot that method names have a more limited set of allowed keywords :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I found a way to uncomment https://github.com/php/php-src/pull/1158/files#diff-7eff82c2c5b45db512a9dc49fb990bb8R280 and have these words semi reserved too, but that would require that people aliasing methods that clash with method modifiers names to use the verbose syntax. Ex:

trait TraitA {
   private function private(){}
}

Class A {
    use TraitA {
        private as protected protected;
    }
}

I'm leaving this out of the patch though, as it has to be compatible with the older one.


identifier:
T_STRING { $$ = $1; }
| /* if */ SEMI_RESERVED { REWIND } /* and rematch as */ T_STRING { $$ = $3; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to get the string representation of keyword without "rewind" and "rematch".
If something like the following may work, you won't need to change lexer at all.

identifier:
T_STRING { $$ = $1; }
| SEMI_RESERVED { zval zv; ZVAL_STRINGL(&zv, LANG_SCNG(yy_text), LANG_SCNG(yy_leng)); $$ = zend_ast_create_zval(&zv); }
;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a great idea because we don't waste time interacting with the lexer twice anymore. On the other side I was doing the ext tokenizer port based on this lexical feedback and now the lexer is completely unaware of context again.

I'll use this anyway and will try to find another solution on the tokenizer extension side again. Thanks ^^

@marcioAlmada
Copy link
Contributor Author

Just a heads up: this PR is still active and soon we'll have updates.

@marcioAlmada marcioAlmada force-pushed the lex-feedback branch 2 times, most recently from 1580d0d to bfd0279 Compare March 22, 2015 03:18
The implementation has no regression risks, has an even smaller footprint
compared to the previous attempt involving a pure lexical approach, is higly
predictable and higly configurable.

To turn a word semi-reserved you only need to edit the "SEMI_RESERVED" parser rule,
it's an inclusive list of all the words that should be matched as T_STRING on specific contexts.
Example:

```
method_modifiers function returns_ref indentifier '(' parameter_list ')' ...
```

instead of:

```
method_modifiers function returns_ref T_STRING '(' parameter_list ')' ...
```

TODO: port ext tokenizer
The class_name_scalar rule was removed from grammar: "::class" is now just
a reserved case insensitive class const.

The lexer became completely unaware about the semi-reserved words context.
…calar back in grammar

That was clearly not the best way to achieve it, let's reserve T_CLASS for now.
Turning it a reserved case insensitive constant was a bad idea because there
are many static analyzers out there relying on ::class as a language construct,
we can't simply "downgrade" it to a special constant.

It's no big deal, reverting this bit is fine and won't have big impacts
on the final result.

This reverts commit c931644.
@marcioAlmada marcioAlmada force-pushed the lex-feedback branch 2 times, most recently from 9ae97c2 to 127c949 Compare March 28, 2015 02:55
@marcioAlmada
Copy link
Contributor Author

FYI, I'll be waiting for the anonymous classes patch to be merged before continue.

@jrnickell
Copy link

@marcioAlmada will this support methods named empty()? It looks like you have empty included in the tokenizer, but I don't see it listed in the RFC or your tests here.

@marcioAlmada
Copy link
Contributor Author

@jrnickell absolutely. Looks like I forgot empty on the tests. The following works:

php -r "class X { static function empty(){ return true; } } var_dump(X::empty());"
# bool(true)
php -r "class X { function empty(){ return true; } } var_dump((new X)->empty());"
# bool(true)
php -r "class X { const EMPTY = true; } var_dump(X::EMPTY);"
# bool(true)

Good catch, the tests will be updated soon.

@marcioAlmada
Copy link
Contributor Author

@jrnickell 66c17c0

@jrnickell
Copy link

@marcioAlmada awesome! Thanks

@marcioAlmada marcioAlmada force-pushed the lex-feedback branch 2 times, most recently from 9d6fcb1 to a0ef2bd Compare April 4, 2015 10:41
@marcioAlmada marcioAlmada mentioned this pull request Apr 5, 2015
7 tasks
@marcioAlmada
Copy link
Contributor Author

I'm also working on this other experiment #1221 as we still have some time.

@marcioAlmada
Copy link
Contributor Author

Closed in favor of #1221.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants