Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHP 8.0 | Add support for named function call arguments #3178

Conversation

jrfnl
Copy link
Contributor

@jrfnl jrfnl commented Nov 28, 2020

PHP 8.0 introduces named function call parameters:

array_fill(start_index: 0, num: 100, value: 50);

// Using reserved keywords as names is allowed.
array_foobar(array: $array, switch: $switch, class: $class);

Ref: https://wiki.php.net/rfc/named_params

This PR adds support to PHPCS for named function call arguments by adding a special custom token T_PARAM_NAME and tokenizing the labels in function calls using named arguments to that new token, as per the proposal in #3159.

I also ensured that the colon after a parameter label is always tokenized as T_COLON.

Includes some minor efficiency fixes to the code which deals with the colon vs inline else determination as there is no need to run the "is this a return type" or the "is this a case statement" checks if it has already been established that the colon is a colon and not an inline else.

Includes a ridiculous amount of unit tests to safeguard the correct tokenization of both the parameter label as well as the colon after it (and potential inline else colons in the same statement).
Please also see my comment about this here: #3159 (comment)

Note: The only code samples I could come up with which would result in "incorrect" tokenization to T_PARAM_NAME are all either parse errors or compile errors. I've elected to let those tokenize as T_PARAM_NAME anyway as:

  1. When there is a parse error/compile error, there will be more tokenizer issues anyway, so working around those cases seems redundant.
  2. The code will at least tokenize consistently (the same) across PHP versions. (which wasn't the case for parse errors/compile errors with numeric literals or arrow functions, which is why they needed additional safeguards previously).

Fixes #3159

@jrfnl jrfnl force-pushed the php-8/backfill-named-function-call-parameters-tokenization branch from bd3569c to da5156d Compare November 28, 2020 16:28
PHP 8.0 introduces named function call parameters:
```php
array_fill(start_index: 0, num: 100, value: 50);

// Using reserved keywords as names is allowed.
array_foobar(array: $array, switch: $switch, class: $class);
```

Ref: https://wiki.php.net/rfc/named_params

This PR adds support to PHPCS for named function call arguments by adding a special custom token `T_PARAM_NAME` and tokenizing the _labels_ in function calls using named arguments to that new token, as per the proposal in 3159.

I also ensured that the colon _after_ a parameter label is always tokenized as `T_COLON`.

Includes some minor efficiency fixes to the code which deals with the colon vs inline else determination as there is no need to run the "is this a return type" or the "is this a `case` statement" checks if it has already been established that the colon is a colon and not an inline else.

Includes a ridiculous amount of unit tests to safeguard the correct tokenization of both the parameter label as well as the colon after it (and potential inline else colons in the same statement).
Please also see my comment about this here: squizlabs#3159 (comment)

**Note**: The only code samples I could come up with which would result in "incorrect" tokenization to `T_PARAM_NAME` are all either parse errors or compile errors. I've elected to let those tokenize as `T_PARAM_NAME` anyway as:
1. When there is a parse error/compile error, there will be more tokenizer issues anyway, so working around those cases seems redundant.
2. The code will at least tokenize consistently (the same) across PHP versions. (which wasn't the case for parse errors/compile errors with numeric literals or arrow functions, which is why they needed additional safeguards previously).

Fixes 3159
*/

if ($tokenIsArray === true
&& preg_match('`^[a-zA-Z_\x80-\xff]`', $token[1]) === 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried about running a preg_match on so many tokens, and then the following loop to find the next non-empty looking for a colon.

I probably would have tried tackling this problem starting with the colon and looking backwards, although I know that's going to be complex due to the different ways it can be used. I possibly would have also tried in processAdditional, but that might have been even more complex to unravel.

I'm curious to know if you tried any alternative implementations before I go and play around with the code to see how it's working.

Copy link
Contributor Author

@jrfnl jrfnl Dec 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried about running a preg_match on so many token

Well, as regexes go, this is as fast you can get: a regex checking for one character on a string of one character, so no chance of backtracing at all, it either matches or it doesn't.
The alternative would be to have a list of the characters we want to exclude and do either a strpos() or isset() or something, but then that would break straight away if a new special character would get meaning in PHP. It would definitely be less stable.

I probably would have tried tackling this problem starting with the colon and looking backward

That was my first choice for token to search for, but then the "final token" for the previous effective token (which is the one we need to change) may already have been set in a previous loop, and what with whitespace and comments allowed everywhere, that seemed like it was going to be pretty complicated as I'd then need to start walking the $finalTokens to potentially change a previously set "$finalToken".

I possibly would have also tried in processAdditional, but that might have been even more complex to unravel.

I consider that as well, but that would have been too late to prevent misidentified ternary "else" tokens and trying to recreate the whole if/else determination in processAdditional() after the fact, seemed like it would cause a huge overhead, aside from it probably not being stable. We've seen enough bugs with misidentified "inline else"'s over the years, especially with the colon continuously being used for more syntaxes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking - I could possibly make the regex condition a "T_STRING or matching the regex" condition, that way the most common token for identifiers would not be run through the regex, which would possibly be slightly faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes, I thought long and hard before deciding on this way to do it, considered sniffing based on the ( or , before it too. Every time I tried, there was something which would make it less stable or more complex to set the final token, which is why I ended up with this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed you'd thought through those options, but just wanted to check before I dug into it. Bypassing the regex for T_STRING tokens is a good idea - I might give it a go. The extensive unit tests make this very easy to play around with, so thank you so much for coming up with all those test cases.

gsherwood added a commit that referenced this pull request Jan 14, 2021
@gsherwood gsherwood merged commit 0859719 into squizlabs:master Jan 14, 2021
PHPCS v3 Development automation moved this from Idea Bank to Ready for Release Jan 14, 2021
@gsherwood
Copy link
Member

Thanks so much for this. I did end up putting in your suggestion for checking for T_STRING before doing the regex as I could see a minor performance improvement overall in my test runs.

@jrfnl jrfnl deleted the php-8/backfill-named-function-call-parameters-tokenization branch January 14, 2021 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
PHPCS v3 Development
Ready for Release
Development

Successfully merging this pull request may close these issues.

PHP 8.0 | Add support for named function call parameters
2 participants