Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHP 8.0 | Add support for named function call parameters #3159

Closed
jrfnl opened this issue Nov 7, 2020 · 5 comments · Fixed by #3178
Closed

PHP 8.0 | Add support for named function call parameters #3159

jrfnl opened this issue Nov 7, 2020 · 5 comments · Fixed by #3178
Milestone

Comments

@jrfnl
Copy link
Contributor

jrfnl commented Nov 7, 2020

Introduction

PHP 8.0 introduces named function call parameters:

array_fill(start_index: 0, num: 100, value: 50);

// Using reserved keywords as names is allowed.
array_foobar(array: $array, switch: $switch, class: $class);

Ref: https://wiki.php.net/rfc/named_params

Current tokenization

PHP 8.0 natively tokenizes the parameter label as T_STRING. This is the same in PHP 7.4.
When a reserved keyword is used as a parameter label, the tokenizer will tokenize it as the token for the reserved keyword.

When run through the PHPCS tokenizer, the parameter label, in most cases, will currently be incorrectly retokenized as T_GOTO_LABEL.
The reserved keyword tokenization as the reserved keyword token remains the same in most cases, with a few exceptions where the token is retokenized as T_STRING.

The : will currently tokenize to T_COLON or T_INLINE_ELSE in PHPCS, depending on the wider context of the code sample.

Proposal for handling this in PHPCS

I'm currently looking into adding support for named function call parameters to PHPCS.

Based on my initial findings, I'd like to propose the following:

  • Introduce a new PHPCS native token constant T_PARAM_NAME (or T_PARAM_NAME_LABEL or T_PARAM_LABEL) and consistently tokenize the parameter label as such.
  • Ensure that the : following a parameter label is always tokenized as T_COLON.

Reasoning:

  • The new T_PARAM_NAME token will make it a lot easier to prevent confusion in the ternary tokenizer and can prevent the colon from being misidenitified as T_INLINE_ELSE.
  • T_STRING is already used for a lot of things and often enough it is hard to determine what a T_STRING signifies.
    Without the new token, any sniff looking for T_STRING would need to be checked and possibly adjusted to take parameter name labels into account when determining if this is a T_STRING the sniff should handle.
    With the new token, existing sniffs looking for T_STRING would not need any adjustments and would continue to work as they are.
  • For sniffs looking for the T_COLON token, with the new T_PARAM_NAME token, it will be easy to determine the context of the T_COLON.

@gsherwood What's your opinion on this proposal ?

Affected sniffs

I haven't made a full inventory yet of sniffs which will be affected.

At the very least, I imagine any and all sniffs which examine and fix "function call argument spacing" will need to be reviewed and adjusted to allow for parameter name labels.

@jrfnl
Copy link
Contributor Author

jrfnl commented Nov 7, 2020

Little addition: I only just noticed that the name + the colon for a goto label are joined together into the T_GOTO_LABEL token. We could make the choice to do the same for the parameter name labels.

[Edit]

Probably better not to as whitespace and comments are technically allowed between the label and the colon, which would be awkward to handle if they were joined to be one token.

Note: I suspect this is also allowed for goto labels, which would make the current tokenization a bug. Goto is not used very often in PHP, so not surprising that there have never been bug reports about this.
If we'd want to fix the goto label bug by splitting it into two tokens - the label and the colon -, I'd suggest doing so in PHPCS 4.x as that would be a breaking change.

@gsherwood
Copy link
Member

I agree with everything in this proposal. A new T_PARAM_NAME token, with the T_COLON tokenized separately sounds like the best solution.

I'll have a look into that T_GOTO token and throw something in the 4.0 roadmap for it. Thanks for raising that.

@jrfnl
Copy link
Contributor Author

jrfnl commented Nov 9, 2020

@gsherwood Thanks for the feedback. I'll continue work on this along those lines.

@jrfnl
Copy link
Contributor Author

jrfnl commented Nov 28, 2020

Okay, so I've finished the work on this. The PR would conflict with the open PR goto fix PR #3177 for the package.xml file, but I can pull it anyway and rebase either one depending on which gets merged first.

I've written an abundance of unit tests into redundancy to accompany the PR, but haven't been able to come up with a code sample which would break the pattern I've used to detect named parameters in the PHP tokenizer class, which is basically: "open parenthesis or comma, followed by a word, followed by a colon (disregarding whitespace and comments)".

If anyone can think of a code sample with valid PHP code which uses that pattern, but isn't a named parameter, please leave a comment here.

jrfnl added a commit to jrfnl/PHP_CodeSniffer that referenced this issue Nov 28, 2020
PHP 8.0 introduces named function call parameters:
```php
array_fill(start_index: 0, num: 100, value: 50);

// Using reserved keywords as names is allowed.
array_foobar(array: $array, switch: $switch, class: $class);
```

Ref: https://wiki.php.net/rfc/named_params

This PR adds support to PHPCS for named function call arguments by adding a special custom token `T_PARAM_NAME` and tokenizing the _labels_ in function calls using named arguments to that new token, as per the proposal in 3159.

I also ensured that the colon _after_ a parameter label is always tokenized as `T_COLON`.

Includes some minor efficiency fixes to the code which deals with the colon vs inline else determination as there is no need to run the "is this a return type" or the "is this a `case` statement" checks if it has already been established that the colon is a colon and not an inline else.

Includes a ridiculous amount of unit tests to safeguard the correct tokenization of both the parameter label as well as the colon after it (and potential inline else colons in the same statement).
Please also see my comment about this here: squizlabs#3159 (comment)

**Note**: The only code samples I could come up with which would result in "incorrect" tokenization to `T_PARAM_NAME` are all either parse errors or compile errors. I've elected to let those tokenize as `T_PARAM_NAME` anyway as:
1. When there is a parse error/compile error, there will be more tokenizer issues anyway, so working around those cases seems redundant.
2. The code will at least tokenize consistently (the same) across PHP versions. (which wasn't the case for parse errors/compile errors with numeric literals or arrow functions, which is why they needed additional safeguards previously).

Fixes 3159
jrfnl added a commit to jrfnl/PHP_CodeSniffer that referenced this issue Nov 28, 2020
PHP 8.0 introduces named function call parameters:
```php
array_fill(start_index: 0, num: 100, value: 50);

// Using reserved keywords as names is allowed.
array_foobar(array: $array, switch: $switch, class: $class);
```

Ref: https://wiki.php.net/rfc/named_params

This PR adds support to PHPCS for named function call arguments by adding a special custom token `T_PARAM_NAME` and tokenizing the _labels_ in function calls using named arguments to that new token, as per the proposal in 3159.

I also ensured that the colon _after_ a parameter label is always tokenized as `T_COLON`.

Includes some minor efficiency fixes to the code which deals with the colon vs inline else determination as there is no need to run the "is this a return type" or the "is this a `case` statement" checks if it has already been established that the colon is a colon and not an inline else.

Includes a ridiculous amount of unit tests to safeguard the correct tokenization of both the parameter label as well as the colon after it (and potential inline else colons in the same statement).
Please also see my comment about this here: squizlabs#3159 (comment)

**Note**: The only code samples I could come up with which would result in "incorrect" tokenization to `T_PARAM_NAME` are all either parse errors or compile errors. I've elected to let those tokenize as `T_PARAM_NAME` anyway as:
1. When there is a parse error/compile error, there will be more tokenizer issues anyway, so working around those cases seems redundant.
2. The code will at least tokenize consistently (the same) across PHP versions. (which wasn't the case for parse errors/compile errors with numeric literals or arrow functions, which is why they needed additional safeguards previously).

Fixes 3159
jrfnl added a commit to jrfnl/PHP_CodeSniffer that referenced this issue Nov 28, 2020
PHP 8.0 introduces named function call parameters:
```php
array_fill(start_index: 0, num: 100, value: 50);

// Using reserved keywords as names is allowed.
array_foobar(array: $array, switch: $switch, class: $class);
```

Ref: https://wiki.php.net/rfc/named_params

This PR adds support to PHPCS for named function call arguments by adding a special custom token `T_PARAM_NAME` and tokenizing the _labels_ in function calls using named arguments to that new token, as per the proposal in 3159.

I also ensured that the colon _after_ a parameter label is always tokenized as `T_COLON`.

Includes some minor efficiency fixes to the code which deals with the colon vs inline else determination as there is no need to run the "is this a return type" or the "is this a `case` statement" checks if it has already been established that the colon is a colon and not an inline else.

Includes a ridiculous amount of unit tests to safeguard the correct tokenization of both the parameter label as well as the colon after it (and potential inline else colons in the same statement).
Please also see my comment about this here: squizlabs#3159 (comment)

**Note**: The only code samples I could come up with which would result in "incorrect" tokenization to `T_PARAM_NAME` are all either parse errors or compile errors. I've elected to let those tokenize as `T_PARAM_NAME` anyway as:
1. When there is a parse error/compile error, there will be more tokenizer issues anyway, so working around those cases seems redundant.
2. The code will at least tokenize consistently (the same) across PHP versions. (which wasn't the case for parse errors/compile errors with numeric literals or arrow functions, which is why they needed additional safeguards previously).

Fixes 3159
jrfnl added a commit to jrfnl/PHP_CodeSniffer that referenced this issue Nov 28, 2020
PHP 8.0 introduces named function call parameters:
```php
array_fill(start_index: 0, num: 100, value: 50);

// Using reserved keywords as names is allowed.
array_foobar(array: $array, switch: $switch, class: $class);
```

Ref: https://wiki.php.net/rfc/named_params

This PR adds support to PHPCS for named function call arguments by adding a special custom token `T_PARAM_NAME` and tokenizing the _labels_ in function calls using named arguments to that new token, as per the proposal in 3159.

I also ensured that the colon _after_ a parameter label is always tokenized as `T_COLON`.

Includes some minor efficiency fixes to the code which deals with the colon vs inline else determination as there is no need to run the "is this a return type" or the "is this a `case` statement" checks if it has already been established that the colon is a colon and not an inline else.

Includes a ridiculous amount of unit tests to safeguard the correct tokenization of both the parameter label as well as the colon after it (and potential inline else colons in the same statement).
Please also see my comment about this here: squizlabs#3159 (comment)

**Note**: The only code samples I could come up with which would result in "incorrect" tokenization to `T_PARAM_NAME` are all either parse errors or compile errors. I've elected to let those tokenize as `T_PARAM_NAME` anyway as:
1. When there is a parse error/compile error, there will be more tokenizer issues anyway, so working around those cases seems redundant.
2. The code will at least tokenize consistently (the same) across PHP versions. (which wasn't the case for parse errors/compile errors with numeric literals or arrow functions, which is why they needed additional safeguards previously).

Fixes 3159
@jrfnl
Copy link
Contributor Author

jrfnl commented Nov 28, 2020

Affected sniffs

I haven't made a full inventory yet of sniffs which will be affected.

At the very least, I imagine any and all sniffs which examine and fix "function call argument spacing" will need to be reviewed and adjusted to allow for parameter name labels.

I've ran every build-in standard against the test case file which is added in PR #3178 and I have not been able to find any sniff which will need adjusting for named parameters.

All sniffs still seem to work as expected.

There are three sniffs for which adding some unit tests could be considered, though their behaviour is fine as-is and the unit tests would only go to confirm and safeguard that:

  • Generic.Functions.FunctionCallArgumentSpacing
  • PEAR.Functions.FunctionCallSignature
  • PSR2.Functions.FunctionCallSignature

@gsherwood gsherwood added this to the 3.6.0 milestone Dec 11, 2020
gsherwood pushed a commit that referenced this issue Jan 14, 2021
PHP 8.0 introduces named function call parameters:
```php
array_fill(start_index: 0, num: 100, value: 50);

// Using reserved keywords as names is allowed.
array_foobar(array: $array, switch: $switch, class: $class);
```

Ref: https://wiki.php.net/rfc/named_params

This PR adds support to PHPCS for named function call arguments by adding a special custom token `T_PARAM_NAME` and tokenizing the _labels_ in function calls using named arguments to that new token, as per the proposal in 3159.

I also ensured that the colon _after_ a parameter label is always tokenized as `T_COLON`.

Includes some minor efficiency fixes to the code which deals with the colon vs inline else determination as there is no need to run the "is this a return type" or the "is this a `case` statement" checks if it has already been established that the colon is a colon and not an inline else.

Includes a ridiculous amount of unit tests to safeguard the correct tokenization of both the parameter label as well as the colon after it (and potential inline else colons in the same statement).
Please also see my comment about this here: #3159 (comment)

**Note**: The only code samples I could come up with which would result in "incorrect" tokenization to `T_PARAM_NAME` are all either parse errors or compile errors. I've elected to let those tokenize as `T_PARAM_NAME` anyway as:
1. When there is a parse error/compile error, there will be more tokenizer issues anyway, so working around those cases seems redundant.
2. The code will at least tokenize consistently (the same) across PHP versions. (which wasn't the case for parse errors/compile errors with numeric literals or arrow functions, which is why they needed additional safeguards previously).

Fixes 3159
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants