Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHP 8 | Tokenizer/PHP: add support for match expressions #3226

Merged
merged 6 commits into from
Feb 23, 2021

Conversation

jrfnl
Copy link
Contributor

@jrfnl jrfnl commented Feb 22, 2021

TL;DR

PHP 8.0 introduces a new type of control structure: match expressions.

A match expression is similar to a switch control structure but with safer semantics and the ability to return values.

Ref: https://wiki.php.net/rfc/match_expression_v2

This PR adds support for PHP 8.0 match expressions to PHPCS by backfilling the token for older PHP versions.
It also retokenizes the default keyword when used in match expressions to T_MATCH_DEFAULT and the double arrow separating the "case" from the return expression to T_MATCH_ARROW.

Note: if this PR is accepted as is, then, in contrast to the switch control structure, "cases" (including a potential default case) in match expressions will not be considered scope owners and will not have any scope_* array indexes set.

The match structure itself is considered a scope owner and will have those indexes set.

Includes extensive unit tests for all aspects of the PR.

Fixes #3037

Commit Details

PHP 8.0 | Tokens: replicate the T_MATCH token

PHP 8.0 | Tokens: add T_MATCH to $parenthesisOpeners and $scopeOpener

PHP 8.0 | Tokenizer/PHP: backfill the T_MATCH tokenization

This commit adds initial support for match expressions to PHPCS.

  • In PHP < 8: Retokenizes T_STRING tokens containing the match keyword to T_MATCH when they are in actual fact match expressions.
  • In PHP 8: Retokenizes T_MATCH tokens to T_STRING when the match keyword is used outside the context of a match expression, like in a method declaration or call.
  • Ensures that the match keyword for match expressions will be recognized as a scope owner and that the appropriate scope_* array indexes are set for the curly braces belonging to the match expression, as well as that the match condition is added to the tokens within the match expression.

Note: in contrast to switch control structures, "cases" in a match expression will not be recognized as scopes and no scope_owner, scope_opener or scope_closer array indexes will be set for the individual cases in a match expression.
This also applies to the default case when used in a match expression.

Includes extensive unit tests.
Note: at this point, not all unit tests will pass, this will be fixed in follow-on commits.

PHP 8.0 | Tokenizer/PHP: retokenize default keywords within match expressions as T_MATCH_DEFAULT

The default keyword in match expressions is tokenized as T_DEFAULT by PHP.

However, for switch control structures, default (and case) are treated as scope owners/openers by PHPCS.

If the default keyword tokenization in match expressions was left as-is, the Tokenizer::recurseScopeMap() would search for the wrong scope opener/closer, throwing the scope setting for scope owners in the rest of the file off.
Adding the typical opener/closers for match expression default cases to the PHP::$scopeOpeners array would not prevent this and once the scope setting is out of kilter, a lot of sniffs start failing, including the ScopeIndent sniffs, ScopeCloserBrace sniffs etc.

The only stable solution I could find to prevent the scope setting potentially getting borked and existing sniffs breaking badly, was to retokenize the default keyword when used in a match expression to T_MATCH_DEFAULT.

Includes a separate set of unit tests which specifically tests the tokenization of the default keyword and verifies both the retokenization to T_MATCH_DEFAULT when used in match expressions, as well as the tokenization of the default keyword within switch control structures, including the scope setting for the default case.

PHP 8.0 | Tokenizer/PHP: add support for match expressions in combination with arrow functions

As a match expression can be nested in an arrow function and visa versa, setting the scope openers/closers becomes interesting.

If the arrow function would be allowed to take over, it would break the scope setting for match expression, making the scope_closer and scope_condition indexes next to useless.

So in the interest of making things easier for sniff writers and keeping in mind that scope setting for arrow functions isn't perfect anyway, preference is given to keeping the scope setting for match structures intact.

This means that if a match expression is in the return value of an arrow function, like so:

$fn = fn($x) => match($y) {...};

... the token after the closing curly belonging to the match will be considered the scope closer for the arrow function. In this example, that is the ; (semicolon).

While, if an arrow function is the return value of one of the match expression cases, generally speaking the comma at the end of the case will be the scope closer for the arrow function.
There is one exception to this: the comma after each case is optional for the last case in a match expression.
In that situation, the last non-empty token will be considered the scope closer for the arrow function.

Example:

$match = match($y) {
    default => fn($x) => $y * $x
};

In this example, the $x at the end of the arrow expression would be considered the scope closer for the arrow function.

$match = match($y) {
    default => fn($x) => ($y * $x)
};

And in this example, the parenthesis closer at the end of the arrow expression would be considered the scope closer for the arrow function.

Tokenizer/PHP: retokenize the match double arrow to T_MATCH_ARROW

The double arrow in PHP is used in a number of contexts:

  • Long/short arrays with keys;
  • Long/short lists with keys;
  • In the as part of foreach() statements with keys;
  • For yield statements with keys;
  • For arrow functions as the scope opener;
  • And now for match expressions to separate the cases from the return value (body).

As most of these constructs can be nested in each other - an arrow function in an array value, a match expression in a list key -, every sniff handling any of these constructs has to take a lot of care when searching for the double arrow for the construct they are handling, to prevent matching a double arrow belonging to another type of construct nested in the target construct.

This type of detection and special handling has to be done in each individual sniff which in one way or another has to deal with the T_DOUBLE_ARROW token and can cause quite some processing overhead.

With that in mind, the double arrow as a scope opener for arrow functions has previously already been retokenized to T_FN_ARROW.

Following the same reasoning, I'm proposing to retokenize the double arrow which separates match case expressions from the body expression to T_MATCH_ARROW.

This should make life easier for any sniff dealing with any of the above constructs and will prevent potential false positives being introduced for sniffs currently handling any of these constructs, but not yet updated to allow for match expressions.

Includes a set of dedicated unit tests verifying the tokenization of the double arrow operator in all currently supported contexts, including in combined (nested) contexts.

PHP 8.0 introduces a new type of control structure: match expressions.
> A match expression is similar to a `switch` control structure but with safer semantics and the ability to return values.

Ref: https://wiki.php.net/rfc/match_expression_v2

This commit adds initial support for match expressions to PHPCS.

* In PHP < 8: Retokenizes `T_STRING` tokens containing the `match` keyword to `T_MATCH` when they are in actual fact match expressions.
* In PHP 8: Retokenizes `T_MATCH` tokens to `T_STRING` when the `match` keyword is used outside the context of a match expression, like in a method declaration or call.
* Ensures that the `match` keyword for match expressions will be recognized as a scope owner and that the appropriate `scope_*` array indexes are set for the curly braces belonging to the match expression, as well as that the `match` condition is added to the tokens within the match expression.

Note: in contrast to `switch` control structures, "cases" in a match expression will not be recognized as scopes and no `scope_owner`, `scope_opener` or `scope_closer` array indexes will be set for the individual cases in a match expression.
This also applies to the `default` case when used in a match expression.

Includes extensive unit tests.
Note: at this point, not all unit tests will pass, this will be fixed in follow-on commits.
…xpressions as `T_MATCH_DEFAULT`

The `default` keyword in match expressions is tokenized as `T_DEFAULT` by PHP.

However, for `switch` control structures, `default` (and `case`) are treated as scope owners/openers by PHPCS.

If the `default` keyword tokenization in match expressions was left as-is, the `Tokenizer::recurseScopeMap()` would search for the wrong scope opener/closer, throwing the scope setting for scope owners in the rest of the file off.
Adding the typical opener/closers for match expression `default` cases to the `PHP::$scopeOpeners` array would not prevent this and once the scope setting is out of kilter, a lot of sniffs start failing, including the `ScopeIndent` sniffs, `ScopeCloserBrace` sniffs etc.

The only stable solution I could find to prevent the scope setting potentially getting borked and existing sniffs breaking badly, was to retokenize the `default` keyword when used in a `match` expression to `T_MATCH_DEFAULT`.

Includes a separate set of unit tests which specifically tests the tokenization of the `default` keyword and verifies both the retokenization to `T_MATCH_DEFAULT` when used in `match` expressions, as well as the tokenization of the `default` keyword within `switch` control structures, including the scope setting for the `default` case.
…tion with arrow functions

As a match expression can be nested in an arrow function and visa versa, setting the scope openers/closers becomes _interesting_.

If the arrow function would be allowed to take over, it would break the scope setting for match expression, making the `scope_closer` and `scope_condition` indexes next to useless.

So in the interest of making things easier for sniff writers and keeping in mind that scope setting for arrow functions isn't perfect anyway, preference is given to keeping the scope setting for match structures intact.

This means that if a match expression is in the return value of an arrow function, like so:
```php
$fn = fn($x) => match($y) {...};
```
... the token _after_ the closing curly belonging to the `match` will be considered the scope closer for the arrow function. In this example, that is the `;` (semicolon).

While, if an arrow function is the return value of one of the match expression cases, generally speaking the comma at the end of the case will be the scope closer for the arrow function.
There is one exception to this: the comma after each case is optional for the last case in a match expression.
In that situation, the last non-empty token will be considered the scope closer for the arrow function.

Example:
```php
$match = match($y) {
    default => fn($x) => $y * $x
};
```
In this example, the `$x` at the end of the arrow expression would be considered the scope closer for the arrow function.

```php
$match = match($y) {
    default => fn($x) => ($y * $x)
};
```
And in this example, the parenthesis closer at the end of the arrow expression would be considered the scope closer for the arrow function.
The double arrow in PHP is used in a number of contexts:
* Long/short arrays with keys;
* Long/short lists with keys;
* In the `as` part of `foreach()` statements with keys;
* For `yield` statements with keys;
* For arrow functions as the scope opener;
* And now for `match` expressions to separate the cases from the return value (body).

As most of these constructs can be nested in each other - an arrow function in an array value, a match expression in a list key -, every sniff handling any of these constructs has to take a lot of care when searching for the double arrow for the construct they are handling, to prevent matching a double arrow belonging to another type of construct nested in the target construct.

This type of detection and special handling has to be done in each individual sniff which in one way or another has to deal with the `T_DOUBLE_ARROW` token and can cause quite some processing overhead.

With that in mind, the double arrow as a scope opener for arrow functions has previously already been retokenized to `T_FN_ARROW`.

Following the same reasoning, I'm proposing to retokenize the double arrow which separates `match` case expressions from the body expression to `T_MATCH_ARROW`.

This should make life easier for any sniff dealing with any of the above constructs and will prevent potential false positives being introduced for sniffs currently handling any of these constructs, but not yet updated to allow for match expressions.

Includes a set of dedicated unit tests verifying the tokenization of the double arrow operator in all currently supported contexts, including in combined (nested) contexts.
@gsherwood
Copy link
Member

I have not gone through the code yet, but 100% agree with the new T_MATCH_ARROW and T_MATCH_DEFAULT tokens. Makes perfect sense the way you've described it all.

Also, match expressions mixed with arrow functions 🤯 but all the solutions you laid out sound very logical to me.

Will finish the review soon and get this merged in so you can pull those sniff changes.

@jrfnl
Copy link
Contributor Author

jrfnl commented Feb 22, 2021

Appreciated and I'm glad to hear you agree with the premises on which I have based the PR.

@jrfnl
Copy link
Contributor Author

jrfnl commented Feb 22, 2021

Also if anyone would still be in doubt: have a good look through the (often very convoluted) code in the unit test case files. Aside from the few cases where I've explicitly annotated it, all the code in the test case files is valid code in PHP 8+...

@jrfnl
Copy link
Contributor Author

jrfnl commented Feb 22, 2021

@gsherwood Oh and before I forget: there are a few cases in the backfillMatchToken test case file - testNoMatchNamespacedFunctionCall and testNoMatchNamespaceOperatorFunctionCall - which will tokenize differently in PHPCS 4.x, so will probably need to be removed when you cherrypick the commits to 4.x.

gsherwood added a commit that referenced this pull request Feb 23, 2021
@gsherwood gsherwood merged commit f073df4 into squizlabs:master Feb 23, 2021
gsherwood added a commit that referenced this pull request Feb 23, 2021
@gsherwood gsherwood added this to Idea Bank in PHPCS v3 Development via automation Feb 23, 2021
@gsherwood gsherwood added this to the 3.6.0 milestone Feb 23, 2021
@gsherwood
Copy link
Member

Thank you! 🎉 Amazing to have support for match expressions in PHPCS.

There are some fun code snippets in those tests :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

PHP 8.0 | Add support for match expressions
2 participants