-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHP 8 | Tokenizer/PHP: add support for match expressions #3226
PHP 8 | Tokenizer/PHP: add support for match expressions #3226
Conversation
PHP 8.0 introduces a new type of control structure: match expressions. > A match expression is similar to a `switch` control structure but with safer semantics and the ability to return values. Ref: https://wiki.php.net/rfc/match_expression_v2 This commit adds initial support for match expressions to PHPCS. * In PHP < 8: Retokenizes `T_STRING` tokens containing the `match` keyword to `T_MATCH` when they are in actual fact match expressions. * In PHP 8: Retokenizes `T_MATCH` tokens to `T_STRING` when the `match` keyword is used outside the context of a match expression, like in a method declaration or call. * Ensures that the `match` keyword for match expressions will be recognized as a scope owner and that the appropriate `scope_*` array indexes are set for the curly braces belonging to the match expression, as well as that the `match` condition is added to the tokens within the match expression. Note: in contrast to `switch` control structures, "cases" in a match expression will not be recognized as scopes and no `scope_owner`, `scope_opener` or `scope_closer` array indexes will be set for the individual cases in a match expression. This also applies to the `default` case when used in a match expression. Includes extensive unit tests. Note: at this point, not all unit tests will pass, this will be fixed in follow-on commits.
…xpressions as `T_MATCH_DEFAULT` The `default` keyword in match expressions is tokenized as `T_DEFAULT` by PHP. However, for `switch` control structures, `default` (and `case`) are treated as scope owners/openers by PHPCS. If the `default` keyword tokenization in match expressions was left as-is, the `Tokenizer::recurseScopeMap()` would search for the wrong scope opener/closer, throwing the scope setting for scope owners in the rest of the file off. Adding the typical opener/closers for match expression `default` cases to the `PHP::$scopeOpeners` array would not prevent this and once the scope setting is out of kilter, a lot of sniffs start failing, including the `ScopeIndent` sniffs, `ScopeCloserBrace` sniffs etc. The only stable solution I could find to prevent the scope setting potentially getting borked and existing sniffs breaking badly, was to retokenize the `default` keyword when used in a `match` expression to `T_MATCH_DEFAULT`. Includes a separate set of unit tests which specifically tests the tokenization of the `default` keyword and verifies both the retokenization to `T_MATCH_DEFAULT` when used in `match` expressions, as well as the tokenization of the `default` keyword within `switch` control structures, including the scope setting for the `default` case.
…tion with arrow functions As a match expression can be nested in an arrow function and visa versa, setting the scope openers/closers becomes _interesting_. If the arrow function would be allowed to take over, it would break the scope setting for match expression, making the `scope_closer` and `scope_condition` indexes next to useless. So in the interest of making things easier for sniff writers and keeping in mind that scope setting for arrow functions isn't perfect anyway, preference is given to keeping the scope setting for match structures intact. This means that if a match expression is in the return value of an arrow function, like so: ```php $fn = fn($x) => match($y) {...}; ``` ... the token _after_ the closing curly belonging to the `match` will be considered the scope closer for the arrow function. In this example, that is the `;` (semicolon). While, if an arrow function is the return value of one of the match expression cases, generally speaking the comma at the end of the case will be the scope closer for the arrow function. There is one exception to this: the comma after each case is optional for the last case in a match expression. In that situation, the last non-empty token will be considered the scope closer for the arrow function. Example: ```php $match = match($y) { default => fn($x) => $y * $x }; ``` In this example, the `$x` at the end of the arrow expression would be considered the scope closer for the arrow function. ```php $match = match($y) { default => fn($x) => ($y * $x) }; ``` And in this example, the parenthesis closer at the end of the arrow expression would be considered the scope closer for the arrow function.
The double arrow in PHP is used in a number of contexts: * Long/short arrays with keys; * Long/short lists with keys; * In the `as` part of `foreach()` statements with keys; * For `yield` statements with keys; * For arrow functions as the scope opener; * And now for `match` expressions to separate the cases from the return value (body). As most of these constructs can be nested in each other - an arrow function in an array value, a match expression in a list key -, every sniff handling any of these constructs has to take a lot of care when searching for the double arrow for the construct they are handling, to prevent matching a double arrow belonging to another type of construct nested in the target construct. This type of detection and special handling has to be done in each individual sniff which in one way or another has to deal with the `T_DOUBLE_ARROW` token and can cause quite some processing overhead. With that in mind, the double arrow as a scope opener for arrow functions has previously already been retokenized to `T_FN_ARROW`. Following the same reasoning, I'm proposing to retokenize the double arrow which separates `match` case expressions from the body expression to `T_MATCH_ARROW`. This should make life easier for any sniff dealing with any of the above constructs and will prevent potential false positives being introduced for sniffs currently handling any of these constructs, but not yet updated to allow for match expressions. Includes a set of dedicated unit tests verifying the tokenization of the double arrow operator in all currently supported contexts, including in combined (nested) contexts.
I have not gone through the code yet, but 100% agree with the new Also, match expressions mixed with arrow functions 🤯 but all the solutions you laid out sound very logical to me. Will finish the review soon and get this merged in so you can pull those sniff changes. |
Appreciated and I'm glad to hear you agree with the premises on which I have based the PR. |
Also if anyone would still be in doubt: have a good look through the (often very convoluted) code in the unit test case files. Aside from the few cases where I've explicitly annotated it, all the code in the test case files is valid code in PHP 8+... |
@gsherwood Oh and before I forget: there are a few cases in the |
Thank you! 🎉 Amazing to have support for match expressions in PHPCS. There are some fun code snippets in those tests :) |
TL;DR
PHP 8.0 introduces a new type of control structure: match expressions.
Ref: https://wiki.php.net/rfc/match_expression_v2
This PR adds support for PHP 8.0 match expressions to PHPCS by backfilling the token for older PHP versions.
It also retokenizes the
default
keyword when used in match expressions toT_MATCH_DEFAULT
and the double arrow separating the "case" from the return expression toT_MATCH_ARROW
.Note: if this PR is accepted as is, then, in contrast to the
switch
control structure, "cases" (including a potentialdefault
case) in match expressions will not be considered scope owners and will not have anyscope_*
array indexes set.The match structure itself is considered a scope owner and will have those indexes set.
Includes extensive unit tests for all aspects of the PR.
Fixes #3037
Commit Details
PHP 8.0 | Tokens: replicate the
T_MATCH
tokenPHP 8.0 | Tokens: add T_MATCH to $parenthesisOpeners and $scopeOpener
PHP 8.0 | Tokenizer/PHP: backfill the
T_MATCH
tokenizationThis commit adds initial support for match expressions to PHPCS.
T_STRING
tokens containing thematch
keyword toT_MATCH
when they are in actual fact match expressions.T_MATCH
tokens toT_STRING
when thematch
keyword is used outside the context of a match expression, like in a method declaration or call.match
keyword for match expressions will be recognized as a scope owner and that the appropriatescope_*
array indexes are set for the curly braces belonging to the match expression, as well as that thematch
condition is added to the tokens within the match expression.Note: in contrast to
switch
control structures, "cases" in a match expression will not be recognized as scopes and noscope_owner
,scope_opener
orscope_closer
array indexes will be set for the individual cases in a match expression.This also applies to the
default
case when used in a match expression.Includes extensive unit tests.
Note: at this point, not all unit tests will pass, this will be fixed in follow-on commits.
PHP 8.0 | Tokenizer/PHP: retokenize
default
keywords within match expressions asT_MATCH_DEFAULT
The
default
keyword in match expressions is tokenized asT_DEFAULT
by PHP.However, for
switch
control structures,default
(andcase
) are treated as scope owners/openers by PHPCS.If the
default
keyword tokenization in match expressions was left as-is, theTokenizer::recurseScopeMap()
would search for the wrong scope opener/closer, throwing the scope setting for scope owners in the rest of the file off.Adding the typical opener/closers for match expression
default
cases to thePHP::$scopeOpeners
array would not prevent this and once the scope setting is out of kilter, a lot of sniffs start failing, including theScopeIndent
sniffs,ScopeCloserBrace
sniffs etc.The only stable solution I could find to prevent the scope setting potentially getting borked and existing sniffs breaking badly, was to retokenize the
default
keyword when used in amatch
expression toT_MATCH_DEFAULT
.Includes a separate set of unit tests which specifically tests the tokenization of the
default
keyword and verifies both the retokenization toT_MATCH_DEFAULT
when used inmatch
expressions, as well as the tokenization of thedefault
keyword withinswitch
control structures, including the scope setting for thedefault
case.PHP 8.0 | Tokenizer/PHP: add support for match expressions in combination with arrow functions
As a match expression can be nested in an arrow function and visa versa, setting the scope openers/closers becomes interesting.
If the arrow function would be allowed to take over, it would break the scope setting for match expression, making the
scope_closer
andscope_condition
indexes next to useless.So in the interest of making things easier for sniff writers and keeping in mind that scope setting for arrow functions isn't perfect anyway, preference is given to keeping the scope setting for match structures intact.
This means that if a match expression is in the return value of an arrow function, like so:
... the token after the closing curly belonging to the
match
will be considered the scope closer for the arrow function. In this example, that is the;
(semicolon).While, if an arrow function is the return value of one of the match expression cases, generally speaking the comma at the end of the case will be the scope closer for the arrow function.
There is one exception to this: the comma after each case is optional for the last case in a match expression.
In that situation, the last non-empty token will be considered the scope closer for the arrow function.
Example:
In this example, the
$x
at the end of the arrow expression would be considered the scope closer for the arrow function.And in this example, the parenthesis closer at the end of the arrow expression would be considered the scope closer for the arrow function.
Tokenizer/PHP: retokenize the match double arrow to T_MATCH_ARROW
The double arrow in PHP is used in a number of contexts:
as
part offoreach()
statements with keys;yield
statements with keys;match
expressions to separate the cases from the return value (body).As most of these constructs can be nested in each other - an arrow function in an array value, a match expression in a list key -, every sniff handling any of these constructs has to take a lot of care when searching for the double arrow for the construct they are handling, to prevent matching a double arrow belonging to another type of construct nested in the target construct.
This type of detection and special handling has to be done in each individual sniff which in one way or another has to deal with the
T_DOUBLE_ARROW
token and can cause quite some processing overhead.With that in mind, the double arrow as a scope opener for arrow functions has previously already been retokenized to
T_FN_ARROW
.Following the same reasoning, I'm proposing to retokenize the double arrow which separates
match
case expressions from the body expression toT_MATCH_ARROW
.This should make life easier for any sniff dealing with any of the above constructs and will prevent potential false positives being introduced for sniffs currently handling any of these constructs, but not yet updated to allow for match expressions.
Includes a set of dedicated unit tests verifying the tokenization of the double arrow operator in all currently supported contexts, including in combined (nested) contexts.