Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHP 8.0 | Add support for named function call arguments #3178

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions package.xml
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,8 @@ http://pear.php.net/dtd/package-2.0.xsd">
<file baseinstalldir="" name="BackfillNumericSeparatorTest.php" role="test" />
<file baseinstalldir="" name="BitwiseOrTest.inc" role="test" />
<file baseinstalldir="" name="BitwiseOrTest.php" role="test" />
<file baseinstalldir="" name="NamedFunctionCallArgumentsTest.inc" role="test" />
<file baseinstalldir="" name="NamedFunctionCallArgumentsTest.php" role="test" />
<file baseinstalldir="" name="NullsafeObjectOperatorTest.inc" role="test" />
<file baseinstalldir="" name="NullsafeObjectOperatorTest.php" role="test" />
<file baseinstalldir="" name="ScopeSettingWithNamespaceOperatorTest.inc" role="test" />
Expand Down Expand Up @@ -2045,6 +2047,8 @@ http://pear.php.net/dtd/package-2.0.xsd">
<install as="CodeSniffer/Core/Tokenizer/BackfillNumericSeparatorTest.inc" name="tests/Core/Tokenizer/BackfillNumericSeparatorTest.inc" />
<install as="CodeSniffer/Core/Tokenizer/BitwiseOrTest.php" name="tests/Core/Tokenizer/BitwiseOrTest.php" />
<install as="CodeSniffer/Core/Tokenizer/BitwiseOrTest.inc" name="tests/Core/Tokenizer/BitwiseOrTest.inc" />
<install as="CodeSniffer/Core/Tokenizer/NamedFunctionCallArgumentsTest.php" name="tests/Core/Tokenizer/NamedFunctionCallArgumentsTest.php" />
<install as="CodeSniffer/Core/Tokenizer/NamedFunctionCallArgumentsTest.inc" name="tests/Core/Tokenizer/NamedFunctionCallArgumentsTest.inc" />
<install as="CodeSniffer/Core/Tokenizer/NullsafeObjectOperatorTest.php" name="tests/Core/Tokenizer/NullsafeObjectOperatorTest.php" />
<install as="CodeSniffer/Core/Tokenizer/NullsafeObjectOperatorTest.inc" name="tests/Core/Tokenizer/NullsafeObjectOperatorTest.inc" />
<install as="CodeSniffer/Core/Tokenizer/ScopeSettingWithNamespaceOperatorTest.php" name="tests/Core/Tokenizer/ScopeSettingWithNamespaceOperatorTest.php" />
Expand Down Expand Up @@ -2117,6 +2121,8 @@ http://pear.php.net/dtd/package-2.0.xsd">
<install as="CodeSniffer/Core/Tokenizer/BackfillNumericSeparatorTest.inc" name="tests/Core/Tokenizer/BackfillNumericSeparatorTest.inc" />
<install as="CodeSniffer/Core/Tokenizer/BitwiseOrTest.php" name="tests/Core/Tokenizer/BitwiseOrTest.php" />
<install as="CodeSniffer/Core/Tokenizer/BitwiseOrTest.inc" name="tests/Core/Tokenizer/BitwiseOrTest.inc" />
<install as="CodeSniffer/Core/Tokenizer/NamedFunctionCallArgumentsTest.php" name="tests/Core/Tokenizer/NamedFunctionCallArgumentsTest.php" />
<install as="CodeSniffer/Core/Tokenizer/NamedFunctionCallArgumentsTest.inc" name="tests/Core/Tokenizer/NamedFunctionCallArgumentsTest.inc" />
<install as="CodeSniffer/Core/Tokenizer/NullsafeObjectOperatorTest.php" name="tests/Core/Tokenizer/NullsafeObjectOperatorTest.php" />
<install as="CodeSniffer/Core/Tokenizer/NullsafeObjectOperatorTest.inc" name="tests/Core/Tokenizer/NullsafeObjectOperatorTest.inc" />
<install as="CodeSniffer/Core/Tokenizer/ScopeSettingWithNamespaceOperatorTest.php" name="tests/Core/Tokenizer/ScopeSettingWithNamespaceOperatorTest.php" />
Expand Down
168 changes: 123 additions & 45 deletions src/Tokenizers/PHP.php
Original file line number Diff line number Diff line change
Expand Up @@ -893,6 +893,62 @@ protected function tokenize($string)
continue;
}//end if

/*
Tokenize the parameter labels for PHP 8.0 named parameters as a special T_PARAM_NAME
token and ensure that the colon after it is always T_COLON.
*/

if ($tokenIsArray === true
&& preg_match('`^[a-zA-Z_\x80-\xff]`', $token[1]) === 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried about running a preg_match on so many tokens, and then the following loop to find the next non-empty looking for a colon.

I probably would have tried tackling this problem starting with the colon and looking backwards, although I know that's going to be complex due to the different ways it can be used. I possibly would have also tried in processAdditional, but that might have been even more complex to unravel.

I'm curious to know if you tried any alternative implementations before I go and play around with the code to see how it's working.

Copy link
Contributor Author

@jrfnl jrfnl Dec 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried about running a preg_match on so many token

Well, as regexes go, this is as fast you can get: a regex checking for one character on a string of one character, so no chance of backtracing at all, it either matches or it doesn't.
The alternative would be to have a list of the characters we want to exclude and do either a strpos() or isset() or something, but then that would break straight away if a new special character would get meaning in PHP. It would definitely be less stable.

I probably would have tried tackling this problem starting with the colon and looking backward

That was my first choice for token to search for, but then the "final token" for the previous effective token (which is the one we need to change) may already have been set in a previous loop, and what with whitespace and comments allowed everywhere, that seemed like it was going to be pretty complicated as I'd then need to start walking the $finalTokens to potentially change a previously set "$finalToken".

I possibly would have also tried in processAdditional, but that might have been even more complex to unravel.

I consider that as well, but that would have been too late to prevent misidentified ternary "else" tokens and trying to recreate the whole if/else determination in processAdditional() after the fact, seemed like it would cause a huge overhead, aside from it probably not being stable. We've seen enough bugs with misidentified "inline else"'s over the years, especially with the colon continuously being used for more syntaxes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking - I could possibly make the regex condition a "T_STRING or matching the regex" condition, that way the most common token for identifiers would not be run through the regex, which would possibly be slightly faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes, I thought long and hard before deciding on this way to do it, considered sniffing based on the ( or , before it too. Every time I tried, there was something which would make it less stable or more complex to set the final token, which is why I ended up with this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed you'd thought through those options, but just wanted to check before I dug into it. Bypassing the regex for T_STRING tokens is a good idea - I might give it a go. The extensive unit tests make this very easy to play around with, so thank you so much for coming up with all those test cases.

) {
// Get the next non-empty token.
for ($i = ($stackPtr + 1); $i < $numTokens; $i++) {
if (is_array($tokens[$i]) === false
|| isset(Util\Tokens::$emptyTokens[$tokens[$i][0]]) === false
) {
break;
}
}

if (isset($tokens[$i]) === true
&& is_array($tokens[$i]) === false
&& $tokens[$i] === ':'
) {
// Get the previous non-empty token.
for ($j = ($stackPtr - 1); $j > 0; $j--) {
if (is_array($tokens[$j]) === false
|| isset(Util\Tokens::$emptyTokens[$tokens[$j][0]]) === false
) {
break;
}
}

if (is_array($tokens[$j]) === false
&& ($tokens[$j] === '('
|| $tokens[$j] === ',')
) {
$newToken = [];
$newToken['code'] = T_PARAM_NAME;
$newToken['type'] = 'T_PARAM_NAME';
$newToken['content'] = $token[1];
$finalTokens[$newStackPtr] = $newToken;

$newStackPtr++;

// Modify the original token stack so that future checks, like
// determining T_COLON vs T_INLINE_ELSE can handle this correctly.
$tokens[$stackPtr][0] = T_PARAM_NAME;

if (PHP_CODESNIFFER_VERBOSITY > 1) {
$type = Util\Tokens::tokenName($token[0]);
echo "\t\t* token $stackPtr changed from $type to T_PARAM_NAME".PHP_EOL;
}

continue;
}
}//end if
}//end if

/*
Before PHP 7.0, the "yield from" was tokenized as
T_YIELD, T_WHITESPACE and T_STRING. So look for
Expand Down Expand Up @@ -1700,76 +1756,98 @@ function return types. We want to keep the parenthesis map clean,
// Convert colons that are actually the ELSE component of an
// inline IF statement.
if (empty($insideInlineIf) === false && $newToken['code'] === T_COLON) {
// Make sure this isn't a return type separator.
$isInlineIf = true;

// Make sure this isn't a named parameter label.
// Get the previous non-empty token.
for ($i = ($stackPtr - 1); $i > 0; $i--) {
if (is_array($tokens[$i]) === false
|| ($tokens[$i][0] !== T_DOC_COMMENT
&& $tokens[$i][0] !== T_COMMENT
&& $tokens[$i][0] !== T_WHITESPACE)
|| isset(Util\Tokens::$emptyTokens[$tokens[$i][0]]) === false
) {
break;
}
}

if ($tokens[$i] === ')') {
$parenCount = 1;
for ($i--; $i > 0; $i--) {
if ($tokens[$i] === '(') {
$parenCount--;
if ($parenCount === 0) {
break;
}
} else if ($tokens[$i] === ')') {
$parenCount++;
}
if ($tokens[$i][0] === T_PARAM_NAME) {
$isInlineIf = false;
if (PHP_CODESNIFFER_VERBOSITY > 1) {
echo "\t\t* token is parameter label, not T_INLINE_ELSE".PHP_EOL;
}
}

// We've found the open parenthesis, so if the previous
// non-empty token is FUNCTION or USE, this is a return type.
// Note that we need to skip T_STRING tokens here as these
// can be function names.
for ($i--; $i > 0; $i--) {
if ($isInlineIf === true) {
// Make sure this isn't a return type separator.
for ($i = ($stackPtr - 1); $i > 0; $i--) {
if (is_array($tokens[$i]) === false
|| ($tokens[$i][0] !== T_DOC_COMMENT
&& $tokens[$i][0] !== T_COMMENT
&& $tokens[$i][0] !== T_WHITESPACE
&& $tokens[$i][0] !== T_STRING)
&& $tokens[$i][0] !== T_WHITESPACE)
) {
break;
}
}

if ($tokens[$i][0] === T_FUNCTION || $tokens[$i][0] === T_FN || $tokens[$i][0] === T_USE) {
$isInlineIf = false;
if (PHP_CODESNIFFER_VERBOSITY > 1) {
echo "\t\t* token is return type, not T_INLINE_ELSE".PHP_EOL;
if ($tokens[$i] === ')') {
$parenCount = 1;
for ($i--; $i > 0; $i--) {
if ($tokens[$i] === '(') {
$parenCount--;
if ($parenCount === 0) {
break;
}
} else if ($tokens[$i] === ')') {
$parenCount++;
}
}
}

// We've found the open parenthesis, so if the previous
// non-empty token is FUNCTION or USE, this is a return type.
// Note that we need to skip T_STRING tokens here as these
// can be function names.
for ($i--; $i > 0; $i--) {
if (is_array($tokens[$i]) === false
|| ($tokens[$i][0] !== T_DOC_COMMENT
&& $tokens[$i][0] !== T_COMMENT
&& $tokens[$i][0] !== T_WHITESPACE
&& $tokens[$i][0] !== T_STRING)
) {
break;
}
}

if ($tokens[$i][0] === T_FUNCTION || $tokens[$i][0] === T_FN || $tokens[$i][0] === T_USE) {
$isInlineIf = false;
if (PHP_CODESNIFFER_VERBOSITY > 1) {
echo "\t\t* token is return type, not T_INLINE_ELSE".PHP_EOL;
}
}
}//end if
}//end if

// Check to see if this is a CASE or DEFAULT opener.
$inlineIfToken = $insideInlineIf[(count($insideInlineIf) - 1)];
for ($i = $stackPtr; $i > $inlineIfToken; $i--) {
if (is_array($tokens[$i]) === true
&& ($tokens[$i][0] === T_CASE
|| $tokens[$i][0] === T_DEFAULT)
) {
$isInlineIf = false;
if (PHP_CODESNIFFER_VERBOSITY > 1) {
echo "\t\t* token is T_CASE or T_DEFAULT opener, not T_INLINE_ELSE".PHP_EOL;
}
if ($isInlineIf === true) {
$inlineIfToken = $insideInlineIf[(count($insideInlineIf) - 1)];
for ($i = $stackPtr; $i > $inlineIfToken; $i--) {
if (is_array($tokens[$i]) === true
&& ($tokens[$i][0] === T_CASE
|| $tokens[$i][0] === T_DEFAULT)
) {
$isInlineIf = false;
if (PHP_CODESNIFFER_VERBOSITY > 1) {
echo "\t\t* token is T_CASE or T_DEFAULT opener, not T_INLINE_ELSE".PHP_EOL;
}

break;
}
break;
}

if (is_array($tokens[$i]) === false
&& ($tokens[$i] === ';'
|| $tokens[$i] === '{')
) {
break;
if (is_array($tokens[$i]) === false
&& ($tokens[$i] === ';'
|| $tokens[$i] === '{')
) {
break;
}
}
}
}//end if

if ($isInlineIf === true) {
array_pop($insideInlineIf);
Expand Down
1 change: 1 addition & 0 deletions src/Util/Tokens.php
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@
define('T_ZSR_EQUAL', 'PHPCS_T_ZSR_EQUAL');
define('T_FN_ARROW', 'T_FN_ARROW');
define('T_TYPE_UNION', 'T_TYPE_UNION');
define('T_PARAM_NAME', 'T_PARAM_NAME');

// Some PHP 5.5 tokens, replicated for lower versions.
if (defined('T_FINALLY') === false) {
Expand Down
Loading