Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHP_CodeSniffer should use TOKEN_PARSE flag for token_get_all #3020

Closed
ondrejmirtes opened this issue Jul 15, 2020 · 10 comments
Closed

PHP_CodeSniffer should use TOKEN_PARSE flag for token_get_all #3020

ondrejmirtes opened this issue Jul 15, 2020 · 10 comments

Comments

@ondrejmirtes
Copy link

In PHP 8 there's a new reserved keyword match, but it's allowed as a class method name.

Without the TOKEN_PARSE flag, class method with name match will be present as a T_MATCH token in the token list.

With the TOKEN_PARSE flag, it will continue to be present as T_STRING.

Maybe there's already an example like this today that'd make working with the token stream easier.

PHP_CodeSniffer currently doesn't use the flag:

$tokens = @token_get_all($string);

Ref: nikic/PHP-Parser#684 (comment)

@ondrejmirtes
Copy link
Author

Code that'd be touched by this today is for example this:

<?php

class A
{
    const PUBLIC = 1;
}

With TOKEN_PARSE the constant name is T_STRING, without it it's T_PUBLIC.

@jrfnl
Copy link
Contributor

jrfnl commented Jul 15, 2020

@ondrejmirtes PHPCS 3.x has a minimum PHP version of PHP 5.4. That flag was added to PHP in version 7.0, so unfortunately, this is not an option at this time.

PHPCS 4.x is set to have a minimum PHP version of PHP 7.2, so hopefully it will be for 4.x, as yes, it would be great to have it enabled as it would make a number of backfills (partially) redundant or at the very least, these could be simplified.

(though enabling it would also constitute a BC-break as the tokens provided by PHPCS will be different, so this would need to go into a major anyway... quite apart from the PHPCS tokenizer needing lots more unit tests to safeguard against unexpected BC-breaks, or at least to discover them and be able to annotate them in the changelog)

@ondrejmirtes
Copy link
Author

This could work (https://3v4l.org/eNmOv):

if (defined('TOKEN_PARSE')) {
    token_get_all($code, TOKEN_PARSE);
} else {
    token_get_all($code);
}

I'd argue this isn't even a BC break as logic that parses code like this has to handle the normal case with T_STRING anyway, so the worst thing that can happen is that some extra conditions that are aware of this behaviour would no longer be executed...

@jrfnl
Copy link
Contributor

jrfnl commented Jul 15, 2020

It most definitely would be a BC-break as the token stream will be different and people will have written sniffs based on the expected token stream as it currently is.

Just for fun, I've just run some tests with the flag enabled on PHP 7.4.6:

  • PHPCS native "Core" tests: Tests: 268, Assertions: 397, Errors: 59.
  • PHPCS Generic sniff tests: Tests: 78, Assertions: 0, Errors: 22, Skipped: 5.
  • PHPCS PSR1 sniff tests: Tests: 19, Assertions: 0, Errors: 5.
  • PHPCS PSR2 sniff tests: Tests: 12, Assertions: 0, Errors: 5.
  • PHPCS PSR12 sniff tests: Tests: 16, Assertions: 0, Errors: 5.
  • PHPCS Squiz sniff tests: Tests: 98, Assertions: 0, Errors: 52, Skipped: 2.
  • PHPCS Zend sniff tests: Tests: 3, Assertions: 0, Errors: 1, Skipped: 1.

And some external standards:

  • WordPressCS: Tests: 57, Assertions: 0, Errors: 16.
  • PHPCompatibility: Tests: 8001, Assertions: 9197, Errors: 3987, Skipped: 4.Tests: 8001, Assertions: 9197, Errors: 3987, Skipped: 4. (different test setup from the others listed above)

Looking at the errors, it looks like that flag makes the PHP tokenizer parse error intolerant, which is undesirable for PHPCS for two reasons:

  1. It would mean that sniffs would no longer work in a "live coding" situation, such as in an IDE.
  2. It would make it impossible to write PHP cross-version compatible sniffs which look at "newer" PHP constructs, as if the sniffs are run on an "older" PHP version, those "newer" constructs would be seen as parse errors, the file wouldn't even tokenize and the sniffs wouldn't run.
    If nothing else, it would become neigh impossible to unit test sniffs without setting a minimum PHP version per sniff or per test case file.

Note: as I stated above, I ran the tests on PHP 7.4.6. If I'd rerun these on a lower PHP version, expect all the error numbers listed above to go up.

So, having looked at this more closely, even though it would simplify some backfills, I honestly don't think the flag should ever be turned on for PHPCS.

@ondrejmirtes
Copy link
Author

My concern is that I'd say that a lot of sniffs that are currently looking at code like this:

<?php

class A
{
    const FOO = 1;
}

Will currently definitely break for:

<?php

class A
{
    const PUBLIC = 1;
}

Because their authors are not aware they can encouter T_PUBLIC and not T_STRING in there. That's why I'd say that TOKEN_PARSE is a way to go around that. WDYT?

@jrfnl
Copy link
Contributor

jrfnl commented Jul 15, 2020

Re: the T_MATCH token - just like the T_FN token before, you can expect that token to be backfilled for older PHP versions in PHPCS, including setting it to T_STRING in PHP 8.0+ when the token is used in a non-match expression context.

@jrfnl
Copy link
Contributor

jrfnl commented Jul 15, 2020

Because their authors are not aware they can encouter T_PUBLIC and not T_STRING in there. That's why I'd say that TOKEN_PARSE is a way to go around that.

PHPCS already accounts for a lot of those type of situations and changes the token to T_STRING in that case.

The if, for example, in the below code sample is tokenized to T_STRING by PHPCS.

class Foo {
	function if() {}
}

If the same is not done for class constants, that could be added, but that's definitely not a reason to turn on the TOKEN_PARSE flag and break everything else.

@jrfnl
Copy link
Contributor

jrfnl commented Jul 15, 2020

Oh and just for arguments sake, I've just checked the current PHPCS tokenization of your code sample:

class A
{
    const PUBLIC = 1;
}

And PHPCS already tokenizes the PUBLIC as T_STRING, so I really don't see your point.

Ptr :: Ln  :: Col  :: Cond :: Token Type                 :: [len]: Content
--------------------------------------------------------------------------
  0 :: L01 :: C  1 :: CC 0 :: T_OPEN_TAG                 :: [5]: <?php

  1 :: L02 :: C  1 :: CC 0 :: T_WHITESPACE               :: [0]:

  2 :: L03 :: C  1 :: CC 0 :: T_CLASS                    :: [5]: class
  3 :: L03 :: C  6 :: CC 0 :: T_WHITESPACE               :: [1]:
  4 :: L03 :: C  7 :: CC 0 :: T_STRING                   :: [1]: A
  5 :: L03 :: C  8 :: CC 0 :: T_WHITESPACE               :: [0]:

  6 :: L04 :: C  1 :: CC 0 :: T_OPEN_CURLY_BRACKET       :: [1]: {
  7 :: L04 :: C  2 :: CC 1 :: T_WHITESPACE               :: [0]:

  8 :: L05 :: C  1 :: CC 1 :: T_WHITESPACE               :: [4]:
  9 :: L05 :: C  5 :: CC 1 :: T_CONST                    :: [5]: const
 10 :: L05 :: C 10 :: CC 1 :: T_WHITESPACE               :: [1]:
 11 :: L05 :: C 11 :: CC 1 :: T_STRING                   :: [6]: PUBLIC
 12 :: L05 :: C 17 :: CC 1 :: T_WHITESPACE               :: [1]:
 13 :: L05 :: C 18 :: CC 1 :: T_EQUAL                    :: [1]: =
 14 :: L05 :: C 19 :: CC 1 :: T_WHITESPACE               :: [1]:
 15 :: L05 :: C 20 :: CC 1 :: T_LNUMBER                  :: [1]: 1
 16 :: L05 :: C 21 :: CC 1 :: T_SEMICOLON                :: [1]: ;
 17 :: L05 :: C 22 :: CC 1 :: T_WHITESPACE               :: [0]:

 18 :: L06 :: C  1 :: CC 0 :: T_CLOSE_CURLY_BRACKET      :: [1]: }

@ondrejmirtes
Copy link
Author

So PHP_CodeSniffer already kind-of simulates the TOKEN_PARSE behaviour, which I wasn't aware of.

@gsherwood
Copy link
Member

@jrfnl Thanks for all work exploring this and explaining how PHP_CodeSniffer's tokenizer works.

So PHP_CodeSniffer already kind-of simulates the TOKEN_PARSE behaviour, which I wasn't aware of.

Yes, that's correct. PHP_CodeSniffer has been around for quite a while, so it's seen a lot of different PHP versions and has tried to maintain backwards compatibility as much as possible through these sort of backfills. It's a QA tool, so it's important it meets developers at the PHP version they are at, not just where they would like to be.

I don't have any plans to implement the TOKEN_PARSE flag as my top priority is to ensure the token stack remains the same throughout PHP_CodeSniffer versions to enable custom sniffs to be migrated with as little work as possible.

I'm going to close this issue so it is clear no work is planned here, but thanks for raising it as it turned into a very worthwhile discussion. Feel free to continue, as I can always reopen if there is a benefit to changing the behaviour of the tokenizer that I'm not seeing yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants