New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix: DisallowSpaceIndent did not recognize nor fix mixed tab/space indentations. #1404

Merged
merged 1 commit into from Apr 5, 2017

Conversation

Projects
None yet
3 participants
@jrfnl
Contributor

jrfnl commented Apr 2, 2017

I noticed two bugs in the DisallowSpaceIndent sniff when the inital indentation whitespace on a line had a mix of tabs and spaces.

  • If the line started with spaces, but had a tab after it and spaces after that, only the spaces at the start would be fixed.
  • If the line started with tabs, but had spaces after that, no error was thrown at all.

This PR fixes that.

Notes:

  • A new Line indent: mixed metric will be recorded for this sniff. This is inline with the same in the DisallowTabIndent sniff.
  • Moved the throwing of the error down to prevent code duplication as the same logic & error can be used for both situations.
  • Includes minor documentation fix.
  • Includes unit tests for the bugs and adjusted fix file for existing unit tests which were previously not fixed due to this bug.
Show outdated Hide outdated ...Sniffer/Standards/Generic/Sniffs/WhiteSpace/DisallowSpaceIndentSniff.php
@@ -94,42 +94,60 @@ public function process(PHP_CodeSniffer_File $phpcsFile, $stackPtr)
continue;
}
// If tabs are being converted to spaces, the original content
// If spaces are being converted to tabs, the original content

This comment has been minimized.

@gsherwood

gsherwood Apr 2, 2017

Member

This comment is referring to PHPCS converting tabs to spaces during tokenizing, so the original comment is correct in this case.

@gsherwood

gsherwood Apr 2, 2017

Member

This comment is referring to PHPCS converting tabs to spaces during tokenizing, so the original comment is correct in this case.

This comment has been minimized.

@jrfnl

jrfnl Apr 4, 2017

Contributor

I've reverted the word order, but adjusted the comment slightly for clarification of what was meant here.

@jrfnl

jrfnl Apr 4, 2017

Contributor

I've reverted the word order, but adjusted the comment slightly for clarification of what was meant here.

@gsherwood

This comment has been minimized.

Show comment
Hide comment
@gsherwood

gsherwood Apr 2, 2017

Member

This behaviour is intentional. Some indentation can only be achieved by using spaces after the normal tab indent, so this sniff does not assume that everything is perfectly aligned to a tab stop. The scope indent sniff makes the same assumption, which is why it never checks for exact indentation unless you tell it to. And when you do, it almost always reports false positives.

DisallowTabIndents is much easier because a tab is always banned due to it having no use in a space-based indent rule.

Changing the behaviour of this sniff would be a BC break, as would changing the definitions of the metric.

To make it a better sniff, I think it needs to continue to allow the space exception, but probably only if those spaces indent the code to a non tab stop. So if a tab could be used instead, the spaces would throw an error and be replaced. But if putting a tab there would cause the code to begin at a different column, the spaces should not be reported or replaced. That's the job of the ScopeIndent sniff.

Member

gsherwood commented Apr 2, 2017

This behaviour is intentional. Some indentation can only be achieved by using spaces after the normal tab indent, so this sniff does not assume that everything is perfectly aligned to a tab stop. The scope indent sniff makes the same assumption, which is why it never checks for exact indentation unless you tell it to. And when you do, it almost always reports false positives.

DisallowTabIndents is much easier because a tab is always banned due to it having no use in a space-based indent rule.

Changing the behaviour of this sniff would be a BC break, as would changing the definitions of the metric.

To make it a better sniff, I think it needs to continue to allow the space exception, but probably only if those spaces indent the code to a non tab stop. So if a tab could be used instead, the spaces would throw an error and be replaced. But if putting a tab there would cause the code to begin at a different column, the spaces should not be reported or replaced. That's the job of the ScopeIndent sniff.

@jrfnl

This comment has been minimized.

Show comment
Hide comment
@jrfnl

jrfnl Apr 3, 2017

Contributor

Hi @gsherwood, thanks for the feedback.

This behaviour is intentional. Some indentation can only be achieved by using spaces after the normal tab indent, so this sniff does not assume that everything is perfectly aligned to a tab stop.

Looking at the code, IMHO the existing behaviour of the sniff is inconsistent with the above statement.

When a line starts with a space and is not a file/class level docblock and there are less spaces than the tabWidth, it still replaces the space(s) with a tab. This goes against what you suggest above.

Similarly, when the line starts with a tab, but also has spaces, no replacement is ever done, even when there are sets of 4 (tabWidth) spaces which could be replaced by a tab. (part of the original report above).

Studying the code in even more detail, I find more inconsistencies as the replacement seems to be ordered wrong.

Example (using . for space and -> for tab):

// Original line:
.......->...// code <- 7 spaces, 1 tab, 3 spaces

// Would be replaced by:
->->..->//code <- 2 tabs, 2 spaces, 1 tab

// While I would have expected:
->->->..//code <- 3 tabs, 2 spaces

Changing the behaviour of this sniff would be a BC break

What I would like to suggest is to align the behaviour of the sniff with your statement and with what I understand to be the intention of the sniff: Disallowing spaces for alignment, but allowing precision alignment using spaces.

In practice this would mean:

  • If there are more spaces > tabWidth, spaces would be replaced by tabs.
  • Any remaining spaces will be placed at the end of the whitespace, so the "whitespace order" will always be tabs first, spaces at the end.

I would not consider that a BC break, but a bug fix as the sniff currently does not behave in line with the intention of the sniff.

If you agree with that, the only question is what to do with the current behavioral inconsistency: should a set of less than 4 spaces in a line beginning with a space still be replaced with a tab ? (when not T_DOC_COMMENT_WHITESPACE)

as would changing the definitions of the metric.

As for the metrics, am I correct that any precision spacing should be ignored, but other than that, the metrics should be recorded ?

  • spaces for only spaces and more spaces > tabWidth (i.e. not precision spacing)
  • tabs for only tabs
  • mixed for a mixture of tabs and spaces where the amount of spaces > tabWidth

Or should the tabs metric also be recorded when the whitespace is tabs with only precision spacing ?
If so, continue ignoring file/class comments for the metric ?

To make it a better sniff, I think it needs to continue to allow the space exception, but probably only if those spaces indent the code to a non tab stop. So if a tab could be used instead, the spaces would throw an error and be replaced. But if putting a tab there would cause the code to begin at a different column, the spaces should not be reported or replaced.

I've made the necessary code changes for this in line with the above. If you could clarify your stance on the points I've described here, I can finish it off and push a replacement commit.

Contributor

jrfnl commented Apr 3, 2017

Hi @gsherwood, thanks for the feedback.

This behaviour is intentional. Some indentation can only be achieved by using spaces after the normal tab indent, so this sniff does not assume that everything is perfectly aligned to a tab stop.

Looking at the code, IMHO the existing behaviour of the sniff is inconsistent with the above statement.

When a line starts with a space and is not a file/class level docblock and there are less spaces than the tabWidth, it still replaces the space(s) with a tab. This goes against what you suggest above.

Similarly, when the line starts with a tab, but also has spaces, no replacement is ever done, even when there are sets of 4 (tabWidth) spaces which could be replaced by a tab. (part of the original report above).

Studying the code in even more detail, I find more inconsistencies as the replacement seems to be ordered wrong.

Example (using . for space and -> for tab):

// Original line:
.......->...// code <- 7 spaces, 1 tab, 3 spaces

// Would be replaced by:
->->..->//code <- 2 tabs, 2 spaces, 1 tab

// While I would have expected:
->->->..//code <- 3 tabs, 2 spaces

Changing the behaviour of this sniff would be a BC break

What I would like to suggest is to align the behaviour of the sniff with your statement and with what I understand to be the intention of the sniff: Disallowing spaces for alignment, but allowing precision alignment using spaces.

In practice this would mean:

  • If there are more spaces > tabWidth, spaces would be replaced by tabs.
  • Any remaining spaces will be placed at the end of the whitespace, so the "whitespace order" will always be tabs first, spaces at the end.

I would not consider that a BC break, but a bug fix as the sniff currently does not behave in line with the intention of the sniff.

If you agree with that, the only question is what to do with the current behavioral inconsistency: should a set of less than 4 spaces in a line beginning with a space still be replaced with a tab ? (when not T_DOC_COMMENT_WHITESPACE)

as would changing the definitions of the metric.

As for the metrics, am I correct that any precision spacing should be ignored, but other than that, the metrics should be recorded ?

  • spaces for only spaces and more spaces > tabWidth (i.e. not precision spacing)
  • tabs for only tabs
  • mixed for a mixture of tabs and spaces where the amount of spaces > tabWidth

Or should the tabs metric also be recorded when the whitespace is tabs with only precision spacing ?
If so, continue ignoring file/class comments for the metric ?

To make it a better sniff, I think it needs to continue to allow the space exception, but probably only if those spaces indent the code to a non tab stop. So if a tab could be used instead, the spaces would throw an error and be replaced. But if putting a tab there would cause the code to begin at a different column, the spaces should not be reported or replaced.

I've made the necessary code changes for this in line with the above. If you could clarify your stance on the points I've described here, I can finish it off and push a replacement commit.

@gsherwood

This comment has been minimized.

Show comment
Hide comment
@gsherwood

gsherwood Apr 4, 2017

Member

There seems to be a mix of misunderstanding and inconsistent testing (I cannot reproduce your results). The sniff currently reports on leading spaces in indentation, and replaces those leading spaces with tabs. In my testing, it is still working like this.

For example, you're 7 spaces, 1 tab, 3 spaces example is fixed like this for me:

E: [Line 8] Tabs must be used to indent lines; spaces are not allowed (Generic.WhiteSpace.DisallowSpaceIndent.SpacesUsed)
	Generic_Sniffs_WhiteSpace_DisallowSpaceIndentSniff (line 128) replaced token 39 (T_WHITESPACE) "·······\t···echo" => "\t···\t···echo"

That is,1 tab, 3 spaces, 1 tab, 3 spaces. It replaced as much of the leading whitespace as it could and then left the rest of the indentation as-is.

But this doesn't really matter because you came to the same conclusion as me, although wrote it up more succinctly:

In practice this would mean:

If there are more spaces > tabWidth, spaces would be replaced by tabs.
Any remaining spaces will be placed at the end of the whitespace, so the "whitespace order" will always be tabs first, spaces at the end.

I complexly agree with this. It is what I was suggesting in my final comment, which would make the sniff more useful.

As for your questions:

should a set of less than 4 spaces in a line beginning with a space still be replaced with a tab ? (when not T_DOC_COMMENT_WHITESPACE)

If we assume the rule is "take the indentation and rewrite it to use tabs" than the answer would be that if the indentation is less than a tab width, it would remain as spaces. If it is greater, as many spaces will be removed as possible. The --tab-width setting would need to be used to figure this out.

As for the metrics, am I correct that any precision spacing should be ignored, but other than that, the metrics should be recorded ?

I think the rules are:

  • the indent contains only spaces = spaces
    • e.g., ............echo 'test';
  • the indent contains only tabs = tabs
    • e.g., ->->->echo 'test';
  • the indent contains tabs then spaces < tab-width = tabs
    • e.g., ->->->..echo 'test';
  • the indent contains spaces, then a tab somewhere = mixed
    • e.g., ....->....echo 'test';
    • e.g., ..->->->echo 'test';
  • the indent contains tabs, then spaces >= tab-width = mixed
    • e.g., ->->....echo 'test';
    • e.g., ->........echo 'test';
Member

gsherwood commented Apr 4, 2017

There seems to be a mix of misunderstanding and inconsistent testing (I cannot reproduce your results). The sniff currently reports on leading spaces in indentation, and replaces those leading spaces with tabs. In my testing, it is still working like this.

For example, you're 7 spaces, 1 tab, 3 spaces example is fixed like this for me:

E: [Line 8] Tabs must be used to indent lines; spaces are not allowed (Generic.WhiteSpace.DisallowSpaceIndent.SpacesUsed)
	Generic_Sniffs_WhiteSpace_DisallowSpaceIndentSniff (line 128) replaced token 39 (T_WHITESPACE) "·······\t···echo" => "\t···\t···echo"

That is,1 tab, 3 spaces, 1 tab, 3 spaces. It replaced as much of the leading whitespace as it could and then left the rest of the indentation as-is.

But this doesn't really matter because you came to the same conclusion as me, although wrote it up more succinctly:

In practice this would mean:

If there are more spaces > tabWidth, spaces would be replaced by tabs.
Any remaining spaces will be placed at the end of the whitespace, so the "whitespace order" will always be tabs first, spaces at the end.

I complexly agree with this. It is what I was suggesting in my final comment, which would make the sniff more useful.

As for your questions:

should a set of less than 4 spaces in a line beginning with a space still be replaced with a tab ? (when not T_DOC_COMMENT_WHITESPACE)

If we assume the rule is "take the indentation and rewrite it to use tabs" than the answer would be that if the indentation is less than a tab width, it would remain as spaces. If it is greater, as many spaces will be removed as possible. The --tab-width setting would need to be used to figure this out.

As for the metrics, am I correct that any precision spacing should be ignored, but other than that, the metrics should be recorded ?

I think the rules are:

  • the indent contains only spaces = spaces
    • e.g., ............echo 'test';
  • the indent contains only tabs = tabs
    • e.g., ->->->echo 'test';
  • the indent contains tabs then spaces < tab-width = tabs
    • e.g., ->->->..echo 'test';
  • the indent contains spaces, then a tab somewhere = mixed
    • e.g., ....->....echo 'test';
    • e.g., ..->->->echo 'test';
  • the indent contains tabs, then spaces >= tab-width = mixed
    • e.g., ->->....echo 'test';
    • e.g., ->........echo 'test';
@jrfnl

This comment has been minimized.

Show comment
Hide comment
@jrfnl

jrfnl Apr 4, 2017

Contributor

Ok, verifying my implementation with your comments now.

If we assume the rule is "take the indentation and rewrite it to use tabs" than the answer would be that if the indentation is less than a tab width, it would remain as spaces. If it is greater, as many spaces will be removed as possible. The --tab-width setting would need to be used to figure this out.

This would change the behaviour for the unit tests on line 18 and 19 where the indentation would no longer be fixed. Is that ok ?

https://github.com/squizlabs/PHP_CodeSniffer/blob/master/CodeSniffer/Standards/Generic/Tests/WhiteSpace/DisallowSpaceIndentUnitTest.inc#L18-L19
https://github.com/squizlabs/PHP_CodeSniffer/blob/master/CodeSniffer/Standards/Generic/Tests/WhiteSpace/DisallowSpaceIndentUnitTest.inc.fixed#L18-L19

Edit: Also line 22 and 24 though the whitespace order for those lines would be changed.

Contributor

jrfnl commented Apr 4, 2017

Ok, verifying my implementation with your comments now.

If we assume the rule is "take the indentation and rewrite it to use tabs" than the answer would be that if the indentation is less than a tab width, it would remain as spaces. If it is greater, as many spaces will be removed as possible. The --tab-width setting would need to be used to figure this out.

This would change the behaviour for the unit tests on line 18 and 19 where the indentation would no longer be fixed. Is that ok ?

https://github.com/squizlabs/PHP_CodeSniffer/blob/master/CodeSniffer/Standards/Generic/Tests/WhiteSpace/DisallowSpaceIndentUnitTest.inc#L18-L19
https://github.com/squizlabs/PHP_CodeSniffer/blob/master/CodeSniffer/Standards/Generic/Tests/WhiteSpace/DisallowSpaceIndentUnitTest.inc.fixed#L18-L19

Edit: Also line 22 and 24 though the whitespace order for those lines would be changed.

@jrfnl

This comment has been minimized.

Show comment
Hide comment
@jrfnl

jrfnl Apr 4, 2017

Contributor

One more question: in the existing implementation, file/class docblocks are ignored for the metrics.
According to the below, they should be recorded as tabs (provided they are otherwise indented with tabs).
Continue ignoring the file/class docblock lines or let them follow the same rules as all the other lines ?

the indent contains tabs then spaces < tab-width = tabs
e.g., ->->->..echo 'test';

Contributor

jrfnl commented Apr 4, 2017

One more question: in the existing implementation, file/class docblocks are ignored for the metrics.
According to the below, they should be recorded as tabs (provided they are otherwise indented with tabs).
Continue ignoring the file/class docblock lines or let them follow the same rules as all the other lines ?

the indent contains tabs then spaces < tab-width = tabs
e.g., ->->->..echo 'test';

@gsherwood

This comment has been minimized.

Show comment
Hide comment
@gsherwood

gsherwood Apr 4, 2017

Member

Yeah, I think it is ok for lines 18,19,22,24 to have different test results now. Especially lines 18 and 19, which were changing the indent level of the code.

Continue ignoring the file/class docblock lines or let them follow the same rules as all the other lines ?

Yes, they need to be ignored or else tab-indented code would end up with space-indented metrics for all top-level docblocks, like file and class comments.

Member

gsherwood commented Apr 4, 2017

Yeah, I think it is ok for lines 18,19,22,24 to have different test results now. Especially lines 18 and 19, which were changing the indent level of the code.

Continue ignoring the file/class docblock lines or let them follow the same rules as all the other lines ?

Yes, they need to be ignored or else tab-indented code would end up with space-indented metrics for all top-level docblocks, like file and class comments.

Bug fix: DisallowSpaceIndent did not recognize nor fix mixed tab/spac…
…e indentations.

Details:
* Now also detects & fixes spaces if used after tabs and not used for precision indentation.
* Now also corrects the whitespace order if needed. Order should be tabs first, precision spaces second. For this a new error message and error code has been added.
* Adds the `Line endings: mixed` metric to this sniff in line with how it's used in the `DisallowTabIndent` sniff
* Adds `.fixed` files for the CSS and JS tests.
* Minor documentation clarifications.
@jrfnl

This comment has been minimized.

Show comment
Hide comment
@jrfnl

jrfnl Apr 4, 2017

Contributor

Hi @gsherwood, thanks for those responses.

I've replaced the original commit with a new one which takes everything we've discussed into account.

The logic of the original sniff has essentially been rewritten to allow for this.

Details for the fixed sniff:

  • Now also detects & fixes spaces if used after tabs and not used for precision indentation.
  • Now also corrects the whitespace order if needed. Order should be tabs first, precision spaces second. For this a new error message and error code has been added.
  • Adds the Line endings: mixed metric to this sniff in line with how it's used in the DisallowTabIndent sniff
  • Adds .fixed files for the CSS and JS tests.
  • Minor documentation clarifications.

Please let me know if you need any further changes.

Contributor

jrfnl commented Apr 4, 2017

Hi @gsherwood, thanks for those responses.

I've replaced the original commit with a new one which takes everything we've discussed into account.

The logic of the original sniff has essentially been rewritten to allow for this.

Details for the fixed sniff:

  • Now also detects & fixes spaces if used after tabs and not used for precision indentation.
  • Now also corrects the whitespace order if needed. Order should be tabs first, precision spaces second. For this a new error message and error code has been added.
  • Adds the Line endings: mixed metric to this sniff in line with how it's used in the DisallowTabIndent sniff
  • Adds .fixed files for the CSS and JS tests.
  • Minor documentation clarifications.

Please let me know if you need any further changes.

@gsherwood gsherwood merged commit bf763be into squizlabs:master Apr 5, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

gsherwood added a commit that referenced this pull request Apr 5, 2017

@gsherwood

This comment has been minimized.

Show comment
Hide comment
@gsherwood

gsherwood Apr 5, 2017

Member

Thanks a lot for doing that.

I did end up removing the extra error message as I don't think there is a need to provide a distinction there. I'd prefer to keep the error message simple and consistent with previous versions.

Member

gsherwood commented Apr 5, 2017

Thanks a lot for doing that.

I did end up removing the extra error message as I don't think there is a need to provide a distinction there. I'd prefer to keep the error message simple and consistent with previous versions.

@jrfnl jrfnl deleted the jrfnl:feature/bug-fix-space-indent branch Apr 5, 2017

@jrfnl

This comment has been minimized.

Show comment
Hide comment
@jrfnl

jrfnl Apr 5, 2017

Contributor

You're welcome. Thanks for the merge.

Contributor

jrfnl commented Apr 5, 2017

You're welcome. Thanks for the merge.

}
} else {
if ($numTabs === 0) {
// Precision indentation.

This comment has been minimized.

@Frenzie

Frenzie Aug 24, 2017

This logic looks very suspicious to me (compare #547 (comment)). Consider something like this:

		if (! ($env === 'silent'
		       || ($env === 'production'
		       && ($level >= Minz_Log::NOTICE)))) {

Greatly increasing the tab width doesn't suffice as a workaround because then you'll get nonsensical errors about line length.

@Frenzie

Frenzie Aug 24, 2017

This logic looks very suspicious to me (compare #547 (comment)). Consider something like this:

		if (! ($env === 'silent'
		       || ($env === 'production'
		       && ($level >= Minz_Log::NOTICE)))) {

Greatly increasing the tab width doesn't suffice as a workaround because then you'll get nonsensical errors about line length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment