-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Ignore unknown nodes for Indent rule (fixes #8440) #8504
Conversation
LGTM |
@soda0289, thanks for your PR! By analyzing the history of the files in this pull request, we identified @not-an-aardvark, @vitorbal and @gyandeeps to be potential reviewers. |
Thanks for the PR, it looks like a good start. I'll review this more thoroughly tomorrow, but my first impression is that we definitely need to try to avoid a second traversal, if at all possible. Doing a second traversal (and having two nested loops for each node) will significantly hurt performance. I feel like there might be a simpler way to do this by just doing the unknown-node check inside the |
I tried to avoid calling Another solution was to allow for eslint to have wildcard node listeners. For example a node listener of '*' being called every node. This way we can tell when we are inside of an unknown node and mark tokens correctly using the other node listeners. |
We already have this -- a listener for '*' will be called on every node. |
LGTM |
@not-an-aardvark It may be possible to set the dependency offset, for blocks and classes, by adding logic into those node listeners since it is not possible to know when we are inside an unknown node. This would eliminate the need to find the last dependency of an offset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the delay, I left a few comments.
Could you give a high-level description of how this works? I see the logic you're using, but I'm not sure I understand how it solves the problem of ignoring unknown nodes. I feel like there is something incorrect about this (it seems like it shouldn't be necessary to have special cases for when a node has no dependencies), but I'm having trouble understanding what's going on.
lib/rules/indent.js
Outdated
AssignmentPattern: "AssignmentPattern", | ||
ArrayExpression: "ArrayExpression", | ||
ArrayPattern: "ArrayPattern", | ||
ArrowFunctionExpression: "ArrowFunctionExpression", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to add AwaitExpression
to this list. (It wasn't there initially due to eslint/js#331.)
lib/rules/indent.js
Outdated
@@ -19,6 +19,87 @@ const astUtils = require("../ast-utils"); | |||
// Rule Definition | |||
//------------------------------------------------------------------------------ | |||
|
|||
const KNOWN_NODES = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use a Set
here instead of an object?
lib/rules/indent.js
Outdated
} | ||
|
||
/** | ||
* Ignore each line of an unknown node if its offset has no dependencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is a node only ignored if its offset has no dependencies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ignore the first token of the line within the unknown node. This way it's desired offset will be set to it's current offset.
If the offset is dependent on another token we have to find this token and make its offset depend on the first token of its line, since that token is already ignored.
When I say the first token of a line has no dependencies I mean its offset is not dependent on another tokens offset. For example a typescript interface: 1 interface Foo {
2 bar: string;
3 } Lines 1-3 all have offset of 0 and do not depend on any other token's offset. Line 2 should be ignored since its offset is 1 but I ignore all lines since it might be difficult to determine exactly what should be ignored. When we have a line thats offset depends on another token. For example, abstract class method statements: 1 abstract class {
2 public foo() {
3 bar++;
4 }
5 } Then Line 3's offset depends on the offset of token I will update the jsdoc comments to clarify this logic. |
LGTM |
Thanks for clarifying, that's a useful explanation. I'll review this again with that in mind sometime in the next few days when I have time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the right idea, but I think it fails for unknown nodes that aren't at the top level of a program. I'm also wondering whether it would be possible to simplify the implementation a bit.
lib/rules/indent.js
Outdated
|
||
const dependency = offsets.getFirstDependency(firstTokenOfLine); | ||
|
||
if (!dependency) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: If this if
statement has an else
branch anyway, it would be clearer to use if (dependency)
and swap the branches rather than if (!dependency)
.
I'll make a separate PR to enable no-negated-condition
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree its more readable to have the positive case first. Will change
lib/rules/indent.js
Outdated
let currDependency = dependency, | ||
lastDependency = dependency; | ||
|
||
while ((currDependency = offsets.getFirstDependency(currDependency))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it necessary to iterate through dependencies here? It seems like this will ignore a token that has no dependencies at the top level, but that might not be desirable. For example, a custom node might be nested:
function foo() {
function bar() {
abstract class X {
public baz() {
qux();
}
}
}
}
It looks like this would end up ignoring the function
token at the start of the program, but I think that would be incorrect behavior.
If I'm understanding the logic correctly, it seems like it would be better to stop this while
loop at the last token which is still within the bounds of the unknown node.
But in that case, is the loop even necessary? It seems like it would be better to ignore the first token of every line within the node, if the token's first dependency when exiting the node is outside the node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not consider this case. You're right we should only ignore tokens when they are within the the unknown node.
The loop might not be needed here. I will have to test it out. Since each line depends on the previous line indentation should work as expected.
lib/rules/indent.js
Outdated
"Program:exit"() { | ||
addParensIndent(sourceCode.ast.tokens); | ||
ignoreUnknownNodes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For performance reasons, I wonder if it would be better to do this lazily -- in other words, rather than ignoring all tokens after traversal, we could just validate tokens as normal and put an ignored-node check here, ignoring the token if necessary. That way we wouldn't need to worry about iterating over ignored tokens unless a token would actually produce indentation errors.
That said, let's focus on performance as a lower priority after we figure out the other parts of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do it this way. We have to find a way to map line number to unknown nodes and ignore them that way. Not sure how nested known nodes would work, since they would also have an incorrect desired indent until we set the correct dependency. I'll look into this after I fix the above issues.
This enables the no-negated-condition rule on the ESLint codebase, and fixes existing linting errors. Refs: #8504 (comment)
a52726a
to
f5d9217
Compare
LGTM |
@not-an-aardvark I pushed some new changes, sorry about the delay. I tried to move the logic to the end of the I removed the use of an array to store all outer unknown nodes and having a function call on Currently the code ignores all the lines of an unknown node on exit. This way I can tell if the unknown node is within a known node, or has dependencies outside of the unknown node. With that information I was able to fix the case of an unknown node having all or some of its offsets depend on a containing known node. I can still improve the performance of if we keep track of lines that were already ignored. Maybe by storing the line ranges or the line numbers of unknown nodes that have already been ignored. That way if an unknown node is contained in another unknown node its lines won't be ignored twice. |
lib/rules/indent.js
Outdated
const firstTokenOfDependentLine = tokenInfo.getFirstTokenOfLine(lastDependency); | ||
|
||
offsets.matchIndentOf(firstTokenOfDependentLine, lastDependency); | ||
} else if (dependency.loc.start.line < startLine) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should be able to move this check to the outer if statement and avoid the duplicated line of offsets.ignoreToken(firstTokenOfLine)
. I'll double check the logic tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this mostly looks good. I'm just wondering if it's possible to avoid looping through the offset dependency tree.
lib/rules/indent.js
Outdated
* Ignore first token of each line within an unknown node if its | ||
* offset does not depend on another token's offset. If the first token | ||
* of a line's offset depends on the offset of another token, make the | ||
* last dependent offset depend on the first token of its line. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize the behavior is a bit hard to explain, but I'm confused by the last sentence of this comment. What does it mean to "make the last dependent offset depend on the first token of its line"? Is this supposed to say "make the last dependent token's offset depend on the first token of it's line"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I will clean this up. Maybe If I create a separate function for handling lines with dependencies I can explain the logic in better detail.
lib/rules/indent.js
Outdated
|
||
while ( | ||
(currDependency = offsets.getFirstDependency(currDependency)) && | ||
currDependency.loc.start.line > startLine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still concerned about iterating through dependencies, as this really isn't what the offset dependency tree is designed for. The main intended usage is to just set offsets and let the tree handle how tokens depend on each other. This allows nodes to behave consistently regardless of what types of parents they have.
Is this loop still necessary? I'm not sure I understand how the offsets.matchIndentOf(firstTokenOfDependentLine, lastDependency);
on line 934 helps here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only example I can find where this is required is an abstract class method. For example:
1 abstract class Foo {
2 public bar() {
3 const baz = 4;
4 }
5 }
The first token of line 3, const
depends on offset of token {
, found at the end of line 2. The {
token depends on the offset of token (
, and this is the last token offset dependency.
In most cases, function/arrow expressions or if statements, the last token offset dependency will be the first token of the line. In the above case this is not true, the last token offset dependency is in the middle of line 2.
So this means when we get the desired offset for line number 3, by calling getDesiredIndent
, we get the offset of line 2 as if it were not ignored. Line 2 is supposed to have no indention or an offset of 0. This is because we only ignore the first token of every line and not all tokens on the line.
There are three solutions, I can think of, to this problem:
- We add another token dependency, which is the first token of the line. This is what line 934 does by calling
offsets.matchIndentOf()
. It takes the last dependency and makes it depend on the offset of the first token of the dependent line. - We could ignore the last token used when calculating the line's offset. Currently
ignoreToken()
can only ignore the first token of the line, since it calculates the offset based on this assumption. - We set the offset of the last token offset dependency to that of its line's current offset, which would be the same as ignoring it.
All of these solutions require finding the last token offset that the current line depends on.
Sorry for the long winded explanation but it helps me ensure I'm using the correct logic. We could even add another if statement that says if the last token dependency is the first token of the line do not call offsets.matchIndentOf
, since it would be redundant.
lib/rules/indent.js
Outdated
* @returns {void} | ||
*/ | ||
function checkForUnknownNode(node) { | ||
if (KNOWN_NODES.has(node.type)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: It might be better to avoid the extra return
statement here:
if (!KNOWN_NODES.has(node.type)) {
ignoreUnknownNode(node);
}
(In this case, I think using negation in an if
is fine because there's no else
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the is more clear. Will clean it up.
LGTM |
@not-an-aardvark Thanks for the reviews. I have added another JSDoc comments and moved the dependent line handling to its own function. I tried to clarify why the dependent token traversal needs to be done but it may not be clear enough. |
lib/rules/indent.js
Outdated
|
||
while ( | ||
(currDependency = offsets.getFirstDependency(currDependency)) && | ||
currDependency.loc.start.line > startLineOfNode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Continuing the discussion from #8504 (comment) because that conversation is on an outdated diff)
Thanks for the detailed explanation, that was very useful.
One way to resolve this without iterating through the dependencies here could be to account for this case when the total desired indentation is being computed. I think it will work if you change this line so that if a token with no offset dependency is not the first token of its line, the desired indentation of the first token of its line is used, rather than assuming that the desired indentation is 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The solution you posted works for the code snippet I posted, it's a great idea. There is one other case that this does not account for. When an unknown node is within a known node. There is a situation where a token can have its offset depend on tokens outside of the unknown node. This will cause the getDesiredIndent()
to keep traversing the depndencies and not stop on the ignored line.
For example:
1 function foo() {
2 function bar() {
3 abstract class X {
4 public baz() {
5 qux();
6 }
7 }
8 }
9 }
In this case Line 5 depends on Line 4 tokens {
and (
. The (
token depends on Line 2 tokens {
and function
. None of the tokens are dependent on any tokens in Line 3.
We need a way to break the token dependency chain when it continues outside of an unknown node. One solution is to check if the dependency of a line, inside an unknown node, lies outside the unknown node, then remove the dependency. It is simple to check if dependencies lies outside the unknown node.
The problem is how to determine which token is used a dependency in the line. I can just iterate through all the tokens on the line and set their dependencies to null but I feel like this could be improved. Maybe if I go down a line and traverse the dependencies but that feels like the old solution again.
Any ideas on how to handle this case or is removing the dependency of every token on the line sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe when we encounter an unknown node, we could offset all of its tokens?
"*"(node) {
if (!KNOWN_NODES.has(node.type)) {
offsets.setDesiredOffsets(getTokensAndComments(node), sourceCode.getFirstToken(node), 0);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting idea and will work in some cases. It does not work when we have token dependencies that are not on the first token of an ignored line.
This change will make all tokens depend on the first token of the first line of the unknown node. For example:
1 abstract class X {
2 public baz() {
3 if (true) {
4 qux();
5 }
6 }
7 }
Line 3 offset depends on Line 2 tokens {
and (
, which depend on Line 1 token abstract
.
Since we only ignore the first token of each line, only the token public
on Line 2 is ignored. This means the offset for Line 2 tokens {
and (
is 0 when it should be 1. The modification we made to getDesiredIndent()
, to use the offset of the first token of the line instead of 0, will only work when there is no further dependencies. I tried to modify getDesiredIndent()
to always get the offset from the first token of the line, even when there is another dependency, but this causes the indent rule to break.
I also tried to modify getDesiredIndent()
to use the offset of the first token when it is ignored. This would have the same effect of removing the dependency from every token of the line when it is ignored. But it caused some tests to fail.
Any other possible solutions? I can push the working solution I have currently. This includes modifying getDesiredIndent()
to use the first token of a line when there is no token dependency and removing the token dependencies from every token of a line only when the dependency falls outside of the unknown node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to push the working solution. It's possible there's a better way, but this one might be fine for now. (We can always improve the solution in the future -- I mainly want to make sure we wouldn't be breaking backwards compatibility by improving the solution later, e.g. if we're ignoring too many tokens here.)
I think that solution is fine. |
I think I found a simpler way to implement /**
* Ignore all tokens within an unknown node whose offset does not depend
* on another token's offset within the unknown node.
* @param {ASTNode} node Unknown Node
* @returns {void}
*/
function ignoreUnknownNode(node) {
const unknownNodeTokens = new Set(getTokensAndComments(node));
unknownNodeTokens.forEach(token => {
if (!unknownNodeTokens.has(offsets.getFirstDependency(token))) {
const firstTokenOfLine = tokenInfo.getFirstTokenOfLine(token);
if (token === firstTokenOfLine) {
offsets.ignoreToken(token);
} else {
offsets.matchIndentOf(firstTokenOfLine, token);
}
}
});
} All the tests in this PR still pass with this replacement. |
@not-an-aardvark That is a better solution. Thanks! It does have to check every token, instead of checking the first token of every line, but avoids the need to remove the dependencies of tokens that lie outside the unknown node and has less branching and complexity. I will test this solution on a few typescript repos I have just to make sure nothing unexpected happens. |
- Remove extra traversal of AST - Create list of outermost unknown nodes - Ignore lines of outermost unknown nodes only
- Convert KNOWN_NODES into Set - Add AwaitExpression to list of KNOWN_NODES - Update comments to clarify logic
- Only handle dependent lines when the first dependency is within the unknown node - Start from the first dependency when calculating the last token dependency - Add an if statement to ensure the first token of a dependent line does not equal the last token dependency - Change if statement in node listener to only branch when the node is unknown - Move finding last dependency check to its own function - Update and add more JSDoc comments
- Update getDesiredIndent function to use the first token of the line's offset when there is dependency - Remove the need to traverse all token depencies inside of an unknown node
- Add tests for variable declarator with unknown node - Use new logic for ignoring nodes
e9e1b64
to
cf502dd
Compare
LGTM |
- fix tests - do not use first token of line to get desired indent of token without dependnecy
LGTM |
LGTM |
@not-an-aardvark New logic works great. Changes made:
The only difference I can find is with the type httpMethod = 'GET'
| 'POST'
| 'PUT'; The old logic would enforce indentation here. This is because it would think the token dependency is part of the unknown node, since its line number is between the line number of the unknown node, and not ignore the tokens. I thought about fixing this but this case would still be ignored: type httpMethod =
'GET'
| 'POST'
| 'PUT'; I decided to just leave it as is and ignore these nodes instead of trying to come up with logic where it would indent properly. This means we don't need the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks!
At some point we should probably add some custom parsers as devDependencies so that we don't have to copy-paste ASTs and make giant diffs whenever we add tests with custom syntax. But that doesn't have to be done in this PR. |
Is there anything left to do for this PR, or is it ready to merge? |
Should be good to go |
What is the purpose of this pull request? (put an "X" next to item)
[ X ] Bug fix (template)
What changes did you make? (Give an overview)
Created a new function that is run after program exit. It traverses the entire AST searching for unknown nodes. If an unknown node exists it will ignore the first token of each line within the node. If the token has a dependency, used to calculate its offset, then we find the last dependency and set it's offset to match that of the first token on its line.
Is there anything you'd like reviewers to focus on?
This is my first attempt at a solution. There might be a cleaner one or more efficient one. I tried to keep it simple and readable. Not sure how we can avoid traversing the entire AST again. Maybe if we had a mapping for token to node that might work better.
Is there any cases I missed of a token having an offset that this change will now ignore? I think by ensuring we do not ignore tokens that have dependencies should cover every case.
Is it safe to match the offset of the last dependency to that of the first token of the line. I assume it should be OK since that token will be ignored, since it should lie within the unknown node.