Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Ignore unknown nodes for Indent rule (fixes #8440) #8504

Merged
merged 12 commits into from
May 13, 2017

Conversation

soda0289
Copy link
Member

@soda0289 soda0289 commented Apr 25, 2017

What is the purpose of this pull request? (put an "X" next to item)
[ X ] Bug fix (template)

What changes did you make? (Give an overview)
Created a new function that is run after program exit. It traverses the entire AST searching for unknown nodes. If an unknown node exists it will ignore the first token of each line within the node. If the token has a dependency, used to calculate its offset, then we find the last dependency and set it's offset to match that of the first token on its line.

Is there anything you'd like reviewers to focus on?
This is my first attempt at a solution. There might be a cleaner one or more efficient one. I tried to keep it simple and readable. Not sure how we can avoid traversing the entire AST again. Maybe if we had a mapping for token to node that might work better.

Is there any cases I missed of a token having an offset that this change will now ignore? I think by ensuring we do not ignore tokens that have dependencies should cover every case.

Is it safe to match the offset of the last dependency to that of the first token of the line. I assume it should be OK since that token will be ignored, since it should lie within the unknown node.

@soda0289 soda0289 added bug ESLint is working incorrectly indent Relates to the `indent` rule rule Relates to ESLint's core rules labels Apr 25, 2017
@eslintbot
Copy link

LGTM

@mention-bot
Copy link

@soda0289, thanks for your PR! By analyzing the history of the files in this pull request, we identified @not-an-aardvark, @vitorbal and @gyandeeps to be potential reviewers.

@not-an-aardvark
Copy link
Member

Thanks for the PR, it looks like a good start. I'll review this more thoroughly tomorrow, but my first impression is that we definitely need to try to avoid a second traversal, if at all possible. Doing a second traversal (and having two nested loops for each node) will significantly hurt performance.

I feel like there might be a simpler way to do this by just doing the unknown-node check inside the Program:exit handler right before a token would normally be reported. I haven't had a chance to try this out yet, though -- maybe it doesn't work as I'd hoped.

@soda0289
Copy link
Member Author

I tried to avoid calling getNodeByRangeIndex for every single token. Every line will fail inside of a unknown node and we need a way to know the outer/parent node of the token. Maybe if we created a mapping of firstTokenOfLine to node it would speed things up. It could be possible to call getNodeByRangeIndex only a few times if we can mark all tokens inside the unknown node as soon as we find an invalid one.

Another solution was to allow for eslint to have wildcard node listeners. For example a node listener of '*' being called every node. This way we can tell when we are inside of an unknown node and mark tokens correctly using the other node listeners.

@not-an-aardvark
Copy link
Member

Another solution was to allow for eslint to have wildcard node listeners. For example a node listener of '*' being called every node. This way we can tell when we are inside of an unknown node and mark tokens correctly using the other node listeners.

We already have this -- a listener for '*' will be called on every node.

@eslintbot
Copy link

LGTM

@soda0289
Copy link
Member Author

@not-an-aardvark
I removed the extra AST traversal and used the '*' selector to create a list of the outermost unknown nodes. I then use this list on program exit to ignore each line within an unknown node or set the last dependency to the first token of its line.

It may be possible to set the dependency offset, for blocks and classes, by adding logic into those node listeners since it is not possible to know when we are inside an unknown node. This would eliminate the need to find the last dependency of an offset.

Copy link
Member

@not-an-aardvark not-an-aardvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay, I left a few comments.

Could you give a high-level description of how this works? I see the logic you're using, but I'm not sure I understand how it solves the problem of ignoring unknown nodes. I feel like there is something incorrect about this (it seems like it shouldn't be necessary to have special cases for when a node has no dependencies), but I'm having trouble understanding what's going on.

AssignmentPattern: "AssignmentPattern",
ArrayExpression: "ArrayExpression",
ArrayPattern: "ArrayPattern",
ArrowFunctionExpression: "ArrowFunctionExpression",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to add AwaitExpression to this list. (It wasn't there initially due to eslint/js#331.)

@@ -19,6 +19,87 @@ const astUtils = require("../ast-utils");
// Rule Definition
//------------------------------------------------------------------------------

const KNOWN_NODES = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use a Set here instead of an object?

}

/**
* Ignore each line of an unknown node if its offset has no dependencies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a node only ignored if its offset has no dependencies?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ignore the first token of the line within the unknown node. This way it's desired offset will be set to it's current offset.

If the offset is dependent on another token we have to find this token and make its offset depend on the first token of its line, since that token is already ignored.

@soda0289
Copy link
Member Author

soda0289 commented Apr 27, 2017

When I say the first token of a line has no dependencies I mean its offset is not dependent on another tokens offset.

For example a typescript interface:

1  interface Foo {
2    bar: string;
3  }

Lines 1-3 all have offset of 0 and do not depend on any other token's offset. { offset: 0, from: null }

Line 2 should be ignored since its offset is 1 but I ignore all lines since it might be difficult to determine exactly what should be ignored.

When we have a line thats offset depends on another token. For example, abstract class method statements:

1 abstract class {
2  public foo() {
3    bar++;
4  }
5 }

Then Line 3's offset depends on the offset of token {, which depends on the offset of ). In this case we set the last dependent token to depend on the first token of its line, which would be the public token. Since this token's offset is ignored its offset will be set correctly.

I will update the jsdoc comments to clarify this logic.

@eslintbot
Copy link

LGTM

@not-an-aardvark
Copy link
Member

Thanks for clarifying, that's a useful explanation. I'll review this again with that in mind sometime in the next few days when I have time.

@not-an-aardvark not-an-aardvark self-requested a review April 28, 2017 05:20
Copy link
Member

@not-an-aardvark not-an-aardvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the right idea, but I think it fails for unknown nodes that aren't at the top level of a program. I'm also wondering whether it would be possible to simplify the implementation a bit.


const dependency = offsets.getFirstDependency(firstTokenOfLine);

if (!dependency) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: If this if statement has an else branch anyway, it would be clearer to use if (dependency) and swap the branches rather than if (!dependency).

I'll make a separate PR to enable no-negated-condition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree its more readable to have the positive case first. Will change

let currDependency = dependency,
lastDependency = dependency;

while ((currDependency = offsets.getFirstDependency(currDependency))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it necessary to iterate through dependencies here? It seems like this will ignore a token that has no dependencies at the top level, but that might not be desirable. For example, a custom node might be nested:

function foo() {
  function bar() {
    abstract class X {
      public baz() {
        qux();
      }
    }
  }
}

It looks like this would end up ignoring the function token at the start of the program, but I think that would be incorrect behavior.

If I'm understanding the logic correctly, it seems like it would be better to stop this while loop at the last token which is still within the bounds of the unknown node.

But in that case, is the loop even necessary? It seems like it would be better to ignore the first token of every line within the node, if the token's first dependency when exiting the node is outside the node.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not consider this case. You're right we should only ignore tokens when they are within the the unknown node.

The loop might not be needed here. I will have to test it out. Since each line depends on the previous line indentation should work as expected.

"Program:exit"() {
addParensIndent(sourceCode.ast.tokens);
ignoreUnknownNodes();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For performance reasons, I wonder if it would be better to do this lazily -- in other words, rather than ignoring all tokens after traversal, we could just validate tokens as normal and put an ignored-node check here, ignoring the token if necessary. That way we wouldn't need to worry about iterating over ignored tokens unless a token would actually produce indentation errors.

That said, let's focus on performance as a lower priority after we figure out the other parts of this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do it this way. We have to find a way to map line number to unknown nodes and ignore them that way. Not sure how nested known nodes would work, since they would also have an incorrect desired indent until we set the correct dependency. I'll look into this after I fix the above issues.

not-an-aardvark added a commit that referenced this pull request Apr 28, 2017
This enables the no-negated-condition rule on the ESLint codebase, and fixes existing linting errors.

Refs: #8504 (comment)
@soda0289 soda0289 force-pushed the indent-ignore-unknown-nodes branch from a52726a to f5d9217 Compare May 3, 2017 02:39
@eslintbot
Copy link

LGTM

@soda0289
Copy link
Member Author

soda0289 commented May 3, 2017

@not-an-aardvark I pushed some new changes, sorry about the delay.

I tried to move the logic to the end of the Program:exit handler, before we report an error. I ran into issues when a line was marked as valid when it should be invalid. This can occur if the offset dependencies skip over the unknown node. The code also became more complex as I needed to keep track of all unknown nodes, and their containers. I still have a branch that I can try to work with but the current solution is much simpler.

I removed the use of an array to store all outer unknown nodes and having a function call on Program:exit to ignore them.

Currently the code ignores all the lines of an unknown node on exit. This way I can tell if the unknown node is within a known node, or has dependencies outside of the unknown node. With that information I was able to fix the case of an unknown node having all or some of its offsets depend on a containing known node.

I can still improve the performance of if we keep track of lines that were already ignored. Maybe by storing the line ranges or the line numbers of unknown nodes that have already been ignored. That way if an unknown node is contained in another unknown node its lines won't be ignored twice.

@not-an-aardvark not-an-aardvark added the accepted There is consensus among the team that this change meets the criteria for inclusion label May 3, 2017
const firstTokenOfDependentLine = tokenInfo.getFirstTokenOfLine(lastDependency);

offsets.matchIndentOf(firstTokenOfDependentLine, lastDependency);
} else if (dependency.loc.start.line < startLine) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should be able to move this check to the outer if statement and avoid the duplicated line of offsets.ignoreToken(firstTokenOfLine). I'll double check the logic tomorrow.

Copy link
Member

@not-an-aardvark not-an-aardvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this mostly looks good. I'm just wondering if it's possible to avoid looping through the offset dependency tree.

* Ignore first token of each line within an unknown node if its
* offset does not depend on another token's offset. If the first token
* of a line's offset depends on the offset of another token, make the
* last dependent offset depend on the first token of its line.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize the behavior is a bit hard to explain, but I'm confused by the last sentence of this comment. What does it mean to "make the last dependent offset depend on the first token of its line"? Is this supposed to say "make the last dependent token's offset depend on the first token of it's line"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I will clean this up. Maybe If I create a separate function for handling lines with dependencies I can explain the logic in better detail.


while (
(currDependency = offsets.getFirstDependency(currDependency)) &&
currDependency.loc.start.line > startLine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still concerned about iterating through dependencies, as this really isn't what the offset dependency tree is designed for. The main intended usage is to just set offsets and let the tree handle how tokens depend on each other. This allows nodes to behave consistently regardless of what types of parents they have.

Is this loop still necessary? I'm not sure I understand how the offsets.matchIndentOf(firstTokenOfDependentLine, lastDependency); on line 934 helps here.

Copy link
Member Author

@soda0289 soda0289 May 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only example I can find where this is required is an abstract class method. For example:

1 abstract class Foo {
2   public bar() {
3     const baz = 4;
4   }
5 }

The first token of line 3, const depends on offset of token {, found at the end of line 2. The { token depends on the offset of token (, and this is the last token offset dependency.

In most cases, function/arrow expressions or if statements, the last token offset dependency will be the first token of the line. In the above case this is not true, the last token offset dependency is in the middle of line 2.

So this means when we get the desired offset for line number 3, by calling getDesiredIndent, we get the offset of line 2 as if it were not ignored. Line 2 is supposed to have no indention or an offset of 0. This is because we only ignore the first token of every line and not all tokens on the line.

There are three solutions, I can think of, to this problem:

  1. We add another token dependency, which is the first token of the line. This is what line 934 does by calling offsets.matchIndentOf(). It takes the last dependency and makes it depend on the offset of the first token of the dependent line.
  2. We could ignore the last token used when calculating the line's offset. Currently ignoreToken() can only ignore the first token of the line, since it calculates the offset based on this assumption.
  3. We set the offset of the last token offset dependency to that of its line's current offset, which would be the same as ignoring it.

All of these solutions require finding the last token offset that the current line depends on.

Sorry for the long winded explanation but it helps me ensure I'm using the correct logic. We could even add another if statement that says if the last token dependency is the first token of the line do not call offsets.matchIndentOf, since it would be redundant.

* @returns {void}
*/
function checkForUnknownNode(node) {
if (KNOWN_NODES.has(node.type)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: It might be better to avoid the extra return statement here:

if (!KNOWN_NODES.has(node.type)) {
    ignoreUnknownNode(node);
}

(In this case, I think using negation in an if is fine because there's no else.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the is more clear. Will clean it up.

@eslintbot
Copy link

LGTM

@soda0289
Copy link
Member Author

soda0289 commented May 4, 2017

@not-an-aardvark Thanks for the reviews.

I have added another JSDoc comments and moved the dependent line handling to its own function. I tried to clarify why the dependent token traversal needs to be done but it may not be clear enough.


while (
(currDependency = offsets.getFirstDependency(currDependency)) &&
currDependency.loc.start.line > startLineOfNode
Copy link
Member

@not-an-aardvark not-an-aardvark May 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Continuing the discussion from #8504 (comment) because that conversation is on an outdated diff)

Thanks for the detailed explanation, that was very useful.

One way to resolve this without iterating through the dependencies here could be to account for this case when the total desired indentation is being computed. I think it will work if you change this line so that if a token with no offset dependency is not the first token of its line, the desired indentation of the first token of its line is used, rather than assuming that the desired indentation is 0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution you posted works for the code snippet I posted, it's a great idea. There is one other case that this does not account for. When an unknown node is within a known node. There is a situation where a token can have its offset depend on tokens outside of the unknown node. This will cause the getDesiredIndent() to keep traversing the depndencies and not stop on the ignored line.

For example:

1 function foo() {
2   function bar() {
3     abstract class X {
4       public baz() {
5         qux();
6       }
7     }
8   }
9 }

In this case Line 5 depends on Line 4 tokens { and (. The ( token depends on Line 2 tokens { and function. None of the tokens are dependent on any tokens in Line 3.

We need a way to break the token dependency chain when it continues outside of an unknown node. One solution is to check if the dependency of a line, inside an unknown node, lies outside the unknown node, then remove the dependency. It is simple to check if dependencies lies outside the unknown node.

The problem is how to determine which token is used a dependency in the line. I can just iterate through all the tokens on the line and set their dependencies to null but I feel like this could be improved. Maybe if I go down a line and traverse the dependencies but that feels like the old solution again.

Any ideas on how to handle this case or is removing the dependency of every token on the line sufficient?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe when we encounter an unknown node, we could offset all of its tokens?

"*"(node) {
    if (!KNOWN_NODES.has(node.type)) {
            offsets.setDesiredOffsets(getTokensAndComments(node), sourceCode.getFirstToken(node), 0);
    }
}

Copy link
Member Author

@soda0289 soda0289 May 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting idea and will work in some cases. It does not work when we have token dependencies that are not on the first token of an ignored line.

This change will make all tokens depend on the first token of the first line of the unknown node. For example:

1 abstract class X {
2     public baz() {
3         if (true) {
4             qux();
5         }
6     }
7 }

Line 3 offset depends on Line 2 tokens { and (, which depend on Line 1 token abstract.

Since we only ignore the first token of each line, only the token public on Line 2 is ignored. This means the offset for Line 2 tokens { and ( is 0 when it should be 1. The modification we made to getDesiredIndent(), to use the offset of the first token of the line instead of 0, will only work when there is no further dependencies. I tried to modify getDesiredIndent() to always get the offset from the first token of the line, even when there is another dependency, but this causes the indent rule to break.

I also tried to modify getDesiredIndent() to use the offset of the first token when it is ignored. This would have the same effect of removing the dependency from every token of the line when it is ignored. But it caused some tests to fail.

Any other possible solutions? I can push the working solution I have currently. This includes modifying getDesiredIndent() to use the first token of a line when there is no token dependency and removing the token dependencies from every token of a line only when the dependency falls outside of the unknown node.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to push the working solution. It's possible there's a better way, but this one might be fine for now. (We can always improve the solution in the future -- I mainly want to make sure we wouldn't be breaking backwards compatibility by improving the solution later, e.g. if we're ignoring too many tokens here.)

@not-an-aardvark
Copy link
Member

Currently I changed it to just uses the default variable indentation but maybe we can find a way to ignore this kind of node as well.

I think that solution is fine. VariableDeclarator nodes are already a bit of a special case, so let's keep things as they are unless there is another situation that can't be handled by the current logic.

@not-an-aardvark
Copy link
Member

I think I found a simpler way to implement ignoreUnknownNode:

        /**
        * Ignore all tokens within an unknown node whose offset does not depend
        * on another token's offset within the unknown node.
        * @param {ASTNode} node Unknown Node
        * @returns {void}
        */
        function ignoreUnknownNode(node) {
            const unknownNodeTokens = new Set(getTokensAndComments(node));

            unknownNodeTokens.forEach(token => {
                if (!unknownNodeTokens.has(offsets.getFirstDependency(token))) {
                    const firstTokenOfLine = tokenInfo.getFirstTokenOfLine(token);

                    if (token === firstTokenOfLine) {
                        offsets.ignoreToken(token);
                    } else {
                        offsets.matchIndentOf(firstTokenOfLine, token);
                    }
                }
            });
        }

All the tests in this PR still pass with this replacement.

@soda0289
Copy link
Member Author

soda0289 commented May 9, 2017

@not-an-aardvark That is a better solution. Thanks!

It does have to check every token, instead of checking the first token of every line, but avoids the need to remove the dependencies of tokens that lie outside the unknown node and has less branching and complexity.

I will test this solution on a few typescript repos I have just to make sure nothing unexpected happens.

- Remove extra traversal of AST
- Create list of outermost unknown nodes
- Ignore lines of outermost unknown nodes only
- Convert KNOWN_NODES into Set
- Add AwaitExpression to list of KNOWN_NODES
- Update comments to clarify logic
 - Do not store unknown nodes in array
 - Ignore unknown node on estraverse exit of node
 - If offset dependency is outside of unknown node then ignore line
 - Small fixups
- Only handle dependent lines when the first dependency is within
  the unknown node
- Start from the first dependency when calculating the last token
  dependency
- Add an if statement to ensure the first token of a dependent
  line does not equal the last token dependency
- Change if statement in node listener to only branch when the node
  is unknown
- Move finding last dependency check to its own function
- Update and add more JSDoc comments
 - Update getDesiredIndent function to use the first token of the
   line's offset when there is dependency
 - Remove the need to traverse all token depencies inside of an
   unknown node
 - Add tests for variable declarator with unknown node
 - Use new logic for ignoring nodes
@soda0289 soda0289 force-pushed the indent-ignore-unknown-nodes branch from e9e1b64 to cf502dd Compare May 11, 2017 02:57
@eslintbot
Copy link

LGTM

 - fix tests
 - do not use first token of line to get desired indent of token
   without dependnecy
@eslintbot
Copy link

LGTM

@eslintbot
Copy link

LGTM

@soda0289
Copy link
Member Author

@not-an-aardvark New logic works great.

Changes made:

  • Rebased
  • Used new logic
  • Removed change to getDesiredIndent() that used first token of line when token had no dependency
  • Added passing tests for VariableDeclarator node with and without indentation

The only difference I can find is with the VariableDeclarator case:

type httpMethod = 'GET'
  | 'POST'
  | 'PUT';

The old logic would enforce indentation here. This is because it would think the token dependency is part of the unknown node, since its line number is between the line number of the unknown node, and not ignore the tokens. I thought about fixing this but this case would still be ignored:

type httpMethod = 
  'GET'
  | 'POST'
  | 'PUT';

I decided to just leave it as is and ignore these nodes instead of trying to come up with logic where it would indent properly.

This means we don't need the VariableDeclarator fix to allow unknown kind, such as type, and I can remove this too.

Copy link
Member

@not-an-aardvark not-an-aardvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks!

@not-an-aardvark
Copy link
Member

At some point we should probably add some custom parsers as devDependencies so that we don't have to copy-paste ASTs and make giant diffs whenever we add tests with custom syntax. But that doesn't have to be done in this PR.

@not-an-aardvark
Copy link
Member

Is there anything left to do for this PR, or is it ready to merge?

@soda0289
Copy link
Member Author

Should be good to go

@not-an-aardvark not-an-aardvark merged commit c6c639d into master May 13, 2017
@not-an-aardvark not-an-aardvark deleted the indent-ignore-unknown-nodes branch May 13, 2017 18:53
@eslint-deprecated eslint-deprecated bot locked and limited conversation to collaborators Feb 6, 2018
@eslint-deprecated eslint-deprecated bot added the archived due to age This issue has been archived; please open a new issue for any further discussion label Feb 6, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
accepted There is consensus among the team that this change meets the criteria for inclusion archived due to age This issue has been archived; please open a new issue for any further discussion bug ESLint is working incorrectly indent Relates to the `indent` rule rule Relates to ESLint's core rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants