Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking: Calculate leading/trailing comments in core #7516

Merged
merged 7 commits into from Apr 7, 2017

Conversation

@kaicataldo
Copy link
Member

commented Nov 1, 2016

Fixes #6724

What is the purpose of this pull request? (put an "X" next to item)

[ ] Documentation update
[ ] Bug fix (template)
[ ] New rule (template)
[ ] Changes an existing rule (template)
[ ] Add autofixing to a rule
[ ] Add a CLI option
[X] Add something to the core
[ ] Other, please explain:

What changes did you make? (Give an overview)
This PR turns off comment attachment in Espree and moves comment getting logic into sourceCode.getComments(). This is a breaking change.

Is there anything you'd like reviewers to focus on?
Would love suggestions for how else we might be able to handle shebang comments.

As discussed on the corresponding issue, we should discuss if we want to continue thinking about comments the same way now that we're not attaching at the parser level. This PR mimics the current attachment strategy in Espree as close as it can, though it's not possible (nor do I think we want it to be) exactly the same as it is in Espree, because there are some unpredictable edge cases and bugs in that.

The version in this PR should essentially work the same for our users and ecosystem (unless they rely on some of the really weird edge cases mentioned before).

@eslintbot

This comment has been minimized.

Copy link

commented Nov 1, 2016

Thanks for the pull request, @kaicataldo! I took a look to make sure it's ready for merging and found some changes are needed:

  • The commit summary needs to begin with a tag (such as Fix: or Update:). Please check out our guide for how to properly format your commit summary and update it on this pull request.

Can you please update the pull request to address these?

(More information can be found in our pull request guide.)

@mention-bot

This comment has been minimized.

Copy link

commented Nov 1, 2016

@kaicataldo, thanks for your PR! By analyzing the history of the files in this pull request, we identified @btmills, @nzakas and @mysticatea to be potential reviewers.

@kaicataldo kaicataldo force-pushed the getcomments branch from 5779c43 to 3d684b4 Nov 1, 2016

@kaicataldo kaicataldo force-pushed the getcomments branch 3 times, most recently from 7459bec to 1c5ffa3 Nov 1, 2016

@nzakas
Copy link
Member

left a comment

I'm not entirely sure what I should be reviewing here. Can you point out some areas where you'd like some comments? And can you explain what the differences are between what Espree does and what you're doing here?

@@ -908,6 +908,8 @@ module.exports = (function() {
}
}

ast.hasShebang = !!shebang;

This comment has been minimized.

Copy link
@nzakas

nzakas Nov 4, 2016

Member

Hmm, this doesn't look like a good idea.

This comment has been minimized.

Copy link
@kaicataldo

kaicataldo Nov 4, 2016

Author Member

Yeah, this was what I was hoping to get some feedback on. The current behavior is to modify the shebang comment to be a normal JS line comment before parsing and then to remove the parsed comment from the top level comments array as well as from the leadingComments of the first node in the Program body after parsing has completed.

Now that we're calculating this on the fly, I need to figure out how getComments() can know which token represents a shebang comment (if one exists) so that it doesn't include it. The challenge here is that once the shebang comment has been modified to be a normal JS line comment, there isn't reliable way of knowing if there is a shebang comment at the top of the file or not.

Is there any possibility of adding a property to the shebang comment token that we could check? Any other suggestions/ideas would be most welcome!

This comment has been minimized.

Copy link
@nzakas

nzakas Nov 8, 2016

Member

Hmm, I think I'm missing something. If we're already removing the shebang comment from comments, and getComments() uses the comments array to figure out which comments to return, wouldn't it automatically ignore shebang comments?

This comment has been minimized.

Copy link
@kaicataldo

kaicataldo Nov 8, 2016

Author Member

Good question - it's because the SourceCode instance is created before we remove the shebang comment from the comments array. So it seems like the fix might actually be in lib/eslint.js

This comment has been minimized.

Copy link
@kaicataldo

kaicataldo Nov 8, 2016

Author Member

Here's the line where the SourceCode instance is created: https://github.com/eslint/eslint/blob/master/lib/eslint.js#L817

And here's where the shebang comment is removed:
https://github.com/eslint/eslint/blob/master/lib/eslint.js#L903

This comment has been minimized.

Copy link
@nzakas

nzakas Nov 9, 2016

Member

Still confused. If ast is the same as was passed into SourceCode, shouldn't the change work correctly? Is this just a timing issue?

This comment has been minimized.

Copy link
@kaicataldo

kaicataldo Nov 14, 2016

Author Member

That's right - sorry I'm not explaining this better. The SourceCode instance above takes the parsed ast and generates its own internal tokenAndCommentStore from it. Since this occurs before the shebang comment is removed, the SourceCode instance's tokenAndCommentStore still contains the shebang comment.

I think we should be able to do the shebang comment removal before creating the SourceCode instance - will have to figure out how to do it with the few forks of logic that happen there.

This comment has been minimized.

Copy link
@kaicataldo

kaicataldo Nov 17, 2016

Author Member

Had some time to look at this and I think I've found a better solution. One of our rules, lines-around-directive, relies on the shebang comment being in sourceCode's tokensAndCommentsStore, so I need to just fix that rule and we should be good to go!

This comment has been minimized.

Copy link
@kaicataldo

kaicataldo Nov 17, 2016

Author Member

After a bit of digging I'm realizing that including the shebang in the tokens and comments store might be intentional and actually a good thing. Despite not being attached to any nodes or included in the comments array of the AST, transforming it into a standard JS line comment and including it in the store allows us to use sourceCode.getTokenOrCommentBefore() in rules to write rules around the shebang.

Had some thoughts and will write them in a comment below.

@@ -133,6 +143,9 @@ function SourceCode(text, ast) {
this.getTokenOrCommentBefore = tokensAndCommentsStore.getTokenBefore;
this.getTokenOrCommentAfter = tokensAndCommentsStore.getTokenAfter;

this._getTokens = tokensAndCommentsStore.getTokens;
this._getCommentsStore = new WeakMap();

This comment has been minimized.

Copy link
@nzakas

nzakas Nov 4, 2016

Member

Maybe just _commentStore?

@kaicataldo

This comment has been minimized.

Copy link
Member Author

commented Nov 4, 2016

Sure, sorry that wasn't clear.

Behavior and differences between sourceCode.getComments() and Espree's comment attachment

  • The new behavior for sourceCode.getComments() is to iterate over the token list and stop when it encounters a non-comment token (starting from the first token and checking before and the last token and checking after). This is different from the current behavior, as Espree collects comments as it parses and then attaches when it finishes a node. This means that comments can be attached across tokens (parentheses, operators) and leads to some unexpected behavior (see examples below).
  • It doesn't attach nodes that exist outside the range of the node's parent, as it should then be attached to that parent node (I didn't notice any differences between Espree and this PR for this behavior).

Examples:

foo /*comment*/ || /*comment*/ bar
var foo /*comment*/ = /*comment*/ bar;

Espree: In both examples above, the Identifierfoo has 1 trailing comment while bar has 2 leading comments.

sourceCode.getComments(): In both examples above, the Identifierfoo has 1 trailing comment and the Identifierbar has 1 leading comments.

function foo(/*asdf*/) {}
function foo(/*asdf*/bar) {}

Espree: In the first example above, the BlockStatement has a leading comment. In the second, the Identifier bar has a leading comment and the BlockStatement doesn't have any.

sourceCode.getComments(): The first example does not return any comments. I wasn't sure what the desired behavior would be here - should it attach the comment as a trailing comment when a function node's params is empty? In the second, the Identifier bar has a leading comment. The BlockStatement does have leading comments in either case.

Questions/Concerns

  • How should we handle shebang comments (since they shouldn't be included in the results for getComments()?
  • How should the comments in the second example (inside the parens of a function declaration without any parameters) be treated? This is a case where the model of leading/trailing comments doesn't make a lot of sense. I also don't think it makes sense for them to be attached to the function body as a leading comment (the current behavior).
  • How does everyone feel about the slightly changed (and more predictable) behavior described above? So far, it doesn't seem to affect any use cases in the ESLint codebase.

@kaicataldo kaicataldo removed the do not merge label Nov 6, 2016

@kaicataldo kaicataldo changed the title WIP - Breaking: Calculate leading/trailing comments in sourceCode.getComments() Breaking: Calculate leading/trailing comments in sourceCode.getComments() Nov 8, 2016

@kaicataldo kaicataldo changed the title Breaking: Calculate leading/trailing comments in sourceCode.getComments() Calculate leading/trailing comments in sourceCode.getComments() Nov 8, 2016

@kaicataldo kaicataldo changed the title Calculate leading/trailing comments in sourceCode.getComments() Breaking: Calculate leading/trailing comments in sourceCode.getComments() Nov 8, 2016

@nzakas

This comment has been minimized.

Copy link
Member

commented Nov 9, 2016

Thanks, that's super helpful. I think the new behavior you've described makes a lot of sense, and have no objections to either dropping the comment without a node or the slightly changed attachment behavior of leading comments.

@platinumazure
Copy link
Member

left a comment

I might be misunderstanding a few things, but hopefully this review will be of some use.

const code = [
"//#!/usr/bin/env node",
"var a;",
"// foo",

This comment has been minimized.

Copy link
@platinumazure

platinumazure Nov 17, 2016

Member

Does this comment count as trailing for var a; and leading for var b;?

I don't think this is a problem since people who want to iterate over all comments can just iterate over the comment store without worrying about attachment, but I also want to make sure I understand what's going on here.

Maybe some comments around the asserts would help? (E.g., assertCommentCount(1, 1)(node); // commented shebang, foo)

This comment has been minimized.

Copy link
@kaicataldo

kaicataldo Nov 17, 2016

Author Member

That's right. This is the current behavior as defined by Espree's comment attachment. I was hoping we could discuss as a team to see what everyone thought about keeping the behavior as close to Espree as possible or if there were ideas for improvements, since this is already a breaking change. If you have any ideas, I'd love to discuss!

This comment has been minimized.

Copy link
@platinumazure

platinumazure Nov 17, 2016

Member

I'm okay with the behavior-- I just wanted to confirm my understanding.

I'd love to see some comments in the tests themselves, so it's clear to people unfamiliar with comment attachment which comments are leading and trailing to what nodes. 90% of the time it's clear, but for the other 10%, it'd be nice to see documentation via comments.

eslint.on("Identifier", assertCommentCount(0, 0));

eslint.verify(code, config, "", true);
});

This comment has been minimized.

Copy link
@platinumazure

platinumazure Nov 17, 2016

Member

Could you please add a test for a multiple-declarator VariableDeclaration, so we understand how the comment attachment is supposed to work there?

Example test case:

// Leading comment for VariableDeclaration?
var a,  // Trailing comment for VariableDeclarator? And/or leading for the next?
    b,  // Trailing comment for second VariableDeclarator?
    c;  // Trailing comment for VariableDeclaration?
// Trailing comment for VariableDeclaration?
"switch (foo)",
" //comment",
" /*another comment*/",
"}"

This comment has been minimized.

Copy link
@platinumazure

platinumazure Nov 17, 2016

Member

Is this example even syntactically valid? I see a closing brace but not an opening brace.

This comment has been minimized.

Copy link
@kaicataldo

kaicataldo Nov 17, 2016

Author Member

Good catch 👍

@kaicataldo

This comment has been minimized.

Copy link
Member Author

commented Nov 17, 2016

Working on this has led me to have some questions around how we want to handle shebangs. Essentially, it seems like we actually probably want to keep the current behavior of not including shebangs in the AST's comments array and should not be included as a leading comment when we use sourceCode.getComments(). Please see this comment thread for more context.

The problem as it currently stands is that sourceCode.getComments() doesn't have a way of knowing if a LineComment token was a shebang or not (since it gets transformed into a standard JS LineComment prior to parsing). I think we do want to keep the behavior of transforming the shebang and keeping it in the tokens and comment store, as this allows rules to use sourceCode.getTokenOrCommentBefore() to get the token that represents the shebang.

It seems like we have a few ways forward, and I wanted to see what you all thought:

  • Right after parsing in lib/eslint.js, add a property (shebang: true?) to the LineComment token that represents the shebang and then filter it when calculating sourceCode.getComments()'s return value.
  • When sourceCode is instatiated, remove the shebang and store a property/add a helper method in sourceCode that allows access to the shebang token.
  • Simply remove the shebang token and make rules figure this out for themselves (maybe by checking the first line of the source code). This is my least favorite option, because it feels like something we should be able to provide.

Thoughts? Suggestions? Things I missed?

@platinumazure

This comment has been minimized.

Copy link
Member

commented Nov 17, 2016

I'd vote for option 2:

  • When sourceCode is instatiated, remove the shebang and store a property/add a helper method in sourceCode that allows access to the shebang token.

This is similar to how we handle the byte-order mark (BOM).

I would be okay with option 1 (add shebang property to the comment token itself).

I would be opposed to option 3 (remove the comment entirely from SourceCode's store).

@mikesherov

This comment has been minimized.

Copy link
Contributor

commented Nov 17, 2016

This sounds correct to me too:

When sourceCode is instatiated, remove the shebang and store a property/add a helper method in sourceCode that allows access to the shebang token.

@kaicataldo kaicataldo force-pushed the getcomments branch 5 times, most recently from 7a64fdb to 970a009 Nov 17, 2016

@kaicataldo

This comment has been minimized.

Copy link
Member Author

commented Nov 24, 2016

Updated - thoughts on this approach? I thought about it some more and am actually uncomfortable with removing it from the token list. Treating the shebang like a Line comment works for most cases - the cases that we don't cover are ones where the rule needs to know whether the line comment token it's checking represents a shebang or not.

This current iteration changes the type of the comment token to Shebang. Doing so will continue to allow rules to iterate over tokens (which seems the most likely way that rules will be checking this), as well as giving rules that use sourceCode.getTokenOrCommentBefore() an easy way to differentiate between regular JS Line comments and Shebang comments (token.type === "Shebang").

This could potentially break some rules that assume that the token.type === "Line" check will include shebang comments while iterating over the token list, but this seems like a pretty narrow use case and I think this gives rule writers greater control.

Thanks for all the input! If this doesn't seem like a good idea, I'm happy to continue exploring other options.

@kaicataldo kaicataldo force-pushed the getcomments branch from 970a009 to be5df78 Nov 24, 2016

@kaicataldo

This comment has been minimized.

Copy link
Member Author

commented Apr 4, 2017

@not-an-aardvark @btmills Thanks again for the thorough reviews! I have addressed all the comments (either with code changes or comments of my own). Please let me know what you think! Changes were made in the last two commits, so hopefully it's not too hard to re-review.

@btmills

btmills approved these changes Apr 5, 2017

Copy link
Member

left a comment

@kaicataldo thanks for adding tests for those edge cases. I'm totally on board with #8408 and think that's the right direction to go. There's no perfect way to classify all comments as leading or trailing some particular node, but it looks like this does as good a job as we can hope for. LGTM :shipit:

@kaicataldo kaicataldo force-pushed the getcomments branch 2 times, most recently from 6274d06 to 9cd4e91 Apr 5, 2017

@kaicataldo

This comment has been minimized.

Copy link
Member Author

commented Apr 5, 2017

Also, rebased and ran eslint-canary against this branch - not seeing any new unexpected errors! 🎉

@not-an-aardvark
Copy link
Member

left a comment

LGTM with a slight nitpick. Thanks!

// Ignores shebangs
"#!/usr/bin/env node",
{ code: "#!/usr/bin/env node", options: ["always"] },
{ code: "#!/usr/bin/env node", options: ["never"] },

This comment has been minimized.

Copy link
@not-an-aardvark

not-an-aardvark Apr 5, 2017

Member

Nitpick: All of the comments in these tests start with /, so the rule wouldn't report an error for them anyway. If the rule is refactored in the future, it might be useful to have tests for:

{ code: "#!foo", options: ["always"] }
{ code: "#!Foo", options: ["never"] }

This comment has been minimized.

Copy link
@kaicataldo

kaicataldo Apr 5, 2017

Author Member

Good call - done!

@not-an-aardvark

This comment has been minimized.

Copy link
Member

commented Apr 6, 2017

Is the CLA bot down? It's still waiting for the status to be reported (and it hasn't left a comment)

@vitorbal

This comment has been minimized.

Copy link
Member

commented Apr 6, 2017

@not-an-aardvark that happened to me a couple of days ago. I had to force push to trigger the bot again.

@kaicataldo kaicataldo force-pushed the getcomments branch from f4aa99f to 58dd348 Apr 7, 2017

@eslintbot

This comment has been minimized.

Copy link

commented Apr 7, 2017

LGTM

@ilyavolodin ilyavolodin merged commit 867dd2e into master Apr 7, 2017

5 checks passed

continuous-integration/appveyor/branch AppVeyor build succeeded
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
licence/cla Contributor License Agreement is signed.
Details

@kaicataldo kaicataldo deleted the getcomments branch Apr 7, 2017

@ilyavolodin ilyavolodin moved this from Ready to Merged in v4.0.0 Apr 7, 2017

@JamesHenry

This comment has been minimized.

Copy link
Member

commented Apr 7, 2017

Yayyyyy! What an epic PR 😄

Thanks so much for your work on this @kaicataldo! And to all the reviewers for their invaluable help.

@mikesherov

This comment has been minimized.

Copy link
Contributor

commented Apr 7, 2017

Congrats!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.