Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve empty interpolations #5079

Merged
merged 25 commits into from
Jun 17, 2018

Conversation

helixbass
Copy link
Collaborator

@GeoffreyBooth I ran into the need to preserve empty interpolations (#{}) for Prettier, so figured it was a good excuse to do the refactoring of interpolated strings that you suggested

This is currently based on my preserve-string-literal branch (#5057), see here for now for the diff of just this branch, or once that gets merged I can rebase/retarget to master

@GeoffreyBooth
Copy link
Collaborator

Thanks for taking this on, it brings me unreasonable joy to see the string-related parts of the lexer refactored and moved into the nodes classes.

I’m thinking we should make a new branch on the main repo for you to target these refactoring PRs against. The new branch would itself get merged into master only when it adds a new feature that makes sense to release. I think that new feature would be when CoffeeScript can produce an AST. Prettier support would come from a PR into Prettier, based on this new AST. Having a new ast branch on the main repo lets us keep doing these PRs as bite-size efforts, without this big refactor holding up unrelated bugfixes like the ones that just went out in 2.3.1. What do you think?

@helixbass
Copy link
Collaborator Author

@GeoffreyBooth that sounds like a good idea if you're up for reviewing the AST-related PRs to the extent of feeling comfortable merging them into the ast branch

And ya if you keep pinging me about things ready to get merged into master that could have implications for the AST stuff like you just did for #5006 (even if just likely merge conflicts/things that may need AST support) that should help keep the ast branch from drifting away from master

@GeoffreyBooth GeoffreyBooth changed the base branch from master to ast June 11, 2018 04:47
Copy link
Collaborator

@GeoffreyBooth GeoffreyBooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please merge the latest ast branch into this.

]

Regex: [
o 'REGEX', -> new RegexLiteral $1
o 'REGEX_START Invocation REGEX_END', -> new RegexWithInterpolations $2.args
o 'REGEX', -> new RegexLiteral "#{$1}", delimiter: $1.delimiter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe add a comment saying that this is to convert a String object back into a primitive string? Maybe every block or so where one of these is happening? It took me awhile to remember that that’s what was going on.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that was the advantage of using a named conversion function eg toPrimitiveString($1), it's self-documenting. The "#{$1}" syntax is slick but then if it's unclear what it's doing, I'd prefer a named function as a way of explaining rather than a (potentially awkwardly placed) comment

Updated to use toPrimitiveString() (required explicitly exposing it to grammar rules)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t see toPrimitiveString yet, maybe you haven’t pushed that change? But regardless, I agree that "#{$1}" is slick, and it would be a shame not to use it. I think toPrimitiveString doesn’t explain enough, i.e. it’s not clear why we need to convert anything into a primitive string. A comment like # Convert `String` object back to primitive string now that we’ve retrieved stowaway extra properties. is clearer, though it begs the question how often it should be repeated down the file. Alternatively I guess you could explain it in a comment inside toPrimitiveString.

@@ -243,3 +243,56 @@ exports.nameWhitespaceCharacter = (string) ->
when '\r' then 'carriage return'
when '\t' then 'tab'
else string

# Constructs a string or regex by escaping certain characters.
exports.makeDelimitedLiteral = (body, options = {}) ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only used in nodes.coffee? If so, should it join the utility functions at the bottom of that file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, moved

@@ -426,6 +427,13 @@ exports.Rewriter = class Rewriter
tokens[method] generate 'TERMINATOR', '\n', tokens[j] unless tokens[j][0] is 'TERMINATOR'
tokens[method] generate 'JS', '', tokens[j], token

dontShiftForward = (i, tokens) ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the purpose of this function? Is it to prevent comments from being shifted outside of an interpolation block? Because I thought the previous code was already doing that, though of course using different tokens like STRING_END since we didn’t have INTERPOLATION_END. I feel like we shouldn’t need a special case for this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GeoffreyBooth yes it's possible that this isn't strictly necessary (as I don't thoroughly understand the way comments are wrangled), but here's what was happening:

Without this, the test comment at end of CSX escape was failing, as this input was failing to include the comment in the output:

<Person>
{"i am a string"
# i am a comment
}
</Person>

rescueStowawayComments() was ending up attaching the comment to the newline string after the closing }, both here and on master it appears. Not sure what exactly was happening on master after that to enable getting it printed inside the } but seems cleaner/more accurate to keep the comment attached to a token that's inside the interpolation

Also, the implementation on master is not robust, as if you just adjust the spacing in the example to eg:

<Person>
{"i am a string"
# i am a comment
}</Person>

then the comment gets printed outside the interpolation. Whereas with the dontShiftForward() check, this case also prints the comment more correctly inside the interpolation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rewriting function that keeps comments inside interpolation (}) blocks should be keeping this one from shifting onto the other side of the }. If it doesn’t, that’s a bug in its own right that should be fixed; and it should probably be fixed in rescueStowawayComments, not by adding another rewriter function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so I dug a little deeper into the behavior on master to see how it's managing to print the comment correctly in the test case

The comment ends up attached to both the StringLiteral for the newline after the end } and its wrapping Value. I believe this is the "cause" of the breakage on this branch as the StringLiteral is no longer wrapped in a Value so the following doesn't happen:

Inside the expr.traverseChildren() callback in StringWithInterpolations::compileNode(), the wrapping Value hits the else if node.comments branch and then ends up attaching its comments to elements[elements.length - 1], which is the Parens enclosing the interpolation. So that's how the comment manages to sneak back inside the interpolation

So the existing logic seems kind of tenuous since the rewriter shifts the comment outside the interpolation and then is relying on the StringLiteral being wrapped in a Value to trigger the logic that shifts it back inside. And as mentioned if the formatting is changed slightly, the fact that the rewriter shifts the comment outside the interpolation ends up resulting in the comment being printed outside of it as the comment has been rewritten out of the StringWithInterpolations entirely and so isn't visible like in the described test case

it should probably be fixed in rescueStowawayComments, not by adding another rewriter function

That's more or less what this is doing - it's not adding a separate rewriter pass, dontShiftForward() is just applied inside rescueStowawayComments() to prevent the else clause (ie when token[0] not in DISCARDED) from pushing newLine comments forward if they'd get pushed forward beyond an interpolation end boundary

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for looking into this, and thanks for improving it. My head hurts every time I look at the rewriter. Someday we need to refactor it out of existence somehow.

@@ -751,5 +773,5 @@ DISCARDED = ['(', ')', '[', ']', '{', '}', '.', '..', '...', ',', '=', '++', '--
'AS', 'AWAIT', 'CALL_START', 'CALL_END', 'DEFAULT', 'ELSE', 'EXTENDS', 'EXPORT',
'FORIN', 'FOROF', 'FORFROM', 'IMPORT', 'INDENT', 'INDEX_SOAK', 'LEADING_WHEN',
'OUTDENT', 'PARAM_END', 'REGEX_START', 'REGEX_END', 'RETURN', 'STRING_END', 'THROW',
'UNARY', 'YIELD'
'UNARY', 'YIELD', 'INTERPOLATION_START', 'INTERPOLATION_END'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alphabetize.

@helixbass helixbass force-pushed the preserve-empty-interpolations branch from 1893b5c to e95901f Compare June 16, 2018 14:44
Copy link
Collaborator Author

@helixbass helixbass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GeoffreyBooth merged ast (into base branch preserve-string-literal and here) and updated per/replied to your comments

]

Regex: [
o 'REGEX', -> new RegexLiteral $1
o 'REGEX_START Invocation REGEX_END', -> new RegexWithInterpolations $2.args
o 'REGEX', -> new RegexLiteral "#{$1}", delimiter: $1.delimiter
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that was the advantage of using a named conversion function eg toPrimitiveString($1), it's self-documenting. The "#{$1}" syntax is slick but then if it's unclear what it's doing, I'd prefer a named function as a way of explaining rather than a (potentially awkwardly placed) comment

Updated to use toPrimitiveString() (required explicitly exposing it to grammar rules)

@@ -243,3 +243,56 @@ exports.nameWhitespaceCharacter = (string) ->
when '\r' then 'carriage return'
when '\t' then 'tab'
else string

# Constructs a string or regex by escaping certain characters.
exports.makeDelimitedLiteral = (body, options = {}) ->
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, moved

@@ -426,6 +427,13 @@ exports.Rewriter = class Rewriter
tokens[method] generate 'TERMINATOR', '\n', tokens[j] unless tokens[j][0] is 'TERMINATOR'
tokens[method] generate 'JS', '', tokens[j], token

dontShiftForward = (i, tokens) ->
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GeoffreyBooth yes it's possible that this isn't strictly necessary (as I don't thoroughly understand the way comments are wrangled), but here's what was happening:

Without this, the test comment at end of CSX escape was failing, as this input was failing to include the comment in the output:

<Person>
{"i am a string"
# i am a comment
}
</Person>

rescueStowawayComments() was ending up attaching the comment to the newline string after the closing }, both here and on master it appears. Not sure what exactly was happening on master after that to enable getting it printed inside the } but seems cleaner/more accurate to keep the comment attached to a token that's inside the interpolation

Also, the implementation on master is not robust, as if you just adjust the spacing in the example to eg:

<Person>
{"i am a string"
# i am a comment
}</Person>

then the comment gets printed outside the interpolation. Whereas with the dontShiftForward() check, this case also prints the comment more correctly inside the interpolation

# This is an interpolated string, not a CSX tag; and for whatever
# reason `` `a${/*test*/}b` `` is invalid JS. So compile to
# `` `a${/*test*/''}b` `` instead.
placeholderToken = @makeToken 'STRING', "''"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this logic somewhere? (Forgive me if it was just moved.) Seems relevant to the rewriter issue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for the case where it's an empty interpolation (with just a comment) so I think it's independent of the rewriter case. I don't remember exactly what allowed me to make this simplifying change, but it's just always making the generated placeholder token a JS token instead of either making it a STRING or a JS token depending on whether it's in a CSX context

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can do without it, sure, that’s great. I just want to make sure we’re not reintroducing a bug. I checked out your branch and tried "#{### comment ###}" and it was output correctly as `${/* comment */""}`;—how is this happening, with this code gone?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I pushed the logic for generating the empty string to the nodes side here in StringWithInterpolations::compileNode()

Copy link
Collaborator Author

@helixbass helixbass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GeoffreyBooth see replies

# This is an interpolated string, not a CSX tag; and for whatever
# reason `` `a${/*test*/}b` `` is invalid JS. So compile to
# `` `a${/*test*/''}b` `` instead.
placeholderToken = @makeToken 'STRING', "''"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for the case where it's an empty interpolation (with just a comment) so I think it's independent of the rewriter case. I don't remember exactly what allowed me to make this simplifying change, but it's just always making the generated placeholder token a JS token instead of either making it a STRING or a JS token depending on whether it's in a CSX context

@@ -426,6 +427,13 @@ exports.Rewriter = class Rewriter
tokens[method] generate 'TERMINATOR', '\n', tokens[j] unless tokens[j][0] is 'TERMINATOR'
tokens[method] generate 'JS', '', tokens[j], token

dontShiftForward = (i, tokens) ->
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so I dug a little deeper into the behavior on master to see how it's managing to print the comment correctly in the test case

The comment ends up attached to both the StringLiteral for the newline after the end } and its wrapping Value. I believe this is the "cause" of the breakage on this branch as the StringLiteral is no longer wrapped in a Value so the following doesn't happen:

Inside the expr.traverseChildren() callback in StringWithInterpolations::compileNode(), the wrapping Value hits the else if node.comments branch and then ends up attaching its comments to elements[elements.length - 1], which is the Parens enclosing the interpolation. So that's how the comment manages to sneak back inside the interpolation

So the existing logic seems kind of tenuous since the rewriter shifts the comment outside the interpolation and then is relying on the StringLiteral being wrapped in a Value to trigger the logic that shifts it back inside. And as mentioned if the formatting is changed slightly, the fact that the rewriter shifts the comment outside the interpolation ends up resulting in the comment being printed outside of it as the comment has been rewritten out of the StringWithInterpolations entirely and so isn't visible like in the described test case

it should probably be fixed in rescueStowawayComments, not by adding another rewriter function

That's more or less what this is doing - it's not adding a separate rewriter pass, dontShiftForward() is just applied inside rescueStowawayComments() to prevent the else clause (ie when token[0] not in DISCARDED) from pushing newLine comments forward if they'd get pushed forward beyond an interpolation end boundary

src/nodes.coffee Outdated

# Functions required by parser.
exports.extend = extend
exports.addDataToNode = addDataToNode
# Convert `String` object back to primitive string now that we've retrieved
# stowaway extra properties
exports.toPrimitiveString = (strObject) -> "#{strObject}"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To continue the "#{$1}"/toPrimitiveString() thread (which got swallowed up), here's the definition of toPrimitiveString()

I added the comment you mentioned here - I agree with how you're thinking about it, it'd be nice to use the "#{$1}" syntax directly but it's hard to justify adding a verbose comment every time we use it (there are two places in the grammar as of this branch, but there are others on other branches where this technique has been used). So I think the best way of making it comprehensible and relatively DRY is to make them all use a shared function with a reasonably descriptive name and whose definition provides a more thorough comment

Or the alternative in my mind is to just use the "#{$1}" syntax directly and let people scratch their heads a bit?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a performance test of `${strObj}` vs strObj.toString(). In my browser, toString() is 25 times faster. So we should definitely use toString() instead of the interpolation/casting syntax, it seems.

That doesn’t settle whether we should keep toPrimitiveString($1) and call toString inside toPrimitiveString, or just call $1.toString() directly; the latter has the “where to put the comment” issue again. But toString is rather descriptive on its own, so maybe we wouldn’t need a comment? (We should at least put a comment before the first time we call toString, even if we don’t repeat the comment every time.) Or we could just use toString inside the body of toPrimitiveString and be done with it. I don’t have a preference.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toString() is 25 times faster

That's interesting

toString is rather descriptive on its own

I agree, updated to use $1.toString() directly with a descriptive comment before the first block where it's used in the grammar

Copy link
Collaborator Author

@helixbass helixbass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GeoffreyBooth updated/replied

# This is an interpolated string, not a CSX tag; and for whatever
# reason `` `a${/*test*/}b` `` is invalid JS. So compile to
# `` `a${/*test*/''}b` `` instead.
placeholderToken = @makeToken 'STRING', "''"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I pushed the logic for generating the empty string to the nodes side here in StringWithInterpolations::compileNode()

src/nodes.coffee Outdated

# Functions required by parser.
exports.extend = extend
exports.addDataToNode = addDataToNode
# Convert `String` object back to primitive string now that we've retrieved
# stowaway extra properties
exports.toPrimitiveString = (strObject) -> "#{strObject}"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toString() is 25 times faster

That's interesting

toString is rather descriptive on its own

I agree, updated to use $1.toString() directly with a descriptive comment before the first block where it's used in the grammar

@GeoffreyBooth
Copy link
Collaborator

Alright, that’s all I’ve got. Thanks for all the effort!

@GeoffreyBooth GeoffreyBooth merged commit 3e089ca into jashkenas:ast Jun 17, 2018
@helixbass helixbass deleted the preserve-empty-interpolations branch May 11, 2019 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants