-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve empty interpolations #5079
Preserve empty interpolations #5079
Conversation
Thanks for taking this on, it brings me unreasonable joy to see the string-related parts of the lexer refactored and moved into the nodes classes. I’m thinking we should make a new branch on the main repo for you to target these refactoring PRs against. The new branch would itself get merged into |
@GeoffreyBooth that sounds like a good idea if you're up for reviewing the AST-related PRs to the extent of feeling comfortable merging them into the And ya if you keep pinging me about things ready to get merged into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please merge the latest ast
branch into this.
src/grammar.coffee
Outdated
] | ||
|
||
Regex: [ | ||
o 'REGEX', -> new RegexLiteral $1 | ||
o 'REGEX_START Invocation REGEX_END', -> new RegexWithInterpolations $2.args | ||
o 'REGEX', -> new RegexLiteral "#{$1}", delimiter: $1.delimiter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maybe add a comment saying that this is to convert a String
object back into a primitive string? Maybe every block or so where one of these is happening? It took me awhile to remember that that’s what was going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue that was the advantage of using a named conversion function eg toPrimitiveString($1)
, it's self-documenting. The "#{$1}"
syntax is slick but then if it's unclear what it's doing, I'd prefer a named function as a way of explaining rather than a (potentially awkwardly placed) comment
Updated to use toPrimitiveString()
(required explicitly exposing it to grammar rules)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t see toPrimitiveString
yet, maybe you haven’t pushed that change? But regardless, I agree that "#{$1}"
is slick, and it would be a shame not to use it. I think toPrimitiveString
doesn’t explain enough, i.e. it’s not clear why we need to convert anything into a primitive string. A comment like # Convert `String` object back to primitive string now that we’ve retrieved stowaway extra properties.
is clearer, though it begs the question how often it should be repeated down the file. Alternatively I guess you could explain it in a comment inside toPrimitiveString
.
src/helpers.coffee
Outdated
@@ -243,3 +243,56 @@ exports.nameWhitespaceCharacter = (string) -> | |||
when '\r' then 'carriage return' | |||
when '\t' then 'tab' | |||
else string | |||
|
|||
# Constructs a string or regex by escaping certain characters. | |||
exports.makeDelimitedLiteral = (body, options = {}) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only used in nodes.coffee
? If so, should it join the utility functions at the bottom of that file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, moved
@@ -426,6 +427,13 @@ exports.Rewriter = class Rewriter | |||
tokens[method] generate 'TERMINATOR', '\n', tokens[j] unless tokens[j][0] is 'TERMINATOR' | |||
tokens[method] generate 'JS', '', tokens[j], token | |||
|
|||
dontShiftForward = (i, tokens) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the purpose of this function? Is it to prevent comments from being shifted outside of an interpolation block? Because I thought the previous code was already doing that, though of course using different tokens like STRING_END
since we didn’t have INTERPOLATION_END
. I feel like we shouldn’t need a special case for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GeoffreyBooth yes it's possible that this isn't strictly necessary (as I don't thoroughly understand the way comments are wrangled), but here's what was happening:
Without this, the test comment at end of CSX escape
was failing, as this input was failing to include the comment in the output:
<Person>
{"i am a string"
# i am a comment
}
</Person>
rescueStowawayComments()
was ending up attaching the comment to the newline string after the closing }
, both here and on master
it appears. Not sure what exactly was happening on master
after that to enable getting it printed inside the }
but seems cleaner/more accurate to keep the comment attached to a token that's inside the interpolation
Also, the implementation on master
is not robust, as if you just adjust the spacing in the example to eg:
<Person>
{"i am a string"
# i am a comment
}</Person>
then the comment gets printed outside the interpolation. Whereas with the dontShiftForward()
check, this case also prints the comment more correctly inside the interpolation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rewriting function that keeps comments inside interpolation (}
) blocks should be keeping this one from shifting onto the other side of the }
. If it doesn’t, that’s a bug in its own right that should be fixed; and it should probably be fixed in rescueStowawayComments
, not by adding another rewriter function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so I dug a little deeper into the behavior on master
to see how it's managing to print the comment correctly in the test case
The comment ends up attached to both the StringLiteral
for the newline after the end }
and its wrapping Value
. I believe this is the "cause" of the breakage on this branch as the StringLiteral
is no longer wrapped in a Value
so the following doesn't happen:
Inside the expr.traverseChildren()
callback in StringWithInterpolations::compileNode()
, the wrapping Value
hits the else if node.comments
branch and then ends up attaching its comments to elements[elements.length - 1]
, which is the Parens
enclosing the interpolation. So that's how the comment manages to sneak back inside the interpolation
So the existing logic seems kind of tenuous since the rewriter shifts the comment outside the interpolation and then is relying on the StringLiteral
being wrapped in a Value
to trigger the logic that shifts it back inside. And as mentioned if the formatting is changed slightly, the fact that the rewriter shifts the comment outside the interpolation ends up resulting in the comment being printed outside of it as the comment has been rewritten out of the StringWithInterpolations
entirely and so isn't visible like in the described test case
it should probably be fixed in rescueStowawayComments, not by adding another rewriter function
That's more or less what this is doing - it's not adding a separate rewriter pass, dontShiftForward()
is just applied inside rescueStowawayComments()
to prevent the else
clause (ie when token[0] not in DISCARDED
) from pushing newLine
comments forward if they'd get pushed forward beyond an interpolation end boundary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks for looking into this, and thanks for improving it. My head hurts every time I look at the rewriter. Someday we need to refactor it out of existence somehow.
src/rewriter.coffee
Outdated
@@ -751,5 +773,5 @@ DISCARDED = ['(', ')', '[', ']', '{', '}', '.', '..', '...', ',', '=', '++', '-- | |||
'AS', 'AWAIT', 'CALL_START', 'CALL_END', 'DEFAULT', 'ELSE', 'EXTENDS', 'EXPORT', | |||
'FORIN', 'FOROF', 'FORFROM', 'IMPORT', 'INDENT', 'INDEX_SOAK', 'LEADING_WHEN', | |||
'OUTDENT', 'PARAM_END', 'REGEX_START', 'REGEX_END', 'RETURN', 'STRING_END', 'THROW', | |||
'UNARY', 'YIELD' | |||
'UNARY', 'YIELD', 'INTERPOLATION_START', 'INTERPOLATION_END' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alphabetize.
1893b5c
to
e95901f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GeoffreyBooth merged ast
(into base branch preserve-string-literal
and here) and updated per/replied to your comments
src/grammar.coffee
Outdated
] | ||
|
||
Regex: [ | ||
o 'REGEX', -> new RegexLiteral $1 | ||
o 'REGEX_START Invocation REGEX_END', -> new RegexWithInterpolations $2.args | ||
o 'REGEX', -> new RegexLiteral "#{$1}", delimiter: $1.delimiter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue that was the advantage of using a named conversion function eg toPrimitiveString($1)
, it's self-documenting. The "#{$1}"
syntax is slick but then if it's unclear what it's doing, I'd prefer a named function as a way of explaining rather than a (potentially awkwardly placed) comment
Updated to use toPrimitiveString()
(required explicitly exposing it to grammar rules)
src/helpers.coffee
Outdated
@@ -243,3 +243,56 @@ exports.nameWhitespaceCharacter = (string) -> | |||
when '\r' then 'carriage return' | |||
when '\t' then 'tab' | |||
else string | |||
|
|||
# Constructs a string or regex by escaping certain characters. | |||
exports.makeDelimitedLiteral = (body, options = {}) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, moved
@@ -426,6 +427,13 @@ exports.Rewriter = class Rewriter | |||
tokens[method] generate 'TERMINATOR', '\n', tokens[j] unless tokens[j][0] is 'TERMINATOR' | |||
tokens[method] generate 'JS', '', tokens[j], token | |||
|
|||
dontShiftForward = (i, tokens) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GeoffreyBooth yes it's possible that this isn't strictly necessary (as I don't thoroughly understand the way comments are wrangled), but here's what was happening:
Without this, the test comment at end of CSX escape
was failing, as this input was failing to include the comment in the output:
<Person>
{"i am a string"
# i am a comment
}
</Person>
rescueStowawayComments()
was ending up attaching the comment to the newline string after the closing }
, both here and on master
it appears. Not sure what exactly was happening on master
after that to enable getting it printed inside the }
but seems cleaner/more accurate to keep the comment attached to a token that's inside the interpolation
Also, the implementation on master
is not robust, as if you just adjust the spacing in the example to eg:
<Person>
{"i am a string"
# i am a comment
}</Person>
then the comment gets printed outside the interpolation. Whereas with the dontShiftForward()
check, this case also prints the comment more correctly inside the interpolation
# This is an interpolated string, not a CSX tag; and for whatever | ||
# reason `` `a${/*test*/}b` `` is invalid JS. So compile to | ||
# `` `a${/*test*/''}b` `` instead. | ||
placeholderToken = @makeToken 'STRING', "''" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this logic somewhere? (Forgive me if it was just moved.) Seems relevant to the rewriter issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just for the case where it's an empty interpolation (with just a comment) so I think it's independent of the rewriter case. I don't remember exactly what allowed me to make this simplifying change, but it's just always making the generated placeholder token a JS
token instead of either making it a STRING
or a JS
token depending on whether it's in a CSX context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can do without it, sure, that’s great. I just want to make sure we’re not reintroducing a bug. I checked out your branch and tried "#{### comment ###}"
and it was output correctly as `${/* comment */""}`;
—how is this happening, with this code gone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like I pushed the logic for generating the empty string to the nodes side here in StringWithInterpolations::compileNode()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GeoffreyBooth see replies
# This is an interpolated string, not a CSX tag; and for whatever | ||
# reason `` `a${/*test*/}b` `` is invalid JS. So compile to | ||
# `` `a${/*test*/''}b` `` instead. | ||
placeholderToken = @makeToken 'STRING', "''" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just for the case where it's an empty interpolation (with just a comment) so I think it's independent of the rewriter case. I don't remember exactly what allowed me to make this simplifying change, but it's just always making the generated placeholder token a JS
token instead of either making it a STRING
or a JS
token depending on whether it's in a CSX context
@@ -426,6 +427,13 @@ exports.Rewriter = class Rewriter | |||
tokens[method] generate 'TERMINATOR', '\n', tokens[j] unless tokens[j][0] is 'TERMINATOR' | |||
tokens[method] generate 'JS', '', tokens[j], token | |||
|
|||
dontShiftForward = (i, tokens) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so I dug a little deeper into the behavior on master
to see how it's managing to print the comment correctly in the test case
The comment ends up attached to both the StringLiteral
for the newline after the end }
and its wrapping Value
. I believe this is the "cause" of the breakage on this branch as the StringLiteral
is no longer wrapped in a Value
so the following doesn't happen:
Inside the expr.traverseChildren()
callback in StringWithInterpolations::compileNode()
, the wrapping Value
hits the else if node.comments
branch and then ends up attaching its comments to elements[elements.length - 1]
, which is the Parens
enclosing the interpolation. So that's how the comment manages to sneak back inside the interpolation
So the existing logic seems kind of tenuous since the rewriter shifts the comment outside the interpolation and then is relying on the StringLiteral
being wrapped in a Value
to trigger the logic that shifts it back inside. And as mentioned if the formatting is changed slightly, the fact that the rewriter shifts the comment outside the interpolation ends up resulting in the comment being printed outside of it as the comment has been rewritten out of the StringWithInterpolations
entirely and so isn't visible like in the described test case
it should probably be fixed in rescueStowawayComments, not by adding another rewriter function
That's more or less what this is doing - it's not adding a separate rewriter pass, dontShiftForward()
is just applied inside rescueStowawayComments()
to prevent the else
clause (ie when token[0] not in DISCARDED
) from pushing newLine
comments forward if they'd get pushed forward beyond an interpolation end boundary
src/nodes.coffee
Outdated
|
||
# Functions required by parser. | ||
exports.extend = extend | ||
exports.addDataToNode = addDataToNode | ||
# Convert `String` object back to primitive string now that we've retrieved | ||
# stowaway extra properties | ||
exports.toPrimitiveString = (strObject) -> "#{strObject}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To continue the "#{$1}"
/toPrimitiveString()
thread (which got swallowed up), here's the definition of toPrimitiveString()
I added the comment you mentioned here - I agree with how you're thinking about it, it'd be nice to use the "#{$1}"
syntax directly but it's hard to justify adding a verbose comment every time we use it (there are two places in the grammar as of this branch, but there are others on other branches where this technique has been used). So I think the best way of making it comprehensible and relatively DRY is to make them all use a shared function with a reasonably descriptive name and whose definition provides a more thorough comment
Or the alternative in my mind is to just use the "#{$1}"
syntax directly and let people scratch their heads a bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a performance test of `${strObj}`
vs strObj.toString()
. In my browser, toString()
is 25 times faster. So we should definitely use toString()
instead of the interpolation/casting syntax, it seems.
That doesn’t settle whether we should keep toPrimitiveString($1)
and call toString
inside toPrimitiveString
, or just call $1.toString()
directly; the latter has the “where to put the comment” issue again. But toString
is rather descriptive on its own, so maybe we wouldn’t need a comment? (We should at least put a comment before the first time we call toString
, even if we don’t repeat the comment every time.) Or we could just use toString
inside the body of toPrimitiveString
and be done with it. I don’t have a preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toString()
is 25 times faster
That's interesting
toString
is rather descriptive on its own
I agree, updated to use $1.toString()
directly with a descriptive comment before the first block where it's used in the grammar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GeoffreyBooth updated/replied
# This is an interpolated string, not a CSX tag; and for whatever | ||
# reason `` `a${/*test*/}b` `` is invalid JS. So compile to | ||
# `` `a${/*test*/''}b` `` instead. | ||
placeholderToken = @makeToken 'STRING', "''" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like I pushed the logic for generating the empty string to the nodes side here in StringWithInterpolations::compileNode()
src/nodes.coffee
Outdated
|
||
# Functions required by parser. | ||
exports.extend = extend | ||
exports.addDataToNode = addDataToNode | ||
# Convert `String` object back to primitive string now that we've retrieved | ||
# stowaway extra properties | ||
exports.toPrimitiveString = (strObject) -> "#{strObject}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toString()
is 25 times faster
That's interesting
toString
is rather descriptive on its own
I agree, updated to use $1.toString()
directly with a descriptive comment before the first block where it's used in the grammar
Alright, that’s all I’ve got. Thanks for all the effort! |
@GeoffreyBooth I ran into the need to preserve empty interpolations (
#{}
) for Prettier, so figured it was a good excuse to do the refactoring of interpolated strings that you suggestedThis is currently based on my
preserve-string-literal
branch (#5057), see here for now for the diff of just this branch, or once that gets merged I can rebase/retarget tomaster