New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some fixes for string interpolation #2461
Conversation
Positions of static parts are now set explicitly during parsing rather than filled in wholesale during subsequent atPos after parsing. I also had to change the offsets that scanner uses for initial static parts of string interpolations so that they no longer point to the opening double quote, but rather to the actual beginning of the part.
@paulp I'll follow up with an investigation of how |
var start = 0 | ||
def charAtIs(idx: Int, ch: Char) = idx < strLen && str(idx) == ch | ||
def isPercent(idx: Int) = charAtIs(idx, '%') | ||
def isConversion(idx: Int) = isPercent(idx) && !charAtIs(idx + 1, 'n') && !charAtIs(idx + 1, '%') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of little methods to avoid duplication? Look! The macro creature... it's evolving! Soon it will be more powerful than we can possibly imagine!
See comments in code for the exhaustive list of the cases handled. Also note that treatment of non-formatting percents has been changed. Embedding literal %'s now requires escaping. Moreover, this commit also features exact error positions for string interpolation, something that has been impossible until the fix for SI-7271, also included in this patch.
At all its callsites except for a single one, the results of literal() were wrapped in atPos(in.offset), so the simplification was pretty much warranted. Thanks @paulp
var start = 0 | ||
def charAtIndexIs(idx: Int, ch: Char) = idx < strLen && str(idx) == ch | ||
def isPercent(idx: Int) = charAtIndexIs(idx, '%') | ||
def isConversion(idx: Int) = isPercent(idx) && !charAtIndexIs(idx + 1, 'n') && !charAtIndexIs(idx + 1, '%') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I die a little inside every time I see a new pile of index-oriented imperative string parsing code on its way into trunk. But I guess I'll survive this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you do this differently? Could you please elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a whole literature on string parsing, yet code like 'if charAtIndexIs(idx + 1, 'n')' could easily serve as pseudocode for the 8086 assembly version. If you're ever looking at an index counter, if correctness ever depends on manual checks for boundary conditions, if you're analyzing the string from the standpoint of individual characters rather than meaningful tokens, then you're partying like it's 1975.
The question becomes one of performance. We are not parsing so many format strings that performance is important, so I would implement this at a very high level, with parser combinators probably. But if I thought that was too slow, I'd still isolate the assembly code from the logic. The semantics being implemented - the fact that "%n" is a meaningful token within the string - are freely being mixed with the ultra-low-level detail of what's to be found at some register offset. Such code will always have bugs; more importantly, such code is utterly brutal to modify, especially for anyone who didn't write it.
Some fixes for string interpolation
Long time no see!
Review @paulp @retronym. Also see #2282