-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Escaped whitespace and slashes in Heregexes #3214
Conversation
* Resolves jashkenas#3059: Don't remove escaped whitespace. * Fixes jashkenas#2238: Prevent escaping slashes that are already escaped. * Fix detection of end of heregex with escaped slashes.
Fabulous. Keep up the good work! |
Escaped whitespace and slashes in Heregexes
|
||
HEREGEX_OMIT = /\s+(?:#.*)?/g | ||
HEREGEX_OMIT = /// | ||
((?:\\\\)+) # consume (and preserve) an even number of backslashes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not sufficient. In the case of three backslashes, this will just consume the latter two. Use ^([^\\]|\\.)*
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RE consumes the string left to right, evaluating the first alternative first. In the case of three backslashes the first two are consumed by the first alternative, leaving the third backslash for another sub-rule. (There are also three backslashes in the test.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marchaefner: I see, nothing follows the match. I didn't realise there were two alternations in the regex. If something had followed, the regex would backtrack in order to make the match succeed. Okay, LGTM.
@jashkenas: It'd be nice if we left a little time for more than one reviewer to look at PRs, especially ones that can be as tricky as this. |
Isn't this counter-intuitive? I'd expect
to mean |
Agreed. |
It may seem somewhat illogical, but my reasoning was that an operator ( One could also expect the above example to mean Besides it's always possible (and much better style) to just write a literal Nevertheless, it needs fixing. But I think it should compile to |
Another option is for an end-of-line |
In what context the other meaning comes into play? Seems to me allowing trailing
|
It seems that if, in the context of regular expressions, Different line endings break that, though, so you either have to go for what's in the file (which would be horrible), a 'standard' (probably Edit: given there's precedent in heredocs, using |
Yes -- in terms of CoffeeScript, I think @satyr's reasoning here is convincing. Let's allow it to escape whitespace and newlines, as in heredocs. |
Plus, that's how CSR behaves. On purpose. |
But it clashes with the other simple explanation: "Long lines can be split with
IMHO heredocs do it wrong and should be fixed. |
Escaped whitespace and slashes in Heregexes
Escaped whitespaces and slashes in Heregexes
It's a bit ugly but it implements #3059 and fixes #2238.
(Google and MSDN say this should work with all relevant JS engines.)
This PR also fixes the
HEREGEX
regex, so this is a thing now:Notes
Linebreaks are excluded from the escapable characters to avoid any confusion. (A backslash at the end of a line usually denotes a continuation on the next line but would mean the exact opposite here.)
Whitespace escaping introduces a potential pitfall:
It would be possible to treat this as a comment but that would be somewhat inconsistent and break other constructs (e.g.
/// ^[\ #].* ///
).