Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raw/unescaped string literals? #246

Closed
benley opened this issue Oct 17, 2016 · 10 comments
Closed

raw/unescaped string literals? #246

benley opened this issue Oct 17, 2016 · 10 comments

Comments

@benley
Copy link
Contributor

benley commented Oct 17, 2016

We sometimes need to write regexes in jsonnet documents, and the double escaping can get a bit awkward, e.g.

"^\\nx\\.y\\.z\\.com\\n$"

A raw string literal syntax would be very nice, perhaps like python's:

r"^\nx\.y\.z\.com\n$"

Desired json output for the above, of course: "^\\nx\\.y\\.z\\.com\\n$"

@sparkprime
Copy link
Member

Python's syntax is a bit weird though: (Copy/pasted from https://docs.python.org/2/reference/lexical_analysis.html):

When an 'r' or 'R' prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string. For example, the string literal r"\n" consists of two characters: a backslash and a lowercase 'n'. String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r""" is a valid string literal consisting of two characters: a backslash and a double quote; r"" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the string, not as a line continuation.

When an 'r' or 'R' prefix is used in conjunction with a 'u' or 'U' prefix, then the \uXXXX and \UXXXXXXXX escape sequences are processed while all other backslashes are left in the string. For example, the string literal ur"\u0062\n" consists of three Unicode characters: ‘LATIN SMALL LETTER B’, ‘REVERSE SOLIDUS’, and ‘LATIN SMALL LETTER N’. Backslashes can be escaped with a preceding backslash; however, both remain in the string. As a result, \uXXXX escape sequences are only recognized when there are an odd number of backslashes.

I doubt we want that, so we probably don't want to use the r"" syntax as it'd be confusing to use the same syntax but have different meaning. I'm not sure what we should do about \u and escaping of " itself. The simplest is to not allow any escaping at all, so if you want to put a " in a raw string, you have to concatenate it in using a non-raw string. Of course you can do r'foo' as well, but this doesn't help if you need to put both a ' and a " in the same string.

@benley
Copy link
Contributor Author

benley commented Oct 17, 2016

Yow, I clearly didn't think that one all the way through. As for mixed quotes within a string, |||'"||| would work, right?

@sparkprime
Copy link
Member

sparkprime commented Oct 18, 2016

||| always has a terminating \n (assuming unix line endings in your Jsonnet file) which is probably not what you want.

How about we do what C# does and have @"foo" and @'foo'.

https://msdn.microsoft.com/en-us/library/aa691090(v=vs.71).aspx

In a verbatim string literal, the characters between the delimiters are interpreted verbatim, the only exception being a quote-escape-sequence. In particular, simple escape sequences and hexadecimal and Unicode escape sequences are not processed in verbatim string literals.

@sparkprime
Copy link
Member

sparkprime commented Oct 18, 2016

The quote escape sequence means that you can say @'Simon''s Cat', for example.

@benley
Copy link
Contributor Author

benley commented Oct 18, 2016

That is pretty much exactly what I was looking for! Sounds great.

@sparkprime
Copy link
Member

Would you like to implement it?

@benley
Copy link
Contributor Author

benley commented Oct 18, 2016

I'm rather unskilled at C++ (it's been like 12 years...), but I'm happy to attempt it.

@sparkprime
Copy link
Member

sparkprime commented Oct 18, 2016

It shouldn't be too hard, there are a bunch of small things to do. There's lexer.h/cpp where you'd want to add 2 more token types for it. Then that gets converted during parsing into a LiteralString ast which also has an enum for the kind of quotes, that you'd have to extend. The actual handling of escapes is in desugarer.cpp, see these lines:

            if (ast->tokenKind != LiteralString::BLOCK) {
                ast->value = jsonnet_string_unescape(ast->location, ast->value);
            }

After the desugarer the strings are truly raw sequences of unicode codepoints, ready for actual execution. You'd want to add something there to interpret '' and "" as ' and " respectively.

You'd also need to add support in formatter.cpp for printing it back out in the original quoting style. You probably also want to adjust EnforceStringStyle in formatter.cpp to ignore the raw strings (like that filter currently ignores |||). For bonus points, we could have the formatter convert strings to raw form if they only have \ escapes :)

@sparkprime
Copy link
Member

Presumably we want to allow @"foo" and @'foo' in imports and field definitions, too.

benley added a commit to benley/jsonnet that referenced this issue Nov 17, 2016
benley added a commit to benley/jsonnet that referenced this issue Nov 17, 2016
@benley
Copy link
Contributor Author

benley commented Nov 17, 2016

Cool, this turned out to be easier than I imagined. I think I have it all implemented now except for converting to verbatim form if a string only has \ escapes. Still need to add tests and update the docs, then I'll open a pull request.

benley added a commit to benley/jsonnet that referenced this issue Nov 25, 2016
benley added a commit to benley/jsonnet that referenced this issue Nov 25, 2016
benley added a commit to benley/jsonnet that referenced this issue Nov 25, 2016
benley added a commit to benley/jsonnet that referenced this issue Nov 25, 2016
benley added a commit to benley/jsonnet that referenced this issue Nov 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants