RFC: Syntax for raw string literals

A raw string literal is a string literal that does not interpret any embedded sequence, meaning no backslash-escapes. A lot of languages (certainly most that I've used) support some syntax for raw string literals. They're useful for embedding any string that wants to have a bunch of backslashes in it (typically because the function the string is passed to wants to interpret them itself), such as regular expressions. Unfortunately, Rust does not have a raw string literal syntax.

There's been [a discussion](https://mail.mozilla.org/pipermail/rust-dev/2013-September/005635.html) on the mailing list for the past few days about this. I will try to put a quick summary here.

There's two questions at stake. The first is, should Rust have a raw string literal syntax? The second is, if so, what particular syntax should be used? I think the answer to the first is definitely Yes. It's useful enough, and has enough overwhelming precedence in other languages, that we should add it. The question of concrete syntax is the harder one.

The syntaxes that have been proposed so far, along with their Pros and Cons:
1. C++11 syntax, e.g. `R"delim(raw text)delim"`.
   
   Pros:
   - Reasonably straightforward
   - Can embed any character sequence
   
   Cons:
   - Syntax is slightly complicated (editorial note: I think any syntax that's flexible enough to contain any character is going to be considered slightly complicated).
2. Python syntax, e.g. `r"foo"`
   
   Pros:
   - Simple syntax
   
   Cons:
   - Can't embed any character sequence.
   - Python's implementation has really wacky handling of backslash escapes in conjunction with the quote character. Even reproducing that behavior does not allow for embedding any sequence, as `r"foo\""` evaluates to the string `foo\"` (with the literal backslash).
3. D syntax, e.g. `r"raw text"`, ``raw text``, or `q"(raw text)"`/`q"delim\nraw text\ndelim"`
   
   Pros:
   - Can embed any character sequence (with the third variant)
   
   Cons:
   - The first two forms aren't flexible enough, and the third form is a bit confusing. The delimiter behaves differently depending on whether it's a "nesting" delimiter (one of ([&lt;{), another token, or an identifier.
4. C#/SQL/something else, using a simple raw string syntax such as `r"text"` where doubling up the quote inserts a single quote, as in `r"foo""bar"`
   
   Pros:
   - Simple syntax
   
   Cons:
   - Does not reproduce verbatim every character found in the source sequence, which makes it slightly harder/more confusing to read, and more annoying to do things like pasting a raw string into your source file (e.g. raw HTML).
5. Perl quote-like operators, e.g. `q{text}`. Unfortunately, most viable delimiters will result in an ambiguous parse.
6. Ruby quote-like operators, e.g. `%q{text}`. Unfortunately, this also is ambiguous (with the % token).
7. Lua syntax, e.g. `[=[text]=]`
   
   Pros:
   - Simple syntax
   - Can embed any character sequence
   
   Cons:
   - Syntax looks decidedly non-string-like
   - Custom delimiters are limited to sequences of =
   - Alex Chrichton opined that seeing `println!([[Hello, {}!]], "world")` in an introduction to Rust would be awfully confusing (see previous point about being non-string-like).
8. Go syntax, e.g. ``raw text``. This is one of the variants of D strings as well
   
   Pros:
   - Simple syntax
   
   Cons:
   - Cannot embed any character sequence (notably, cannot embed backtick)
   - It's difficult or impossible to embed backticks in a markdown code sequence, which will make it awkward to use raw strings in markdown editors. May also be confusing with the usage of `foo` in doc comments.
9. A new syntax using ASCII Control characters STX and ETX
   
   Pros:
   - I don't think there are any
   
   Cons:
   - Can't type the keys on any keyboard
   - Text editors probably won't render the characters correctly either
   - Can't technically embed any character sequence, because ETX cannot be embedded, but in fairness it can embed any _printable_ sequence.
10. A syntax proposed over IRC is ``delim"raw text"delim``.
    
    Pros:
    - Can embed any character
    
    Cons:
    - Unusual syntax with no precedent in other languages. Functionally identical to C++11 syntax.
    - Hard to type in Markdown editors

Some form of Heredoc syntax was also suggested, but heredocs are really primarily concerned with embedding multiline input, not raw input. They also have issues around dealing with indentation and the first/last newline.

During this discussion, only two Rust team members (that I'm aware of) chimed in. Alex Chricton raised issues with the Lua syntax, and threw out the suggestion of Go's syntax, though only as something to consider rather than a recommendation. Felix Klock expressed a preference for C++11 syntax, and more generally stated that he wants a syntax with user-delimited sequences. There was also at least one community member in favor of C++11 syntax.

My own preference at this point is for C++11 syntax as well. At the very least, something similar to C++11 syntax, that shares all of its properties, but there seems to be no value in inventing a new syntax when there's precedent in C++11.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: Syntax for raw string literals #9411

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RFC: Syntax for raw string literals #9411

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions