# RFC: Raw Identifiers #2151

Merged
merged 6 commits into from Feb 27, 2018

## Conversation

Projects
None yet
Member

### cuviper commented Sep 14, 2017 • edited by Centril

 Add a raw identifier format r#ident, so crates written in future language epochs/versions can still use an older API that overlaps with new keywords. (rendered)
 RFC: Raw Identifiers 
Add a raw identifier format r#ident, so crates written in future
language epochs/versions can still use an older API that overlaps with
new keywords.
 3f9a0f5 
Contributor

### est31 commented Sep 14, 2017

 Generally I'm in support of the RFC. However I think that the feature should only be available through a whitelist, where its actually useful. So only enable it for the newly introduced keywords like catch. This means it can't be used in general. There might also be a need for raw keywords in the other direction, e.g. so the older epoch can still use the new catch functionality somehow. I think this particular case is already served well enough by do catch { ... }, if we choose to stabilize it that way. In fact in the VLA RFC we were wondering how to get [V; dyn N] syntax working in the current epoch. So this is relevant beyond just catch.
Contributor

### SimonSapin commented Sep 14, 2017

 To clarify: this allows using as an identifier what would otherwise be an identifier, but does not change the set of characters allows in identifiers, right? If so, that sounds fine.
Member

### cuviper commented Sep 14, 2017

 @est31 However I think that the feature should only be available through a whitelist, where its actually useful. So only enable it for the newly introduced keywords like catch. This means it can't be used in general. I prefer generality myself. I could see having a lint for "unnecessarily raw identifier", but I see no reason to forbid this. @SimonSapin To clarify: this allows using as an identifier what would otherwise be an identifier, but does not change the set of characters allows in identifiers, right? If so, that sounds fine. Correct. Some of the discussed alternatives could allow extended characters, but that's not what I'm proposing. If some people do want extended characters, then we might want to choose a syntax that would allow that, even if we don't extend it initially.
Member

### cuviper commented Sep 14, 2017

 @est31 There might also be a need for raw keywords in the other direction, [...] In fact in the VLA RFC we were wondering how to get [V; dyn N] syntax working in the current epoch. So this is relevant beyond just catch. I dismissed the br# alternative as being unnecessary, but maybe it would work for this? i.e. r#ident and br#keyword

Member

### scottmcm commented Sep 14, 2017

 I like not extending the identifier alphabet here. the feature should only be available through a whitelist, where its actually useful I worry that such a restriction would make it harder to write code that compiles on multiple compiler versions. I want to be able to update my code to avoid a new-epoch keyword while still being able to compile it with the current stable that doesn't know about that keyword yet.
Contributor

### est31 commented Sep 14, 2017

 I worry that such a restriction would make it harder to write code that compiles on multiple compiler versions. Epochs work differently. Any future compiler version will support the epoch of your code, that's what the epochs RFC guarantees. So if you say that your codebase uses the old epoch, you can freely use the identifier, and you are compatible with all future compilers. This will be even enforced in macros (macros will get epoch hygiene)! If you say that your codebase uses the new epoch, your crate can obviously only be compiled by compiler versions that support that epoch, this has nothing to do with the whitelist. But if you opt in to the new epoch, the whitelisted keywords will be available to you. The only thing that a whitelist will make harder is wanting to be able to "support" multiple epochs, but this isn't really a legitimate real-world case IMO because your code will always be in exactly one epoch as you must explictly specify it (except for the 2015 epoch which is the default). There is one use case where badly deployed whitelists would be an issue: when you are migrating code from one epoch to another, and you are not doing it by invoking rustfix (despite rustfix being required to work with almost all code), it would show up as error. This use case can very easily be fixed though, simply by extending the whitelist in the old epoch as well.
Member

### scottmcm commented Sep 14, 2017

 I agree it's rare, but I don't think it deserves to be blocking. I'd be tempted to use r#catch in a Stack Overflow answer even in the 2015 epoch, for example. And targeting the preview epoch on nightly would want to be able to use r#throw before the keyword was added to the whitelist, if an RFC is accepted. I do agree that a "unnecessary raw identifier" warning or clippy lint makes sense.

### egilburg commented Sep 14, 2017 • edited

 Backslashes could connote escaping identifiers, like \ident, perhaps surrounded like \ident, {ident}, etc. However, the infix RFC #1579 currently seems to be leaning towards \op syntax already. It doesn't seem that like RFC has a lot of traction. Backslashes are intuitive as "escape" characters. I feel just \ident is also more ergonomic than \ident\. Seeing a letter prefix like r# seems to imply more like literal casting. E.g. s"foo" as hypothetical shorthand of "foo".to_string()
Member

### cuviper commented Sep 14, 2017

 @egilburg Seeing a letter prefix like r# seems to imply more like literal casting. It's meant to seem more like raw strings, e.g. r#foo is equivalent to foo, just like r"foo" and r#"foo"# are equivalent to "foo". And such raw strings already exist, unlike your hypothetical, but I do take the point that this wasn't intuitive to you.
Contributor

### petrochenkov commented Sep 14, 2017

 This RFC tries to solve a problem that doesn't exist and won't exist is epochs are done in responsible way. catch specifically is a bad motivating example because catch as a context-dependent identifier has exactly zero breakage in practice, i.e. infinitely less breakage than routinely done by standard library additions.
Contributor

### petrochenkov commented Sep 14, 2017 • edited

 There is also a minor technical issue with raw identifiers - some logic in the compiler relies on keywords being unusable as item names. For example, it would be pretty unfortunate if you could create a type named Self, self or super. Maybe there are other cases, but I can't recall them right away.
Contributor

### est31 commented Sep 14, 2017

 @petrochenkov 's argument that standard library additions mean a similar amount of breakage has convinced me that this feature is not required. I think its better off to just simply change the identifiers to not use keywords again, maybe forcing an API bump.

### burdges commented Sep 14, 2017 • edited

 You'd import this old API via use statements, right? I'd think use statements could address this, like use old_crate::dyn as old_crate_dyn;, so long as the new keywords does not appear in use statements.
Member

### cuviper commented Sep 14, 2017

 @petrochenkov This RFC tries to solve a problem that doesn't exist and won't exist is epochs are done in responsible way. catch specifically is a bad motivating example because catch as a context-dependent identifier has exactly zero breakage in practice, i.e. infinitely less breakage than routinely done by standard library additions. AFAICS, catch is still explicitly mentioned as a motivator in the epochs RFC, along with the general desire for new keywords. If you think that there are reasonable rules for adding keywords without breaking epoch interoperability, then shouldn't that be spelled out in that RFC? (I confess I stopped reading that discussion a while ago though.) @burdges You'd import this old API via use statements, right? I'd think use statements could address this, like use old_crate::dyn as old_crate_dyn; That's ok for free items, but you can't import associated items like methods this way. Maybe that can still use a UFCS form -- in the baseball example, you'd write Player::catch(&mut player, ball). I don't think there's any such workaround for struct fields though. If new keywords are always considered identifiers in the context of paths (foo::catch) or fields/methods (foo.catch), then perhaps use-renaming can take care of the rest. I'm not sure.

### burdges commented Sep 14, 2017

 We're only worried about catch, dyn, and default right now, yes? And default must stay contextual anyways. We cannot add keywords forever regardless, not without driving away users. I think perhaps the best solution might be prefixing each usage by an attribute #[epoch(...)], so typically #[epoch(...)] use old_crate::dyn as old_crate_dyn; I doubt struct fields would be too problematic in practice, but methods could maybe be renamed with local inherent impls for traits: impl T { fn old_catch(...) { #[epoch(...)] ::catch(...) } }  I suppose use syntax could maybe rename struct fields and methods if push really came to shove, but the attribute can handle them directly if that ever happens.
Contributor

### withoutboats commented Sep 14, 2017

 I could see limiting this to only reserved words, but limiting to only those reserved words which were introduced in an epoch seems unnecessary & potentially confusing for users who encounter this feature and don't know when each keyword was introduced. In general, we have taken a very free hand with the syntax and use lints, social conventions and rustfmt to keep everyone on the same page, and I don't see a reason to do things differently here. This seems like a straightforward solution to a basic problem to me.
Member

### kennytm commented Sep 15, 2017 • edited

 One more alternative: C# allows bare Unicode escapes as part of identifier. (Very ugly, not recommending it, but still an alternative.) class Class1 { static void M() { cl\u0061ss.st\u0061tic(true); } } (This "feature" is probably inspired by Java, but you can't define a keyword-identifier like this in Java.)

### cuviper added some commits Sep 15, 2017

 mention br#keyword possibility 
 afcb41e 
 add a couple more references to other languages 
 b602bf0 
Member

### eddyb commented Sep 16, 2017

 Not necessarily an alternative, but Dart uses #ident (but also e.g. #+, to refer to operator+).
 note Dart's #ident 
 289c6f8 
Member

### cuviper commented Sep 17, 2017

 OK, I noted Dart, but it looks like #ident would break macros-1.0 too.
Contributor

### est31 commented Sep 17, 2017

 @cuviper not just that I think people also wonder whether to use them in macros 2.0 for escaping hygiene.
Member

### eddyb commented Sep 18, 2017

 @cuviper Hmm, so these are the official docs - but they don't mention # used with operators. Anyway, I know # wouldn't work for rust, but as I mentioned on the forums, r#+::r#+ could be a strange and interesting replacement for Add::add (not entirely serious suggestion).
 Use a better Dart link 
 935feba 
Contributor

### est31 commented Sep 19, 2017

 This proposal reminds me of C/C++ trigraphs which are on their way out with C++17. I'm sure like trigraphs this feature will be used more by people who want to write confusing code than for its actually intended purpose... Also, I don't think that it will be of any good if cargo and the rust compiler now switch to using r#crate everywhere instead of krate. Do you really have to modify the language and add a whole new way of referring to identifiers just because you are scared of implementing analysis of which identifiers are still free in rustfix?
Member

### cuviper commented Sep 19, 2017

 This proposal reminds me of C/C++ trigraphs which are on their way out with C++17. Come on, r#ident is not anywhere near as obfuscating as trigraphs! Also, I don't think that it will be of any good if cargo and the rust compiler now switch to using r#crate everywhere instead of krate. The RFC explicitly recommends using alternatives like krate when possible. Do you really have to modify the language and add a whole new way of referring to identifiers just because you are scared of implementing analysis of which identifiers are still free in rustfix? I don't see how rustfix is relevant. The point is to have compatibility using older APIs that may not get updated, for whatever reason. Maybe said crate just doesn't want to make a breaking change to avoid the new keyword, maybe the maintainer is on holiday, etc. This is just a means towards keeping Rust's overall compatibility goals.
Member

### scottmcm commented Sep 19, 2017

 This proposal reminds me of C/C++ trigraphs This doesn't remind me of trigraphs in the slightest. Those are there for character sets without symbols used by the language, or for people who cannot type them. I agree we don't have that need. Instead it reminds me of @class in C#, since different .Net languages can have different sets of keywords. Sure, people are discouraged from using certain things, but if you need to use them, you need to use them. And sometimes it leads to nice libraries, like how in Razor one can set HTML attributes with syntax like new { style="max-width: 66ex", @class = "textcontent" } It could tell them to use klass, but it's just as easy to tell them to use @class, and using klass for class there would prevent people from being able to set a klass attribute if they so wanted.
Member

### cuviper commented Feb 7, 2018

 The internals discussion raised the idea of using such arbitrary strings for test names.

### burdges commented Feb 8, 2018

 Also if there is any route to permitting macro application in declaration position then macro syntax like ident!("name") works and maybe even use somecrate::ident!("name") as local_name; could work.
Contributor

### nikomatsakis commented Feb 8, 2018

 @joshtriplett As a technical issue, r#foo would require clear tokenization rules for where foo stops. r#foo# wouldn't. Yes. Presumably it's the standard Rust identifier rules. Do we care about identifiers with weird symbols in them, or spaces in them? Or only symbols that clash with keywords? That's the question, isn't it? As I said, I lean personally towards r#foo# ("raw identifiers", analogous with "raw strings"), which makes them both more flexible and more annoying to use. =) However, I also suspect that r#foo will be "good enough". It certainly fills the critical use case (bridging epochs).

Contributor

### SimonSapin commented Feb 10, 2018

 The RFC as proposed does not change which characters can be used in an identifier. It only allows having identifiers that would otherwise be keywords. I’m not for or against this proposal. I would be opposed to raw identifier allowing arbitrary characters. CSS does this, and it’s just nonsense. For example you can have a CSS custom property whose name is literally the ASCII space. Serde already has #[serde(rename = "foo")] for name that are not valid Rust identifiers. For FFI we have #[link_name = "foo"].

### rfcbot commented Feb 14, 2018

 🔔 This is now entering its final comment period, as per the review above. 🔔
Member

### cuviper commented Feb 15, 2018

 I didn't get around to adding an alternative about just renaming/aliasing. Is that still wanted? (Personally, I feel that approach is too limited to really address the issue.)

### burdges commented Feb 15, 2018

 Is the macro like syntax ident!("name") unworkable for parser or hygiene reasons?
Member

### cuviper commented Feb 15, 2018

 @burdges AFAIK macros don't work in ident positions, which is why concat_idents! can't actually do much.
Contributor

### bstrie commented Feb 21, 2018

 Re: syntax, just go with \foo. The symmetry with escaping is obvious and avoids the unjustified construction of r#foo (which, additionally, nobody here seems to be fond of even the slightest bit). The only objection given in the text is that a different RFC--one that will almost certainly never be accepted--had considered possibly using that syntax. This RFC is profoundly more important than that one (though FFI ought to be the primary motivation, rather than epoch breakage). Re: extending this feature to putting arbitrary Unicode in identifiers: don't. That's a subject for its own RFC and its own bikeshed. Be maximally conservative here.
Contributor

### est31 commented Feb 21, 2018

 \foo is not ugly enough. This needs to be maximally ugly and minimally useful in order to be a strong enough deterrent. Just go with r#foo as well as a lint that checks for usage of the feature outside of a whitelist of recently introduced or to be introduced keywords.
Contributor

### bstrie commented Feb 21, 2018

 @est31 Though there do exist features that ought to be made syntactically ugly in order to discourage their use, this isn't one of them. There is nothing dangerous whatsoever about this feature, and nothing useful about it that risks overuse (or any use at all) except in unfortunate circumstances that will require users to use it. Any argument that people will deliberately use this to obfuscate their code is even more damning of r#foo, because that is more obfuscatory than \foo. Let's not penalize people who are already being penalized by being forced to annotate their code for the sake of forwards compatibility. And let's not give people any more reason than necessary to look at a random unfamiliar piece of Rust code and wonder, "what the hell could this possibly mean?".
Contributor

### Ixrec commented Feb 21, 2018

 Maybe this is just me, but if I had no idea Rust had a raw identifiers feature, and I saw r#foo or \foo in the wild, I'd probably be able to guess what r# was doing while \ I wouldn't even dare to guess. The "symmetry with escaping" is not obvious for me, because in every other language I know the very notion of "escaping" is unique to string/regex literals. When I see \foo my first thought is of Haskell's lambda syntax (and I've hardly ever used Haskell). On the other hand, any "r and a sigil" syntax immediately reminds me of things like C++'s raw string literals, which is "raw" in the same sense that raw identifiers are raw. I don't object to the notion that r# is "ugly", but I do object to the notion that it's (in any non-purely-subjective sense) more obfuscating or more of a penalty than \ is. For me it's quite the opposite.
Contributor

Contributor

### Centril commented Feb 24, 2018

 Linking #1579 with respect to using a \dot b for infix method syntax vs. using it for raw identifiers.

### rfcbot commented Feb 24, 2018

 The final comment period is now complete.

### twmb commented Feb 25, 2018

 If there isn't, can there be a page for why words are reserved, and why these reserved words cannot be contextual?

### Centril referenced this pull request Feb 27, 2018

Closed

#### Tracking issue for RFC 2151, Raw Identifiers #48589

 RFC 2151 
 d049d6c 

Contributor

### Centril commented Feb 27, 2018

 Huzzah! The RFC is merged! Tracking issue: rust-lang/rust#48589

### ssokolow commented Feb 28, 2018 • edited

 I don't know how I missed both the initial announcement of this and the FCP call, but I just have to share a perspective on the intuitiveness of \ident vs. r#ident that I didn't see from anyone else. As someone whose experience is more or less exclusively in imperative languages, whenever I see \ being used outside a literal DOS/Windows path, I have a strong expectation that it is a fixed-width token, consisting of the slash and the character which follows... so when I see \ident, I can't help but expect it to produce error: unknown character escape: i. By contrast, r#ident gives me the impression that # is being used as some kind of namespacing operator similar to ::... which is essentially correct. You're conceptually unifying reserved words and identifiers and then restricting to the identifier side to override the default precedence... the only difference is that, instead of precedence being within the resoution of "an identifier node in the AST" level, you're operating more at the level of translating tokens into AST nodes.

Open