New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Raw Identifiers #2151

Merged
merged 6 commits into from Feb 27, 2018

Conversation

Projects
None yet
@cuviper
Member

cuviper commented Sep 14, 2017

Add a raw identifier format r#ident, so crates written in future
language epochs/versions can still use an older API that overlaps with
new keywords.

(rendered)

RFC: Raw Identifiers
Add a raw identifier format `r#ident`, so crates written in future
language epochs/versions can still use an older API that overlaps with
new keywords.
@est31

This comment has been minimized.

Show comment
Hide comment
@est31

est31 Sep 14, 2017

Contributor

Generally I'm in support of the RFC. However I think that the feature should only be available through a whitelist, where its actually useful. So only enable it for the newly introduced keywords like catch. This means it can't be used in general.

There might also be a need for raw keywords in the other direction, e.g. so the
older epoch can still use the new catch functionality somehow. I think this
particular case is already served well enough by do catch { ... }, if we
choose to stabilize it that way.

In fact in the VLA RFC we were wondering how to get [V; dyn N] syntax working in the current epoch. So this is relevant beyond just catch.

Contributor

est31 commented Sep 14, 2017

Generally I'm in support of the RFC. However I think that the feature should only be available through a whitelist, where its actually useful. So only enable it for the newly introduced keywords like catch. This means it can't be used in general.

There might also be a need for raw keywords in the other direction, e.g. so the
older epoch can still use the new catch functionality somehow. I think this
particular case is already served well enough by do catch { ... }, if we
choose to stabilize it that way.

In fact in the VLA RFC we were wondering how to get [V; dyn N] syntax working in the current epoch. So this is relevant beyond just catch.

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin Sep 14, 2017

Contributor

To clarify: this allows using as an identifier what would otherwise be an identifier, but does not change the set of characters allows in identifiers, right? If so, that sounds fine.

Contributor

SimonSapin commented Sep 14, 2017

To clarify: this allows using as an identifier what would otherwise be an identifier, but does not change the set of characters allows in identifiers, right? If so, that sounds fine.

@cuviper

This comment has been minimized.

Show comment
Hide comment
@cuviper

cuviper Sep 14, 2017

Member

@est31

However I think that the feature should only be available through a whitelist, where its actually useful. So only enable it for the newly introduced keywords like catch. This means it can't be used in general.

I prefer generality myself. I could see having a lint for "unnecessarily raw identifier", but I see no reason to forbid this.

@SimonSapin

To clarify: this allows using as an identifier what would otherwise be an identifier, but does not change the set of characters allows in identifiers, right? If so, that sounds fine.

Correct. Some of the discussed alternatives could allow extended characters, but that's not what I'm proposing. If some people do want extended characters, then we might want to choose a syntax that would allow that, even if we don't extend it initially.

Member

cuviper commented Sep 14, 2017

@est31

However I think that the feature should only be available through a whitelist, where its actually useful. So only enable it for the newly introduced keywords like catch. This means it can't be used in general.

I prefer generality myself. I could see having a lint for "unnecessarily raw identifier", but I see no reason to forbid this.

@SimonSapin

To clarify: this allows using as an identifier what would otherwise be an identifier, but does not change the set of characters allows in identifiers, right? If so, that sounds fine.

Correct. Some of the discussed alternatives could allow extended characters, but that's not what I'm proposing. If some people do want extended characters, then we might want to choose a syntax that would allow that, even if we don't extend it initially.

@cuviper

This comment has been minimized.

Show comment
Hide comment
@cuviper

cuviper Sep 14, 2017

Member

@est31

There might also be a need for raw keywords in the other direction, [...]

In fact in the VLA RFC we were wondering how to get [V; dyn N] syntax working in the current epoch. So this is relevant beyond just catch.

I dismissed the br# alternative as being unnecessary, but maybe it would work for this?
i.e. r#ident and br#keyword

Member

cuviper commented Sep 14, 2017

@est31

There might also be a need for raw keywords in the other direction, [...]

In fact in the VLA RFC we were wondering how to get [V; dyn N] syntax working in the current epoch. So this is relevant beyond just catch.

I dismissed the br# alternative as being unnecessary, but maybe it would work for this?
i.e. r#ident and br#keyword

@scottmcm scottmcm added the T-lang label Sep 14, 2017

@scottmcm

This comment has been minimized.

Show comment
Hide comment
@scottmcm

scottmcm Sep 14, 2017

Member

I like not extending the identifier alphabet here.

the feature should only be available through a whitelist, where its actually useful

I worry that such a restriction would make it harder to write code that compiles on multiple compiler versions. I want to be able to update my code to avoid a new-epoch keyword while still being able to compile it with the current stable that doesn't know about that keyword yet.

Member

scottmcm commented Sep 14, 2017

I like not extending the identifier alphabet here.

the feature should only be available through a whitelist, where its actually useful

I worry that such a restriction would make it harder to write code that compiles on multiple compiler versions. I want to be able to update my code to avoid a new-epoch keyword while still being able to compile it with the current stable that doesn't know about that keyword yet.

@est31

This comment has been minimized.

Show comment
Hide comment
@est31

est31 Sep 14, 2017

Contributor

I worry that such a restriction would make it harder to write code that compiles on multiple compiler versions.

Epochs work differently. Any future compiler version will support the epoch of your code, that's what the epochs RFC guarantees. So if you say that your codebase uses the old epoch, you can freely use the identifier, and you are compatible with all future compilers. This will be even enforced in macros (macros will get epoch hygiene)! If you say that your codebase uses the new epoch, your crate can obviously only be compiled by compiler versions that support that epoch, this has nothing to do with the whitelist. But if you opt in to the new epoch, the whitelisted keywords will be available to you.

The only thing that a whitelist will make harder is wanting to be able to "support" multiple epochs, but this isn't really a legitimate real-world case IMO because your code will always be in exactly one epoch as you must explictly specify it (except for the 2015 epoch which is the default).

There is one use case where badly deployed whitelists would be an issue: when you are migrating code from one epoch to another, and you are not doing it by invoking rustfix (despite rustfix being required to work with almost all code), it would show up as error. This use case can very easily be fixed though, simply by extending the whitelist in the old epoch as well.

Contributor

est31 commented Sep 14, 2017

I worry that such a restriction would make it harder to write code that compiles on multiple compiler versions.

Epochs work differently. Any future compiler version will support the epoch of your code, that's what the epochs RFC guarantees. So if you say that your codebase uses the old epoch, you can freely use the identifier, and you are compatible with all future compilers. This will be even enforced in macros (macros will get epoch hygiene)! If you say that your codebase uses the new epoch, your crate can obviously only be compiled by compiler versions that support that epoch, this has nothing to do with the whitelist. But if you opt in to the new epoch, the whitelisted keywords will be available to you.

The only thing that a whitelist will make harder is wanting to be able to "support" multiple epochs, but this isn't really a legitimate real-world case IMO because your code will always be in exactly one epoch as you must explictly specify it (except for the 2015 epoch which is the default).

There is one use case where badly deployed whitelists would be an issue: when you are migrating code from one epoch to another, and you are not doing it by invoking rustfix (despite rustfix being required to work with almost all code), it would show up as error. This use case can very easily be fixed though, simply by extending the whitelist in the old epoch as well.

@scottmcm

This comment has been minimized.

Show comment
Hide comment
@scottmcm

scottmcm Sep 14, 2017

Member

I agree it's rare, but I don't think it deserves to be blocking. I'd be tempted to use r#catch in a Stack Overflow answer even in the 2015 epoch, for example. And targeting the preview epoch on nightly would want to be able to use r#throw before the keyword was added to the whitelist, if an RFC is accepted.

I do agree that a "unnecessary raw identifier" warning or clippy lint makes sense.

Member

scottmcm commented Sep 14, 2017

I agree it's rare, but I don't think it deserves to be blocking. I'd be tempted to use r#catch in a Stack Overflow answer even in the 2015 epoch, for example. And targeting the preview epoch on nightly would want to be able to use r#throw before the keyword was added to the whitelist, if an RFC is accepted.

I do agree that a "unnecessary raw identifier" warning or clippy lint makes sense.

@egilburg

This comment has been minimized.

Show comment
Hide comment
@egilburg

egilburg Sep 14, 2017

Backslashes could connote escaping identifiers, like \ident, perhaps surrounded like \ident, {ident}, etc. However, the infix RFC #1579 currently seems to be leaning towards \op syntax already.

It doesn't seem that like RFC has a lot of traction. Backslashes are intuitive as "escape" characters. I feel just \ident is also more ergonomic than \ident\.

Seeing a letter prefix like r# seems to imply more like literal casting. E.g. s"foo" as hypothetical shorthand of "foo".to_string()

egilburg commented Sep 14, 2017

Backslashes could connote escaping identifiers, like \ident, perhaps surrounded like \ident, {ident}, etc. However, the infix RFC #1579 currently seems to be leaning towards \op syntax already.

It doesn't seem that like RFC has a lot of traction. Backslashes are intuitive as "escape" characters. I feel just \ident is also more ergonomic than \ident\.

Seeing a letter prefix like r# seems to imply more like literal casting. E.g. s"foo" as hypothetical shorthand of "foo".to_string()

@cuviper

This comment has been minimized.

Show comment
Hide comment
@cuviper

cuviper Sep 14, 2017

Member

@egilburg

Seeing a letter prefix like r# seems to imply more like literal casting.

It's meant to seem more like raw strings, e.g. r#foo is equivalent to foo, just like r"foo" and r#"foo"# are equivalent to "foo". And such raw strings already exist, unlike your hypothetical, but I do take the point that this wasn't intuitive to you.

Member

cuviper commented Sep 14, 2017

@egilburg

Seeing a letter prefix like r# seems to imply more like literal casting.

It's meant to seem more like raw strings, e.g. r#foo is equivalent to foo, just like r"foo" and r#"foo"# are equivalent to "foo". And such raw strings already exist, unlike your hypothetical, but I do take the point that this wasn't intuitive to you.

@petrochenkov

This comment has been minimized.

Show comment
Hide comment
@petrochenkov

petrochenkov Sep 14, 2017

Contributor

This RFC tries to solve a problem that doesn't exist and won't exist is epochs are done in responsible way.

catch specifically is a bad motivating example because catch as a context-dependent identifier has exactly zero breakage in practice, i.e. infinitely less breakage than routinely done by standard library additions.

Contributor

petrochenkov commented Sep 14, 2017

This RFC tries to solve a problem that doesn't exist and won't exist is epochs are done in responsible way.

catch specifically is a bad motivating example because catch as a context-dependent identifier has exactly zero breakage in practice, i.e. infinitely less breakage than routinely done by standard library additions.

@petrochenkov

This comment has been minimized.

Show comment
Hide comment
@petrochenkov

petrochenkov Sep 14, 2017

Contributor

There is also a minor technical issue with raw identifiers - some logic in the compiler relies on keywords being unusable as item names.
For example, it would be pretty unfortunate if you could create a type named Self, self or super. Maybe there are other cases, but I can't recall them right away.

Contributor

petrochenkov commented Sep 14, 2017

There is also a minor technical issue with raw identifiers - some logic in the compiler relies on keywords being unusable as item names.
For example, it would be pretty unfortunate if you could create a type named Self, self or super. Maybe there are other cases, but I can't recall them right away.

@est31

This comment has been minimized.

Show comment
Hide comment
@est31

est31 Sep 14, 2017

Contributor

@petrochenkov 's argument that standard library additions mean a similar amount of breakage has convinced me that this feature is not required. I think its better off to just simply change the identifiers to not use keywords again, maybe forcing an API bump.

Contributor

est31 commented Sep 14, 2017

@petrochenkov 's argument that standard library additions mean a similar amount of breakage has convinced me that this feature is not required. I think its better off to just simply change the identifiers to not use keywords again, maybe forcing an API bump.

@burdges

This comment has been minimized.

Show comment
Hide comment
@burdges

burdges Sep 14, 2017

You'd import this old API via use statements, right? I'd think use statements could address this, like use old_crate::dyn as old_crate_dyn;, so long as the new keywords does not appear in use statements.

burdges commented Sep 14, 2017

You'd import this old API via use statements, right? I'd think use statements could address this, like use old_crate::dyn as old_crate_dyn;, so long as the new keywords does not appear in use statements.

@cuviper

This comment has been minimized.

Show comment
Hide comment
@cuviper

cuviper Sep 14, 2017

Member

@petrochenkov

This RFC tries to solve a problem that doesn't exist and won't exist is epochs are done in responsible way.

catch specifically is a bad motivating example because catch as a context-dependent identifier has exactly zero breakage in practice, i.e. infinitely less breakage than routinely done by standard library additions.

AFAICS, catch is still explicitly mentioned as a motivator in the epochs RFC, along with the general desire for new keywords. If you think that there are reasonable rules for adding keywords without breaking epoch interoperability, then shouldn't that be spelled out in that RFC? (I confess I stopped reading that discussion a while ago though.)

@burdges

You'd import this old API via use statements, right? I'd think use statements could address this, like use old_crate::dyn as old_crate_dyn;

That's ok for free items, but you can't import associated items like methods this way. Maybe that can still use a UFCS form -- in the baseball example, you'd write Player::catch(&mut player, ball). I don't think there's any such workaround for struct fields though.

If new keywords are always considered identifiers in the context of paths (foo::catch) or fields/methods (foo.catch), then perhaps use-renaming can take care of the rest. I'm not sure.

Member

cuviper commented Sep 14, 2017

@petrochenkov

This RFC tries to solve a problem that doesn't exist and won't exist is epochs are done in responsible way.

catch specifically is a bad motivating example because catch as a context-dependent identifier has exactly zero breakage in practice, i.e. infinitely less breakage than routinely done by standard library additions.

AFAICS, catch is still explicitly mentioned as a motivator in the epochs RFC, along with the general desire for new keywords. If you think that there are reasonable rules for adding keywords without breaking epoch interoperability, then shouldn't that be spelled out in that RFC? (I confess I stopped reading that discussion a while ago though.)

@burdges

You'd import this old API via use statements, right? I'd think use statements could address this, like use old_crate::dyn as old_crate_dyn;

That's ok for free items, but you can't import associated items like methods this way. Maybe that can still use a UFCS form -- in the baseball example, you'd write Player::catch(&mut player, ball). I don't think there's any such workaround for struct fields though.

If new keywords are always considered identifiers in the context of paths (foo::catch) or fields/methods (foo.catch), then perhaps use-renaming can take care of the rest. I'm not sure.

@burdges

This comment has been minimized.

Show comment
Hide comment
@burdges

burdges Sep 14, 2017

We're only worried about catch, dyn, and default right now, yes? And default must stay contextual anyways. We cannot add keywords forever regardless, not without driving away users.

I think perhaps the best solution might be prefixing each usage by an attribute #[epoch(...)], so typically #[epoch(...)] use old_crate::dyn as old_crate_dyn;

I doubt struct fields would be too problematic in practice, but methods could maybe be renamed with local inherent impls for traits:

impl<T: Player> T {
    fn old_catch(...) {  #[epoch(...)] <T as Player>::catch(...)  }
}

I suppose use syntax could maybe rename struct fields and methods if push really came to shove, but the attribute can handle them directly if that ever happens.

burdges commented Sep 14, 2017

We're only worried about catch, dyn, and default right now, yes? And default must stay contextual anyways. We cannot add keywords forever regardless, not without driving away users.

I think perhaps the best solution might be prefixing each usage by an attribute #[epoch(...)], so typically #[epoch(...)] use old_crate::dyn as old_crate_dyn;

I doubt struct fields would be too problematic in practice, but methods could maybe be renamed with local inherent impls for traits:

impl<T: Player> T {
    fn old_catch(...) {  #[epoch(...)] <T as Player>::catch(...)  }
}

I suppose use syntax could maybe rename struct fields and methods if push really came to shove, but the attribute can handle them directly if that ever happens.

@withoutboats

This comment has been minimized.

Show comment
Hide comment
@withoutboats

withoutboats Sep 14, 2017

Contributor

I could see limiting this to only reserved words, but limiting to only those reserved words which were introduced in an epoch seems unnecessary & potentially confusing for users who encounter this feature and don't know when each keyword was introduced. In general, we have taken a very free hand with the syntax and use lints, social conventions and rustfmt to keep everyone on the same page, and I don't see a reason to do things differently here.

This seems like a straightforward solution to a basic problem to me.

Contributor

withoutboats commented Sep 14, 2017

I could see limiting this to only reserved words, but limiting to only those reserved words which were introduced in an epoch seems unnecessary & potentially confusing for users who encounter this feature and don't know when each keyword was introduced. In general, we have taken a very free hand with the syntax and use lints, social conventions and rustfmt to keep everyone on the same page, and I don't see a reason to do things differently here.

This seems like a straightforward solution to a basic problem to me.

@kennytm

This comment has been minimized.

Show comment
Hide comment
@kennytm

kennytm Sep 15, 2017

Member

One more alternative: C# allows bare Unicode escapes as part of identifier. (Very ugly, not recommending it, but still an alternative.)

class Class1
{
    static void M() {
        cl\u0061ss.st\u0061tic(true);
    }
}

(This "feature" is probably inspired by Java, but you can't define a keyword-identifier like this in Java.)

Member

kennytm commented Sep 15, 2017

One more alternative: C# allows bare Unicode escapes as part of identifier. (Very ugly, not recommending it, but still an alternative.)

class Class1
{
    static void M() {
        cl\u0061ss.st\u0061tic(true);
    }
}

(This "feature" is probably inspired by Java, but you can't define a keyword-identifier like this in Java.)

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Sep 16, 2017

Member

Not necessarily an alternative, but Dart uses #ident (but also e.g. #+, to refer to operator+).

Member

eddyb commented Sep 16, 2017

Not necessarily an alternative, but Dart uses #ident (but also e.g. #+, to refer to operator+).

@cuviper

This comment has been minimized.

Show comment
Hide comment
@cuviper

cuviper Sep 17, 2017

Member

OK, I noted Dart, but it looks like #ident would break macros-1.0 too.

Member

cuviper commented Sep 17, 2017

OK, I noted Dart, but it looks like #ident would break macros-1.0 too.

@est31

This comment has been minimized.

Show comment
Hide comment
@est31

est31 Sep 17, 2017

Contributor

@cuviper not just that I think people also wonder whether to use them in macros 2.0 for escaping hygiene.

Contributor

est31 commented Sep 17, 2017

@cuviper not just that I think people also wonder whether to use them in macros 2.0 for escaping hygiene.

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Sep 18, 2017

Member

@cuviper Hmm, so these are the official docs - but they don't mention # used with operators.
Anyway, I know # wouldn't work for rust, but as I mentioned on the forums, r#+::r#+ could be a strange and interesting replacement for Add::add (not entirely serious suggestion).

Member

eddyb commented Sep 18, 2017

@cuviper Hmm, so these are the official docs - but they don't mention # used with operators.
Anyway, I know # wouldn't work for rust, but as I mentioned on the forums, r#+::r#+ could be a strange and interesting replacement for Add::add (not entirely serious suggestion).

@est31

This comment has been minimized.

Show comment
Hide comment
@est31

est31 Sep 19, 2017

Contributor

This proposal reminds me of C/C++ trigraphs which are on their way out with C++17. I'm sure like trigraphs this feature will be used more by people who want to write confusing code than for its actually intended purpose... Also, I don't think that it will be of any good if cargo and the rust compiler now switch to using r#crate everywhere instead of krate.

Do you really have to modify the language and add a whole new way of referring to identifiers just because you are scared of implementing analysis of which identifiers are still free in rustfix?

Contributor

est31 commented Sep 19, 2017

This proposal reminds me of C/C++ trigraphs which are on their way out with C++17. I'm sure like trigraphs this feature will be used more by people who want to write confusing code than for its actually intended purpose... Also, I don't think that it will be of any good if cargo and the rust compiler now switch to using r#crate everywhere instead of krate.

Do you really have to modify the language and add a whole new way of referring to identifiers just because you are scared of implementing analysis of which identifiers are still free in rustfix?

@cuviper

This comment has been minimized.

Show comment
Hide comment
@cuviper

cuviper Sep 19, 2017

Member

This proposal reminds me of C/C++ trigraphs which are on their way out with C++17.

Come on, r#ident is not anywhere near as obfuscating as trigraphs!

Also, I don't think that it will be of any good if cargo and the rust compiler now switch to using r#crate everywhere instead of krate.

The RFC explicitly recommends using alternatives like krate when possible.

Do you really have to modify the language and add a whole new way of referring to identifiers just because you are scared of implementing analysis of which identifiers are still free in rustfix?

I don't see how rustfix is relevant. The point is to have compatibility using older APIs that may not get updated, for whatever reason. Maybe said crate just doesn't want to make a breaking change to avoid the new keyword, maybe the maintainer is on holiday, etc.

This is just a means towards keeping Rust's overall compatibility goals.

Member

cuviper commented Sep 19, 2017

This proposal reminds me of C/C++ trigraphs which are on their way out with C++17.

Come on, r#ident is not anywhere near as obfuscating as trigraphs!

Also, I don't think that it will be of any good if cargo and the rust compiler now switch to using r#crate everywhere instead of krate.

The RFC explicitly recommends using alternatives like krate when possible.

Do you really have to modify the language and add a whole new way of referring to identifiers just because you are scared of implementing analysis of which identifiers are still free in rustfix?

I don't see how rustfix is relevant. The point is to have compatibility using older APIs that may not get updated, for whatever reason. Maybe said crate just doesn't want to make a breaking change to avoid the new keyword, maybe the maintainer is on holiday, etc.

This is just a means towards keeping Rust's overall compatibility goals.

@scottmcm

This comment has been minimized.

Show comment
Hide comment
@scottmcm

scottmcm Sep 19, 2017

Member

This proposal reminds me of C/C++ trigraphs

This doesn't remind me of trigraphs in the slightest. Those are there for character sets without symbols used by the language, or for people who cannot type them. I agree we don't have that need.

Instead it reminds me of @class in C#, since different .Net languages can have different sets of keywords. Sure, people are discouraged from using certain things, but if you need to use them, you need to use them. And sometimes it leads to nice libraries, like how in Razor one can set HTML attributes with syntax like

new { style="max-width: 66ex", @class = "textcontent" }

It could tell them to use klass, but it's just as easy to tell them to use @class, and using klass for class there would prevent people from being able to set a klass attribute if they so wanted.

Member

scottmcm commented Sep 19, 2017

This proposal reminds me of C/C++ trigraphs

This doesn't remind me of trigraphs in the slightest. Those are there for character sets without symbols used by the language, or for people who cannot type them. I agree we don't have that need.

Instead it reminds me of @class in C#, since different .Net languages can have different sets of keywords. Sure, people are discouraged from using certain things, but if you need to use them, you need to use them. And sometimes it leads to nice libraries, like how in Razor one can set HTML attributes with syntax like

new { style="max-width: 66ex", @class = "textcontent" }

It could tell them to use klass, but it's just as easy to tell them to use @class, and using klass for class there would prevent people from being able to set a klass attribute if they so wanted.

@cuviper

This comment has been minimized.

Show comment
Hide comment
@cuviper

cuviper Feb 7, 2018

Member

The internals discussion raised the idea of using such arbitrary strings for test names.

Member

cuviper commented Feb 7, 2018

The internals discussion raised the idea of using such arbitrary strings for test names.

@burdges

This comment has been minimized.

Show comment
Hide comment
@burdges

burdges Feb 8, 2018

Also if there is any route to permitting macro application in declaration position then macro syntax like ident!("name") works and maybe even use somecrate::ident!("name") as local_name; could work.

burdges commented Feb 8, 2018

Also if there is any route to permitting macro application in declaration position then macro syntax like ident!("name") works and maybe even use somecrate::ident!("name") as local_name; could work.

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Feb 8, 2018

Contributor

@joshtriplett

As a technical issue, r#foo would require clear tokenization rules for where foo stops. r#foo# wouldn't.

Yes. Presumably it's the standard Rust identifier rules.

Do we care about identifiers with weird symbols in them, or spaces in them? Or only symbols that clash with keywords?

That's the question, isn't it? As I said, I lean personally towards r#foo# ("raw identifiers", analogous with "raw strings"), which makes them both more flexible and more annoying to use. =)

However, I also suspect that r#foo will be "good enough". It certainly fills the critical use case (bridging epochs).

Contributor

nikomatsakis commented Feb 8, 2018

@joshtriplett

As a technical issue, r#foo would require clear tokenization rules for where foo stops. r#foo# wouldn't.

Yes. Presumably it's the standard Rust identifier rules.

Do we care about identifiers with weird symbols in them, or spaces in them? Or only symbols that clash with keywords?

That's the question, isn't it? As I said, I lean personally towards r#foo# ("raw identifiers", analogous with "raw strings"), which makes them both more flexible and more annoying to use. =)

However, I also suspect that r#foo will be "good enough". It certainly fills the critical use case (bridging epochs).

@aturon aturon removed the I-nominated label Feb 8, 2018

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin Feb 10, 2018

Contributor

The RFC as proposed does not change which characters can be used in an identifier. It only allows having identifiers that would otherwise be keywords. I’m not for or against this proposal.

I would be opposed to raw identifier allowing arbitrary characters. CSS does this, and it’s just nonsense. For example you can have a CSS custom property whose name is literally the ASCII space.

Serde already has #[serde(rename = "foo")] for name that are not valid Rust identifiers. For FFI we have #[link_name = "foo"].

Contributor

SimonSapin commented Feb 10, 2018

The RFC as proposed does not change which characters can be used in an identifier. It only allows having identifiers that would otherwise be keywords. I’m not for or against this proposal.

I would be opposed to raw identifier allowing arbitrary characters. CSS does this, and it’s just nonsense. For example you can have a CSS custom property whose name is literally the ASCII space.

Serde already has #[serde(rename = "foo")] for name that are not valid Rust identifiers. For FFI we have #[link_name = "foo"].

@rfcbot

This comment has been minimized.

Show comment
Hide comment
@rfcbot

rfcbot Feb 14, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

rfcbot commented Feb 14, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@cuviper

This comment has been minimized.

Show comment
Hide comment
@cuviper

cuviper Feb 15, 2018

Member

I didn't get around to adding an alternative about just renaming/aliasing. Is that still wanted?
(Personally, I feel that approach is too limited to really address the issue.)

Member

cuviper commented Feb 15, 2018

I didn't get around to adding an alternative about just renaming/aliasing. Is that still wanted?
(Personally, I feel that approach is too limited to really address the issue.)

@burdges

This comment has been minimized.

Show comment
Hide comment
@burdges

burdges Feb 15, 2018

Is the macro like syntax ident!("name") unworkable for parser or hygiene reasons?

burdges commented Feb 15, 2018

Is the macro like syntax ident!("name") unworkable for parser or hygiene reasons?

@cuviper

This comment has been minimized.

Show comment
Hide comment
@cuviper

cuviper Feb 15, 2018

Member

@burdges AFAIK macros don't work in ident positions, which is why concat_idents! can't actually do much.

Member

cuviper commented Feb 15, 2018

@burdges AFAIK macros don't work in ident positions, which is why concat_idents! can't actually do much.

@bstrie

This comment has been minimized.

Show comment
Hide comment
@bstrie

bstrie Feb 21, 2018

Contributor

Re: syntax, just go with \foo. The symmetry with escaping is obvious and avoids the unjustified construction of r#foo (which, additionally, nobody here seems to be fond of even the slightest bit). The only objection given in the text is that a different RFC--one that will almost certainly never be accepted--had considered possibly using that syntax. This RFC is profoundly more important than that one (though FFI ought to be the primary motivation, rather than epoch breakage).

Re: extending this feature to putting arbitrary Unicode in identifiers: don't. That's a subject for its own RFC and its own bikeshed. Be maximally conservative here.

Contributor

bstrie commented Feb 21, 2018

Re: syntax, just go with \foo. The symmetry with escaping is obvious and avoids the unjustified construction of r#foo (which, additionally, nobody here seems to be fond of even the slightest bit). The only objection given in the text is that a different RFC--one that will almost certainly never be accepted--had considered possibly using that syntax. This RFC is profoundly more important than that one (though FFI ought to be the primary motivation, rather than epoch breakage).

Re: extending this feature to putting arbitrary Unicode in identifiers: don't. That's a subject for its own RFC and its own bikeshed. Be maximally conservative here.

@est31

This comment has been minimized.

Show comment
Hide comment
@est31

est31 Feb 21, 2018

Contributor

\foo is not ugly enough. This needs to be maximally ugly and minimally useful in order to be a strong enough deterrent. Just go with r#foo as well as a lint that checks for usage of the feature outside of a whitelist of recently introduced or to be introduced keywords.

Contributor

est31 commented Feb 21, 2018

\foo is not ugly enough. This needs to be maximally ugly and minimally useful in order to be a strong enough deterrent. Just go with r#foo as well as a lint that checks for usage of the feature outside of a whitelist of recently introduced or to be introduced keywords.

@bstrie

This comment has been minimized.

Show comment
Hide comment
@bstrie

bstrie Feb 21, 2018

Contributor

@est31 Though there do exist features that ought to be made syntactically ugly in order to discourage their use, this isn't one of them. There is nothing dangerous whatsoever about this feature, and nothing useful about it that risks overuse (or any use at all) except in unfortunate circumstances that will require users to use it. Any argument that people will deliberately use this to obfuscate their code is even more damning of r#foo, because that is more obfuscatory than \foo. Let's not penalize people who are already being penalized by being forced to annotate their code for the sake of forwards compatibility. And let's not give people any more reason than necessary to look at a random unfamiliar piece of Rust code and wonder, "what the hell could this possibly mean?".

Contributor

bstrie commented Feb 21, 2018

@est31 Though there do exist features that ought to be made syntactically ugly in order to discourage their use, this isn't one of them. There is nothing dangerous whatsoever about this feature, and nothing useful about it that risks overuse (or any use at all) except in unfortunate circumstances that will require users to use it. Any argument that people will deliberately use this to obfuscate their code is even more damning of r#foo, because that is more obfuscatory than \foo. Let's not penalize people who are already being penalized by being forced to annotate their code for the sake of forwards compatibility. And let's not give people any more reason than necessary to look at a random unfamiliar piece of Rust code and wonder, "what the hell could this possibly mean?".

@Ixrec

This comment has been minimized.

Show comment
Hide comment
@Ixrec

Ixrec Feb 21, 2018

Contributor

Maybe this is just me, but if I had no idea Rust had a raw identifiers feature, and I saw r#foo or \foo in the wild, I'd probably be able to guess what r# was doing while \ I wouldn't even dare to guess. The "symmetry with escaping" is not obvious for me, because in every other language I know the very notion of "escaping" is unique to string/regex literals. When I see \foo my first thought is of Haskell's lambda syntax (and I've hardly ever used Haskell). On the other hand, any "r and a sigil" syntax immediately reminds me of things like C++'s raw string literals, which is "raw" in the same sense that raw identifiers are raw.

I don't object to the notion that r# is "ugly", but I do object to the notion that it's (in any non-purely-subjective sense) more obfuscating or more of a penalty than \ is. For me it's quite the opposite.

Contributor

Ixrec commented Feb 21, 2018

Maybe this is just me, but if I had no idea Rust had a raw identifiers feature, and I saw r#foo or \foo in the wild, I'd probably be able to guess what r# was doing while \ I wouldn't even dare to guess. The "symmetry with escaping" is not obvious for me, because in every other language I know the very notion of "escaping" is unique to string/regex literals. When I see \foo my first thought is of Haskell's lambda syntax (and I've hardly ever used Haskell). On the other hand, any "r and a sigil" syntax immediately reminds me of things like C++'s raw string literals, which is "raw" in the same sense that raw identifiers are raw.

I don't object to the notion that r# is "ugly", but I do object to the notion that it's (in any non-purely-subjective sense) more obfuscating or more of a penalty than \ is. For me it's quite the opposite.

@petrochenkov

This comment has been minimized.

Show comment
Hide comment
@petrochenkov

petrochenkov Feb 21, 2018

Contributor

If this feature has to happen, I'd rather use r#ident# that is fully symmetrical with raw strings and potentially extensible (to $ in identifiers, for example, or something else).

Contributor

petrochenkov commented Feb 21, 2018

If this feature has to happen, I'd rather use r#ident# that is fully symmetrical with raw strings and potentially extensible (to $ in identifiers, for example, or something else).

@est31

This comment has been minimized.

Show comment
Hide comment
@est31

est31 Feb 21, 2018

Contributor

Let's not penalize people who are already being penalized by being forced to annotate their code for the sake of forwards compatibility.

Have you even read the epochs RFC? Code under the old epoch will always compile. If you switch epochs, this can already be seen by some as a semver-breaking change as most likely you are switching the minimum supported rustc version (it disrupts anyone stuck on an old compiler, so it is a breaking change!). So do it properly and just replace all the idents you have with proper new names for them.

Contributor

est31 commented Feb 21, 2018

Let's not penalize people who are already being penalized by being forced to annotate their code for the sake of forwards compatibility.

Have you even read the epochs RFC? Code under the old epoch will always compile. If you switch epochs, this can already be seen by some as a semver-breaking change as most likely you are switching the minimum supported rustc version (it disrupts anyone stuck on an old compiler, so it is a breaking change!). So do it properly and just replace all the idents you have with proper new names for them.

@aturon

This comment has been minimized.

Show comment
Hide comment
@aturon

aturon Feb 21, 2018

Member

@est31

Have you even read the epochs RFC?

Please tone it down.

While @bstrie is incorrect about the detail you mention, the rest of his post I think presents a fine argument for the proposed syntax (or something close to it).

Member

aturon commented Feb 21, 2018

@est31

Have you even read the epochs RFC?

Please tone it down.

While @bstrie is incorrect about the detail you mention, the rest of his post I think presents a fine argument for the proposed syntax (or something close to it).

@burdges

This comment has been minimized.

Show comment
Hide comment
@burdges

burdges Feb 21, 2018

I think \foo is problematic because \ has no similar meaning in other languages. Instead \ gets used for lambda expressions, set difference, left actions, shorter escapes, etc. Rust might want it for infix symbols or whatever. There is a vastly weaker but similar argument against using #, but afaik no such argument against using $ since Rust is not an ML style language.

I still personally like both use whatever::"ident" as my_ident; and ident!("name"), but those only work for importing strange symbols, not exporting them, which maybe people wish to do. We could have both ident!("ident") for rarely used imports and def_ident!(my_ident,"ident"); for both frequently used imports and exports though.

Aside from r$ident$ or r#ident#, one could even imagine $my_ident where previously const my_ident : &'static str = "ident";

Also, we'll need to use these identifiers anyways, right? Assuming so, a use wherever::"ident" as my_ident; makes good sense. Is it too strange to use self::"ident" as my_ident; for exporting?

burdges commented Feb 21, 2018

I think \foo is problematic because \ has no similar meaning in other languages. Instead \ gets used for lambda expressions, set difference, left actions, shorter escapes, etc. Rust might want it for infix symbols or whatever. There is a vastly weaker but similar argument against using #, but afaik no such argument against using $ since Rust is not an ML style language.

I still personally like both use whatever::"ident" as my_ident; and ident!("name"), but those only work for importing strange symbols, not exporting them, which maybe people wish to do. We could have both ident!("ident") for rarely used imports and def_ident!(my_ident,"ident"); for both frequently used imports and exports though.

Aside from r$ident$ or r#ident#, one could even imagine $my_ident where previously const my_ident : &'static str = "ident";

Also, we'll need to use these identifiers anyways, right? Assuming so, a use wherever::"ident" as my_ident; makes good sense. Is it too strange to use self::"ident" as my_ident; for exporting?

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Feb 23, 2018

Member

IMO we should be using \ for escaping tokens in macros, at the very least.
E.g. \$$name:ident matching $foo and $($x:ident)\++ using + as a separator.

Member

eddyb commented Feb 23, 2018

IMO we should be using \ for escaping tokens in macros, at the very least.
E.g. \$$name:ident matching $foo and $($x:ident)\++ using + as a separator.

@Centril

This comment has been minimized.

Show comment
Hide comment
@Centril

Centril Feb 24, 2018

Contributor

Linking #1579 with respect to using a \dot b for infix method syntax vs. using it for raw identifiers.

Contributor

Centril commented Feb 24, 2018

Linking #1579 with respect to using a \dot b for infix method syntax vs. using it for raw identifiers.

@rfcbot

This comment has been minimized.

Show comment
Hide comment
@rfcbot

rfcbot Feb 24, 2018

The final comment period is now complete.

rfcbot commented Feb 24, 2018

The final comment period is now complete.

@twmb

This comment has been minimized.

Show comment
Hide comment
@twmb

twmb Feb 25, 2018

If there isn't, can there be a page for why words are reserved, and why these reserved words cannot be contextual?

twmb commented Feb 25, 2018

If there isn't, can there be a page for why words are reserved, and why these reserved words cannot be contextual?

@Centril Centril referenced this pull request Feb 27, 2018

Closed

Tracking issue for RFC 2151, Raw Identifiers #48589

3 of 7 tasks complete

@Centril Centril merged commit 0574612 into rust-lang:master Feb 27, 2018

@Centril

This comment has been minimized.

Show comment
Hide comment
@Centril

Centril Feb 27, 2018

Contributor

Huzzah! The RFC is merged!

Tracking issue: rust-lang/rust#48589

Contributor

Centril commented Feb 27, 2018

Huzzah! The RFC is merged!

Tracking issue: rust-lang/rust#48589

@ssokolow

This comment has been minimized.

Show comment
Hide comment
@ssokolow

ssokolow Feb 28, 2018

I don't know how I missed both the initial announcement of this and the FCP call, but I just have to share a perspective on the intuitiveness of \ident vs. r#ident that I didn't see from anyone else.

As someone whose experience is more or less exclusively in imperative languages, whenever I see \ being used outside a literal DOS/Windows path, I have a strong expectation that it is a fixed-width token, consisting of the slash and the character which follows... so when I see \ident, I can't help but expect it to produce error: unknown character escape: i.

By contrast, r#ident gives me the impression that # is being used as some kind of namespacing operator similar to ::... which is essentially correct. You're conceptually unifying reserved words and identifiers and then restricting to the identifier side to override the default precedence... the only difference is that, instead of precedence being within the resoution of "an identifier node in the AST" level, you're operating more at the level of translating tokens into AST nodes.

ssokolow commented Feb 28, 2018

I don't know how I missed both the initial announcement of this and the FCP call, but I just have to share a perspective on the intuitiveness of \ident vs. r#ident that I didn't see from anyone else.

As someone whose experience is more or less exclusively in imperative languages, whenever I see \ being used outside a literal DOS/Windows path, I have a strong expectation that it is a fixed-width token, consisting of the slash and the character which follows... so when I see \ident, I can't help but expect it to produce error: unknown character escape: i.

By contrast, r#ident gives me the impression that # is being used as some kind of namespacing operator similar to ::... which is essentially correct. You're conceptually unifying reserved words and identifiers and then restricting to the identifier side to override the default precedence... the only difference is that, instead of precedence being within the resoution of "an identifier node in the AST" level, you're operating more at the level of translating tokens into AST nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment