Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for RFC 2151, Raw Identifiers #48589

Closed
Centril opened this Issue Feb 27, 2018 · 63 comments

Comments

@Centril
Copy link
Contributor

Centril commented Feb 27, 2018

This is a tracking issue for RFC 2151 (rust-lang/rfcs#2151).

Steps:

Unresolved questions:

  • Do macros need any special care with such identifier tokens?
    Probably not.
  • Should diagnostics use the r# syntax when printing identifiers that overlap keywords?
    Depends on the edition?
  • Does rustdoc need to use the r# syntax? e.g. to document pub use old_epoch::*
@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Feb 28, 2018

@Centril you rock! @petrochenkov, think you could supply some mentoring instructions here?

@Lymia

This comment has been minimized.

Copy link
Contributor

Lymia commented Mar 6, 2018

I'd like to take a shot at this. It seems like it'd be a decent way to learn how rustc works.

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Mar 7, 2018

The relevant code is probably this, you'll want to make the ('r', Some('#'), _) case allow for the third character to be alphabetic or an underscore, and in that case skip the r and the # before running ident_continue.

We could add an is_raw boolean to token::Ident as well.

You'll also need to feature gate this but we can do that later.

lmk if you have questions

@petrochenkov

This comment has been minimized.

Copy link
Contributor

petrochenkov commented Mar 7, 2018

We could add an is_raw boolean to token::Ident as well.

This is possible, but would be unfortunate. Idents are used everywhere and supposed to be small.

Ideally we should limit the effect of r# to lexer, for example by interning r#keyword identifiers into separate slots (like gensyms) so r#keyword and keyword have different NameSymbols.

EDIT: The first paragraph is about ast::Ident, token::Ident may actually be the appropriate place.

@petrochenkov

This comment has been minimized.

Copy link
Contributor

petrochenkov commented Mar 7, 2018

Some clarifications are needed:

  • How r# affects context-dependent identifiers (aka weak keywords) like default.
    Do they lose their special context-dependent properties and turn into "normal identifiers"?
// `union` is a normal ident, this is not an error
union U {
    ....
}

// `union` is a raw ident, is this an error?
r#union U {
    ...
}
  • How does r# affect keywords that are "semantically special" and not "syntactically special"?
    I'm talking about path segment keywords specifically.
    For example, Self in Self::A::B is already treated as normal identifier during parsing, it only gains special abilities during name resolution when we resolve identifiers named Self (or self/super/etc) in a special way.
#[derive(Default)]
struct S;

impl S {
    fn f() -> S {
        r#Self::default() // Is this an error?
    }  
}
@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Mar 7, 2018

oh, I didn't realize we reuse Ident from the lexer.

I think r#union is an error (when used to create a union). We'll need the ident lexing step to return a bool on the lexed ident's raw-ness.

I think it's ok for r#Self to work; but don't mind either way

@petrochenkov

This comment has been minimized.

Copy link
Contributor

petrochenkov commented Mar 7, 2018

Also, lifetime identifiers weren't covered by the RFC - r#'ident or 'r#ident.
(One more case of ident vs lifetime mismatch caused by lifetime token being a separate entity rather than a combination of ' and identifier, cc https://internals.rust-lang.org/t/pre-rfc-splitting-lifetime-into-two-tokens/6716).

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Mar 7, 2018

I think it's fine if we don't have raw lifetime identifiers. Lifetimes are crate-local, their identifiers never need to be used by consumers of your crate, so lifetimes clashing with keywords can simply be fixed on epochs. Admittedly, writing a lint that makes that automatic may be tricky.

Raw identifiers are primarily necessary because people may need to call e.g. functions named catch() in crates on an older epoch. This problem doesn't occur for lifetimes.

@petrochenkov

This comment has been minimized.

Copy link
Contributor

petrochenkov commented Mar 7, 2018

Yeah, it's mostly a consistency question rather than a practical issue.

@Lymia

This comment has been minimized.

Copy link
Contributor

Lymia commented Mar 7, 2018

From what I've been seeing while looking around the codebase, I think the best way to implement this is to add a new parameter to token::Ident, rather than messing with the Symbol itself?

I think this would make implementing epoch-specific keywords easier, since there's no question of what Symbol should be used when, and in what epoch. (For example, you'd have to make sure the Symbol for catch being used as an identifier in 2015 epoch code is the same as the Symbol for r#catch being used in a epoch where it's a full keyword.) This was already something I wasn't sure how to handle with contextual keywords.

My main questions, right now, would be:

  • Does this actually sound like the best approach?
  • Reading the code, it looks like most feature gating is done on the AST after parsing, and not during parsing. Since nothing in the AST would reflect the raw identifiers being there at all with this approach, the feature check would have to be in parser.rs. Would adding one there be an issue? How would I go about doing that, considering that module doesn't have other feature checks that I can use as a template?
  • A minor code style point: right now, token::Ident is declared as Ident(ast::Ident). To add an is_raw field would mean having a mystery unnamed bool field in a tuple struct, or making it use named fields, in which case, matching on token::Idents becomes nastier. One idea that did come to mind is adding an RawIdent(ast::Ident) variant, but then the compiler can't help me find places I might need to worry about raw identifiers. Any advice on this?

I'll implement lifetime parameters if it turns out to be easy to, I guess. As Manishearth said, it's not something you really need to escape ever.

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Mar 7, 2018

Yes, we should not be affecting Symbol.

Regarding the feature gate, we can solve the problem later, but I was thinking of doing a delayed error or something since we don't know what feature gates are available whilst lexing

a mystery unnamed bool field in a tuple struct

I think that's fine. Folks usually do this as Ident(ast::Ident, /* is_raw */ bool)

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Mar 7, 2018

I think it's best if we don't allow this to work for lifetime parameters, actually. We restrict the number of places where raw identifiers are allowed at a first pass, and if we need this for a later epoch, we add it then

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Mar 7, 2018

But yeah, your plan sounds good otherwise

@Lymia

This comment has been minimized.

Copy link
Contributor

Lymia commented Mar 7, 2018

.... right, that makes sense. The lexer obviously doesn't know what feature flags there are, because it's busy lexing them. :D

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Mar 8, 2018

@petrochenkov

In my opinion, r#foo should always be a "generic identifier" and hence not eligible as a contextual keyword. So:

How r# affects context-dependent identifiers (aka weak keywords) like default.

They are not special anymore. r#union Foo would be an error.

How does r# affect keywords that are "semantically special" and not "syntactically special"?

Is this just about self and Self? Does this apply to super too? We currently talk about self and Self as if they were keywords. I am therefore inclined to think that r#self would be "just another name" and not have the special properties that self ordinarily has.

So e.g. use r#self::foo would be an absolute path.

cc @rust-lang/lang -- do others agree?

@scottmcm

This comment has been minimized.

Copy link
Member

scottmcm commented Mar 8, 2018

@nikomatsakis That's exactly what I'd have expected.

That said, I don't know what that means with macros, since apparently this works:

macro_rules! foo {
    ($i:ident) => {
        $i Foo {
            x: u32,
            y: i32,
        }
    }
}

foo!(union);

(I wish it didn't, but it might be too late?)

@cuviper

This comment has been minimized.

Copy link
Member

cuviper commented Mar 8, 2018

I hesitate about treating self and r#self as distinct things, because it seems like this would allow overlapping names in the same scope, like:

impl Foo {
    fn foo(self, r#self: Bar) {
        println!("I have distinct {} and {} at the same time?", self, r#self);
    }
}

Maybe that's in fact OK, but if so it's at least a corner case to test...

(In general, foo and r#foo are supposed to refer to the same thing.)

@cuviper

This comment has been minimized.

Copy link
Member

cuviper commented Mar 8, 2018

@scottmcm I'd expect your foo!(union) to work, but not foo!(r#union). That is, macros will have to preserve the metadata whether a particular :ident is raw or not.

@Lymia

This comment has been minimized.

Copy link
Contributor

Lymia commented Mar 9, 2018

I've found some other unexpected places where raw identifiers might show up while implementing this. Should these be allowed?

  • #[r#struct]
  • macro_rules! foo { ($r#struct:expr) => { r#expr } }

There's some weirdness with the built-in procedural macros taking raw identifiers too:

  • concat_args!(r#abc, r#def)
  • format!("{struct}", r#struct = 0);

Also, libproc_macro seems to need to be able to deal with raw identifiers somehow too. I've been creating artificial Symbols with content like r#struct, but this doesn't seem like a very good solution since these Symbols wouldn't be used anywhere else. Any advice here?

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Mar 9, 2018

Procedural macros see them after lexing, so that will Just Work.

I don't think libproc_macro needs to know anything? Again, this is all after lexing.

I personally think it's fine to allow all those. Simplifies things.

@Lymia

This comment has been minimized.

Copy link
Contributor

Lymia commented Mar 9, 2018

libproc_macro has an unstable enum that represent a token: https://doc.rust-lang.org/nightly/proc_macro/enum.TokenNode.html

This needs to know about the new field in token::Ident to properly serialize/deserialize it. I don't know how much unstable stuff depends on it.

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Mar 9, 2018

@Lymia

This comment has been minimized.

Copy link
Contributor

Lymia commented Mar 9, 2018

That said, I don't know what that means with macros, since apparently this works:

macro_rules! foo {
    ($i:ident) => {
        $i Foo {
            x: u32,
            y: i32,
        }
    }
}

foo!(union);

Actually, checking on the playground, it looks like this isn't even unique to contextual keywords: https://play.rust-lang.org/?gist=98bba154f78cd9aba5838bf82ac2fbb4&version=stable

@Lymia

This comment has been minimized.

Copy link
Contributor

Lymia commented Mar 10, 2018

Hrm, another strange case that came up while writing tests:

Given this macro definition, which branch should test_macro!(r#a) match:

macro_rules! test_macro {
    (a) => { ... };
    (r#a) => { ... };
}
@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Mar 10, 2018

@nikomatsakis ^ ?

Could even make this change based on the epoch. idk.

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Mar 10, 2018

macro matching is kinda-sorta-breakable already

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented Jul 18, 2018

Oh, I see what you mean.

I meant that #foo carried meaning in quote that does not in regular macro-defining syntax, but that wasn't your point.

Looking at the implementation it seems to be fine.

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Jul 24, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Jul 31, 2018

You can't detect $foo properly in a macro_rules macro, but if you have a proc macro, it should work (at least now, maybe it has had issues in the past).

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Aug 3, 2018

The final comment period, with a disposition to merge, as per the review above, is now complete.

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

Mark-Simulacrum commented Aug 8, 2018

I've nominated this mostly so that someone (lang team, maybe) can find or select someone to write up docs and stabilize this feature.

@cramertj cramertj self-assigned this Aug 9, 2018

@cramertj

This comment has been minimized.

Copy link
Member

cramertj commented Aug 9, 2018

This is in need of a stabilization PR! There's a stabilization guide on the forge. This feature is already documented in the edition guide, so much of that documentation can probably be reused in the stabilization PRs. Please post here if you plan to take this issue! (I'll circulate it around and see if anyone wants to take it as a good first or third PR)

@alexreg

This comment has been minimized.

Copy link
Contributor

alexreg commented Aug 9, 2018

I'll have a go. This looks pretty straightforward.

@alexreg

This comment has been minimized.

Copy link
Contributor

alexreg commented Aug 9, 2018

When it comes to new documentation, is there anything that should be updated besides the Reference? I'm not sure it belongs in the Book.

kennytm added a commit to kennytm/rust that referenced this issue Aug 10, 2018

Rollup merge of rust-lang#53236 - alexreg:stabilise-raw-idents, r=cra…
…mertj

Stabilise raw_identifiers feature

* [Reference PR](rust-lang-nursery/reference#395)
* [Book PR](rust-lang/book#1480)
* [Rust by Example PR](rust-lang/rust-by-example#1095)

Closes rust-lang#48589.

r? @cramertj
CC @cuviper @Centril

@Centril Centril removed the I-nominated label Aug 11, 2018

kennytm added a commit to kennytm/rust that referenced this issue Aug 11, 2018

Rollup merge of rust-lang#53236 - alexreg:stabilise-raw-idents, r=cra…
…mertj

Stabilise raw_identifiers feature

* [Reference PR](rust-lang-nursery/reference#395)
* [Book PR](rust-lang/book#1480)
* [Rust by Example PR](rust-lang/rust-by-example#1095)

Closes rust-lang#48589.

r? @cramertj
CC @cuviper @Centril

kennytm added a commit to kennytm/rust that referenced this issue Aug 11, 2018

Rollup merge of rust-lang#53236 - alexreg:stabilise-raw-idents, r=cra…
…mertj

Stabilise raw_identifiers feature

* [Reference PR](rust-lang-nursery/reference#395)
* [Book PR](rust-lang/book#1480)
* [Rust by Example PR](rust-lang/rust-by-example#1095)

Closes rust-lang#48589.

r? @cramertj
CC @cuviper @Centril

kennytm added a commit to kennytm/rust that referenced this issue Aug 11, 2018

Rollup merge of rust-lang#53236 - alexreg:stabilise-raw-idents, r=cra…
…mertj

Stabilise raw_identifiers feature

* [Reference PR](rust-lang-nursery/reference#395)
* [Book PR](rust-lang/book#1480)
* [Rust by Example PR](rust-lang/rust-by-example#1095)

Closes rust-lang#48589.

r? @cramertj
CC @cuviper @Centril

kennytm added a commit to kennytm/rust that referenced this issue Aug 13, 2018

Rollup merge of rust-lang#53236 - alexreg:stabilise-raw-idents, r=cra…
…mertj

Stabilise raw_identifiers feature

* [Reference PR](rust-lang-nursery/reference#395)
* [Book PR](rust-lang/book#1480)
* [Rust by Example PR](rust-lang/rust-by-example#1095)

Closes rust-lang#48589.

r? @cramertj
CC @cuviper @Centril

bors added a commit that referenced this issue Aug 21, 2018

Auto merge of #53236 - alexreg:stabilise-raw-idents, r=cramertj
Stabilise raw_identifiers feature

* [Reference PR](rust-lang-nursery/reference#395)
* [Book PR](rust-lang/book#1480)
* [Rust by Example PR](rust-lang/rust-by-example#1095)

Closes #48589.

r? @cramertj
CC @cuviper @Centril

@bors bors closed this in #53236 Aug 21, 2018

@alexreg

This comment has been minimized.

Copy link
Contributor

alexreg commented Aug 21, 2018

Merged! I think some boxes can be ticked off now, @Centril. :-)

@alexreg

This comment has been minimized.

Copy link
Contributor

alexreg commented Aug 21, 2018

As for the unresolved questions:

  • Do macros need any special care with such identifier tokens?
    No, I can't see how this would affect macros. Perhaps @petrochenkov could second this, however.
  • Should diagnostics use the r# syntax when printing identifiers that overlap keywords?
    Yes, I think so, although possibly we should not do this if we're using an old edition and the keyword was only introduced in a later one.
  • Does rustdoc need to use the r# syntax? e.g. to document pub use old_epoch::*
    I'm not sure of this. What do you mean by pub use old_epoch::*, @Centril?
@cuviper

This comment has been minimized.

Copy link
Member

cuviper commented Aug 21, 2018

Does rustdoc need to use the r# syntax? e.g. to document pub use old_epoch::*
I'm not sure of this. What do you mean by pub use old_epoch::*

For instance, if the old_epoch crate had a fn catch() which it could declare without bothering with raw identifiers, and a new_epoch crate used and re-exported it. The new crate would have had to use raw identifiers if it had declared that function itself. Should this affect the way rustdoc presents it?

@alexreg

This comment has been minimized.

Copy link
Contributor

alexreg commented Aug 21, 2018

@cuviper Right, makes sense. I think the rules should be the same as what I proposed for diagnostics, in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.