Tracking Issue for RFC 3101: Reserved Prefixes #84978

nikomatsakis · 2021-05-06T10:00:09Z

This is a tracking issue for the RFC "Reserved Prefixes" (rust-lang/rfcs#3101).
The feature gate for the issue is #![feature(reserved_prefixes)].

About tracking issues

Tracking issues are used to record the overall progress of implementation.
They are also used as hubs connecting to other relevant issues, e.g., bugs or open design questions.
A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature.
Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.

Steps

Implement the RFC (implementation details are tracked in Implement RFC: Reserved prefixes in the 2021 edition #84599)
Adjust documentation (see instructions on rustc-dev-guide)
Stabilization FCP (see instructions on rustc-dev-guide) -- Stabilize reserved prefixes #88140

Unresolved Questions

How to manage the API proc_macro::TokenStream::from_str, which does not take any edition information? (raised here) --> Discussed in Reserved Prefixes RFC: How to handle TokenStream::from_str? #84979

Implementation history

The text was updated successfully, but these errors were encountered:

lrh2000 · 2021-05-12T14:20:05Z

@rustbot claim

I'm interested at implementing this. But there are some questions.

The RFC says foo# bar is treated as three tokens. But currently r# bar failed with found invalid character.

The error is weird and unnecessary (in my opinion).
It may lead to bad consistency. For example, we may reserve the prefixxr# and accept xr# bar as three tokens (that's what the RFC specified). Once we decide to use xr# represents something like a raw string, xr# bar may not compile anymore (due to the error found invalid character).

Possible solutions:

Remove the error found invalid character to improve consistency. I don't think it breaks anything.
Treat everything starting with foo# as an error (suggested by @m-ou-se). It is easy to implement but it does not conform to the requirements of the RFC.

The RFC does not describe some corner cases clearly. I give my expected results for them below. (It's according to the RFC, and obviously we get different results if we deny everything starting from foo#.)

foo## bar
4 tokens foo, #, #, bar (NOTE: However,r## bar cannot compile now.)
foo#! bar
4 tokens foo, #, !, bar (NOTE: However, r#! bar cannot compile now.)
foo##! bar
5 tokens foo, #, #, !, bar (NOTE: However, r##! bar cannot compile now.)
foo##" bar
error: reserved prefixes
foo##bar
error: reserved prefixes
r#foo#bar
error: reserved prefixes
r"foo"#bar
3 tokens "foo", #, bar

(See also the discussion at zulip https://rust-lang.zulipchat.com/#narrow/stream/268952-edition-2021/topic/reserved.20prefixes/near/238470099.)

steffahn · 2021-05-19T14:14:55Z

Some relevant examples regarding string (and similarly char) literals – the first two copied from the linked zulip discussion (archive) – include

match"foo" { _ => () }

if"foo".eq("foo") {}

while"foo".ne("foo") {}

if let"foo" = "foo" {}

loop {
    break"foo"
};

for _ in"foo".chars() {}

return"foo"

(playground)

which raises the question whether we want to exclude all keywords or maybe certain keywords from the disallowed prefixes.

petrochenkov · 2021-05-19T18:31:37Z

whether we want to exclude all keywords or maybe certain keywords from the disallowed prefixes

Note that lexer isn't currently aware of the keyword list (*), and making it aware is a step in the wrong direction, IMO.
I generally support the @matklad's idea of working with literal prefixes using jointness (#84979 (comment)), this is the direction in which multi-character punctuation and "compound" float literals are supposed to move in accordance with #71322 (comment).

(*) After all, every left-hand side of a macro is able to introduce its own syntax entirely independent from Rust syntax, with its own keyword list, and lexer works with macro inputs too.

nikomatsakis · 2021-05-21T15:33:19Z

@petrochenkov I was wondering about "jointness". The idea here would be that we produce the same tokens, but in the Rust parser we check for (e.g.) k#foo and interpret it differently than k# foo? (By inspecting the "is there adjoining whitespace" info?)

petrochenkov · 2021-05-21T18:15:26Z

Yeah, it should be applicable to prefix#ident too in some form (with less urgency), but I was talking about prefixLITERAL specifically (as an answer to #84978 (comment)) because it has larger compatibility issues.

lrh2000 · 2021-05-22T15:38:58Z

@petrochenkov So do you mean that we should emit errors for reserved prefixes in the parser instead of the lexer?

I think it does not solve the problem about TokenStream::from_str. Instead, it only defers the problem. We may not alter the lexing rules now, but in the future we have to change the lexing rules if we want to introduce another prefix that is also used to denote raw strings (or something like them). Am I missing anything?

But I think it sounds good because it can easily solve almost all problems pointed out by @steffahn . For example, we can distinguish lifetimes from string literals and we can also distinguish keywords from identifiers. And we can have better diagnostics messages like

error: unknown prefix on string literal: bar
 --> file.rs:x:y
  |
1 | foo!(bar"qux");
  |      ^^^ help: try using whitespace here: `bar "qux"`
  |
  = note: prefixed string literals are reserved for future use

But I've found currently parsing the macro arguments and expanding macros is done in rustc_expand, so it means that we will emit errors for reserved prefixes both from rustc_parse and rustc_expand, does it sound good?

petrochenkov · 2021-05-23T17:23:02Z

@lrh2000

So do you mean that we should emit errors for reserved prefixes in the parser instead of the lexer?

Yes, at least for breakage-prone things like match"foo".
For anything involving raw strings with #s the errors can reported in the lexer just to save time and defer the final decision, I'm pretty sure that code like match#"foo"# is not used in practice.
What I'd really like to see is a crater result for the most conservative reservation, that would help to make decisions.

I think it does not solve the problem about TokenStream::from_str

Could you remind what is the problem with TokenStream::from_str exactly?

but in the future we have to change the lexing rules if we want to introduce another prefix that is also used to denote raw strings (or something like them). Am I missing anything?

String escaping is done during parsing (proc macros can do custom escaping in particular), so lexer doesn't need to discern between raw and non-raw strings unless #s are involved, but #s should be covered by the first item.

But I've found currently parsing the macro arguments and expanding macros is done in rustc_expand, so it means that we will emit errors for reserved prefixes both from rustc_parse and rustc_expand, does it sound good?

rustc_expand should always delegate to rustc_parse for parsing (maybe unless compatibility shims for tt matchers are involved?).
Sorry for the brief answers, I can't spend much time on this right now.

steffahn · 2021-05-23T17:41:11Z

lexer doesn't need to discern between raw and non-raw strings unless #s are involved

It does need to discern. Both "\"" and r"\" are valid (raw) string literals, whereas "\" is an un terminated string literal and r"\"" is just r"\" followed by an unterminated string literal.

nikomatsakis · 2021-05-24T15:01:29Z

We can certainly just accept that any future form of raw string will have to use #.

lrh2000 · 2021-05-24T16:01:17Z

@petrochenkov Thanks for your reply first.

Could you remind what is the problem with TokenStream::from_str exactly?

We may get different results (TokenStream) for the same string (from_str) in different versions of Rust, but the version information of the string is not known during the evaluation of TokenStream::from_str.

I'm not sure why you ask this. Maybe I misunderstood the problem?

String escaping is done during parsing (proc macros can do custom escaping in particular), so lexer doesn't need to discern between raw and non-raw strings unless #s are involved, but #s should be covered by the first item.

Now the lexer reports the start position and the end position for every token, so it has to do some (although simplified) escaping, see

rust/compiler/rustc_lexer/src/lib.rs

Lines 653 to 671 in bf24e6b

    
           /// Eats double-quoted string and returns true 
        
           /// if string is terminated. 
        
           fn double_quoted_string(&mut self) -> bool { 
        
               debug_assert!(self.prev() == '"'); 
        
               while let Some(c) = self.bump() { 
        
                   match c { 
        
                       '"' => { 
        
                           return true; 
        
                       } 
        
                       '\\' if self.first() == '\\' || self.first() == '"' => { 
        
                           // Bump again to skip escaped character. 
        
                           self.bump(); 
        
                       } 
        
                       _ => (), 
        
                   } 
        
               } 
        
               // End of file reached. 
        
               false 
        
           }

rustc_expand should always delegate to rustc_parse for parsing (maybe unless compatibility shims for tt matchers are involved?).

If we check reserved prefixes in parser, I think the best place is in parse_ident. But when parsing macro arguments, things are different. I don't believe it will go through the parse_ident path, especially when it just expects a token. I think it reflects that checking reserved prefixed in parser is less elegant than checking in lexer (at least in some certain aspects), although it also brings many benefits.

I will conduct more experiments on this, but now I'm curious about your opinions (if any).

m-ou-se · 2021-05-31T11:53:13Z

Note that the 'jointness' solution won't really work. That's exactly what we wanted to avoid by reserving prefixes in the lexer, to make sure we don't have to decide right now on the lexing rules for every possible prefix. As I wrote in #84979 (comment):

We specifically wanted to make these things a lexer error, so we don't have to decide on the lexing rules for that literal yet. E.g. is fb"\xFF" valid or not? How about fb"❉"? Or fr"\"? Or f"hello {123 + "hey".len()}"?

Also, this doesn't seem to be true:

String escaping is done during parsing (proc macros can do custom escaping in particular)

Proc macros cannot accept string literals with escape sequences or characters that don't match the prefix. These errors are given without parsing:

#[proc_macro]
pub fn hey(_: TokenStream) -> TokenStream {
    TokenStream::new()
}

hey!("\xFF");
hey!(b"å");

error: out of range hex escape
 --> examples/hey.rs:3:7
  |
3 | hey!("\xFF");
  |       ^^^^ must be a character in the range [\x00-\x7f]

error: non-ASCII character in byte constant
 --> examples/hey.rs:4:8
  |
4 | hey!(b"å");
  |        ^
  |        |
  |        byte constant must be ASCII
  |        help: use a \xHH escape for a non-ASCII byte: `\xE5`

m-ou-se · 2021-05-31T11:56:04Z

As for match"hey" etc., I think it's perfectly fine if that gives an error and a suggestion for match "hey" with a space.

nikomatsakis · 2021-06-01T21:53:40Z

I wrote up a summary of what I believe to be the issues and concerns raised thus far:

https://hackmd.io/YLe7viGLTu2PfE5sQO4v0w

I think that, on balance, the right path is to continue with a hard lex error (also for keywords). I don't think the lexer should know about the keyword listing, but it would have to know about the edition. Is there any critical problem with this, @petrochenkov or @matklad? The downsides of using jointness information (and some unknowns) are covered in the hackmd.

m-ou-se · 2021-08-02T15:25:23Z

This seems to have already been completed. Apart from edition 2021 not being stable yet, this is stable.

nikomatsakis · 2021-08-18T15:19:05Z

Re-opening as we have not made a stabilization decision here. Has there been a Rust reference PR? I drafted a stabilization report in this hackmd.

nikomatsakis · 2021-08-18T15:19:49Z

@TriageBot release-assignment

I don't believe @lrh2000 is still pushing this forward.

nikomatsakis · 2021-08-18T15:19:56Z

@rustbot release-assignment

nikomatsakis · 2021-08-18T15:22:33Z

Stabilization proposed in #88140

nikomatsakis added C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. T-lang Relevant to the language team, which will review and decide on the PR/issue. labels May 6, 2021

nikomatsakis mentioned this issue May 6, 2021

RFC: Reserved prefixes in the 2021 edition rust-lang/rfcs#3101

Merged

nikomatsakis added the F-reserved_prefixes `#![feature(reserved_prefixes)]` label May 6, 2021

rustbot assigned lrh2000 May 12, 2021

This comment has been minimized.

Sign in to view

nikomatsakis mentioned this issue May 13, 2021

Implement RFC: Reserved prefixes in the 2021 edition #84599

Closed

lrh2000 mentioned this issue May 22, 2021

Reserve prefixed identifiers and literals (RFC 3101) #85359

Merged

m-ou-se added this to Feature Complete Blockers in 2021 Edition Blockers May 25, 2021

m-ou-se mentioned this issue May 29, 2021

Tracking Issue for RFC 3085: Edition 2021 #85811

Closed

2 tasks

nikomatsakis mentioned this issue Jun 1, 2021

prepare PR that will "dry run" the crater migration #85895

Closed

3 tasks

nikomatsakis moved this from Feature Complete Blockers to Stable Blockers in 2021 Edition Blockers Jun 24, 2021

m-ou-se closed this as completed Aug 2, 2021

2021 Edition Blockers automation moved this from Stable Blockers to Completed items Aug 2, 2021

nikomatsakis reopened this Aug 18, 2021

nikomatsakis mentioned this issue Aug 18, 2021

Stabilize reserved prefixes #88140

Closed

m-ou-se mentioned this issue Sep 3, 2021

🎉 Rust 2021 celebration and thanks #88623

Closed

joshtriplett closed this as completed Jul 16, 2022

dtolnay mentioned this issue Mar 4, 2023

proc-macros should not automatically error on token groups that are "locally" syntactically correct #108385

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking Issue for RFC 3101: Reserved Prefixes #84978

Tracking Issue for RFC 3101: Reserved Prefixes #84978

nikomatsakis commented May 6, 2021 •

edited

Loading

lrh2000 commented May 12, 2021

This comment has been minimized.

steffahn commented May 19, 2021

petrochenkov commented May 19, 2021

nikomatsakis commented May 21, 2021 •

edited

Loading

petrochenkov commented May 21, 2021

lrh2000 commented May 22, 2021 •

edited

Loading

petrochenkov commented May 23, 2021 •

edited

Loading

steffahn commented May 23, 2021

nikomatsakis commented May 24, 2021

lrh2000 commented May 24, 2021 •

edited

Loading

m-ou-se commented May 31, 2021 •

edited

Loading

m-ou-se commented May 31, 2021

nikomatsakis commented Jun 1, 2021

m-ou-se commented Aug 2, 2021

nikomatsakis commented Aug 18, 2021

nikomatsakis commented Aug 18, 2021

nikomatsakis commented Aug 18, 2021

nikomatsakis commented Aug 18, 2021 •

edited

Loading

Tracking Issue for RFC 3101: Reserved Prefixes #84978

Tracking Issue for RFC 3101: Reserved Prefixes #84978

Comments

nikomatsakis commented May 6, 2021 • edited Loading

About tracking issues

Steps

Unresolved Questions

Implementation history

lrh2000 commented May 12, 2021

This comment has been minimized.

steffahn commented May 19, 2021

petrochenkov commented May 19, 2021

nikomatsakis commented May 21, 2021 • edited Loading

petrochenkov commented May 21, 2021

lrh2000 commented May 22, 2021 • edited Loading

petrochenkov commented May 23, 2021 • edited Loading

steffahn commented May 23, 2021

nikomatsakis commented May 24, 2021

lrh2000 commented May 24, 2021 • edited Loading

m-ou-se commented May 31, 2021 • edited Loading

m-ou-se commented May 31, 2021

nikomatsakis commented Jun 1, 2021

m-ou-se commented Aug 2, 2021

nikomatsakis commented Aug 18, 2021

nikomatsakis commented Aug 18, 2021

nikomatsakis commented Aug 18, 2021

nikomatsakis commented Aug 18, 2021 • edited Loading

nikomatsakis commented May 6, 2021 •

edited

Loading

nikomatsakis commented May 21, 2021 •

edited

Loading

lrh2000 commented May 22, 2021 •

edited

Loading

petrochenkov commented May 23, 2021 •

edited

Loading

lrh2000 commented May 24, 2021 •

edited

Loading

m-ou-se commented May 31, 2021 •

edited

Loading

nikomatsakis commented Aug 18, 2021 •

edited

Loading