Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proc-macros should not automatically error on token groups that are "locally" syntactically correct #108385

Open
AaronKutch opened this issue Feb 23, 2023 · 6 comments
Labels
A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) C-feature-request Category: A feature request, i.e: not implemented / a PR. needs-rfc This change is large or controversial enough that it should have an RFC accepted before doing it. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@AaronKutch
Copy link
Contributor

My limited understanding is that TokenStreams inputs and outputs for proc-macros have to abide to some minimal level of syntax (e.x. can't have mismatched delimiters outside of string literals), but everything else should be fair game for the proc-macro to handle (please edit the title if there is a better term for what I am describing). However, I found a case where the proc-macro system will forcibly error just because Rust doesn't support it, and it doesn't involve any special delimiters.
I am adding hexadecimal, binary, etc floating point support to my awint crate and encountered this problem:

proc_macro_ex!(0x123.456_i256f128); // error: hexadecimal float literal is not supported

this is frustrating because it works if I chose some other closely related convention like 0j123.456_i256f128.

I am titling this issue more generally because I want to make sure there are not other cases like it.

@AaronKutch AaronKutch changed the title proc-macros should not automatically error on token groups that are "locally" syntactically correct even if Rust doesn't support them proc-macros should not automatically error on token groups that are "locally" syntactically correct Feb 23, 2023
@Nilstrieb Nilstrieb added the A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) label Feb 23, 2023
@Nilstrieb
Copy link
Member

We've had this discussion before about literals that are out of range of the maximal literal, and IIRC the consensus was that these should fail to lex into tokens and therefore not work. I can't remember the exact discussion though.

@AaronKutch
Copy link
Contributor Author

So Rust does literal verification at lex time? I could imagine it doing steps like separating "-123.0f32" into groups like "- 123 . 0 f32" but it should detect and fail on overflow at another stage. If it lexes into some canonical form early on, what it should do for cases that don't match what Rust supports, is it should continue with the tokens as raw, and then normal Rust can fail on that case later and proc-macros can do what they want.

@AaronKutch
Copy link
Contributor Author

It has been a while since I looked at TokenTree and proc_macro::Literal. I see that you wouldn't want token groups to parse as Ident or Literal differently across versions in case we made hexadecimal floats work in plain Rust or something like that, so token groups starting with '0'-'9' would always be some kind of Literal. Is it possible for there to be some "Unknown" type in the implementation details of Literal? When Literal::from_str encounters an unknown kind of literal it just stores the chars as they are. The docs say it can currently panic and "We reserve the right to change these errors into LexErrors later.". Then Literal::to_string can simply spit out what was put in, and I don't see that we break anything proc-macro user facing. Rust's internals that are intertwined more closely with the system just have to be aware of one more "Unknown" type that they parse again more rigorously to see what canonical integer it is most likely trying to represent.

@AaronKutch
Copy link
Contributor Author

AaronKutch commented Feb 24, 2023

Additionally, it looks like overflow checks are in fact happening after the lexing stage, because

let _ = ex!(-999u8);
let _ = ex!(0xfffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff);

doesn't produce any errors, so it looks to me like there are definitely unnecessary restrictions in place

@Nilstrieb Nilstrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Feb 24, 2023
@dtolnay dtolnay added T-lang Relevant to the language team, which will review and decide on the PR/issue. and removed T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 4, 2023
@dtolnay
Copy link
Member

dtolnay commented Mar 4, 2023

The discussion above about overflow does not seem germane. 0x123.456_i256f128 in macro input is not disallowed because of overflow, it is disallowed because that is a syntax that Rust has reserved for its own future use. Just like #84978 makes jkl"" not valid in macro tokens.

If 0x123.456_i256f128 were to lex as multiple macro tokens (like your example of 0j123.456_i256f128, which is 3 tokens) then it would be a breaking change for Rust to introduce hexadecimal float literal using that syntax. If it were to lex as a single token, it would be a breaking change for Rust to introduce hexadecimal float literals with syntax which diverges from that one. So until a hexadecimal float literal syntax for Rust makes it through RFC, or the lang team commits via RFC to not having such a syntax, it needs to be illegal in macro input.

@dtolnay dtolnay added C-feature-request Category: A feature request, i.e: not implemented / a PR. needs-rfc This change is large or controversial enough that it should have an RFC accepted before doing it. labels Mar 4, 2023
@AaronKutch
Copy link
Contributor Author

The thing about "" and # prefixes makes total sense to me because it involves delimiters, but it is unfortunate that other things have a problem because of how tokens are decided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) C-feature-request Category: A feature request, i.e: not implemented / a PR. needs-rfc This change is large or controversial enough that it should have an RFC accepted before doing it. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

3 participants