Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Short Macro Invocation Syntax: m!123 and m!"abc" #3267

Closed
wants to merge 3 commits into from

Conversation

m-ou-se
Copy link
Member

@m-ou-se m-ou-se commented May 18, 2022

@m-ou-se m-ou-se added T-lang Relevant to the language team, which will review and decide on the RFC. A-macros Macro related proposals and issues labels May 18, 2022
@rylev
Copy link
Member

rylev commented May 18, 2022

This would be useful for the windows ecosystem (perhaps inside the windows crate) to declare wide string literals. Even though there are plenty of workarounds for wide string literals, having such a short syntax would make their use more common.

@m-ou-se
Copy link
Member Author

m-ou-se commented May 18, 2022

Given that the wide-literals crate already provides w!(""), the w!"" syntax would start working right away for its users, without any change to the crate.


```
MacroInvocation :
SimplePath ! Literal
Copy link
Member

@eddyb eddyb May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this "Literal" unambiguous enough to distinguish between these two situations?

  • "literal" token, aka proc_macro::Literal (string/character/numeric literals)
  • "literal" (e.g. expression) grammar, aka rustc_ast::Literal aka $lit:literal macro inputs
    • besides what literal tokens support, this also includes false and true
    • also it does more validation (suffixes, presumably string escapes, forcing integers into u128, etc.)

I would assume the former (esp. given the mention of m!identifier later in the RFC, where arguably m!false/m!true would fit), but $lit:literal being the latter muddles the waters a bit sadly.

This is confusing enough that the reference is wrongly mentioning false/true on stable under "tokens" (was fixed since by rust-lang/reference#1189).

(Thanks to @solson for bringing up the potential ambiguity wrt bool literals - I would've naively assumed it was a non-concern at first)

Copy link
Member Author

@m-ou-se m-ou-se May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first one.

$lit:literal doesn't accept very large integers, which is important for use cases like bignum!123. (Playground.)

Allowing json!true and json!false might be reasonable, but then we also need to have a discussion about json!null, js!NaN, py!True, and so on, which would all need the identifier or path grammer. So I'd like to leave that discussion for a future RFC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I think we could leave false/true to the identifier case, at most I'd add a note to the RFC that Literal in this case refers to something different from "literal expressions"/$lit:literal syntax.

Can probably link to other parts of the reference, but I'm not sure what exactly is the most relevant (I got confused just now trying to follow it, though a lot of that was looking at the pre-rust-lang/reference#1189 version).

Copy link
Member

@joshtriplett joshtriplett May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$lit:literal doesn't accept very large integers, which is important for use cases like bignum!123

I wonder if we could fix that?

Not a blocker for this RFC, but we could allow arbitrary-length integer literals to get fed into macros, and only check if they fit when the resulting token stream from the macro gets parsed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$x:tt accepts arbitrary length integers just fine. (See the playground link above.) It's just that non-tt things like $x:literal no longer represent the raw tokens but instead a portion of already processed/parsed AST.

@Lokathor
Copy link
Contributor

I still don't entirely see the value in cutting off the parentheses. The syntax of doing a wide string literal is already extremely short.

There's several crates for wide string literals, and other than the wide-literals crate they all use multi-character macro names that are words that a person can more easily understand when reading the source code. This even includes the const_utf16 crate that you wrote rylev, which picks the name encode!, which is a whopping 5 characters more than the minimum requirements. This suggests to me that being maximally terse isn't actually a great virtue in practice. If it really were so good to be maximally terse then all the crates would already use single letter macro names like w!, instead of full word names like encode! or utf16! or wstr!.

Let's have a realistic example:

use const_utf16::encode;
const MESSAGE: &[u16] = encode!("Hello, world!");
const MESSAGE: &[u16] = encode!"Hello, world!";

Is it really the ( and ) that are making the first line of code burdensome while the second line of code is free and clear?

@zesterer
Copy link

zesterer commented May 18, 2022

I'm worried about whether users might find precedence confusing. Right now, a macro invocation is a syntactic atom in its own right, being entirely self-delimiting. This is not the case for this proposal, and it opens up the field for confusion.

Is foo! x as i32 to be parsed as foo!(x as i32) or foo!(x) as i32?

Is foo! x? to be parsed as foo!(x?) or foo!(x)??

Is foo! -5 to be parsed as foo!(-5) or foo!(-) 5?

This last one is most confusing of all because negation can be both an operator, or part of a standalone integer literal and the behaviour changes depending on which it is interpreted as here.

I am unconvinced that the convenience of the removal of 2 characters is worth the potential for syntactic ambiguity (we can of course come up with well-defined rules for resolve this ambiguity, such as one binding to syntax atoms, but human brains do not work this way).

@m-ou-se
Copy link
Member Author

m-ou-se commented May 18, 2022

It's not about minimizing the amount of bytes of source code, or even the amount of time spent typing (though typing () takes me significantly longer than typing regular words), it's about the visual complexity when reading code.

Something like w!"asdf" looks like a single 'unit' to me, while encode!("asdf") has one level of 'nesting'. When I see w!"asdf" I see 'a wide string literal', but when I see w!("asdf") I see 'a string literal, passed to w!()'.

The difference is of course small, but it can add up:

File::open(w!("abc") + name + w!(".txt"));

This feels like a somewhat complicated expression, nesting two 'calls' inside the outer call.

Without the () for the macro invocations, this becomes:

File::open(w!"abc" + name + w!".txt");

To me, this is easier to read, as I visually process it as thing(thing + thing + thing) instead of thing(thing(thing) + thing + thing(thing)).

It all feels similar to why I prefer x < y + 1 over x < (y + 1). I already parse it correctly without the (); it just adds noise.

@m-ou-se
Copy link
Member Author

m-ou-se commented May 18, 2022

@zesterer That's mostly a formatting issue. You could ask the same question about m! (a + 1) * 3. Rustfmt helps by formatting that as m!(a + 1) * 3. Similarly, it should format m! x as i32 as m!x as i32.

Is foo! -5 to be parsed as foo!(-5) or foo!(-) 5?

We wouldn't accept foo!-. - is not a literal.

@zesterer
Copy link

zesterer commented May 18, 2022

If the intention is to reduce visual complexity, then perhaps it is better to follow the precedence set by explicit numeric literal type annotations (such as 5u8) and allow this in postfix position? "hello, world"!w seems more natural to me than w!"hello, world" given the prior knowledge users have of numeric literal annotations.

@m-ou-se
Copy link
Member Author

m-ou-se commented May 18, 2022

entirely self-delimiting

I suppose one could argue that m!"asdf" is also 'self-delimiting', with "" rather than () as the delimiters.

Note that I'm only proposing this short-hand for literals, and nothing else. So there's no precedence rules about where the argument stops or anything like that.

@m-ou-se
Copy link
Member Author

m-ou-se commented May 18, 2022

If the intention is to reduce visual complexity, then perhaps it is better to follow the precedence set by explicit numeric literal type annotations (such as 5u8) and allow this in postfix position? "hello, world"!w seems more natural to me than w!"hello, world" given the prior knowledge users have of numeric literal annotations.

We already have b"asdf", with the modifier at the start. Regardless, I think it's best if we keep the macro invocation in order, to make this change as minimal as possible.

@m-ou-se
Copy link
Member Author

m-ou-se commented May 18, 2022

If it really were so good to be maximally terse then all the crates would already use single letter macro names like w!, instead of full word names like encode! or utf16! or wstr!.

Conversely, if w!("..") was good enough, we wouldn't be getting any requests for w".." or c".." or z".." and so on. Quite a few people seem excited about those, so it seems like they aren't satisfied with wstr!("..") or w!("..").

@ChrisDenton
Copy link
Member

ChrisDenton commented May 18, 2022

Should we allow m!r"..."? (I think yes.)

I think this would definitely be useful for the wide str case if the goal is to reduce visual noise. For example:

wide!r"\\.\pipe\local\pipe name" is much nicer than
wide!"\\\\.\\pipe\\local\\pipe name"

@tavianator
Copy link

Would this result in println!"Hello world"; working? I find that kind of odd.

@m-ou-se
Copy link
Member Author

m-ou-se commented May 18, 2022

Yup. But this also already works:

println! {"hey {:?}",};

panic![".."];

let _ = vec! (1, 2);

thread_local! [
   ..
];

So I don't think that's a problem in practice. Using the conventional one of the three (or four) ways to invoke a macro is already part of Rust code style/formatting, and some are even handled by rustfmt. I don't think I've ever encountered a wild println![] or similar.

@Lokathor
Copy link
Contributor

Conversely, if w!("..") was good enough, we wouldn't be getting any requests for w".." or c".." or z".." and so on. Quite a few people seem excited about those, so it seems like they aren't satisfied with wstr!("..") or w!("..").

I would suggest that what people want is something built into the default experience (at the language level, or in core) without having to pull in some crate to do it.

@conradludgate
Copy link

I would support this, purely because for a while I've wanted some 'string literal macro' system. A way to, in user code, make things like b"str". My main gripe is that making owned strings for initialising structs is polluted by lots of into's or to_owned or to_string or String::from etc. These can distract from the text that I care about. A simple s!"I am an owned string" would definitely be an improvement in my books

@scottmcm
Copy link
Member

Would this result in println!"Hello world"; working?

That's an interesting point. This could well be far more than numerics, because with captured identifiers, would this be a de-facto transition to, say, format!"{a} - {b} = {c}"?

@m-ou-se
Copy link
Member Author

m-ou-se commented May 18, 2022

[..] with captured identifiers, would this be a de-facto transition to, say, format!"{a} - {b} = {c}"?

The RFC mentions this as an example: f!"{a} {b}" (with use std::format as f;).

@jhpratt
Copy link
Member

jhpratt commented May 18, 2022

I haven't created an RFC for this yet, but I have a WIP implementation of custom literals on my rust-lang/rust fork. Given the lack of documentation there, I'll briefly explain here. Essentially, a new trait is introduced:

pub trait FromIntegerLiteral: Sized {
    type Input: sealed::Integer;
    fn from_integer_literal(i: Self::Input) -> Self;
}

This trait is a lang item. If the compiler expects a certain type, the known type does not match, and the expected type implements FromIntegerLiteral, the compiler will coerce the integer literal into <T as FromIntegerLiteral>::from_integer_literal(lit), where lit is guaranteed to be the type expected as input.

All implementations of FromIntegerLiteral are required to be impl const; this is enforced by the compiler. This is necessary so that the values can be used in any location and to ensure that there are no side effects. As const eval grows more powerful, so will the ability for custom literals. My goal is to always const eval the value. Invalid inputs must panic, which is functionally equivalent to emitting a compiler error.

As currently written, the trait is limited to accept integers as input. However, I did intend on expanding it to any literal, which would notably include strings. It would obviously be renamed in this situation. I believe an expanded trait that permits any literal would be quite powerful and would serve much the same purpose as this proposal. One notable exclusion would be f-strings, but I believe there was general support for f"{foo}" having compiler support in the future, hence why that syntax was reserved in the 2021 edition.

Personally I view custom literals as more ergonomic, more transparent, and with guaranteed const eval, more reliable. f-strings would be great to have, but I don't think this is the way to go about it. It may seem surprising that coercions work in this manner, but I assure you that the implementation linked is already mostly functional. The only thing missing is always const eval'ing the input. Custom literals combined with type ascription would be nearly identical to custom suffixes — you could do 5:cm to get a value that is 5cm (preferred formatting aside).

Edit: After some discussion on Zulip, the existing implementation will not work, but it is still possible to have custom literals in this user-facing manner.

@ChayimFriedman2
Copy link

Should we allow m!b"abc" and m!b'x'? (I think yes.)

A counterargument: b"" is, to some extent, custom literal too - so its seems a little strange to allow m!b"" but disallow m!b!"".

[future-possibilities]: #future-possibilities

In the future, we could consider extending this syntax in a backwards compatible way by allowing
slightly more kinds of arguments to be used without brackets, such as `m!-123` or `m!identifier`, or even `m!|| { .. }` or `m!struct X {}`.
Copy link
Member

@joshtriplett joshtriplett May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m!identifier seems unambiguous and easy to add, though also less well-motivated.

m!-123 could, for now, at least have a rustfix-applicable suggestion telling the user to use -m!123 instead.

Copy link
Member Author

@m-ou-se m-ou-se May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m!identifier would possibly make it hard or impossible to allow m!p::a::t::h or m!thing.member, so I figured that might be good to leave for a later discussion. (Not saying that we should allow either of that. Just saying that allowing m!identifier might block other future possibilities.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I would put this exact explanation in the RFC text itself, since I had the exact same question and you pretty thoroughly convinced me why we shouldn't right now in a single sentence.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

telling the user to use -m!123 instead.

More likely m!(-123), since it isn't necessarily guaranteed that the macro is commutative with -

Co-authored-by: Josh Triplett <josh@joshtriplett.org>
@PatchMixolydic
Copy link
Contributor

As a bystander and declarative macro fanatic, I'm a bit hesitant about this. It seems a bit surprising that this new form would only support a literal as its body. I could easily imagine a newcomer (or myself) writing something like this and being confused when it doesn't work:

macro_rules! macroroni {
    ($x:ident) => { /* TODO */ };
}

fn main() {
    macroroni!foo; //~ ERROR can only be used with macros that take a literal
}

This could also crop up during a refactor:

let default_animation_wide = w!"normal";
let y = self.sprite.set_animation( // hmm, i should pull `"normal"` into a const

Moving "normal" into a constant without changing default_animation_wide to use parentheses (or brackets) would cause an error, which might confuse the programmer.

const DEFAULT_ANIMATION: &str = "normal";

let default_animation_wide = w!DEFAULT_ANIMATION;
//~^ ERROR can only be used with macros that take a literal
let y = self.sprite.set_animation(DEFAULT_ANIMATION);

Even if this syntax is opened up to most other metavariable types, it's backwards-incompatible to open it to tts (unless delimited groups are ignored when using this syntax, which might also catch people by surprise):

macro_rules! macroroni {
    ($x:tt) => { panic!("{}", $x) };
    () => {};
}

fn main() {
    // this currently does nothing, but may panic if `tt` is accepted
    macroroni!();

    // If `tt` is accepted but delimited groups are rejected,
    // you might start with this...
    macroroni!1;
    // ... then realize you need an addition...
    macroroni!(1 + 1); //~ ERROR no rules expected the token `+`
    // ... and run into a stumbling block.
}

This syntax might also cause readability issues, especially if it's expanded to cover idents:

// I am a Linux user. What does `w!` mean?
let le_mot = w!"foo";
// Ugh, it's 12am... I need sleep...
// ? There's no variable named `shelllle_mot` in scope, is there?
let result = shell!le_mot;
// Oh! This is a macro call
let result = shell!(le_mot);

On the other hand, the m![] syntax provides some amount of precedent for accepting this form, as it is essentially a special case of m!literal for array literals (which aren't actually matched by literal metavariables).

@ChristopherRabotin
Copy link

I strongly oppose this proposal.

I've been programming in Rust for just under five years, so I'm probably not as experienced as most folks here. I will say however, that Rust was by far the hardest language for me to become proficient in. When I first picked it up, I had a hard time understanding the syntax by reading it (lifetimes and the turbofish were extremely confusing). Yet, I was able to approximate at first macro invocations as simple function calls that were somehow different but it didn't matter for the time being. In fact, as a newcomer, if I saw let le_mot = w!"foo"; I think I would have absolutely no idea what that does, and my brain would parse w! itself as a special token instead of understanding the bang as a macro invocation and the w as the name of that macro.

Today, I regularly try to convince folks who work on critical systems to use Rust: I worry that removing the delimiter tokens around macro invocations will make code significantly harder to understand for newcomers.

I'll also add that, many years ago, I had to learn VBScript (I forget which version). But one of the most confusing things was that invoking a function with parentheses and without them had a different behavior (one would allow reading the return value but not the other one IIRC).

@ssokolow
Copy link

ssokolow commented May 27, 2022

It's weird to me that those are syntax errors; usually Rust doesn't care about whitespace unless it is to delimit tokens:

I haven't thought much about 12u8, but I see the b" in b"..." being akin to 'a... and ' a is also invalid syntax, unlike &str vs. & str.

@Iron-E
Copy link

Iron-E commented May 27, 2022

I guess r#""# and r#use don't allow spaces either, yeah.

@LemmingAvalanche
Copy link

I like it.

I hope that this form is whitespace-sensitive and that w! "wide" (space between macro invocation and argument) is not allowed.

bors added a commit to rust-lang-ci/rust that referenced this pull request May 30, 2022
… r=joshtriplett

improve format impl for literals

The basic idea of this change can be seen here https://godbolt.org/z/MT37cWoe1.

Updates the format impl to have a fast path for string literals and the default path for regular format args.

This change will allow `format!("string literal")` to be used interchangably with `"string literal".to_owned()`.

This would be relevant in the case of `f!"string literal"` being legal (rust-lang/rfcs#3267) in which case it would be the easiest way to create owned strings from literals, while also being just as efficient as any other impl
@pickfire
Copy link
Contributor

Is it possible to mix multiple short macro invocation?

x!y!"hello"

@conradludgate
Copy link

No, y!"hello" is not a literal, it's an expression, so it isn't supported under this proposal

@cdmistman
Copy link

cdmistman commented May 31, 2022

I agree with what several others have said about restricting this functionality to const-evaluable items only; when I read a literal in Rust, I expect that literal to resolve at compile time, not run time (ie, 0xDEADBEEFu64 does not result in any runtime conversions). I'd love to see a solution that incorporates this, prevents abuse/misuse, and still keeps the verbosity that makes it apparent that this is a custom literal transformation, instead of the type transformation in 0xDEADBEEFu64. Perhaps something like this?

// a custom implementation of the format! macro
macro_rules! f {
  // explicitly opt-in to the use of this macro as a bracket-less matcher that
  // only accepts literals.
  // also, `macro_rules!` can error if a bracket-less matcher corresponds to a
  // transcriber that isn't explicitly a `const` block.
  // I'm unsure if this requires stabilizing const blocks.
  $lit:literal => const { format_args!($lit) };

  // The rest of the macro definition can be used as normal
  ($lit:literal) => { std::format!($lit) };
  // etc
}

@thecaralice
Copy link

thecaralice commented Jul 5, 2022

How would m!() be parsed, as m!() (macro invocation without arguments) or as m!(()) (macro invocation with the only argument being a unit (am I right that () is a literal))?

@tranzystorekk
Copy link

tranzystorekk commented Jul 6, 2022

How would m!() be parsed, as m!() (macro invocation without arguments) or as m!(()) (macro invocation with the only argument being a unit (am I right that () is a literal))?

My strong guess is this syntax is only for macro invocations without any parentheses, so any foo!(), foo![], foo!{} would be out of scope

@conradludgate
Copy link

am I right that () is a literal?

It is not a literal as the rust lexer interprets. All of our literals are explained in this doc https://doc.rust-lang.org/reference/tokens.html#literals

It basically includes numbers, strings, chars and all variations of those like tuple indexes, strings with custom suffixes, numbers with suffixes, byte strings, raw strings

workingjubilee pushed a commit to tcdi/postgrestd that referenced this pull request Sep 15, 2022
…iplett

improve format impl for literals

The basic idea of this change can be seen here https://godbolt.org/z/MT37cWoe1.

Updates the format impl to have a fast path for string literals and the default path for regular format args.

This change will allow `format!("string literal")` to be used interchangably with `"string literal".to_owned()`.

This would be relevant in the case of `f!"string literal"` being legal (rust-lang/rfcs#3267) in which case it would be the easiest way to create owned strings from literals, while also being just as efficient as any other impl
@joshtriplett
Copy link
Member

We discussed this in a @rust-lang/lang meeting a long while ago, and discussed it again today. There was lukewarm sentiment towards having this as a general-purpose shorthand. In general, it felt like:

  • The set of things we'd want to add this for are things that often want to be language features.
  • We have language features planned for some things like this (e.g. prefixes on strings)
  • The additional complexity and mental parsing overhead here (of having a non-delimited macro, and special whitespace and token-related rules, and questions about how far things might extend...) doesn't seem like something we want to add as a fully general-purpose mechanism.

With that in mind:

@rfcbot close

@rfcbot
Copy link
Collaborator

rfcbot commented Oct 18, 2022

Team member @joshtriplett has proposed to close this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-close This RFC is in PFCP or FCP with a disposition to close it. labels Oct 18, 2022
@nikomatsakis
Copy link
Contributor

@rfcbot reviewed

@rfcbot rfcbot added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Oct 25, 2022
@rfcbot
Copy link
Collaborator

rfcbot commented Oct 25, 2022

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot removed the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Oct 25, 2022
@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this RFC. to-announce and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Nov 4, 2022
@rfcbot
Copy link
Collaborator

rfcbot commented Nov 4, 2022

The final comment period, with a disposition to close, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This is now closed.

@rfcbot rfcbot added closed This FCP has been closed (as opposed to postponed) and removed disposition-close This RFC is in PFCP or FCP with a disposition to close it. labels Nov 4, 2022
@rfcbot rfcbot closed this Nov 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-macros Macro related proposals and issues closed This FCP has been closed (as opposed to postponed) finished-final-comment-period The final comment period is finished for this RFC. T-lang Relevant to the language team, which will review and decide on the RFC. to-announce
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet