What is a trailing expression in a block exactly? #61733

petrochenkov · 2019-06-11T09:48:40Z

Is it determined syntactically or semantically?
Before or after macro expansion?

Answering these questions is necessary to specify expansion of macros (stable fn-like ones or unstable attribute ones) in expression and statement positions.
The current implementation is sometimes inconsistent.

Below I'll be dumping some code examples expanded using different expansion models in hope to come up with some rules that are both self-consistent and backward compatible.

cc #33953

petrochenkov · 2019-06-11T09:52:05Z

Macros used in the examples:

macro empty() {}
macro expr() { 0 }
macro stmt() { 0 ; }
macro stmt_expr() { 0 ; 1 }

and equivalent attribute macros producing the same tokens.

Definitions:

"Token-based expansion" - macro invocation tokens are replaced by tokens produced by the macro, without knowing anything about AST-based context of that macro invocation (whether it's a macro in item position, or in expression position, etc).

Having an expansion model like this is necessary if we want to perform eager expansion.
Suppose that we somehow identified the macro invocation stmt!() in the next token stream:

0 . id 2 stmt!() ; bar +

To expand it we replace it with its produced tokens without having any idea about the context

0 . id 2 stmt!() ; bar +

=>

0 . id 2 0 ; ; bar +

"AST-based expansion" - macro invocation knows its AST node kind (whether it's a macro expression, or a macro item, or something else) and its produced tokens (+ perhaps some other neighbouring tokens) are immediately (re-)parsed as one or multiple AST nodes of that kind, then the new nodes replace the old node.
That's how current expansion in the compiler works, more or less.

stmt!(); // Statement kind

=> 

0 ; // Produced tokens, parsed as a statement `0;`

=>

0; // The parsed statement replaces the previous statement `stmt!();`

petrochenkov · 2019-06-11T09:53:00Z

fn main() {
    { 2 }

    empty!();
}

Token-based expansion:

fn main() {
    { 2 } // <- trailing expression if only "output(empty!())  ;" is re-parsed after the macro expansion
           // (assuming empty statements are ignored when determining trailing-ness)
           // <- not a trailing expression if some larger context is re-parsed after the macro expansion

    ; // An "empty statement" if only "output(empty!())  ;" is re-parsed after the macro expansion
      // A part of "{ 2 };" statement if some larger context is re-parsed after the macro expansion
}

AST-based expansion:

fn main() {
    { 2 } // <- trailing expression

    // Stmt( empty!(); ) -> [/* empty stmt list */]
    /* Nothing */
}

Compiler's behavior:
{ 2 } is a trailing expression.

petrochenkov · 2019-06-11T12:34:30Z

fn main() {
    { 2 }

    expr!();
}

Token-based expansion:

fn main() {
    { 2 } // <- not a trailing expression

    0 ; // A non-empty statement, not a trailing expression
}

AST-based expansion:

fn main() {
    { 2 } // <- not a trailing expression

    // Stmt( expr!(); ) -> [Stmt(0;)]
    0; // A non-empty statement, not a trailing expression

    // ^^^ Assuming "output(expr!()) ;" is re-parsed and not only "output(expr!())".
    // In the second case ";" would be a separate empty statement, and "0" would be a trailing expression 
}

Compiler's behavior:
{ 2 } is not a trailing expression, 0 is not a trailing expression.

petrochenkov · 2019-06-11T12:40:32Z

fn main() {
    { 2 }

    stmt!();
}

Token-based expansion:

fn main() {
    { 2 } // <- not a trailing expression

    0 ; ; // A non-empty statement not a trailing expression, followed by empty statement
}

AST-based expansion:

fn main() {
    { 2 } // <- not a trailing expression

    // Stmt( stmt!(); ) -> [Stmt(0;)]
    0; // A non-empty statement, not a trailing expression
}

Compiler's behavior:
{ 2 } is not a trailing expression, 0 is not a trailing expression.

petrochenkov · 2019-06-11T12:49:12Z

fn main() {
    { 2 }

    stmt_expr!();
}

Token-based expansion:

fn main() {
    { 2 } // <- not a trailing expression

    0 ; 1 ; // A non-empty statement not a trailing expression, followed by a non-empty statement not a trailing expression
}

AST-based expansion:

fn main() {
    { 2 } // <- not a trailing expression

    // Stmt( stmt_expr!(); ) -> [Stmt(0;), Stmt(1;)]
    0; 1; // A non-empty statement not a trailing expression, followed by a non-empty statement not a trailing expression
}

Compiler's behavior:
{ 2 } is not a trailing expression, 1 is not a trailing expression.

petrochenkov · 2019-06-11T13:02:24Z

fn main() {
    { 2 }

    empty!()
}

Token-based expansion:

fn main() {
    { 2 } // <- trailing expression

    /* Nothing */
}

AST-based expansion 1 (empty!() has statement kind):

fn main() {
    { 2 } // <- trailing expression

    // Stmt( empty!() ) -> [/* empty stmt list */]
    /* Nothing */
}

AST-based expansion 2 (empty!() has expression kind):

fn main() {
    { 2 } // N/A

    // Expr( empty!() ) -> ParseError (empty token stream is not an expression)
    /* Nothing */
}

Compiler's behavior:
Parse error error: macro expansion ends with an incomplete expression: expected expression.

petrochenkov · 2019-06-13T14:09:59Z

fn main() {
    { 2 }

    expr!()
}

Token-based expansion:

fn main() {
    { 2 } // not a trailing expression

    0 // trailing expression
}

AST-based expansion 1 (expr!() has statement kind):

fn main() {
    { 2 } // not a trailing expression

    // Stmt( expr!() ) -> [Stmt(0)]
    0 // trailing expression
}

AST-based expansion 2 (expr!() has expression kind):

fn main() {
    { 2 } // not a trailing expression

    // Expr( expr!() ) -> Expr(0)
    0 // trailing expression
}

Compiler's behavior:
{ 2 } is not a trailing expression, 0 is a trailing expression.

petrochenkov · 2019-06-13T14:17:36Z

fn main() {
    { 2 }

    stmt!()
}

Token-based expansion:

fn main() {
    { 2 } // not a trailing expression

    0 ; // not a trailing expression
}

AST-based expansion 1 (stmt!() has statement kind):

fn main() {
    { 2 } // not a trailing expression

    // Stmt( stmt!() ) -> [Stmt(0;)]
    0; // not a trailing expression
}

AST-based expansion 2 (stmt!() has expression kind):

fn main() {
    { 2 } // not a trailing expression

    // Expr( stmt!() ) -> ParseError (`0;` is not an expression)
    /* nothing */
}

Compiler's behavior:
{ 2 } is not a trailing expression, 0 is a trailing expression.
This doesn't match any model and is clearly a bug (#33953).

petrochenkov · 2019-06-13T14:21:21Z

fn main() {
    { 2 }

    stmt_expr!()
}

Token-based expansion:

fn main() {
    { 2 } // not a trailing expression

    0 ; 1 // trailing expression
}

AST-based expansion 1 (stmt_expr!() has statement kind):

fn main() {
    { 2 } // not a trailing expression

    // Stmt( stmt_expr!() ) -> [Stmt(0;), Stmt(1)]
    0; 1 // trailing expression
}

AST-based expansion 2 (stmt_expr!() has expression kind):

fn main() {
    { 2 } // not a trailing expression

    // Expr( stmt_expr!() ) -> ParseError (`0; 1` is not an expression)
    /* nothing */
}

Compiler's behavior:
Parse error: error: macro expansion ignores token 1 and any following

petrochenkov · 2019-06-13T14:36:31Z

For attribute macros token-based and AST-based interpretations are the same as for fn-like macros, so I'm just going to list the actual compiler's behaviors.

cfg executes separate compiler's logic from other attribute macros, so it's tested here separately.

fn main() {
    { () } // trailing expression

    #[cfg(FALSE)]
    0; // not trailing expression
}

fn main() {
    { () } // trailing expression

    #[cfg(FALSE)]
    0 // not trailing expression
}

fn main() {
    { () } // not trailing expression

    #[cfg(TRUE)]
    0; // not trailing expression
}

fn main() {
    { () } // not trailing expression

    #[cfg(TRUE)]
    0 // trailing expression
}

petrochenkov · 2019-07-08T23:48:04Z

so I'm just going to list the actual compiler's behaviors.

fn main() {
    { () } // trailing expression

    #[empty]
    0; // not trailing expression
}
fn main() {
    { () } // trailing expression

    #[empty]
    0 // not trailing expression
}
fn main() {
    { () } // not trailing expression

    #[expr]
    0; // trailing expression (the attribute transforms the whole statement, including semicolon)
        // so it's equivalent to `expr!( 0 ; )`
}
fn main() {
    { () } // not trailing expression

    #[expr]
    0 // trailing expression
}
fn main() -> u8 {
    { () } // not trailing expression

    #[stmt]
    0; // not trailing expression
}
fn main() -> () {
    { () } // not trailing expression

    #[stmt]
    0 // error: macro expansion ignores token `;` and any following
}
fn main() -> () {
    { () } // not trailing expression

    #[stmt_expr]
    0; // trailing expression (the attribute transforms the whole statement, including semicolon)
        // so it's equivalent to `stmt_expr!( 0 ; )`
}
fn main() -> () {
    { () } // not trailing expression

    #[stmt_expr]
    0 // error: macro expansion ignores token `;` and any following
}

petrochenkov · 2019-07-09T12:42:41Z

Proposed resolution:

Use the token-based model, but without reparsing the whole world after expanding a single macro.
Introduce the notion of "reparse context" of a macro invocation including the invocation itself and some of its neighbouring tokens.
Only tokens from the reparse context are reparsed after the macro invocation produces tokens.

Non-eager expansion

When doing a regular macro expansion we have a partially built AST with some nodes in it being unexpanded macros.
So we take the macro node, expand the macro, and replace the node with the expansion results somehow.

In the token-based model macro expansion produces a token stream that needs to be converted into AST somehow.
In theory, the token parsing depends on context and we could reparse the whole crate after a single macro is expanded.

macro foo() { 2 + 3 }
fn main() { 1 * foo!() }

=> expand and reparse the whole crate =>

macro foo() { 2 + 3 }
fn main() { 1 * 2 + 3 } // Oh, wait

For multiple reasons, starting with "operator priority hygiene" (as in the example above), and ending with performance we want to reparse as few tokens as possible.

Proposal 1: reparse context includes only tokens from the macro invocation's AST node.

// Example: macro invocation in a statement node

// <reparse_context>
#[inert_attrs] // optional
// <invocation>
foo!()
// </invocation>
; // optional
// </reparse_context>

So, we are going to reparse the tokens produced by the macro together with their "closest environment".
(I won't talk about the inert attributes here, they don't currently work according to this model, and we perhaps can make them to, but that's a separate question.)

The reparse context tokens are reparsed as multiple statements and the original statement node is replaced with them.

Proposal 2: treat the trailing semicolon-less macro invocation as a statement rather than an expression like it's sometimes treated now.

So its produced tokens could be parsed as multiple statements as well.

Proposal 3: introduce an empty statement ;.

We need it in the token model.
empty!(); statement in particular turns into a single empty statement after expansion.

Eager expansion

0 . id 2 foo!() ; bar +

Reparse context either consists of the invocation only, or there is a programmatic way to mark some neighbouring tokens as belonging to it.
I don't won't to speculate about the details, but the general idea is that the concept of reparse context can be expanded to eager expansion as well.

Examples: What changes and what continues working

See the next comments

petrochenkov · 2019-07-09T17:44:56Z

What stops working:

fn main() {
    // `0` inside `stmt!()` is no longer a trailing expression
    // Fixes https://github.com/rust-lang/rust/issues/33953
    stmt!()
}

petrochenkov · 2019-07-09T17:51:07Z

What starts working:

fn main() {
    // fn main() {}
    empty!()
}
fn main() {
    // fn main() { 0; 1 }, trailing
    stmt_expr!()
}
fn main() {
    // fn main() { 0; }, non-trailing
    #[stmt]
    0
}
fn main() {
    // fn main() { 0; 1 }, trailing
    #[stmt_expr]
    0
}

The reason is that semicolon-less macro statements can now expand into multiple statements.

petrochenkov · 2019-07-09T18:00:01Z

Interesting cases that work now and keep working:

fn foo() -> u8 {
    { 0 } // <- trailing expression

    empty!();
}

In the "reparse everything" model this would expand into fn foo() -> u8 { { 0 } ; } and { 0 } would stop being a trailing expression.
However, we do not reparse everything, only the current node (#61733 (comment))!
So we get two statements after the expansion - [Stmt({ 0 }), Stmt(;)].

We can introduce the rule that all empty statements are thrown away when determining the trailing expression to make { 0 } trailing.
We can introduce the same rule for item statements as well

fn foo() -> u8 {
    { 0 } // <- trailing?

    fn bar() {}
}

, but that's not strictly necessary for backward compatibility.

Aaron1011 · 2020-10-25T18:17:23Z

We can introduce the rule that all empty statements are thrown away when determining the trailing expression to make { 0 } trailing.
We can introduce the same rule for item statements as well

@petrochenkov: Assuming that it doesn't cause much (or any breakage), I think it would be better to not ignore an empty statement when determining the trailing expression. That is:

fn foo() -> u8 {
    { 0 }

    empty!(); //~ ERROR: mismatched types
}

would fail to typecheck, since we have a trailing (statement) expression of ; with type ().

There are couple of reasons I think we should prefer this:

Consistency with other trailing statements. All of these currently fail to compile:

fn trailing_item() -> bool {
    { true }
    fn inner() {}
}

fn trailing_stmt() -> bool {
    { true }
    let a = 1;
}

fn trailing_smei() -> bool {
    { true }
    ;
}

Of course, these are parsed somewhat differently (the newline in fn trailing_semi() doesn't matter, and we just have the single statement { true };). However, the visual appearance of the code matches up with the behavior - there's nothing between the trailing expression and the closing brace.

Reducing the dependence on the implementation details of a proc macro

If the user writes:

fn foo() {
    { 0 }

    empty!(); //~ ERROR: mismatched types
}

Then someone reading the code will get the impression that this function returns a value of (). However, the value returned by the function actually depends on what empty! expands to - if it expands to nothing, the function return { 0 }: otherwise, the function returns (). If the function has a generic return type, both versions might actually compile. A macro might legitimately expand to nothing (e.g. based on features or platform #[cfg]s), so this could come up in real code.

Of course, doing this is a breaking change, since this currently compiles:

macro_rules! empty {
    () => { }
}

fn foo() -> bool {
    { true }
    empty!();
}

@petrochenkov: Assuming you don't object to the idea, I'll do a Crater run to get an idea of how much breakage this might cause.

With a bit of effort, I think I could come up with a future-incompatibility lint that would fire on any functions that would have their behavior changed by this (closures are a different story). During lowering, we would mark 'former werid trailing expressions' like { true }. If typecheck succeeds, we try to unify the return type with the type of the 'former weird trailing expression' (if it exists). If unification succeeds (e.g. a function returns an impl trait), we emit a warning.

petrochenkov · 2020-10-25T18:24:38Z

@Aaron1011
I do not object, we can try making it an error if its use in practice is rare enough.

Aaron1011 · 2020-10-27T15:29:03Z

One consequence of the proposed resolution in #61733 (comment):

As described in #61733 (comment), we will properly handle trailing semicolons in macro_rules! macros (fixing #33953). That is, the following code will compile:

fn main() {
    macro_rules! a {
        ($e:expr) => { $e; }
    }
    a!(true)
}

This code will continue to compile (note the semicolon after a!(true)). It will expand to true;;, which is a statement true; following by an empty statement ;

fn main() {
    macro_rules! a {
        ($e:expr) => { $e; }
    }
    a!(true);
}

However, what should happen to this code is less clear:

fn main() {
    macro_rules! a {
        ($e:expr) => { $e; }
    }
    let _val = a!(true);
}

This will expand to let _val = true;;, which compiles if written literally by the user. However, making this work would require reparsing the entire let statement (in the current implementation, we could just keep the existing AST node and append an empty ; statement). This is because a!(true) occurs in expression position, where semicolons are not allowed. That is, the code will be parsed as [let a = [true;]];, not [let a = true];;

I think rejecting this code is most consistent with the idea of a 'reparse context'. If we switch to token-based expansion (e.g. not constructing intermediate AST nodes), then allowing this would require reparsing arbitrarily many preceding tokens. For example, we could have let [StructPat { .. }, StructPat { field: OtherStruct { .. }}] = mac!(); - reparsing all of this would go against the idea of a local 'reparse context'.

However, this may be somewhat surprising to users, let _val = a!(true); and a!(true); look very similar under a purely token-based (i.e. reparse the world) approach: both expand to sequence with two trailing semicolons.

See rust-lang#61733 (comment) We now preserve the trailing semicolon in a macro invocation, even if the macro expands to nothing. As a result, the following code no longer compiles: ```rust macro_rules! empty { () => { } } fn foo() -> bool { //~ ERROR mismatched { true } //~ ERROR mismatched empty!(); } ``` Previously, `{ true }` would be considered the trailing expression, even though there's a semicolon in `empty!();` This makes macro expansion more token-based.

…expr, r=petrochenkov Treat trailing semicolon as a statement in macro call See rust-lang#61733 (comment) We now preserve the trailing semicolon in a macro invocation, even if the macro expands to nothing. As a result, the following code no longer compiles: ```rust macro_rules! empty { () => { } } fn foo() -> bool { //~ ERROR mismatched { true } //~ ERROR mismatched empty!(); } ``` Previously, `{ true }` would be considered the trailing expression, even though there's a semicolon in `empty!();` This makes macro expansion more token-based.

petrochenkov added A-frontend Area: frontend (errors, parsing and HIR) A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) T-lang Relevant to the language team, which will review and decide on the PR/issue. labels Jun 11, 2019

petrochenkov mentioned this issue Jun 11, 2019

Do not discern between statements with and without semicolon after lowering to HIR #61753

Closed

jonas-schievink added the A-parser Area: The parsing of Rust source code to an AST. label Jun 13, 2019

This was referenced Jun 28, 2019

Proper handling of ; between statements. rust-lang/wg-grammar#55

Open

Should MacroCall remove visibility? rust-lang/wg-grammar#52

Open

petrochenkov self-assigned this Jul 8, 2019

petrochenkov mentioned this issue Aug 2, 2019

How to treat inert attributes on macro invocations? #63221

Open

petrochenkov mentioned this issue Sep 8, 2019

Warn if include macro fails to include entire file #64284

Merged

petrochenkov mentioned this issue Jan 31, 2020

Tracking issue for procedural macros and "hygiene 2.0" #54727

Open

This was referenced Feb 29, 2020

encode ; stmt without expr as StmtKind::Empty #69506

Merged

Is a semicolon part of statement or not? Is a semicolon really never part of Statement nonterminal expansion containing ExpressionWithBlock? rust-lang/reference#773

Closed

petrochenkov mentioned this issue Mar 26, 2020

Macro Invocations Behave Questionable Regarding Semicolons #70432

Open

petrochenkov mentioned this issue May 13, 2020

Tracking issue for stmt_expr_attributes: Add attributes to expressions, etc. #15701

Open

Aaron1011 mentioned this issue Oct 25, 2020

Treat trailing semicolon as a statement in macro call #78376

Merged

Aaron1011 mentioned this issue Oct 25, 2020

Properly handle attributes on statements #78296

Merged

Aaron1011 mentioned this issue Nov 2, 2020

[DO NOT MERGE] Consistent handling of semicolons in macro expansions #78685

Closed

Aaron1011 mentioned this issue Nov 19, 2020

Stop ignoring trailing semicolons in a macro body when a macro is invoked in expression position rust-lang/lang-team#70

Closed

YizhePKU mentioned this issue Mar 30, 2021

Invalid expression discarding when using semicolons Rust-GCC/gccrs#317

Closed

ehuss mentioned this issue Jun 22, 2022

Flaw/inconsistency between block expression description and rustc. rust-lang/reference#762

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is a trailing expression in a block exactly? #61733

What is a trailing expression in a block exactly? #61733

petrochenkov commented Jun 11, 2019

petrochenkov commented Jun 11, 2019 •

edited

Loading

petrochenkov commented Jun 11, 2019 •

edited

Loading

petrochenkov commented Jun 11, 2019 •

edited

Loading

petrochenkov commented Jun 11, 2019

petrochenkov commented Jun 11, 2019 •

edited

Loading

petrochenkov commented Jun 11, 2019 •

edited

Loading

petrochenkov commented Jun 13, 2019

petrochenkov commented Jun 13, 2019

petrochenkov commented Jun 13, 2019

petrochenkov commented Jun 13, 2019

petrochenkov commented Jul 8, 2019 •

edited

Loading

petrochenkov commented Jul 9, 2019 •

edited

Loading

petrochenkov commented Jul 9, 2019 •

edited

Loading

petrochenkov commented Jul 9, 2019 •

edited

Loading

petrochenkov commented Jul 9, 2019 •

edited

Loading

Aaron1011 commented Oct 25, 2020 •

edited

Loading

petrochenkov commented Oct 25, 2020

Aaron1011 commented Oct 27, 2020 •

edited

Loading

What is a trailing expression in a block exactly? #61733

What is a trailing expression in a block exactly? #61733

Comments

petrochenkov commented Jun 11, 2019

petrochenkov commented Jun 11, 2019 • edited Loading

petrochenkov commented Jun 11, 2019 • edited Loading

petrochenkov commented Jun 11, 2019 • edited Loading

petrochenkov commented Jun 11, 2019

petrochenkov commented Jun 11, 2019 • edited Loading

petrochenkov commented Jun 11, 2019 • edited Loading

petrochenkov commented Jun 13, 2019

petrochenkov commented Jun 13, 2019

petrochenkov commented Jun 13, 2019

petrochenkov commented Jun 13, 2019

petrochenkov commented Jul 8, 2019 • edited Loading

petrochenkov commented Jul 9, 2019 • edited Loading

Non-eager expansion

Eager expansion

Examples: What changes and what continues working

petrochenkov commented Jul 9, 2019 • edited Loading

petrochenkov commented Jul 9, 2019 • edited Loading

petrochenkov commented Jul 9, 2019 • edited Loading

Aaron1011 commented Oct 25, 2020 • edited Loading

petrochenkov commented Oct 25, 2020

Aaron1011 commented Oct 27, 2020 • edited Loading

petrochenkov commented Jun 11, 2019 •

edited

Loading

petrochenkov commented Jun 11, 2019 •

edited

Loading

petrochenkov commented Jun 11, 2019 •

edited

Loading

petrochenkov commented Jun 11, 2019 •

edited

Loading

petrochenkov commented Jun 11, 2019 •

edited

Loading

petrochenkov commented Jul 8, 2019 •

edited

Loading

petrochenkov commented Jul 9, 2019 •

edited

Loading

petrochenkov commented Jul 9, 2019 •

edited

Loading

petrochenkov commented Jul 9, 2019 •

edited

Loading

petrochenkov commented Jul 9, 2019 •

edited

Loading

Aaron1011 commented Oct 25, 2020 •

edited

Loading

Aaron1011 commented Oct 27, 2020 •

edited

Loading