Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

None-delimited groups do not work as advertised #124974

Closed
ijackson opened this issue May 10, 2024 · 3 comments
Closed

None-delimited groups do not work as advertised #124974

ijackson opened this issue May 10, 2024 · 3 comments
Labels
C-bug Category: This is a bug.

Comments

@ijackson
Copy link
Contributor

The documentation for proc_macro::Delimiter::.None says:

∅ ... ∅ An invisible delimiter, that may, for example, appear around tokens coming from a “macro variable” $var. It is important to preserve operator priorities in cases like $var * 3 where $var is 1 + 2. Invisible delimiters might not survive roundtrip of a token stream through a string.

This implies that a None-delimited Group is parsed in an expression, the grouping overrides expression precedence. But this is not the case. It only appears to work because usually what looks like a None-delimited group in proc macro debug output, is really something else.

I conjecture that this has seemed to work for many people much of the time because of various optimisations in the proc_macro bridge, and because macro_rules! expansions don't actually produce these None-delimited groups internally.

CC @dtolnay (as the author of syn, whose users seem like they might be affected).

See also #124817, which is sort of the same bug, but with types. There, the outcome is a compile error rather than miscompilation.

To reproduce

proc macro

use proc_macro::{Group, TokenStream, TokenTree};

#[proc_macro]
pub fn dbg_dump(input: TokenStream) -> TokenStream {
    dbg!(&input);
    dbg!(input.to_string());
    input
}

/// Reconstructs, identically, all the `Group`s in a `TokenStream`
///
/// (We don't bother adjusting spans.)
#[proc_macro]
pub fn reconstruct_groups(input: TokenStream) -> TokenStream {
    fn recurse(input: TokenStream) -> TokenStream {
        let mut output = TokenStream::new();
        for tt in input {
            let tt = match tt {
                TokenTree::Group(g) => {
                    let delim = g.delimiter();
                    dbg!(&delim);
                    let stream = recurse(g.stream());
                    TokenTree::Group(Group::new(delim, stream))
                },
                other => other,
            };
            output.extend([tt]);
        }
        output
    }

    dbg!(&input);
    let output = recurse(input);
    dbg!(&output);
    dbg!(&output.to_string());
    output
}

main.rs

#![allow(dead_code)]
use foo_macros::*;

macro_rules! one_plus { { $v:expr } => {
    2 * $v
} }
macro_rules! one_plus_dbg { { $v:expr } => {
    dbg_dump!(2 * $v)
} }
macro_rules! one_plus_reconstruct { { $v:expr } => {
    reconstruct_groups!(2 * $v)
} }

fn main() {
    println!("2 * (3 + 4) should be 14");
    println!("{} without proc_macro", one_plus!(3 + 4));
    println!("{} dbg_dump", one_plus_dbg!(3 + 4));
    println!("{} reconstruct", one_plus_reconstruct!(3 + 4));
}

Expected output

2 * (3 + 4) should be 14
14 without proc_macro
14 dbg_dump
14 reconstruct

(plus debug dumps to stderr)

Actual output

2 * (3 + 4) should be 14
14 without proc_macro
14 dbg_dump
10 reconstruct

(plus debug dumps to stderr)

Additional inforamation, observations

Consequences

Some proc_macros could be generating expressions that are interpreted wrongly by the compiler.

It doesn't seem easy to rule out the possibility that this could result in very seriously buggy executables. However, many simpler cases seem to work as expected, so hopefully the problem is not very widespread.

One way the issue could be triggered is by composition of (a) a normal macro_rules macro with (b) a proc_macro which performs certain token stream reconstructions; these might even be in completely separate crates and only combined in a naive downstream crate.

It might be possible to use a specially instrumented compiler to detect programs (at least on crates.io) which are misinterpreted.

Semantic difference is not discernable even in Debug output

See the full output, below. The three Debug representations are identical, apart from spans:

  1. The apparent TokenStream resulting from just the macro_rules! expansion
  2. The pre-reconstructed input to reconstruct_groups
  3. The reconstructed output from reconstruct_groups

But version 3 is parsed differently, when evaluated as an expression. It ignores the group and consequently gives the wrong answer.

Rust version

rustc 1.79.0-beta.3 (f5d04caa7 2024-05-03)
binary: rustc
commit-hash: f5d04caa74a1dfa5ffc4082c2c8f621f25336bbc
commit-date: 2024-05-03
host: x86_64-unknown-linux-gnu
release: 1.79.0-beta.3
LLVM version: 18.1.4

I have reproduced with Rust 1.45, too (which was when proc macros in expression position were stabilised).

Full output

-*- mode: compilation; default-directory: "~/Rustup/Arti/experiments/" -*-
Compilation started at Fri May 10 14:37:53

nailing-cargo run
nailing-cargo: out-of-tree, git, building in: `/home/ian/Rustup/Arti/Build/experiments'
nailing-cargo: using really to run as user `rustcargo'
nailing-cargo: *WARNING* cwd is not in Cargo.nail thbough it has Cargo.toml!
nailing-cargo: nailed (0 manifests, 0 packages)
nailing-cargo: invoking: cargo run --locked --offline
   Compiling foo-macros v0.1.0 (/volatile/rustcargo/Rustup/Arti/experiments/macros)
   Compiling foo v0.1.0 (/volatile/rustcargo/Rustup/Arti/experiments)
[macros/macros.rs:5:5] &input = TokenStream [
    Literal {
        kind: Integer,
        symbol: "2",
        suffix: None,
        span: #11 bytes(156..157),
    },
    Punct {
        ch: '*',
        spacing: Alone,
        span: #11 bytes(158..159),
    },
    Group {
        delimiter: None,
        stream: TokenStream [
            Literal {
                kind: Integer,
                symbol: "3",
                suffix: None,
                span: #0 bytes(411..412),
            },
            Punct {
                ch: '+',
                spacing: Alone,
                span: #0 bytes(413..414),
            },
            Literal {
                kind: Integer,
                symbol: "4",
                suffix: None,
                span: #0 bytes(415..416),
            },
        ],
        span: #11 bytes(160..162),
    },
]
[macros/macros.rs:6:5] input.to_string() = "2 * 3 + 4"
[macros/macros.rs:32:5] &input = TokenStream [
    Literal {
        kind: Integer,
        symbol: "2",
        suffix: None,
        span: #17 bytes(245..246),
    },
    Punct {
        ch: '*',
        spacing: Alone,
        span: #17 bytes(247..248),
    },
    Group {
        delimiter: None,
        stream: TokenStream [
            Literal {
                kind: Integer,
                symbol: "3",
                suffix: None,
                span: #0 bytes(473..474),
            },
            Punct {
                ch: '+',
                spacing: Alone,
                span: #0 bytes(475..476),
            },
            Literal {
                kind: Integer,
                symbol: "4",
                suffix: None,
                span: #0 bytes(477..478),
            },
        ],
        span: #17 bytes(249..251),
    },
]
[macros/macros.rs:21:21] &delim = None
[macros/macros.rs:34:5] &output = TokenStream [
    Literal {
        kind: Integer,
        symbol: "2",
        suffix: None,
        span: #17 bytes(245..246),
    },
    Punct {
        ch: '*',
        spacing: Alone,
        span: #17 bytes(247..248),
    },
    Group {
        delimiter: None,
        stream: TokenStream [
            Literal {
                kind: Integer,
                symbol: "3",
                suffix: None,
                span: #0 bytes(473..474),
            },
            Punct {
                ch: '+',
                spacing: Alone,
                span: #0 bytes(475..476),
            },
            Literal {
                kind: Integer,
                symbol: "4",
                suffix: None,
                span: #0 bytes(477..478),
            },
        ],
        span: #19 bytes(225..252),
    },
]
[macros/macros.rs:35:5] &output.to_string() = "2 * 3 + 4"
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.23s
     Running `target/debug/foo`
2 * (3 + 4) should be 14
14 without proc_macro
14 dbg_dump
10 reconstruct
nailing-cargo: unnailed.  status 0.

Compilation finished at Fri May 10 14:37:54
@ijackson ijackson added the C-bug Category: This is a bug. label May 10, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label May 10, 2024
@ijackson
Copy link
Contributor Author

Complete code available:

git clone https://www.chiark.greenend.org.uk/ucgi/~ian/githttp/rust-experiments.git -b rust-ticket-124974

git clone https://www.chiark.greenend.org.uk/ucgi/~ian/githttp/rust-experiments.git -b rust-ticket-124974-for-1.45

@dtolnay
Copy link
Member

dtolnay commented May 10, 2024

This is a duplicate of #67062.

@ijackson
Copy link
Contributor Author

This is a duplicate of #67062.

So it is, thanks.

@dtolnay dtolnay closed this as completed May 10, 2024
@jieyouxu jieyouxu removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label May 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug.
Projects
None yet
Development

No branches or pull requests

4 participants