Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yarte_lexer 0.0.1 compilation regression #96305

Open
nnethercote opened this issue Apr 22, 2022 · 7 comments
Open

yarte_lexer 0.0.1 compilation regression #96305

nnethercote opened this issue Apr 22, 2022 · 7 comments
Labels
C-bug Category: This is a bug. P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another.

Comments

@nnethercote
Copy link
Contributor

nnethercote commented Apr 22, 2022

@lqd pointed out a failure from a recent crater run.

Bisection identifies the 7th commit of #95159 as the problem: "Eliminate TokenTreeOrTokenTreeSlice". Interestingly, #95797 reverted many of the changes from that 7th commit, but the problem persists.

UPDATE: See this comment for the explanation.

I have a reduced test case. It doesn't fit in a single file, instead requiring two crates.

Two-crate test case

Crate y, Cargo.toml:

[package]
name = "y"
version = "0.1.0"
edition = "2018"
[dependencies]

Crate y, src/lib.rs:

#![feature(trace_macros)]
trace_macros!(true);

macro_rules! ascii_builder {
    ($($n:literal)+) => {
        #[macro_export]
        macro_rules! ascii {
            $(($n) => { $n };)+
            ($t:tt) => { compile_error!("no match") };
        }
    };
}

ascii_builder!('a' 'b' 'c');

Crate z (which sits next to crate y in the filesystem), Cargo.toml:

[package]
name = "z2"
version = "0.1.0"
edition = "2021"
[dependencies]
y = { path = "../y" }

Crate z, src/main.rs:

#![feature(trace_macros)]
trace_macros!(true);

fn main() {
    let _x = y::ascii!('a');
}

Note: The original code defines ascii and ascii_builder within the yarte_lexer crate and then uses ascii within a unit test. For my reduced test case I used two separate crates because it made it easier to use tools like cargo expand, and to avoid any issues specific to unit tests.

Output

If I compile with cargo check using a version prior to #95159, compilation succeeds with this output:

    Checking y v0.1.0 (/home/njn/dev/y)
note: trace_macro
  --> /home/njn/dev/y/src/lib.rs:14:1
   |
14 | ascii_builder!('a' 'b' 'c');
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: expanding `ascii_builder! { 'a' 'b' 'c' }`
   = note: to `#[macro_export] macro_rules! ascii
           {
               ('a') => { 'a' } ; ('b') => { 'b' } ; ('c') => { 'c' } ; ($t : tt) =>
               { compile_error! ("no match") } ;
           }`

    Checking z2 v0.1.0 (/home/njn/dev/z3)
note: trace_macro
 --> src/main.rs:5:14
  |
5 |     let _x = y::ascii!('a');
  |              ^^^^^^^^^^^^^^
  |
  = note: expanding `ascii! { 'a' }`
  = note: to `'a'`

    Finished dev [unoptimized + debuginfo] target(s) in 0.21s

If I do the same with a version after #95159, compilation fails with this output:

    Checking y v0.1.0 (/home/njn/dev/y)
note: trace_macro
  --> /home/njn/dev/y/src/lib.rs:14:1
   |
14 | ascii_builder!('a' 'b' 'c');
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: expanding `ascii_builder! { 'a' 'b' 'c' }`
   = note: to `#[macro_export] macro_rules! ascii
           {
               ('a') => { 'a' } ; ('b') => { 'b' } ; ('c') => { 'c' } ; ($t : tt) =>
               { compile_error! ("no match") } ;
           }`

    Checking z2 v0.1.0 (/home/njn/dev/z2)
error: no match
 --> src/main.rs:5:14
  |
5 |     let _x = y::ascii!('a');
  |              ^^^^^^^^^^^^^^
  |
  = note: this error originates in the macro `y::ascii` (in Nightly builds, run with -Z macro-backtrace for more info)

note: trace_macro
 --> src/main.rs:5:14
  |
5 |     let _x = y::ascii!('a');
  |              ^^^^^^^^^^^^^^
  |
  = note: expanding `ascii! { 'a' }`
  = note: to `compile_error! ("no match")`

error: could not compile `z2` due to previous error

The obvious difference is the compile error. Interestingly, the expansion of ascii_builder in crate y is the same in both cases. So it's not at all clear why the failure occurs. There's a rule matching 'a' in both cases.

Single file attempt

I tried and failed to reproduce this in a single file. Here's what I had:

macro_rules! ascii_builder {
    ($($n:literal)+) => {
        #[macro_export]
        macro_rules! ascii {
            $(($n) => { $n };)+
            ($t:tt) => { compile_error!("no match") };
        }
    };
}

ascii_builder!('a' 'b' 'c');

fn main() {
    let _x = ascii!('a');
}

I just get the same compile error from above, no matter which version of the compiler I use:

error: no match
  --> y.rs:9:26
   |
9  |             ($t:tt) => { compile_error!("no match") };
   |                          ^^^^^^^^^^^^^^^^^^^^^^^^^^
...
17 |     let _x = ascii!('a');
   |              ----------- in this macro invocation
   |
   = note: this error originates in the macro `ascii` (in Nightly builds, run with -Z macro-backtrace for more info)

error: aborting due to previous error

I honestly don't know why this single-file version fails but the two-crate version works, and whether that is valid or a bug. It feels like it should compile.

Metadata difference

The only other clue I have is that the metadata has changed slightly. The older (pre-regression) compiler produces this:

-rw-rw-r-- 1 njn njn 3282 Apr 22 13:39 liby-d5c1be4b71acf3c4.rmeta

The newer (post-regression) compiler produces this:

-rw-rw-r-- 1 njn njn 3278 Apr 22 13:39 liby-d5c1be4b71acf3c4.rmeta

I don't know of an easy way to compare metadata files, so I don't know what accounts for the slight size difference.

Conclusion

This is a weird one, involving a macro that generates another macro that is then used in another crate. This explains why the change in behaviour took a while to be noticed.

I have looked through the commit responsible, but I can't see any changes related to all of the above.

@petrochenkov, any ideas?

@nnethercote nnethercote added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Apr 22, 2022
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Apr 22, 2022
@nnethercote
Copy link
Contributor Author

Oh, I can probably guess what caused the metadata change: mbe::Delimited's tts field was renamed as all_tts. (It was subsequently renamed back to tts in #95797). And that type is marked with derive(Encodable, Decodable).

@nnethercote
Copy link
Contributor Author

Aha! I have worked it out. The 7th commit in #95159 unintentionally changed the behaviour of NoDelim-delimited sequences. I guess these don't come up very often, but a macro that generates a macro is such a case? There is an easy fix, which I will post shortly. Though I don't know how to write a test case for it, which is unfortunate.

(For anyone wondering how I worked this out: I took the 7th commit from #95159 and combined it with the diff from #95797 using combinediff. The result was pretty small, and the removal of TokenTree::{get_tt,len} was a big chunk of it. So then I stared at them until I realized that the NoDelim case was suspicious.)

I'm still unsure why the single-file test case above doesn't work, and whether it should.

nnethercote added a commit to nnethercote/rust that referenced this issue Apr 22, 2022
PR rust-lang#95159 unintentionally changed the behaviour of declarative macro
matching for `NoDelim` delimited sequences.
- Before rust-lang#95159, delimiters were implicit in `mbe::Delimited`. When
  doing macro matching, normal delimiters were generated out of thin air
  as necessary, but `NoDelim` delimiters were not. This was done within
  `TokenTree::{get_tt,len}`.
- After rust-lang#95159, the delimiters were explicit. There was an unintentional
  change whereby `NoDelim` delimiters were represented explicitly just
  like normal delimeters.
- rust-lang#95555 introduced a new matcher representation (`MatcherLoc`) and the
  `NoDelim` delimiters were made explicit within it, just like
  `mbe::Delimited`.
- rust-lang#95797 then changed `mbe::Delimited` back to having implicit
  delimiters, but because matching is now being done on `MatcherLoc`,
  the behavioural change persisted.

The fix is simple: remove the explicit `NoDelim` delimiters in the
`MatcherLoc` representation. This gives back the original behaviour.

As for why this took so long to be found: it seems that `NoDelim`
sequences are unusual. It took a macro that generates another macro
(from the `yarte_lexer` crate, found via a crater run) to uncover this.

Fixes rust-lang#96305.
@nnethercote nnethercote changed the title yarte_lexer 0.0.1 compilation regression due to #95159 yarte_lexer 0.0.1 compilation regression Apr 22, 2022
@apiraino
Copy link
Contributor

Assigning priority as discussed in the Zulip thread of the Prioritization Working Group.

@rustbot label -I-prioritize +P-medium

@rustbot rustbot added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Apr 22, 2022
@petrochenkov
Copy link
Contributor

@nnethercote

I'm still unsure why the single-file test case above doesn't work, and whether it should.

Likely because in the single-file case literal fragments are kept as AST pieces in Nonterminal tokens, but in the cross-crate case they are converted into tokens streams including NoDelim delimiters so they can be written to crate metadata.

The cross-crate case is "normative" because it's pure token stream.
The fact that it works locally is kind of a variation of #67062.

@nnethercote
Copy link
Contributor Author

@petrochenkov said the following in #96307:

Given that the macro in #96305 correctly reports an error if used locally, I'd say that

It would be better to report this case early, at macro definition site, when parsing the LHS of the macro definition.

is the right course of action here.

See Nonterminal/NoDelim when parsing a macro LHS => report an error (or a deny-by-default deprecation lint if there are too many regressions).

@petrochenkov
Copy link
Contributor

After some thought, perhaps we should indeed land #96307 to keep the old behavior at call site, until the definition site diagnostics (#96307 (comment)) are actually implemented and turned into errors.

@petrochenkov
Copy link
Contributor

until the definition site diagnostics (#96307 (comment)) are actually implemented and turned into errors.

This needs an open issue though, so it doesn't get lost.

@JohnTitor JohnTitor added regression-from-stable-to-stable Performance or correctness regression from one stable version to another. and removed regression-untriaged Untriaged performance or correctness regression. labels May 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants