Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MBE: more checks at definition time #61053

Closed
4 tasks done
mark-i-m opened this issue May 22, 2019 · 38 comments
Closed
4 tasks done

MBE: more checks at definition time #61053

mark-i-m opened this issue May 22, 2019 · 38 comments
Labels
A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) C-cleanup Category: PRs that clean code up or issues documenting cleanup. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. E-help-wanted Call for participation: Help is requested to fix this issue. E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@mark-i-m
Copy link
Member

mark-i-m commented May 22, 2019

There are a few checks that are currently only done at macro invocation time (or are not done at all) that really would make more sense to do a macro definition time.

This is a tracking issue to implement these checks.

@mark-i-m
Copy link
Member Author

mark-i-m commented May 22, 2019

Could someone tag this with cleanup and macros and help-wanted please?

EDIT: and tracking-issue

@mark-i-m
Copy link
Member Author

cc @petrochenkov

@jonas-schievink jonas-schievink added A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. E-help-wanted Call for participation: Help is requested to fix this issue. labels May 22, 2019
@jonas-schievink
Copy link
Contributor

@mark-i-m you can apply labels yourself by using @rustbot modify labels

@mark-i-m
Copy link
Member Author

Oh, that's good to know. Thanks :)

@rustbot modify labels: +C-cleanup

@rustbot rustbot added the C-cleanup Category: PRs that clean code up or issues documenting cleanup. label May 22, 2019
@Centril Centril added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue. labels May 23, 2019
@mark-i-m
Copy link
Member Author

mark-i-m commented May 25, 2019

Just some a quick note for anyone who wants to implement this: the checks should probably be inserted here, where there are already some other checks:

for rhs in &rhses {
valid &= check_rhs(sess, rhs);
}
// don't abort iteration early, so that errors for multiple lhses can be reported
for lhs in &lhses {
valid &= check_lhs_no_empty_seq(sess, &[lhs.clone()]);
valid &= check_lhs_duplicate_matcher_bindings(
sess,
&[lhs.clone()],
&mut FxHashMap::default(),
def.id
);
}

You can look at the existing checks to see what they are like.

Also, anything implemented will probably need a crater run. I think breakage here is acceptable, since any code that does the things in the OP is almost certainly wrong.

EDIT: also, we would probably need to follow a procedure like #57742 and its PRs.

@mark-i-m
Copy link
Member Author

@rustbot modify labels: +E-medium +E-mentor

@rustbot rustbot added E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. labels May 25, 2019
@ia0
Copy link
Contributor

ia0 commented May 27, 2019

I am interested into implementing those checks. I would need a bit of mentoring though. Who can I ask questions and through which medium? For example, I implemented the check that variables in rhs are bound in lhs and it already prevents building stage 1 (with ./x.py build --stage 1) because quick-error has errors. All variables on this line https://github.com/tailhook/quick-error/blob/aadd84f9481dddb5f6c04743d1be0672a0589dc8/src/lib.rs#L468 are not bound. How can I locally patch this crate?

@mark-i-m
Copy link
Member Author

@ia0 Thanks! You can ask me on this thread :) But I am out of town this week, so I might not be super responsive.

Hmm... I am not super familiar with quick-error. I believe you can change the Cargo.toml for the faulty crate to use a path dependency. Does this help: https://internals.rust-lang.org/t/how-to-specify-a-private-repository-for-one-library-like-libc/9779 ?

@ia0
Copy link
Contributor

ia0 commented May 27, 2019

Thanks, I was indeed looking for [patch.crates-io]. I think all 3 checks are implemented in https://github.com/ia0/rust/tree/issues_61053. What remains to be done are:

  • Fix all macro errors in rustc and dependencies (may take some time, there are a lot).
  • Clean up code (current code was mostly to get the check in place to get an idea of impact).
  • Improve error messages (very basic for now).
  • Add tests (none yet).
  • Run crater.
    It should take some time, but I'll keep you posted on progress.

@ia0
Copy link
Contributor

ia0 commented May 29, 2019

I think most of the work is done and is ready for review. Should I create a pull request?

Here is a description of the commits available in https://github.com/ia0/rust/commits/issues_61053:

  1. Fix meta-variable binding errors in macros
    This is an independent change (could have its own pull request) fixing macro errors in the compiler itself.
  2. Run rustfmt
    This is an independent change (could have its own pull request) running rustfmt on 2 files that I'll modify in later commits.
  3. Remember the span of the Kleene operator in macros
    This change only depends on commit 2 and saves the span of the Kleene operator to give precise error messages in commit 5.
  4. Patch extern crates
    This is an independent change similar to commit 1 but for external crates. I actually don't know how we should handle this issue. One idea is to first have the checks as a warning instead of an error (which would also avoid having to run crater). Once the problematic external crates are bumped to a version fixing the issues, we can set it back to an error.
  5. Implement checks for meta-variables in macros
    This change depends on all previous commits and implements the 3 checks of this issue. I didn't run and fix all tests yet (only the ui test suite). Also note that there might be issues with Tracking issue: declarative macros 2.0 #39412 that I did not investigate yet.

I will be offline for the next 4 days (leaving Thursday noon, back Sunday night).

@ia0
Copy link
Contributor

ia0 commented May 29, 2019

Actually, there's an interesting problem:

macro_rules! outer {
    ($x:expr; $fragment:ident) => {
        macro_rules! inner { ($y:$fragment) => { $x + $y } }
    }
}

$y is free in the right-hand side of the outer macro, but it is correctly bound and used in the inner macro. So I would need to add support for macros defined in macros. But I'm not sure it will be easy to fix.

@petrochenkov
Copy link
Contributor

Ah, that's a great example.
Perhaps nested macros is the reason why this wasn't done in the first place.

@ia0
Copy link
Contributor

ia0 commented Jun 20, 2019

Ok, I created #62008. Should I migrate the errors to an allow-by-default before running crater? Or maybe I can migrate to a deny-by-default lint, run crater, and then switch the lint to allow-by-default.

@mark-i-m
Copy link
Member Author

@ia0 Thanks! I will take a look soon. Let I think we should make it an allow-by-default lint of the crater run. The goal will be to make sure we aren't breaking any existing known crates. If we want we could also try a run with the deny-by-default lint just to see how prevalent the errors are and what the false-positive rate is.

@ia0
Copy link
Contributor

ia0 commented Jun 29, 2019

@petrochenkov raised a good point in #62008 that we don't need to specify the repetition in RHS. It is entirely determined by the LHS.

macro_rules! current { ($($x:tt)+) => { $($x)+ }; }
macro_rules! proposed { ($($x:tt)+) => { $($x) }; }

Although less noisy, I see 2 possible issues:

  • This is a breaking change. How do we know if current!(1 2 3) is 1 2 3 + (new compiler) or 1 2 3 (old compiler)? There is no issue for proposed!(1 2 3) because it would be rejected by an old compiler.
  • It may be useful as documentation to see how a token tree repeats in the RHS. For example let $x $(: $t)? = $v (current behavior) and let $x $(: $t) = $v (proposed behavior).

bors added a commit that referenced this issue Jun 29, 2019
Add meta-variable checks in macro definitions

This is an implementation of #61053. It is not sound (some errors are not reported) and not complete (reports may not be actual errors). This is due to the possibility to define macros in macros in indirect ways. See module documentation of `macro_check` for more details.

What remains to be done:
- [x] Migrate from an error to an allow-by-default lint.
- [x] Add more comments in particular for the handling of nested macros.
- [x] Add more tests if needed.
- [x] Try to avoid cloning too much (one idea is to use lists on the stack).
- [ ] Run crater with deny-by-default lint (measure rate of false positives).
    - [ ] Remove extra commit for deny-by-default lint
- [x] Create a PR to remove the old `question_mark_macro_sep` lint #62160
@mark-i-m
Copy link
Member Author

It might be a good thing to do for decl_macros

@petrochenkov
Copy link
Contributor

@ia0

This is a breaking change.

I would expect */+/? to still be accepted by the new compiler, just become optional.

@ia0
Copy link
Contributor

ia0 commented Jun 29, 2019

I would expect */+/? to still be accepted by the new compiler, just become optional.

So you mean that current!(1 2 3) should produces 1 2 3 as with the old compiler. This looks brittle to me because if the user is using the new style throughout their crate and they want to output a *, +, or ? after a repeated sequence, they would have to use the old style (i.e. explicitly provide the Kleene operator) to avoid ambiguity: Use $($x)++ to produce 1 2 3 + because $($x)+ would produce 1 2 3.

I think it is preferable to have a breaking change (new compiler does not expect a Kleene operator after a sequence in the RHS) than to have the Kleene operator be optional in the RHS.

@mark-i-m
Copy link
Member Author

@petrochenkov That would be ambiguous unless we denied the use of ?/*/+ in matchers or required the first such character to be a Kleene op, as @ia0 pointed out.

bors added a commit that referenced this issue Jun 30, 2019
Add meta-variable checks in macro definitions

This is an implementation of #61053. It is not sound (some errors are not reported) and not complete (reports may not be actual errors). This is due to the possibility to define macros in macros in indirect ways. See module documentation of `macro_check` for more details.

What remains to be done:
- [x] Migrate from an error to an allow-by-default lint.
- [x] Add more comments in particular for the handling of nested macros.
- [x] Add more tests if needed.
- [x] Try to avoid cloning too much (one idea is to use lists on the stack).
- [ ] Run crater with deny-by-default lint (measure rate of false positives).
    - [ ] Remove extra commit for deny-by-default lint
- [x] Create a PR to remove the old `question_mark_macro_sep` lint #62160
bors added a commit that referenced this issue Jul 2, 2019
Add meta-variable checks in macro definitions

This is an implementation of #61053. It is not sound (some errors are not reported) and not complete (reports may not be actual errors). This is due to the possibility to define macros in macros in indirect ways. See module documentation of `macro_check` for more details.

What remains to be done:
- [x] Migrate from an error to an allow-by-default lint.
- [x] Add more comments in particular for the handling of nested macros.
- [x] Add more tests if needed.
- [x] Try to avoid cloning too much (one idea is to use lists on the stack).
- [ ] Run crater with deny-by-default lint (measure rate of false positives).
    - [ ] Remove extra commit for deny-by-default lint
- [x] Create a PR to remove the old `question_mark_macro_sep` lint #62160
@ia0
Copy link
Contributor

ia0 commented Jul 7, 2019

Status

Implementation

#62008 implements the meta_variable_misuse lint with the following checks at macro definition:

  • Meta-variables must not be bound twice in nested macro definitions (this is already an error at top-level)
  • Meta-variables must not be free
  • Meta-variables must repeat at least as many times as their binder
  • Meta-variables must repeat with the same Kleene operators as their binder

The main issue with #62008 is that it may have false negatives and false positives. This is due to the possibility to define macros in macros in ways that depend on the actual macro instantiation. The current lint tries to detect nested macro definitions by looking for one of the following patterns:

  • macro_rules! name { body }
  • macro_rules! $name { body }
  • macro name { body }
  • macro $name { body }
  • macro name(lhs) { rhs } (body is (lhs) => { rhs })
  • macro $name(lhs) { rhs } (body is (lhs) => { rhs })

In case a nested macro definition is detected, its body is analyzed as long as it matches a sequence of lhs => rhs separated and optionally terminated by ; for legacy macros and separated by , for decl_macro macros. As soon as the body does not match this pattern, the lint falls back and checks the remaining unchecked parts as if outside the nested macro definition.

Here are a few problematic examples classified by root cause.

Behavior depends on actual macro instantiation

We need a check at macro instantiation to be sound and complete.

macro_rules! foo {
    ($n:tt) => {
        // The lint does not recognize the possible nested macro definition.
        $n! bar {
            // $x is free unless $n is `macro_rules`
            ($x:tt) => { $x };
        }
    };
}
macro_rules! foo {
    ($d:tt) => {
        macro_rules! bar {
            // The lint does not recognize the possible meta-variable occurrence:
            // `$d z` is a free meta-variable if $d is `$`.
            ($y:tt) => { $d z };
        }
    };
}

Behavior depends on other macro definitions

We need some kind of abstract macro instantiation to unfold macro calls.

macro_rules! foo {
    () => {
        // The lint thinks bar is a nested macro definition but it's not.
        stringify!(macro_rules! bar { () => { $x }; })
    };
}
macro_rules! foo {
    () => {
        // The lint does not look at the definition of `def`, thus does not recognize
        // the nested macro definition, and thus reports $x as free.
        def!({ ($x:tt) => { $x }; });
    };
}
macro_rules! def { ($body:tt) => { macro_rules! bar $body }; }

Arbitrary limitations

The way we try to guess nested macro definitions and macro bodies can be improved.

macro_rules! foo {
    ($($k:tt $v:tt)*) => {
        macro_rules! bar {
            // The lint does not recognize the repetition as a body, thus falls back
            // to checking as if outside a nested macro definition, and thus reports
            // $f as free.
            $(($f:tt $k) => { $f($v) };)*
        }
    };
}

Discussion

Omit Kleene operator for repetition occurrences

As noticed by @petrochenkov, the Kleene operator on sequence occurrences is redundant. It is entirely defined by its binder. As such, we could consider a language change to omit the Kleene operator in those cases:

macro_rules! current { ($($x:tt)+) => { $($x)+ }; }
macro_rules! proposed { ($($x:tt)+) => { $($x) }; }

As noted by @mark-i-m, this may be easier to apply for decl_macro.

bors added a commit that referenced this issue Jul 20, 2019
Add meta-variable checks in macro definitions

This is an implementation of #61053. It is not sound (some errors are not reported) and not complete (reports may not be actual errors). This is due to the possibility to define macros in macros in indirect ways. See module documentation of `macro_check` for more details.

What remains to be done:
- [x] Migrate from an error to an allow-by-default lint.
- [x] Add more comments in particular for the handling of nested macros.
- [x] Add more tests if needed.
- [x] Try to avoid cloning too much (one idea is to use lists on the stack).
- [ ] Run crater with deny-by-default lint (measure rate of false positives).
    - [ ] Remove extra commit for deny-by-default lint
- [x] Create a PR to remove the old `question_mark_macro_sep` lint #62160
@ia0
Copy link
Contributor

ia0 commented Jul 20, 2019

To measure the ratio of false positives of #62008 we ran a crater with deny-by-default: https://crater-reports.s3.amazonaws.com/pr-62008/downloads.html. In this post, we go over regressed crates to see if the regression is expected (actual errors in the crate) or if it is due to false positives. I went over all regressed crates that have a non-transitive error which is not a "different Kleene operator" (since those are most probably an actual error and it removes 355 crates). From those 53 crates, 6 have false positives:

  • reg/cluColor/0.1.5 false positives
  • reg/dyplugin/0.2.0 false positives
  • reg/easy_ffi/0.1.0 false positives
  • reg/pmutil/0.3.0 false positives
  • reg/seed/0.3.7 false positives
  • reg/webrtc-sctp/0.0.0 false positives

The 47 remaining crates only have expected errors:

  • gh/Webd01/rust-macros expected
  • gh/aylei/leetcode-rust expected
  • gh/hopv/hoice expected
  • gh/innerand/eggspire expected
  • gh/rantingpirate/spinnrd expected
  • gh/rasviitanen/svgmacro expected
  • gh/adamAndMath/AlgebraicManipulatorRust expected
  • gh/kyleheadley/type-theory-by-trait expected
  • gh/westerhack/quest expected
  • reg/abomonation/0.7.2 expected
  • reg/acacia_net/0.1.0 expected
  • reg/adapton/0.3.30 expected
  • reg/astro/2.0.0 expected
  • reg/binjs_io/0.2.1 expected
  • reg/bvh_anim/0.4.0 expected
  • reg/crack/0.1.0 expected
  • reg/crossbeam-channel/0.3.8 expected
  • reg/dockerfile-rs/0.3.0 expected
  • reg/error-chain/0.12.1 expected
  • reg/fbxcel-dom/0.0.3 expected
  • reg/func_swap/0.1.0 expected
  • reg/fungi-lang/0.1.63 expected
  • reg/galvanic-assert/0.8.7 expected
  • reg/handlebars/2.0.0-beta.2 expected
  • reg/indexed_vec/1.1.1 expected
  • reg/ld_preload/0.1.2 expected
  • reg/liftoff/0.1.1 expected
  • reg/maths-traits/0.1.3 expected
  • reg/nom-operator/0.0.2 expected
  • reg/probor/0.3.1 expected
  • reg/proptest/0.9.4 expected
  • reg/protobuf_codec/0.2.7 expected
  • reg/quick-error/1.2.2 expected
  • reg/quick-error2/2.0.1 expected
  • reg/relm/0.16.0 expected
  • reg/relm-state/0.16.0 expected
  • reg/render-tree/0.1.1 expected
  • reg/rml_rtmp/0.3.0 expected
  • reg/svgmacro/0.2.2 expected
  • reg/tensor-macros/0.2.0 expected
  • reg/tfdeploy/0.0.10 expected
  • reg/typeparam/0.0.1 expected
  • reg/ubend/0.2.1 expected
  • reg/upt/0.1.3 expected
  • reg/v9/0.1.2 expected
  • reg/wood/0.4.2 expected
  • reg/xmlhelper/0.1.0 expected

False positives

We mostly have 3 types of false positives:

  • Sequences of rules in nested macro definitions. (This shouldn't be hard to implement and would fix most issues.)
  • A meta-variable is expected to be a dollar. (This is hard to implement. To help we could consider extending the macro system with $$ that gets expanded to $ such that users don't need to take a $dollar meta-variable as argument.)
  • Macro definition using parentheses instead of braces. (Easy to implement.)

cluColor-0.1.5

On this line, we don't see that $s is a binder, because we don't support sequences for rules of nested macro definitions.

dyplugin-0.2.0

On this line, we don't see that $func is a binder, because we don't support sequences for rules of nested macro definitions.

easy_ffi-0.1.0

On this line, we think $attr is free because we don't realize that the $dol meta-variable may be a dollar which would bind attr on line 117.

pmutil-0.3.0

On this line, we don't see that $tokens is a binder, because we don't support sequences for rules of nested macro definitions.

seed-0.3.7

On this line, we think $d is free because we don't realize that the argument to with_dollar_sign! is the body of a macro definition.
On this line, we don't see that $value is a binder, because we don't support sequences for rules of nested macro definitions.

webrtc-sctp-0.0.0

On this line, we don't see that $i is a binder, because the macro body uses parentheses instead of braces. If this is a valid way to define macro, we should update this line.

@ia0
Copy link
Contributor

ia0 commented Jul 20, 2019

How should we handle stringify!($x) where $x is free?

macro_rules! bar { () => { stringify!($x) }; }

This is not an error apparently (but the lint would report it). Note that this is specific to stringify! apparently. If I try to simulate this with another macro, I get an error.

macro_rules! foo { ($($x:tt)*) => {}; }
macro_rules! bar { () => { foo!($x) }; }

I'm currently marking those "errors" as expected when finding false positives, because I don't think this is really an issue of the lint, because the crates that have this issue seem to not rely on this deliberately.

@mark-i-m
Copy link
Member Author

@ia0 Thanks! Sorry for the long delay. Things have been busy...

stringify! and a few other macros are built-in macros -- they are special-cased in the compiler and are expanded in a special way. One thing we could do is just whitelist any problematic built-ins.

@mark-i-m
Copy link
Member Author

Also, thanks for all the work you have put into this!

@ia0
Copy link
Contributor

ia0 commented Aug 29, 2019

No problem :-)

And by the way, I presented this lint at the Rust meetup in Zürich today (to try to get some people using it), and a few interesting questions came up:

  1. Given that this lint inherently has false positives (and false negatives), why is it a lint in the compiler instead of clippy?

  2. What is the future for this lint? Is it planned to stay a lint?

  3. How does it work with proc-macro? I'll look into this. I think it works fine, I just want to check at which location the errors are reported.

Also, I would like to know the following:

  1. Should I implement support for sequences in body of nested macro? I don't think this is too complicated to implement and it would fix some false positives.

  2. Are there any other implementation follow-ups I should do in the coming months? Or should we just wait until we get some user feedback in the next year?

Thanks a lot!

@ia0
Copy link
Contributor

ia0 commented Aug 29, 2019

To respond to (3), I was wrong. It doesn't run when proc-macro outputs a macro definition.

# Cargo.toml
[package]
name = "proc_macro_experiments"
version = "0.1.0"
edition = "2018"

[lib]
proc-macro = true
// src/lib.rs
#![deny(meta_variable_misuse)]

extern crate proc_macro;

use proc_macro::TokenStream;

#[proc_macro]
pub fn make_foo(_: TokenStream) -> TokenStream {
    format!("macro_rules! foo {{ () => {{ $x }}; }}")
        .parse()
        .unwrap()
}
// examples/main.rs
#![feature(proc_macro_hygiene)]
#![deny(meta_variable_misuse)]

proc_macro_experiments::make_foo!();

fn main() {
    // foo!();
}

cargo build --examples=main only generates an error if the call to foo!() is not commented.

@mark-i-m
Copy link
Member Author

mark-i-m commented Feb 5, 2020

Sorry! It's been quite a while!

Given that this lint inherently has false positives (and false negatives), why is it a lint in the compiler instead of clippy?

At the time I proposed the lint, I didn't realize it would have false positives. Also, I don't know enough about clippy to know whether it would be able to implement these.

What is the future for this lint? Is it planned to stay a lint?

Personally, I think they should be enabled by default (at least some of them can be). However, there is a bit more discussion on #68836 ...

At the very least, I think this work has been really useful in giving us an understanding of some of the downsides of the current MBEs implementation and behavior.

@ia0
Copy link
Contributor

ia0 commented Feb 8, 2020

Sorry! It's been quite a while!

No problem :-) I wasn't waiting active follow-up. I guess this is a subject that may take quite some time to settle and depends on the future of MBE and proc-macro.

However, there is a bit more discussion on #68836 ...

I'll follow that. Thanks for the link!

At the very least, I think this work has been really useful in giving us an understanding of some of the downsides of the current MBEs implementation and behavior.

Yes, it was also very interesting to me to dig a bit into a small part of the Rust compiler :-)

@Mark-Simulacrum
Copy link
Member

Visiting this in T-lang backlog bonanza, we think the majority of the checks here are currently implemented, and it makes sense to file new issues for concrete additions (perhaps like #95943).

Closing this issue for now since it seems like this is no longer actively tracking something.

@Fishrock123
Copy link
Contributor

This issue is still referenced to by these docs: https://doc.rust-lang.org/rustc/lints/listing/allowed-by-default.html#meta-variable-misuse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) C-cleanup Category: PRs that clean code up or issues documenting cleanup. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. E-help-wanted Call for participation: Help is requested to fix this issue. E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants