MBE (macro_rules!) pattern-matching is unnecessarily O(n) even in simple cases. #68836

eddyb · 2020-02-05T00:01:02Z

There are some crates which do something like this (e.g. string_cache_codegen):

macro_rules! foo {
    ("a") => (A);
    ("b") => (B);
    ("c") => (C);
    // ... etc. (maybe hundreds more)
}

(@nnethercote came across this when profiling html5ever's compilation)

Also, prefixing patterns with @foo to create "internal rules" is common.

But the current implementation (AFAICT) has to check each of the patterns in turn.

Instead, we could build something like a decision tree ahead of time, from the constant (i.e. $-less) prefix of all arms, using maps for operators/idents/literals, to make the pattern-matching of the prefix amortize to being linear in the number of input tokens.

We could keep it simple by limiting the constant prefix to leaf tokens (no (...), [...] or {...}).

For the example above, we'd pre-compute something like this:

DecisionTree::Match {
    op: {},
    ident: {},
    literal: {
        "a": DecisionTree::Done([0]),
        "b": DecisionTree::Done([1]),
        "c": DecisionTree::Done([2]),
        // ...
    },
}

Whereas for something using @foo rules it could be:

DecisionTree::Match {
    op: {
        '@': DecisionTree::Match {
            op: {},
            ident: {
                "foo": DecisionTree::Done([0, 1]),
                "bar": DecisionTree::Done([2, 3, 4]),
                "baz": DecisionTree::Done([5]),
            },
            literal: {},
        }
    },
    ident: {},
    literal: {},
}

(where DecisionTree::Done indicates the arms to continue with)

cc @rust-lang/compiler (this might need a design meeting?)

The text was updated successfully, but these errors were encountered:

nnethercote · 2020-02-05T00:02:55Z

In the html5ever case, the macro has over 1200 patterns.

Mark-Simulacrum · 2020-02-05T00:08:01Z

I imagine the "proper" way to do this today is a proc macro, right? Where you can appropriately use a HashMap, for example, for such a macro?

I'm not sure we should dedicate time to fixing this if proc macros do indeed make things much better here (and are 1-1 compatible, hygiene wise, in the relative short term with MBE hygiene stabilization).

petrochenkov · 2020-02-05T06:01:29Z

cc @mark-i-m

mark-i-m · 2020-02-05T14:46:41Z

I'm personally not in favor of complicating the mbe implementation more. It's already quite hard to understand and reason about, and it has a lot of weird corner cases (i.e. allowing surprising behavior) despite our best efforts. If more time is spent on the mbe implementation, I would say it should be to redo the algorithm from scratch and replace it in the next edition or as after a crater run.

mark-i-m · 2020-02-05T15:43:09Z

To be more concrete, I think we should focus on decl_macros, making them less expressive as needed to simplify the implementation (more advanced use cases should use proc-macros). Basically, I think we should be able to fully implement the checks from #61053 without any hacks or heuristics; the checks should be sound, complete, and as early as possible.

eddyb · 2020-02-05T19:54:00Z

I think that we can get a lot of mileage out of having a "compilation step" that both checks the definition and transforms it into something more amenable to simple and fast pattern-matching.

I agree that we should probably not bolt on such optimizations to the existing implementation, and instead try to include them as part of a simpler implementation.

mark-i-m · 2020-02-05T20:19:19Z

@eddyb I agree with you in principle, but I guess I can't see how it can be done.

The hard part is that the current implementation (and thus the stable interface) allows all sorts of crazy things that makes it impossible to check almost anything very conclusively, as we discovered in #61053. For example, it is impossible to determine whether a macro_rules defines a new macro_rules without seeing the use site. This in turn means that we cannot have a complete check for whether all metavariables are well-defined at the definition site (since they may be defined at the call-site by a macro generated there).

In my opinion, we need to first RFC major restrictions on MBEs to make these checks possible to implement at all. In particular:

macros should not generate other macros (if you want to do that, use a proc-macro). This allows us to ensure that $var in the body of a macro arm necessarily is a reference to a metavar binding from the macro arm.
all the lints from MBE: more checks at definition time #61053 become hard errors.
the body of a macro arm must be valid rust syntax (i.e. you can't emit weird token streams for consumption by another macro).

There are a few other improvements that were identified in #61053 too, which would be good to implement...

Also, cc @ia0 who was involved in implementing those lints...

eddyb · 2020-02-06T05:49:53Z

the body of a macro arm must be valid rust syntax (i.e. you can't emit weird token streams for consumption by another macro).

How can you possibly enforce this when macros can take and interpolate arbitrary token streams?!

I thought you were worried about things like ambiguities or compatibility hazards in the patterns of a macro, not anything to do with what the macro expands to.

With what I've seen in the wild, I don't see how what you're describing is realistically possible, even with an edition (and an edition would still require supporting the older behavior).

mark-i-m · 2020-02-06T15:34:11Z

Admittedly, that one seems like a bit of a stretch, but It seems plausible if we say that you can only use a tt metavariable inside a call to a macro. The output of the macro has to be valid rust at some point anyway. Perhaps I'm forgetting something, though?

I thought you were worried about things like ambiguities or compatibility hazards in the patterns of a macro, not anything to do with what the macro expands to.

Most of my concerns are about the patterns, but the fact that a macro can expand to arbitrary things restricts what we can assume about it when checking usage of metavariables.

eddyb · 2020-02-06T15:57:58Z

Perhaps I'm forgetting something, though?

Yeah, $($foo)* to interpolate arbitrary Rust syntax.

checking usage of metavariables

Why do we now care about usage? This is the first time I hear about this, compared to, say, compatibility hazards in terms of an earlier arm succeeding in newer versions.

mark-i-m · 2020-02-06T20:16:26Z

Yeah, $($foo)* to interpolate arbitrary Rust syntax.

@eddyb I should maybe clarify that when I say "valid rust", I mean syntactically, not semantically (e.g. there may still be type errors etc). Perhaps you can elaborate what you mean? How can this produce arbitrary rust syntax if $foo not a tt (or else is only used in a macro invocation)? We know what kind of token tree it is otherwise, so it is seems it should not be hard to check that the interpolated fragment at least syntactically fits there.

Why do we now care about usage?

I feel that current macro error messages are... not great. And I think this is largely due to the fact that it is very hard to check the body of a macro at definition time. Rather, we have to wait for it to be instantiated so we can see what the whole thing expands too.

eddyb · 2020-02-06T21:36:56Z

I did mean $foo to be a tt, yes. For me that makes the whole thing fall apart.

I feel that current macro error messages are... not great. And I think this is largely due to the fact that it is very hard to check the body of a macro at definition time. Rather, we have to wait for it to be instantiated so we can see what the whole thing expands too.

This is getting pretty offtopic but I strongly disagree.
In general it's impossible to check the definition site, so I would focus on macro-aware tooling, including diagnostics, operating after the fact. Something direly missing right now is step-by-step expansion visualization with best-effort pretty-printing.

Recent research into resugaring might also be relevant.

nnethercote · 2022-09-28T05:46:56Z

I think this can be closed. I did a lot of work earlier in the year speeding up declarative macro expansion and this pattern (many rules and no metavariables) barely came up at all, because it's so rare. Other than html5ever I've never seen it used in real code.

Noratrieb · 2022-10-23T12:54:36Z

Before #102895, rustc used to use this pattern as well for the query system, but now it doesn't anymore.

nnethercote mentioned this issue Feb 5, 2020

Hasten macro parsing #68848

Closed

mark-i-m mentioned this issue Feb 5, 2020

MBE: more checks at definition time #61053

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MBE (macro_rules!) pattern-matching is unnecessarily O(n) even in simple cases. #68836

MBE (macro_rules!) pattern-matching is unnecessarily O(n) even in simple cases. #68836

eddyb commented Feb 5, 2020

nnethercote commented Feb 5, 2020

Mark-Simulacrum commented Feb 5, 2020

petrochenkov commented Feb 5, 2020

mark-i-m commented Feb 5, 2020

mark-i-m commented Feb 5, 2020 •

edited

Loading

eddyb commented Feb 5, 2020

mark-i-m commented Feb 5, 2020

eddyb commented Feb 6, 2020

mark-i-m commented Feb 6, 2020

eddyb commented Feb 6, 2020

mark-i-m commented Feb 6, 2020

eddyb commented Feb 6, 2020

nnethercote commented Sep 28, 2022

Noratrieb commented Oct 23, 2022

MBE (macro_rules!) pattern-matching is unnecessarily O(n) even in simple cases. #68836

MBE (macro_rules!) pattern-matching is unnecessarily O(n) even in simple cases. #68836

Comments

eddyb commented Feb 5, 2020

nnethercote commented Feb 5, 2020

Mark-Simulacrum commented Feb 5, 2020

petrochenkov commented Feb 5, 2020

mark-i-m commented Feb 5, 2020

mark-i-m commented Feb 5, 2020 • edited Loading

eddyb commented Feb 5, 2020

mark-i-m commented Feb 5, 2020

eddyb commented Feb 6, 2020

mark-i-m commented Feb 6, 2020

eddyb commented Feb 6, 2020

mark-i-m commented Feb 6, 2020

eddyb commented Feb 6, 2020

nnethercote commented Sep 28, 2022

Noratrieb commented Oct 23, 2022

mark-i-m commented Feb 5, 2020 •

edited

Loading