Check future-proofing of `macro_rules!` using FIRST sets. #1746

LeoTestard · 2016-09-10T20:31:47Z

No description provided.

durka · 2016-09-11T01:00:03Z

durka · 2016-09-11T01:00:15Z

text/0000-first-sets-future-proofing.md

+* `NOW(m)` is the set of inputs that are now accepted by a matcher m
+* `MAYBE(m)` is defined by: `forall sentence s, matcher m: s ∈ MAYBE(m) <=> s ∉ NOW(m) ⋀ s ∉ NEVER(m)`
+
+Of course, the problem of deciding wether some input sequence may match some matcher in the future (that is, if, for a given matcher m, wether it belongs to NOW(m), MAYBE(m) or NEVER(m)) is virtually impossible. Instead, we use the concept of *FIRST sets* as an approximation.


durka · 2016-09-11T01:04:55Z

Thanks for writing all this up and working through the very tricky reasoning!

My concerns are:

A 33% false positive rate is much too high. I hope that we can find some ways to reduce that, either by making the algorithm cleverer, adjusting the FIRST/FOLLOW sets, or adding capabilities to macros.
- Would it help if we added one-or-none or specific-number matchers? Like x? and x{3} in regular expressions. That can help to nail down the number of tokens something will parse.
In the case of type ascription, question mark operator, etc, we've just accepted macro breakage because this algorithm wasn't in place and so non-future-proof macros were previously accepted. To go along with the language's goal of stability, if we do something like this I would like there to be a policy decision stipulating that if a syntax change un-future-proofs macros (i.e. adds something to a FIRST set that wasn't there before) then we won't do it.

durka · 2016-09-11T01:05:25Z

text/0000-first-sets-future-proofing.md

+errors will probably be hard to understand and to fix, we will first land it as a warning. We could then periodically use Crater to gather breakage statistics to see how fast people are adapting and decide when to turn it into an error (if we ever do, the other option being to wait for `macro_rules!` to be replaced by
+something else).
+
+An opt-out attribute, `#[unsafe_macro]` will also be added to ignore the future-proofing analysis on a specific macro.


unsafe sounds wrong for this. How about #[ambiguous_macro]?

LeoTestard · 2016-09-11T18:20:41Z

A 33% false positive rate is much too high. I hope that we can find some ways to reduce that, either by making the algorithm cleverer, adjusting the FIRST/FOLLOW sets, or adding capabilities to macros.

Of course it is. But the only way to make the algorithm cleverer in a signficant way is the thing with FOLLOW sets I describe in the unresolved questions. This requires careful thinking and possibly tweaking the way the macro parser works. If feel that we can't do this right now, but @nikomatsakis is the person to ask about this. What sounds like a good plan to me is landing this as an opt-in lint and add maybe this trick later. This way there won't be any unnecessary breakage.

The FIRST sets cannot be adjusted a lot, and it will have very little impact on the number of regressions (and none on the proportion of false positives).

Would it help if we added one-or-none or specific-number matchers? Like x? and x{3} in regular expressions. That can help to nail down the number of tokens something will parse.

No, it wouldn't. The problem is not that the number is too high, it's that we can't know it for every possible input. Plus, I feel like those would be pretty much niche cases.

To go along with the language's goal of stability, if we do something like this I would like there to be a policy decision stipulating that if a syntax change un-future-proofs macros (i.e. adds something to a FIRST set that wasn't there before) then we won't do it.

Of course, it's the goal of this RFCs. We're not supposed to add anything to the FIRST sets after they are accepted. :)

nikomatsakis · 2016-09-22T21:16:38Z

@LeoTestard

This requires careful thinking and possibly tweaking the way the macro parser works. If feel that we can't do this right now, but @nikomatsakis is the person to ask about this.

What did you mean by "If feel"? "I feel"? :) Presuming you meant "I feel", can you say a bit more about why you feel that way (not saying I disagree).

strega-nil · 2016-12-21T02:09:49Z

ping @LeoTestard @nikomatsakis

Status?

LeoTestard · 2016-12-21T23:25:46Z

What did you mean by "If feel"? "I feel"? :) Presuming you meant "I feel", can you say a bit more about why you feel that way (not saying I disagree).

I'm not sure, that's why I said ‶feel″. :D In fact, it was mostly based on the fact that I did not understand 100% how the macro parser works and that it was the end of my internship and I did not have the time to think about all the corner cases that might happen (in particular, I was thinking about sequence-repetitions, for example there might be problems related to rust-lang/rust#33840). I'm sorry about that. Maybe I can now try to find the time on my spare time to think about it in more details, or at least try to sum up the current state of things so that someone else can continue.

aturon · 2017-02-01T00:16:43Z

@LeoTestard @pnkfelix What's the status of this RFC?

nikomatsakis · 2017-02-01T21:55:18Z

So one thing that seems worth mentioning here -- I have been talking to @jseyfried and @nrc in the context of macros 2.0. In that setting, I definitely want to change the meaning of $t:expr to just meant "scoop up all tokens until an expr separator is found" -- it would just let you copy those tokens somewhere else (anywhere else, really). If you happen to paste them into a Rust expression context, it will be parsed as an expression then. The macro parser will never run the parser itself, in other words.

I would like to do the same for existing macro-rules, but I'm not sure of the impact.

durka · 2017-02-01T21:59:13Z

Aha, would that solve the problem I just answered here[1], where if you match something as :expr and then pass it onto another macro, it can't be reparsed? [1]: https://www.reddit.com/r/rust/comments/5rg5eu/help_on_a_rust_macro/dd7dpta/

…

On Wed, Feb 1, 2017 at 4:55 PM, Niko Matsakis ***@***.***> wrote: So one thing that seems worth mentioning here -- I have been talking to @jseyfried <https://github.com/jseyfried> and @nrc <https://github.com/nrc> in the context of macros 2.0. In that setting, I *definitely* want to change the meaning of $t:expr to just meant "scoop up all tokens until an expr separator is found" -- it would just let you copy those tokens somewhere else (anywhere else, really). If you happen to paste them into a Rust expression context, it will be parsed as an expression *then*. The macro parser will never run the parser itself, in other words. I *would like* to do the same for existing macro-rules, but I'm not sure of the impact. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1746 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAC3n1sdwVqBSa-lSk11fXXJhv9w7GJiks5rYP9KgaJpZM4J50dE> .

jseyfried · 2017-02-01T22:34:42Z

@durka

Yeah, we're planning on fixing this (i.e. allowing reparsing) in macros 2.0, at least in certain contexts.

More specifically, we don't want to treat $e exactly like its underlying tokens since we would like
e.g. macro m($e:expr) { 2 * $e } e!(1 + 1); to be 4 (not 3), but we want to move in that direction.

c.f. rust-lang/rust#26361

@nikomatsakis

I would like to do the same for existing macro-rules, but I'm not sure of the impact.

We'll be able to Crater and find out soon once I land some more proc-macros groundwork :)

eddyb · 2017-02-01T22:39:31Z

Do we need to specify something like :expr if we require separators anyway?
Can't we have something closer to regular expressions?

nikomatsakis · 2017-02-11T06:19:17Z

@eddyb

Can't we have something closer to regular expressions?

I have wondered the same thing. I still think it's useful to have expr and ty (ty in particular would enable using counting <...> as a "pseudo-token tree", which cannot be expressed by a regular expression) as shorthands for well-known terminators and so forth, but it does seem like they could be sugar for a more general matcher syntax. It might however be ok to not define that syntax initially. =)

eddyb · 2017-02-12T00:07:42Z

@nikomatsakis What about allowing macros to further match on an expression's tokens? Right now you can't "peer into" a $x:expr because it becomes a single token that contains an Expr node, and in the future we would have a TokenStream but it would also be "packaged up" to avoid misparsing it.

Should we always wrap such matches in (...) so that you can e.g. explicitly write ($x:expr + $y:expr) to match on an addition expression, for example, and if you don't "open" the parens it's the same as today?

FWIW I'm fine with having :ty handle <...> and other ones handling types which may appear in them inside ::<...> and in special locations.... wait a second:

we've mentioned cover grammars in the past for parsing Ty | Expr
but only for the full parsing, not the reduced (shallow) parse rules
a < b > c doesn't parse in Rust, requires wrapping either comparison in parens
types never contain unwrapped expressions, e.g. [T; N] and maybe Array<T, {N+1}> in the future
therefore, < can be always parsed in "balanced mode" and running out of tokens first can be ignored
not sure of all the ways <<, >> and types can interact, though (e.g. we allow << to mean < <)

eddyb · 2017-02-12T09:22:38Z

In an expression context, types can only show up after ::, : or as, right?
Still, that doesn't help a << b, c >> d which you may want to parse with $x, $y. Oh. Uhhh.

Even a < b, c > d can show up in a lot of places and we don't ban that.
I suppose this is why that hack @bstrie or @Kimundi and I came up with long ago didn't pan out, I guess.

nikomatsakis · 2017-02-13T20:45:52Z

@eddyb

Should we always wrap such matches in (...) so that you can e.g. explicitly write ($x:expr + $y:expr) to match on an addition expression, for example, and if you don't "open" the parens it's the same as today?

I don't understand what you're proposing here. Can you elaborate an example? On first glance, it makes me quite nervous. I'm not sure how we would decide, given some stream of tokens like a + b + c, what $x and $y match here, at least not without parsing? (And I don't want to parse when doing macro matching, which should avoid all question of whether changing the grammar will change how macros work.)

eddyb · 2017-02-13T21:06:57Z

Oh, nevermind, fragments are too greedy for $x:expr + ..., and (...) doesn't work for things like items anyway, so I can't use a more structured example. Probably a bad idea anyway.

pnkfelix · 2017-03-06T16:54:18Z

@aturon @nikomatsakis the status of this RFC, my take (all my opinion, and I'm open to being convinced otherwise):

I don't think we're going to be able to reasonably change the semantics of existing macro_rules to reject the macros that violate the rules here.

if we deploy the checks defined here as an opt-in lint, then it won't pay for the implementation and support effort (because I expect relatively few developers to actually opt into using the lint).

what I would primarily aim to do is ensure that the design of macros 2.0 does not fall into the same pitfalls that we hit in macro_rules.

From the comments above, it seems like @jseyfried and @nrc are planning to ensure they are future proof by using the approach described by @nikomatsakis , which, if I understand correctly, effectively decouples the meaning of expr in the Rust grammar itself from the meaning of the expr fragment itself, and thus makes macros inherently future proof (at least with respect to changes to the expr non-terminal in the language.

Presumably we should ensure that all fragment specifiers are similarly decoupled from the grammar of the language?

nikomatsakis · 2017-03-06T22:15:11Z

@pnkfelix

Presumably we should ensure that all fragment specifiers are similarly decoupled from the grammar of the language?

I think so, with the exception that $t:ty ought to at least count < and > and consider them to be "open-close" delimiters of a kind (so it's not as simple as "read token-trees until a follow-set entry is found" in that case).

In any case I believe I agree with your overall conclusions.

eddyb · 2017-03-06T22:16:11Z

@nikomatsakis I think types in expressions have a similar problem, sadly.

nikomatsakis · 2017-03-06T22:16:58Z

@rfcbot fcp close

Given @pnkfelix's comment, my feeling is that we ought to close this RFC, and instead just try to avoid these mistakes in Macros 2.0. We have already adopted the position (e.g. when we adopted the : type ascription operator) that we will allow ourselves to modify our expression grammar -- even though this can cause breakage in macros. In other words, that it is the macro author's responsibility to avoid complications that might arise by suitably bracketing their expressions and so forth. So far this hasn't really caused much difficulty, in part because we're not modifying our grammar all the time -- and we can certainly do warning periods and the like.

rfcbot · 2017-03-06T22:16:58Z

Team member @nikomatsakis has proposed to close this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

nikomatsakis · 2017-03-06T22:18:08Z

@eddyb hmm, that's an interesting point. I hadn't considered that. =(

I'm not sure if affects the idea that we ought to close this RFC, but it's certainly a thorny complication.

rfcbot · 2017-04-19T09:24:45Z

🔔 This is now entering its final comment period, as per the review above. 🔔

rfcbot · 2017-04-29T10:05:00Z

The final comment period is now complete.

aturon · 2017-05-01T22:20:29Z

The FCP has elapsed with no further commentary. I'm going to go ahead and close. Thanks @LeoTestard for your work here; hopefully we'll be in a better position with macros 2.0.

Check future-proofing of macro_rules! using FIRST sets.

f1c2eaa

durka reviewed Sep 11, 2016
View reviewed changes

nrc added the T-lang Relevant to the language team, which will review and decide on the RFC. label Sep 11, 2016

aturon assigned pnkfelix Sep 29, 2016

nikomatsakis mentioned this pull request Jan 13, 2017

Macros by example 2.0 (macro!) #1584

Merged

rfcbot added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Apr 19, 2017

aturon closed this May 1, 2017

kennytm mentioned this pull request Jun 29, 2017

Only match a fragment specifier if it starts with certain tokens. rust-lang/rust#42913

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check future-proofing of `macro_rules!` using FIRST sets. #1746

Check future-proofing of `macro_rules!` using FIRST sets. #1746

LeoTestard commented Sep 10, 2016

durka commented Sep 11, 2016

durka Sep 11, 2016

durka commented Sep 11, 2016 •

edited

Loading

durka Sep 11, 2016

LeoTestard commented Sep 11, 2016 •

edited

Loading

nikomatsakis commented Sep 22, 2016

strega-nil commented Dec 21, 2016

LeoTestard commented Dec 21, 2016

aturon commented Feb 1, 2017

nikomatsakis commented Feb 1, 2017

durka commented Feb 1, 2017 via email

jseyfried commented Feb 1, 2017

eddyb commented Feb 1, 2017

nikomatsakis commented Feb 11, 2017

eddyb commented Feb 12, 2017 •

edited

Loading

eddyb commented Feb 12, 2017

nikomatsakis commented Feb 13, 2017

eddyb commented Feb 13, 2017

pnkfelix commented Mar 6, 2017 •

edited

Loading

nikomatsakis commented Mar 6, 2017

eddyb commented Mar 6, 2017

nikomatsakis commented Mar 6, 2017

rfcbot commented Mar 6, 2017 •

edited by withoutboats

Loading

nikomatsakis commented Mar 6, 2017

rfcbot commented Apr 19, 2017

rfcbot commented Apr 29, 2017

aturon commented May 1, 2017

Check future-proofing of macro_rules! using FIRST sets. #1746

Check future-proofing of macro_rules! using FIRST sets. #1746

Conversation

LeoTestard commented Sep 10, 2016

durka commented Sep 11, 2016

durka Sep 11, 2016

Choose a reason for hiding this comment

durka commented Sep 11, 2016 • edited Loading

durka Sep 11, 2016

Choose a reason for hiding this comment

LeoTestard commented Sep 11, 2016 • edited Loading

nikomatsakis commented Sep 22, 2016

strega-nil commented Dec 21, 2016

LeoTestard commented Dec 21, 2016

aturon commented Feb 1, 2017

nikomatsakis commented Feb 1, 2017

durka commented Feb 1, 2017 via email

jseyfried commented Feb 1, 2017

eddyb commented Feb 1, 2017

nikomatsakis commented Feb 11, 2017

eddyb commented Feb 12, 2017 • edited Loading

eddyb commented Feb 12, 2017

nikomatsakis commented Feb 13, 2017

eddyb commented Feb 13, 2017

pnkfelix commented Mar 6, 2017 • edited Loading

nikomatsakis commented Mar 6, 2017

eddyb commented Mar 6, 2017

nikomatsakis commented Mar 6, 2017

rfcbot commented Mar 6, 2017 • edited by withoutboats Loading

nikomatsakis commented Mar 6, 2017

rfcbot commented Apr 19, 2017

rfcbot commented Apr 29, 2017

aturon commented May 1, 2017

Check future-proofing of `macro_rules!` using FIRST sets. #1746

Check future-proofing of `macro_rules!` using FIRST sets. #1746

durka commented Sep 11, 2016 •

edited

Loading

LeoTestard commented Sep 11, 2016 •

edited

Loading

eddyb commented Feb 12, 2017 •

edited

Loading

pnkfelix commented Mar 6, 2017 •

edited

Loading

rfcbot commented Mar 6, 2017 •

edited by withoutboats

Loading