Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check future-proofing of macro_rules! using FIRST sets. #1746

Closed
wants to merge 1 commit into from

Conversation

LeoTestard
Copy link
Contributor

No description provided.

@durka
Copy link
Contributor

durka commented Sep 11, 2016

Rendered.

* `NOW(m)` is the set of inputs that are now accepted by a matcher m
* `MAYBE(m)` is defined by: `forall sentence s, matcher m: s ∈ MAYBE(m) <=> s ∉ NOW(m) ⋀ s ∉ NEVER(m)`

Of course, the problem of deciding wether some input sequence may match some matcher in the future (that is, if, for a given matcher m, wether it belongs to NOW(m), MAYBE(m) or NEVER(m)) is virtually impossible. Instead, we use the concept of *FIRST sets* as an approximation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whether

@durka
Copy link
Contributor

durka commented Sep 11, 2016

Thanks for writing all this up and working through the very tricky reasoning!

My concerns are:

  • A 33% false positive rate is much too high. I hope that we can find some ways to reduce that, either by making the algorithm cleverer, adjusting the FIRST/FOLLOW sets, or adding capabilities to macros.
    • Would it help if we added one-or-none or specific-number matchers? Like x? and x{3} in regular expressions. That can help to nail down the number of tokens something will parse.
  • In the case of type ascription, question mark operator, etc, we've just accepted macro breakage because this algorithm wasn't in place and so non-future-proof macros were previously accepted. To go along with the language's goal of stability, if we do something like this I would like there to be a policy decision stipulating that if a syntax change un-future-proofs macros (i.e. adds something to a FIRST set that wasn't there before) then we won't do it.

errors will probably be hard to understand and to fix, we will first land it as a warning. We could then periodically use Crater to gather breakage statistics to see how fast people are adapting and decide when to turn it into an error (if we ever do, the other option being to wait for `macro_rules!` to be replaced by
something else).

An opt-out attribute, `#[unsafe_macro]` will also be added to ignore the future-proofing analysis on a specific macro.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsafe sounds wrong for this. How about #[ambiguous_macro]?

@LeoTestard
Copy link
Contributor Author

LeoTestard commented Sep 11, 2016

  • A 33% false positive rate is much too high. I hope that we can find some ways to reduce that, either by making the algorithm cleverer, adjusting the FIRST/FOLLOW sets, or adding capabilities to macros.

Of course it is. But the only way to make the algorithm cleverer in a signficant way is the thing with FOLLOW sets I describe in the unresolved questions. This requires careful thinking and possibly tweaking the way the macro parser works. If feel that we can't do this right now, but @nikomatsakis is the person to ask about this. What sounds like a good plan to me is landing this as an opt-in lint and add maybe this trick later. This way there won't be any unnecessary breakage.

The FIRST sets cannot be adjusted a lot, and it will have very little impact on the number of regressions (and none on the proportion of false positives).

  • Would it help if we added one-or-none or specific-number matchers? Like x? and x{3} in regular expressions. That can help to nail down the number of tokens something will parse.

No, it wouldn't. The problem is not that the number is too high, it's that we can't know it for every possible input. Plus, I feel like those would be pretty much niche cases.

To go along with the language's goal of stability, if we do something like this I would like there to be a policy decision stipulating that if a syntax change un-future-proofs macros (i.e. adds something to a FIRST set that wasn't there before) then we won't do it.

Of course, it's the goal of this RFCs. We're not supposed to add anything to the FIRST sets after they are accepted. :)

@nrc nrc added the T-lang Relevant to the language team, which will review and decide on the RFC. label Sep 11, 2016
@nikomatsakis
Copy link
Contributor

@LeoTestard

This requires careful thinking and possibly tweaking the way the macro parser works. If feel that we can't do this right now, but @nikomatsakis is the person to ask about this.

What did you mean by "If feel"? "I feel"? :) Presuming you meant "I feel", can you say a bit more about why you feel that way (not saying I disagree).

@strega-nil
Copy link

ping @LeoTestard @nikomatsakis

Status?

@LeoTestard
Copy link
Contributor Author

What did you mean by "If feel"? "I feel"? :) Presuming you meant "I feel", can you say a bit more about why you feel that way (not saying I disagree).

I'm not sure, that's why I said ‶feel″. :D In fact, it was mostly based on the fact that I did not understand 100% how the macro parser works and that it was the end of my internship and I did not have the time to think about all the corner cases that might happen (in particular, I was thinking about sequence-repetitions, for example there might be problems related to rust-lang/rust#33840). I'm sorry about that. Maybe I can now try to find the time on my spare time to think about it in more details, or at least try to sum up the current state of things so that someone else can continue.

@aturon
Copy link
Member

aturon commented Feb 1, 2017

@LeoTestard @pnkfelix What's the status of this RFC?

@nikomatsakis
Copy link
Contributor

So one thing that seems worth mentioning here -- I have been talking to @jseyfried and @nrc in the context of macros 2.0. In that setting, I definitely want to change the meaning of $t:expr to just meant "scoop up all tokens until an expr separator is found" -- it would just let you copy those tokens somewhere else (anywhere else, really). If you happen to paste them into a Rust expression context, it will be parsed as an expression then. The macro parser will never run the parser itself, in other words.

I would like to do the same for existing macro-rules, but I'm not sure of the impact.

@durka
Copy link
Contributor

durka commented Feb 1, 2017 via email

@jseyfried
Copy link

@durka

Yeah, we're planning on fixing this (i.e. allowing reparsing) in macros 2.0, at least in certain contexts.

More specifically, we don't want to treat $e exactly like its underlying tokens since we would like
e.g. macro m($e:expr) { 2 * $e } e!(1 + 1); to be 4 (not 3), but we want to move in that direction.

c.f. rust-lang/rust#26361

@nikomatsakis

I would like to do the same for existing macro-rules, but I'm not sure of the impact.

We'll be able to Crater and find out soon once I land some more proc-macros groundwork :)

@eddyb
Copy link
Member

eddyb commented Feb 1, 2017

Do we need to specify something like :expr if we require separators anyway?
Can't we have something closer to regular expressions?

@nikomatsakis
Copy link
Contributor

@eddyb

Can't we have something closer to regular expressions?

I have wondered the same thing. I still think it's useful to have expr and ty (ty in particular would enable using counting <...> as a "pseudo-token tree", which cannot be expressed by a regular expression) as shorthands for well-known terminators and so forth, but it does seem like they could be sugar for a more general matcher syntax. It might however be ok to not define that syntax initially. =)

@eddyb
Copy link
Member

eddyb commented Feb 12, 2017

@nikomatsakis What about allowing macros to further match on an expression's tokens? Right now you can't "peer into" a $x:expr because it becomes a single token that contains an Expr node, and in the future we would have a TokenStream but it would also be "packaged up" to avoid misparsing it.

Should we always wrap such matches in (...) so that you can e.g. explicitly write ($x:expr + $y:expr) to match on an addition expression, for example, and if you don't "open" the parens it's the same as today?

FWIW I'm fine with having :ty handle <...> and other ones handling types which may appear in them inside ::<...> and in special locations.... wait a second:

  • we've mentioned cover grammars in the past for parsing Ty | Expr
  • but only for the full parsing, not the reduced (shallow) parse rules
  • a < b > c doesn't parse in Rust, requires wrapping either comparison in parens
  • types never contain unwrapped expressions, e.g. [T; N] and maybe Array<T, {N+1}> in the future
  • therefore, < can be always parsed in "balanced mode" and running out of tokens first can be ignored
  • not sure of all the ways <<, >> and types can interact, though (e.g. we allow << to mean < <)

@eddyb
Copy link
Member

eddyb commented Feb 12, 2017

In an expression context, types can only show up after ::, : or as, right?
Still, that doesn't help a << b, c >> d which you may want to parse with $x, $y. Oh. Uhhh.

Even a < b, c > d can show up in a lot of places and we don't ban that.
I suppose this is why that hack @bstrie or @Kimundi and I came up with long ago didn't pan out, I guess.

@nikomatsakis
Copy link
Contributor

@eddyb

Should we always wrap such matches in (...) so that you can e.g. explicitly write ($x:expr + $y:expr) to match on an addition expression, for example, and if you don't "open" the parens it's the same as today?

I don't understand what you're proposing here. Can you elaborate an example? On first glance, it makes me quite nervous. I'm not sure how we would decide, given some stream of tokens like a + b + c, what $x and $y match here, at least not without parsing? (And I don't want to parse when doing macro matching, which should avoid all question of whether changing the grammar will change how macros work.)

@eddyb
Copy link
Member

eddyb commented Feb 13, 2017

Oh, nevermind, fragments are too greedy for $x:expr + ..., and (...) doesn't work for things like items anyway, so I can't use a more structured example. Probably a bad idea anyway.

@pnkfelix
Copy link
Member

pnkfelix commented Mar 6, 2017

@aturon @nikomatsakis the status of this RFC, my take (all my opinion, and I'm open to being convinced otherwise):

I don't think we're going to be able to reasonably change the semantics of existing macro_rules to reject the macros that violate the rules here.

if we deploy the checks defined here as an opt-in lint, then it won't pay for the implementation and support effort (because I expect relatively few developers to actually opt into using the lint).

what I would primarily aim to do is ensure that the design of macros 2.0 does not fall into the same pitfalls that we hit in macro_rules.

From the comments above, it seems like @jseyfried and @nrc are planning to ensure they are future proof by using the approach described by @nikomatsakis , which, if I understand correctly, effectively decouples the meaning of expr in the Rust grammar itself from the meaning of the expr fragment itself, and thus makes macros inherently future proof (at least with respect to changes to the expr non-terminal in the language.

Presumably we should ensure that all fragment specifiers are similarly decoupled from the grammar of the language?

@nikomatsakis
Copy link
Contributor

@pnkfelix

Presumably we should ensure that all fragment specifiers are similarly decoupled from the grammar of the language?

I think so, with the exception that $t:ty ought to at least count < and > and consider them to be "open-close" delimiters of a kind (so it's not as simple as "read token-trees until a follow-set entry is found" in that case).

In any case I believe I agree with your overall conclusions.

@eddyb
Copy link
Member

eddyb commented Mar 6, 2017

@nikomatsakis I think types in expressions have a similar problem, sadly.

@nikomatsakis
Copy link
Contributor

@rfcbot fcp close

Given @pnkfelix's comment, my feeling is that we ought to close this RFC, and instead just try to avoid these mistakes in Macros 2.0. We have already adopted the position (e.g. when we adopted the : type ascription operator) that we will allow ourselves to modify our expression grammar -- even though this can cause breakage in macros. In other words, that it is the macro author's responsibility to avoid complications that might arise by suitably bracketing their expressions and so forth. So far this hasn't really caused much difficulty, in part because we're not modifying our grammar all the time -- and we can certainly do warning periods and the like.

@rfcbot
Copy link
Collaborator

rfcbot commented Mar 6, 2017

Team member @nikomatsakis has proposed to close this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@nikomatsakis
Copy link
Contributor

@eddyb hmm, that's an interesting point. I hadn't considered that. =(

I'm not sure if affects the idea that we ought to close this RFC, but it's certainly a thorny complication.

@rfcbot
Copy link
Collaborator

rfcbot commented Apr 19, 2017

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Apr 19, 2017
@rfcbot
Copy link
Collaborator

rfcbot commented Apr 29, 2017

The final comment period is now complete.

@aturon
Copy link
Member

aturon commented May 1, 2017

The FCP has elapsed with no further commentary. I'm going to go ahead and close. Thanks @LeoTestard for your work here; hopefully we'll be in a better position with macros 2.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants