Amend macro future proofing RFC #733

Closed
wants to merge 4 commits into
from

Projects

None yet
@vadimcn
Contributor
vadimcn commented Jan 24, 2015

Proposed changes:

  • Add Semicolon to FOLLOW(ty).
  • Modify algorithm to allow repetitions after NTs.

(rendered)

@vadimcn vadimcn referenced this pull request in rust-lang/rust Jan 24, 2015
Closed

Follow token checking is too restrictive. #21545

@cmr
cmr commented on 2356418 Jan 28, 2015

+1, obvious oversight

@nikomatsakis nikomatsakis was assigned by nrc Jan 29, 2015
@nikomatsakis
Contributor

It seems like the RFC is incomplete: FIRST needs to be spelled out.

@vadimcn
Contributor
vadimcn commented Jan 30, 2015

@cmr, I would like to clarify something:
The RFC does not mention handling of TtDelimited token trees. Are they supposed to be implicitly flattened into the sequence of constituent tokens? If so, shouldn't the last line of this section say "(Note that open and close delimiters are valid following any NT.)" ?

@vadimcn
Contributor
vadimcn commented Jan 30, 2015

Okay, updated my PR.

@nikomatsakis
Contributor

@vadimcn thanks. I'll try to read it and give it some thought. In the meantime, though, @pnkfelix is going to shepherd this PR.

@cmr
Contributor
cmr commented Jan 30, 2015

@vadimcn Yes, your understanding is correct.

@mzabaluev
Contributor

Could this help solve #773?

@pnkfelix pnkfelix and 1 other commented on an outdated diff Feb 16, 2015
text/0550-macro-future-proofing.md
-3. Else, `T` is a complex NT.
- 1. If `T` has the form `$(...)+` or `$(...)*`, run the algorithm on the
- contents with `F` set to the token following `T`. If it accepts,
- continue, else, reject.
- 2. If `T` has the form `$(...)U+` or `$(...)U*` for some token `U`, run
- the algorithm on the contents with `F` set to `U`. If it accepts,
- check that the last token in the sequence can be followed by `F`. If
- so, accept. Otherwise, reject.
+`CHECK(M, F):` *input*: a sequence of tokens `M` and a set of tokens `F`; *output*: whether M is valid.
+ 1. If `M` is empty, accept.
+ 2. Set `t = HEAD(M)`
+ 3. If `t` is not an NT, skip to 7.
+ 4. Find `S`, the set of possible successors of `t`:
+ 1. If `TAIL(M)` is empty, set `S = F`,
+ 2. Else, `S = LEADERS(TAIL(M))`.
+ 5. If `t` is a simple NT, check that `S` is a subset of `FOLLOW(t)`.
@pnkfelix
pnkfelix Feb 16, 2015 Member

what is the distinction between simple and complex non-terminal? I do not see that in the "Terminology" section.

Update: Sorry, I see the definition for "simple", up above.

@vadimcn
vadimcn Feb 16, 2015 Contributor

I just used the original @cmr's definitions of 'simple' and complex'. I'll add a clarification as you asked above.

@pnkfelix pnkfelix and 2 others commented on an outdated diff Feb 16, 2015
text/0550-macro-future-proofing.md
+ 1. If `M` is empty, accept.
+ 2. Set `t = HEAD(M)`
+ 3. If `t` is not an NT, skip to 7.
+ 4. Find `S`, the set of possible successors of `t`:
+ 1. If `TAIL(M)` is empty, set `S = F`,
+ 2. Else, `S = LEADERS(TAIL(M))`.
+ 5. If `t` is a simple NT, check that `S` is a subset of `FOLLOW(t)`.
+ If so, skip to 7, else, reject.
+ 6. Else, `t` is a complex NT.
+ 1. If `t` has the form `$(Q)+` or `$(Q)*`, run `CHECK(Q, S)`.
+ If it accepts, skip to 7, else, reject.
+ 2. If `t` has the form `$(Q)u+` or `$(Q)u*` for some token `u`,
+ run `CHECK(Q, S + {u})` If it accepts, skip to 7, else, reject.
+ 7. Set `M = TAIL(M)`, goto 1.
+
+`LEADERS(S):` Returns the set of all possible tokens that may begin input sequence matched by `S`.
@pnkfelix
pnkfelix Feb 16, 2015 Member

Is this the definition for LEADERS ?

@nikomatsakis has pointed out to me that that it seems like there is detail missing ... for example what is LEADERS($foo:expr) ?


Update: In particular, it sounds like expr is a simple NT? But shouldn't LEADERS($foo:expr) be a non-trivial set of tokens, and not just ... what, { $foo : expr } ?

I'll try to take some time either today or tomorrow and find a definition of FIRST as given in the Dragon book or elsewhere to attempt to help fix this.

@nikomatsakis
nikomatsakis Feb 16, 2015 Contributor

(I'd also prefer to call this set FIRST unless we think it is a distinct entity?)

@vadimcn
vadimcn Feb 16, 2015 Contributor

Is this the definition for LEADERS ?

You mean the line above this comment? No, that's just a one-line summary. The detailed definition follows.

Update: In particular, it sounds like expr is a simple NT? But shouldn't LEADERS($foo:expr) be a non-trivial set of tokens, and not just ... what, { $foo : expr } ?

If I am not mistaken,$foo:expr gets parsed as a single token, MatchNt(Ident("foo"), Ident("expr"), ..), so LEADERS($foo:expr) would just return that token.
Since there are no NT's in any of the FOLLOW sets, the distinction between MatchNT and other tokens is irrelevant, IMO. Although... we might consider making exception for block, because they always start with a {.

(I'd also prefer to call this set FIRST unless we think it is a distinct entity?)

I prefer 'LEADERS', because 'FIRST' has no plural form, and I wanted to emphasize that it's a set, not a single token. But, sure, I can change that to FIRST.

@pnkfelix pnkfelix commented on an outdated diff Feb 16, 2015
text/0550-macro-future-proofing.md
@@ -74,31 +74,36 @@ The algorithm for recognizing valid matchers `M` follows. Note that a matcher
is merely a token tree. A "simple NT" is an NT without repetitions. That is,
`$foo:ty` is a simple NT but `$($foo:ty)+` is not. `FOLLOW(NT)` is the set of
allowed tokens for the given NT's fragment specifier, and is defined below.
@pnkfelix
pnkfelix Feb 16, 2015 Member

Add a sentence "A 'complex NT' is an NT that is not simple." (This way someone who does a search for "complex" in your document will find that definition.

@cmr
Contributor
cmr commented Feb 20, 2015

Looks like some rebasings have disappeared some of the comments I previously made. Will re-review, but @pnkfelix's and @nikomatsakis's comments are on-point.

@kmcallister
Contributor

Did not read yet, but +1 for allowing repetition after a metavariable. The inability to do

$x:tt $($xs:tt)*

is a serious limitation with fancy recursive macros. And it should be completely un-problematic in the case of tt.

@vadimcn vadimcn Defined 'Complex NT'.
Renamed `LEADERS` to `FIRST`
8e5b953
@vadimcn
Contributor
vadimcn commented Feb 26, 2015
@aturon
Contributor
aturon commented Mar 5, 2015

ping @pnkfelix

@kennytm
kennytm commented Mar 5, 2015

While this is still open, could we also add | and if to FOLLOW(pat)? (cc rust-lang/rust#20843)

@pnkfelix
Member

Okay, so I finally got a chance to go through this in some detail.

(I had not really read this RFC carefully enough in the past, so I got caught up in relatively minor issues in its presentation that are not an issue with this PR in particular; hopefully I will have a chance in the near future, while this is fresh in my mind, to try to put up a PR with some revisions to the text to help clarify it.)

  • I think it would be good to add some examples of the CHECK algorithm's behavior on various inputs. I know the original RFC did not have this, but I think it should have, and there's no reason we cannot correct that now. I will help with this (i.e. I'll put up some text somewhere for @vadimcn to incorporate.)
  • I think there is a bug in this additions being made here, in a corner-case for the FIRST function. The standard definition for FIRST returns a set made up of terminals plus (potentially) a special "epsilon" marker signifying that the input sequence may rewrite to the empty string. That can arise for us here (despite effort in this definition to try to avoid it), and needs to be handled. I'm about to go have dinner so I cannot put in a formal example, but the kind of scenario I am worried about is something like $($($(i:expr)),* $($(j:expr)),*) something. I will write up more later.
@Stebalien
Contributor

While this is still open, could we also add | and if to FOLLOW(pat)? (cc rust-lang/rust#20843)

Also in for parsing for loop syntax (for $p:pat in iter).

@nrc nrc added the T-lang label May 28, 2015
@huonw
Member
huonw commented Jul 31, 2015

cc #1209

@pnkfelix
Member

@vadimcn have you given any thought to the feedback I wrote above?

(I know I promised to write up more later, so I cannot claim that I am not without blame w.r.t. the radio silence here.)

Anyway, let me know if you think you'll be able to address the issues above. If you won't have the time, I can try to take on this RFC myself (i.e. try to figure out what are the edits that I want, and put my my own PR with them...)

@vadimcn
Contributor
vadimcn commented Aug 1, 2015

@pnkfelix: sorry this slipped through the cracks of my attention. I'll take another look tonight.

@vadimcn
Contributor
vadimcn commented Aug 3, 2015

@pnkfelix: indeed, NT matching an empty string seems to be a problem. I've updated the PR to address this.

@nikomatsakis
Contributor

What's the status here?

@pnkfelix
Member
pnkfelix commented Jan 7, 2016

So I managed to spend ages working on the more invasive macro future-proofing amendment #1384 but I failed to incorporate the suggestions from this RFC in that effort. Sorry about that, I don't know how I overlooked it.

Update 1: Oh wait, no, I did put Semi into FOLLOW(ty) in #1384; I just forgot that I had done so. I'm going to check on how the amendment's handling of repetitions compares to the suggestion here.

@pnkfelix
Member

Okay, #1384 handles repetitions in the manner that I think this PR was asking for:

macro_rules! foo { ($a:expr $( , $b:expr ),*) => {} }

compiles fine on Nightly today (but is rejected by Stable right now, of course).

So, given that we now handle sequence repetitions, and that Semi is in FOLLOW(ty), I think #1384 has subsumed this proposed amendment.

Thank you for your patience, @vadimcn

@pnkfelix pnkfelix closed this Jan 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment