Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: flexible syntax for macro invocations #2387

Closed
paulstansifer opened this issue May 15, 2012 · 13 comments
Closed

RFC: flexible syntax for macro invocations #2387

paulstansifer opened this issue May 15, 2012 · 13 comments
Labels
A-grammar Area: The grammar of Rust A-syntaxext Area: Syntax extensions C-enhancement Category: An issue proposing an enhancement or a PR with one.
Milestone

Comments

@paulstansifer
Copy link
Contributor

Currently, macro invocations piggyback off an existing syntactic form, the array literal. We'd like more flexibility.

Macro invocation syntax

The proposed invocation syntax will extend the grammar roughly as follows (the exact syntax for identifying an invocation will be decided later. However, they will need to be distinguished from function invocations at parse-time.):

Expr ::= ... | Identifier "{" Balanced* "}"
Balanced ::= "(" Balanced* ")" | "[" Balanced* "]" | "{" Balanced* "}" | AnyOtherToken

Parsing macro invocations

The tricky part is having macros consume Balanceds in a useful way. An example invocation to an example macro:

my_let {
    x := 4*7;
    y := str::len("(-:") + 18;
    x + (y*x)
}

Here's how we'd like to define my_let (rep() is like Macro by Example's ...):

pat_macro {
    my_let { /*BNF-like notation here*/ 
         rep(var=Identifier ":=" val=Expr ";") body=Expr
    }
    => /* transcribe this, with interpolation of `var`, `body`, and `val`*/
    { |rep(var)| body } (rep(val))
    /* like ((lambda (var ...) body) val ...) */
}

Proposed implementation

I believe that this can be implemented in a minimally-invasive way. pat_macro will be a syntax extension which takes a BNF-like notation for the invocation parser on the inside of the <macro_name>{}, and a Balanced on the right side of the =>. (The only reason not to parse it as an Expression is that rustc has no data structure for incomplete ASTs.) (It would be friendly to also parse it as an expression, using dummy values for interpolated syntax, to check that it will parse correctly.)

At macro expansion time, the Balanced will need to be parsed (well, re-parsed) according the the grammar of the macro. We can do this by building a lexer that takes a Balanced instead of a string as input. The parser will interpret the macro's BNF-like pattern, delegating to the Rust parser for things like Identifier and Expr.

There are two ways for the shim lexer to deal with interpolated syntax. The bad one is to pretty-print the interpolated ASTs and re-lex them before sending them to the parser again. The better one is to use special tokens to hand the parser pre-parsed ASTs for it to return immediately.

Possible extensions

Syntax for lexer-skipping syntax extensions

If we remove # from ordinary macro invocation syntax, we can use it to provide quotation for un-lexed syntax. Delimiters would work in a Perl-like fashion:

#regex(\w+\s*) //parens inside must match
#regex|\\w+\\s*| //backslashes escape delimiter

String-examining/lexer-skipping macros

pat_macro {
    fmt { format=StringContents("%" spec=Letter | percent="%%" | literal=NegativeCharClass("%")) "," rep(arg=Expr, ",") }
    => /* ??? */
}
fmt{"Look at this number: %u", 18u}

Making macros look inside strings should be fairly simple, but most practical applications will probably require lots more power from the macro system.

Invocations at non-expression position

What if we want macros to generate non-expressions (especially items, types)? It seems like we need a separate invocation form for every nonterminal we want to extend. Fortunately, expressions cover a lot of the interesting territory.

@ghost ghost assigned paulstansifer May 15, 2012
@paulstansifer
Copy link
Contributor Author

Minor note: I'm hoping to be able to write rep(var) instead of rep(var, ",") using a token::THE_CORRECT_SEPARATOR which parse_seq would understand. However, this is potentially ambiguous, so we might not be able to.

@nikomatsakis
Copy link
Contributor

I would really like to be able to write macros in non-expression positions (particularly items). Is this as simple as renaming pat_macro to something like define_expr_macro and then having def_item_macro as well, which just parses the output token stream as an item?

@nikomatsakis
Copy link
Contributor

At least, I think I would like that.

@paulstansifer
Copy link
Contributor Author

That would work. But the invocation syntax might have to be different in order to be unambiguous.

@nikomatsakis
Copy link
Contributor

One note on the Identifier { ... } syntax. Macro text cannot begin with || (or even |) as it will be interpreted as a sugared closure syntax like:

spawn {|| ... }

Also, I just realized that we have discussed a class literal syntax of C { f: ..., g: ... }. But this is not implemented and still under discussion.

@pcwalton
Copy link
Contributor

Note that modifying the sugared closure syntax to |x| { ... } (with map(): |x| { ... } for the Ruby block notation) fixes the issue of macros beginning with |.

This is dangerously close to coupling proposals, but I thought I'd get it out there in case we decide that we need macro bodies that can begin with |.

@paulstansifer
Copy link
Contributor Author

Fortunately, the interesting work on this is independent of the specific invocation syntax. That said, I'd hate to special-case-forbid some syntaxes.

@graydon
Copy link
Contributor

graydon commented Jun 5, 2012

A few comments:

  • I like the move of changing from expression-arguments to balanced-token-list arguments. That is a good call. Also neatly solves the quoting problem for different grammar classes. 100% in favour of that.
  • I dislike removing # from invocation position. I think that makes invocations too non-obvious.
  • Item, type and pattern-position macros are a requirement in any new work. Can't just be exprs.
  • I suspect it's advantageous to make the Balanced* productions be Balanced [',' Balanced]*, i.e. to make the grammar pre-cluster sequences of un-parenthesized balanced clusters using commas, the way the C preprocessor does. This requires users to parenthesize only when they expect to have an un-parenthesized comma in a cluster, and makes it more likely that users can pass un-parenthesized expressions as single macro args (eg. the "second argument" to #fmt("%d", 1 + 2), and just not notice that they're treated as balanced-lexeme-lists rather than exprs until expansion is complete.
  • I still think #foo(...) is a reasonable looks-function-call-like outermost syntax for the most common balanced-token-list form, with #foo{...} for the rarer character-level form. I don't feel inclined to tinker with those. They work already.
  • I assume part of the motive on that change was to satisfy the example at the top, where a macro-invocation looks like a block. I think this is unlikely to work; or if it's going to work you should be explicit about that goal, eg. say that #foo{...} is a block-like invocation, requiring no trailing semi, and #foo(...) is a call-like invocation, requiring a trailing semi (and then bump the custom-lexing case over to #foo[...] or #foo"..." or something). That's a plausible path to go down. Just keep in mind that the parser has to decide somewhat early where we have a block and where we need a semi.
  • If you do the above, possibly the pre-clustering with commas only makes sense for #macro(...), i.e. those in call-like invocation position. Block-like macros would make more sense to pre-cluster using ... the semi rules :)
  • The main UI experience we have so far is that #macro is hard to remember how to use, so people skip using it. It's modeled on syntax-case, and I think that while that is appealing to schemers, it is more power than users need in a single macro-definition, most of the time. I think it might be reasonable to assume one macro-defining call defines one pattern => result pair, and require the user to write multiple macro-defining calls (or maybe a #macro_extend call) in order to add more cases to a macro. That is, I'd like to see if you can aim for a UI space less complex than syntax-case, and only slightly more powerful than the C preprocessor's #define (eg. with variadic macros, possibly some tools to help with hygiene).

@paulstansifer
Copy link
Contributor Author

  • I'm neutral on the issue of #. It'd make the parsing situation simpler, but it's nice for macros to look "first-class"
  • Other invocation positions are simple, except that there needs to be unambiguous syntax for them.
  • I'd rather not specify anything about the argument grammar by default. If the user wants comma-separated arguments, they can specify it. The parser can consume multiple Balanceds, so extra parentheses wouldn't be needed unless the grammar was ambiguous (at least locally).

I'm going to put off thinking about notational issues about delimiters for the moment.

  • Perhaps we could have syntax_pat and syntax_pats to make the typical, single-definition case easier?

@graydon
Copy link
Contributor

graydon commented Jun 7, 2012

  • The issue of # is not about making parsing simpler, it's making macros not look first class. Intentionally. Because all bets are off when it comes to interpreting their contents. #fmt doesn't even construct a string literal at runtime. #log is going to lazily-evaluate its arguments. #rx is going to compile a regular expression matcher. These are quite non-obvious and not something that will jump out at the reader if they just glance over the expressions making up the arguments. The # is a clue to the reader not to expect normal evaluation rules inside. Making things look like normal-evaluation-rules is, in my mind, an anti-feature. I realize the team has differing opinions about this, but my preference has been pretty consistent from the start.
  • Concerning commas, I suppose it's sufficient for the macro-defining extension to interpret the commas, or for there to be a cluster-by-commas utility in the extension-writing toolkit; I just want it to be likely that in normal modes of use, users don't have to parenthesize expression arguments to function-like macros, and people implementing extensions don't have to bend over backwards to get back to "list of expressions", when expressions were in fact what got passed.
  • Concerning #syntax_pat and #syntax_pats .. I'm curious if #macro (and perhaps #macros or #multi_macro or something) feels inappropriate. I assume this work is going to replace the current #macro extension anyway. Why rename it?

@paulstansifer
Copy link
Contributor Author

Implemented. The extensions are still relevant, however.

@nejucomo
Copy link

Hi, I'm new to rust and a bit contextually challenged. Are the macros under discussion here deprecated in favor of the foo!(a, b) style? I just learned that style tonight from this: https://gist.github.com/3421238

In either case, I have two wishlist items:

hygienic macros - I assume everyone is on board with this?

scoping definitions like other definitions - You can define a macro that is exported/imported across module boundaries and also you can define macros locally to functions or blocks.

Is this the write venue for these kinds of wishlist items?

@bstrie
Copy link
Contributor

bstrie commented Aug 22, 2012

@nejucomo, here's a talk by @paulstansifer that discusses implementing hygienic macros (at the 15:50 mark). As for your other questions, you can generally direct wishlist-style suggestions to the rust-dev mailing list, and file issues for them once you've gotten some positive feedback.

@paulstansifer paulstansifer removed their assignment Jun 16, 2014
bors added a commit to rust-lang-ci/rust that referenced this issue Sep 22, 2022
update ui_test readme

I forgot to do that when changing the ignore/only syntax.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-grammar Area: The grammar of Rust A-syntaxext Area: Syntax extensions C-enhancement Category: An issue proposing an enhancement or a PR with one.
Projects
None yet
Development

No branches or pull requests

6 participants