Procedural macros #1566

Merged
merged 3 commits into from Dec 13, 2016
@nrc
Contributor
nrc commented Apr 1, 2016

This RFC proposes an evolution of Rust's procedural macro system (aka syntax
extensions, aka compiler plugins). This RFC specifies syntax for the definition
of procedural macros, a high-level view of their implementation in the compiler,
and outlines how they interact with the compilation process.

At the highest level, macros are defined by implementing functions marked with
a #[macro] attribute. Macros operate on a list of tokens provided by the
compiler and return a list of tokens that the macro use is replaced by. We
provide low-level facilities for operating on these tokens. Higher level
facilities (e.g., for parsing tokens to an AST) should exist as library crates.

@nrc nrc Procedural macros
This RFC proposes an evolution of Rust's procedural macro system (aka syntax
extensions, aka compiler plugins). This RFC specifies syntax for the definition
of procedural macros, a high-level view of their implementation in the compiler,
and outlines how they interact with the compilation process.

At the highest level, macros are defined by implementing functions marked with
a `#[macro]` attribute. Macros operate on a list of tokens provided by the
compiler and return a list of tokens that the macro use is replaced by. We
provide low-level facilities for operating on these tokens. Higher level
facilities (e.g., for parsing tokens to an AST) should exist as library crates.
9c42f45
@oli-obk oli-obk commented on an outdated diff Apr 1, 2016
text/0000-proc-macros.md
+the `#[macro]` and `#[macro_attribute]` attributes may only appear within a
+`#[cfg(macro)]` crate. This has the effect of partitioning crates into macro-
+defining and non-macro defining crates. Macros may not be used in the crate in
+which they are defined, although they may be called as regular functions. In the
+future, I hope we can relax these restrictions so that macro and non-macro code
+can live in the same crate.
+
+Importing macros for use means using `extern crate` to make the crate available
+and then using `use` imports or paths to name macros, just like other items.
+Again, see [RFC 1561](https://github.com/rust-lang/rfcs/pull/1561) for more
+details.
+
+When a `#[cfg(macro)]` crate is `extern crate`ed, it's items (even public ones)
+are not available to the importing crate; only macros declared in that crate.
+The crate is dynamically linked with the compiler at compile-time, rather
+than with the importing crate at runtime.
@oli-obk
oli-obk Apr 1, 2016 Contributor

This should come in hand with a warning/lint about adding public items in a #[cfg(macro)] crate

@oli-obk oli-obk commented on the diff Apr 1, 2016
text/0000-proc-macros.md
+by a lang-item. I'm not sure how beneficial this would be, since a change to the
+signature would require changing much of the procedural macro system. I propose
+leaving them hard-wired, unless there is a good use case for the more flexible
+approach.
+
+
+### Specifying delimiters
+
+Under this RFC, a function-like macro use may use either parentheses, braces, or
+square brackets. The choice of delimiter does not affect the semantics of the
+macro (the rules requiring braces or a semi-colon for macro uses in item position
+still apply).
+
+Which delimiter was used should be available to the macro implementation via the
+`MacroContext`. I believe this is maximally flexible - the macro implementation
+can throw an error if it doesn't like the delimiters used.
@oli-obk
oli-obk Apr 1, 2016 Contributor

I prefer this over hiding it from the macro implementor. I personally think that things like vec!{1, 2, 3} and vec!(5, 6, 7) should be forbidden or at least linted against (for backwards compat).

@jimmycuadra

Rendered

(I am SOOOOO excited about this. :D)

@steveklabnik steveklabnik commented on an outdated diff Apr 1, 2016
text/0000-proc-macros.md
+
+Rust macros are hygienic by default. Hygiene is a large and complex subject, but
+to summarise: effectively, naming takes place in the context of the macro
+definition, not the expanded macro.
+
+Procedural macros often want to bend the rules around macro hygiene, for example
+to make items or variables more widely nameable than they would be by default.
+Procedural macros will be able to take part in the application of the hygiene
+algorithm via libmacro. Again, full details must wait for the libmacro RFC and a
+sketch is available in this [blog post](http://ncameron.org/blog/libmacro/).
+
+
+## Tokens
+
+Procedural macros will primarily operate on tokens. There are two main benefits
+to this principal: flexibility and future proofing. By operating on tokens, code
@steveklabnik
steveklabnik Apr 1, 2016 Contributor

nit: principle

@steveklabnik
Contributor

I am also psyched to see movement on this 😄

@nrc nrc was assigned by aturon Apr 1, 2016
@seanmonstar seanmonstar and 2 others commented on an outdated diff Apr 1, 2016
text/0000-proc-macros.md
+
+There are two kinds of procedural macro: function-like and macro-like. These two
+kinds exist today, and other than naming (see
+[RFC 1561](https://github.com/rust-lang/rfcs/pull/1561)) the syntax for using
+these macros remains unchanged. If the macro is called `foo`, then a function-
+like macro is used with syntax `foo!(...)`, and an attribute-like macro with
+`#[foo(...)] ...`. Macros may be used in the same places as `macro_rules` macros
+and this remains unchanged.
+
+To define a procedural macro, the programmer must write a function with a
+specific signature and attribute. Where `foo` is the name of a function-like
+macro:
+
+```
+#[macro]
+pub fn foo(TokenStream, &mut MacroContext) -> TokenStream;
@seanmonstar
seanmonstar Apr 1, 2016 Contributor

Alternative: use the macro keyword (it was reserved in 1.0). It would make it feel more part of the language, in my opinion.

pub macro foo(TokenStream, &mut MacroContext) -> TokenStream;
@pczarn
pczarn Apr 4, 2016

What happens to conflicting names of function/attribute-like macros?

#[macro]
pub fn foo(TokenStream, &mut MacroContext) -> TokenStream;
#[macro_attribute]
pub fn foo(Option<TokenStream>, TokenStream, &mut MacroContext) -> TokenStream;

One possibility is allowing the above, and another is providing access to all data through MacroContext:

#[macro]
#[macro_attribute]
pub fn foo(context: &mut MacroContext) -> TokenStream {
    let tokens = context.token_stream();
    if let Attribute(params) = context.macro_kind() {
        // ...
    } else {
        // ...
    }
}
@nrc
nrc Apr 5, 2016 Contributor

Using macro is appealing, but I'm not sure what we would do for attribute-like macros or macros with an extra ident. Should we depend just on the signature? That feels a bit fragile to me, but might work.

@sfackler sfackler and 1 other commented on an outdated diff Apr 1, 2016
text/0000-proc-macros.md
+
+Procedural macros are currently unstable and are awkward to define. We would
+like to remedy this by implementing a new, simpler system for procedural macros,
+and for this new system to be on the usual path to stabilisation.
+
+One major problem with the current system is that since it is based on ASTs, if
+we change the Rust language (even in a backwards compatible way) we can easily
+break procedural macros. Therefore, offering the usual backwards compatibility
+guarantees to procedural macros, would inhibit our ability to evolve the
+language. By switching to a token-based (rather than AST- based) system, we hope
+to avoid this problem.
+
+# Detailed design
+[design]: #detailed-design
+
+There are two kinds of procedural macro: function-like and macro-like. These two
@sfackler
sfackler Apr 1, 2016 Member

"macro-like and attribute-like"?

@pczarn
pczarn Apr 4, 2016

also, "of procedural macros"

@sfackler sfackler commented on the diff Apr 1, 2016
text/0000-proc-macros.md
+### Linking model
+
+Currently, procedural macros are dynamically linked with the compiler. This
+prevents the compiler being statically linked, which is sometimes desirable. An
+alternative architecture would have procedural macros compiled as independent
+programs and have them communicate with the compiler via IPC.
+
+This would have the advantage of allowing static linking for the compiler and
+would prevent procedural macros from crashing the main compiler process.
+However, designing a good IPC interface is complicated because there is a lot of
+data that might be exchanged between the compiler and the macro.
+
+I think we could first design the syntax, interfaces, etc. and later evolve into
+a process-separated model (if desired). However, if this is considered an
+essential feature of macro reform, then we might want to consider the interfaces
+more thoroughly with this in mind.
@sfackler
sfackler Apr 1, 2016 Member

I would very much like to move towards this kind of setup, but I think that the interface proposed here should work just fine in that world so we shouldn't necessarily block on figuring this out right now.

@eddyb
eddyb Apr 2, 2016 Member

We might have to do clever things like use shared memory alongside message-based IPC to get the most out of such a model, but I believe it would be worth it.

Dynamic linking for plugin systems works in practice but just making it safe to unload said plugins is an entire area of design Rust hasn't even touched yet (although lifetimes would play a very important role).

I would rather have an IPC solution that lets me:

  • download a single binary and have a working cross-compiler without even system dependencies (static linking against musl on linux)
  • safely unload plugins after expansion without introducing complexity into the language
  • use a forking zygote to share most of the plugins across compilations (via cargo or RLS)
    • EDIT: This technique can also be used to (relatively) cheaply erase any per-thread/process state in between expansions of the same macro, if we want to completely deny that (it would make order dependence much harder, I think) - since we control the compilation of the macro crate, we could just ban statics altogether, but that doesn't account for them messing with pthreads on their own
  • implement my own libmacro in an alternative Rust compiler
  • more exciting: implement my own libmacro for a Rust tool which is not a full compiler
@eddyb eddyb and 2 others commented on an outdated diff Apr 2, 2016
text/0000-proc-macros.md
+```
+// We might optimise this representation
+pub struct TokenStream(Vec<TokenTree>);
+
+// A borrowed TokenStream
+pub struct TokenSlice<'a>(&'a [TokenTree]);
+
+// A token or token tree.
+pub struct TokenTree {
+ pub kind: TokenKind,
+ pub span: Span,
+ pub hygiene: HygieneObject,
+}
+
+pub enum TokenKind {
+ Sequence(Delimiter, Vec<TokenTree>),
@eddyb
eddyb Apr 2, 2016 Member

Shouldn't this be Sequence(Delimiter, TokenStream)?

@pczarn
pczarn Apr 4, 2016

What's the point of having delimited sequences? If users are expected to use external libraries to parse AST, surely they can use these libraries to find and match delimiters.

@nrc
nrc Apr 5, 2016 Contributor

We have to tokenise into sequences to parse patterns for macros, so I think the compiler must do it. It is then convenient for macro authors to have that info. Since we can't be more flexible by not doing it, I don't think there is an advantage of keeping that info private. Furthermore, it seems like a win for the macro author to see what the compiler sees in terms of why it is being passed the data it is.

Finally, using un-delimited sequences is useful for macros to affect precedence without introducing scopes.

@eddyb
eddyb Apr 5, 2016 Member

IMO the biggest win is making capturing a sequence O(1), assuming a representation which can reuse TokenStreams without copying all the tokens.
Such an example would be a macro taking a large function body but not inspecting it by itself.

@eddyb eddyb and 2 others commented on an outdated diff Apr 2, 2016
text/0000-proc-macros.md
+pub enum TokenKind {
+ Sequence(Delimiter, Vec<TokenTree>),
+
+ // The content of the comment can be found from the span.
+ Comment(CommentKind),
+ // The Span is the span of the string itself, without delimiters.
+ String(Span, StringKind),
+
+ // These tokens are treated specially since they are used for macro
+ // expansion or delimiting items.
+ Exclamation, // `!`
+ Dollar, // `$`
+ // Not actually sure if we need this or if semicolons can be treated like
+ // other punctuation.
+ Semicolon, // `;`
+ Eof,
@eddyb
eddyb Apr 2, 2016 Member

Do we really need EOF, instead of relying on the TokenStream ending?

@nrc
nrc Apr 5, 2016 Contributor

Good question. I'm not sure. Given that a TokenSteam could be internal, it might be useful to know if a stream is ending because of a close delimiter or an EOF. On the other hand, I can't think of a use case for that and it might be better for macro authors not to know about file boundaries.

@Ericson2314
Ericson2314 Apr 5, 2016 Contributor

it might be better for macro authors not to know about file boundaries.

Definitely!

@eddyb eddyb commented on an outdated diff Apr 2, 2016
text/0000-proc-macros.md
+ // ( )
+ Parenthesis,
+ // [ ]
+ Bracket,
+}
+
+pub enum CommentKind {
+ Regular,
+ InnerDoc,
+ OuterDoc,
+}
+
+pub enum StringKind {
+ Regular,
+ // usize is for the count of `#`s.
+ Raw(usize),
@eddyb
eddyb Apr 2, 2016 Member

Maybe use struct variants? Also, it seems string / byte-string is orthogonal to "rawness".

@eddyb eddyb and 1 other commented on an outdated diff Apr 2, 2016
text/0000-proc-macros.md
+ // The Span is the span of the string itself, without delimiters.
+ String(Span, StringKind),
+
+ // These tokens are treated specially since they are used for macro
+ // expansion or delimiting items.
+ Exclamation, // `!`
+ Dollar, // `$`
+ // Not actually sure if we need this or if semicolons can be treated like
+ // other punctuation.
+ Semicolon, // `;`
+ Eof,
+
+ // Word is defined by Unicode Standard Annex 31 -
+ // [Unicode Identifier and Pattern Syntax](http://unicode.org/reports/tr31/)
+ Word(InternedString),
+ Punctuation(char),
@eddyb
eddyb Apr 2, 2016 Member

What about <<, >>, ->, <-, and all the compound assign operators?
For example, the following compiles:

macro_rules! just_one_token { ($x:tt) => {} }
just_one_token!(->);
just_one_token!(<<);
just_one_token!(+=);
just_one_token!(>>=);
@nrc
nrc Apr 5, 2016 Contributor

Hmm, I guess it is better to take multiple chars and maintain backwards compatibility here. I had wanted to avoid macros having to split tokens (e.g., << into < < in generics). Combining them where necessary feels less annoying. I'll add some discussion to the RFC.

@eddyb
eddyb Apr 5, 2016 Member

Could be useful to have both a plain iterator and a Parser-like API for consuming punctuation tokens (which would split << into < < for the procedural macro) but that requires the lookahead buffer.

@eddyb eddyb commented on the diff Apr 2, 2016
text/0000-proc-macros.md
+
+pub enum TokenKind {
+ Sequence(Delimiter, Vec<TokenTree>),
+
+ // The content of the comment can be found from the span.
+ Comment(CommentKind),
+ // The Span is the span of the string itself, without delimiters.
+ String(Span, StringKind),
+
+ // These tokens are treated specially since they are used for macro
+ // expansion or delimiting items.
+ Exclamation, // `!`
+ Dollar, // `$`
+ // Not actually sure if we need this or if semicolons can be treated like
+ // other punctuation.
+ Semicolon, // `;`
@eddyb
eddyb Apr 2, 2016 Member

Can't we use Punctuation for all 3 of these?

@nrc
nrc Apr 5, 2016 Contributor

I'm not really sure, there does seem to be some advantage to making these special. However, I think it is probably a convenience rather than essential. I think I would start implementing and maybe change this later.

@eddyb eddyb commented on the diff Apr 2, 2016
text/0000-proc-macros.md
+}
+
+pub enum Delimiter {
+ None,
+ // { }
+ Brace,
+ // ( )
+ Parenthesis,
+ // [ ]
+ Bracket,
+}
+
+pub enum CommentKind {
+ Regular,
+ InnerDoc,
+ OuterDoc,
@eddyb
eddyb Apr 2, 2016 Member

Don't we expand doc comments into attributes nowadays? I recall being able to use them with macros taking attributes.

@pczarn
pczarn Apr 4, 2016

Yes, doc comments become doc attributes in macro input.

@nrc
nrc Apr 5, 2016 Contributor

Yeah. It is gross and hackey and I have no idea why we even have doc comment attributes. I would like to kill them in the surface syntax, but that seems a bit unlikely given the discussion on that RFC :-(

I would prefer to have explicit doc comments. Is there a reason to treat them like attributes (in macros I mean, I realise that the compiler/rustdoc wants them to be attributes).

@eddyb
eddyb Apr 5, 2016 Member

You can use them in macro_rules macros with just generic attribute pass-through.

@eddyb eddyb commented on the diff Apr 2, 2016
text/0000-proc-macros.md
+function:
+
+```
+#[macro_with_ident]
+pub fn foo(&Token, TokenStream, &mut MacroContext) -> TokenStream;
+```
+
+where the first argument is the extra identifier.
+
+
+### Linking model
+
+Currently, procedural macros are dynamically linked with the compiler. This
+prevents the compiler being statically linked, which is sometimes desirable. An
+alternative architecture would have procedural macros compiled as independent
+programs and have them communicate with the compiler via IPC.
@eddyb
eddyb Apr 2, 2016 Member

I think we can start by running each plugin's registrar and all expansion in a thread dedicated to that plugin.
This will avoid accidental dependence on thread-local state (which we have some of, most notably the string interner).

If we have a &mut context, we should be able to temporarily "transfer ownership" to the expanding thread, without the context even implementing Sync, just Send.

@DemiMarie
DemiMarie Jun 7, 2016

Another advantage of the IPC solution (which OCaml has chosen for its own syntax extensions) is that crashes in the plugin don't crash the compiler. Furthermore, should it become desirable, the plugin could be run in a process with reduced OS-level privileges.

@eddyb eddyb commented on the diff Apr 2, 2016
text/0000-proc-macros.md
+would prevent procedural macros from crashing the main compiler process.
+However, designing a good IPC interface is complicated because there is a lot of
+data that might be exchanged between the compiler and the macro.
+
+I think we could first design the syntax, interfaces, etc. and later evolve into
+a process-separated model (if desired). However, if this is considered an
+essential feature of macro reform, then we might want to consider the interfaces
+more thoroughly with this in mind.
+
+
+### Interactions with constant evaluation
+
+Both procedural macros and constant evaluation are mechanisms for running Rust
+code at compile time. Currently, and under the proposed design, they are
+considered completely separate features. There might be some benefit in letting
+them interact.
@eddyb
eddyb Apr 2, 2016 Member

The main problem with mixing macros and constants is that if you somehow run impure procedural macro code for each monomorphization of a generic type, you generally cannot enforce determinism so safe code can break coherence and possibly even type safety (not that unsafe should let you do such things either).

OTOH, I would welcome a TokenStream -> Result<TokenStream, ...> method on MacroContext for compiling just a constant and outputting its evaluated form (if available - ADT "trees" with primitive literals as leaves should work), but I don't have a specific usecase in mind right now.

@eddyb eddyb and 1 other commented on an outdated diff Apr 2, 2016
text/0000-proc-macros.md
+pub struct TokenSlice<'a>(&'a [TokenTree]);
+
+// A token or token tree.
+pub struct TokenTree {
+ pub kind: TokenKind,
+ pub span: Span,
+ pub hygiene: HygieneObject,
+}
+
+pub enum TokenKind {
+ Sequence(Delimiter, Vec<TokenTree>),
+
+ // The content of the comment can be found from the span.
+ Comment(CommentKind),
+ // The Span is the span of the string itself, without delimiters.
+ String(Span, StringKind),
@eddyb
eddyb Apr 2, 2016 Member

What if the plugin wants to create a string literal which is unrelated to anything found in the source?

@eddyb
eddyb Apr 2, 2016 Member

Also, seems to be missing numeric and character literals.

@nrc
nrc Apr 5, 2016 Contributor

Hmm, yeah, I guess we need to be able to do that. Shame, would be nice to avoid allocating a String in the common case.

Can we get away with treating numeric literals as Words?

@eddyb
eddyb Apr 5, 2016 Member

I would be against allocating a String myself. Having a common Symbol type for both string literals and "words" (not unlike the current implementation?) seems optimal.

I agree with treating numeric literals as something else (and making, e.g. i64::parse easy to use - except that doesn't handle extra _, does it?), as long as you have a solution for literal suffixes which fits the current semantics.

@eddyb eddyb and 1 other commented on an outdated diff Apr 2, 2016
text/0000-proc-macros.md
+ Comment(CommentKind),
+ // The Span is the span of the string itself, without delimiters.
+ String(Span, StringKind),
+
+ // These tokens are treated specially since they are used for macro
+ // expansion or delimiting items.
+ Exclamation, // `!`
+ Dollar, // `$`
+ // Not actually sure if we need this or if semicolons can be treated like
+ // other punctuation.
+ Semicolon, // `;`
+ Eof,
+
+ // Word is defined by Unicode Standard Annex 31 -
+ // [Unicode Identifier and Pattern Syntax](http://unicode.org/reports/tr31/)
+ Word(InternedString),
@eddyb
eddyb Apr 2, 2016 Member

Exposing InternedString makes me uneasy - the fact that we intern these is an implementation detail, although it does affect whether the API for accessing the contents involves MacroContext.

We could get away with either an interned index with the interner in TLS, or RC+SSO or some other strange combination, if we don't want to involve the MacroContext (which is the real decision to be made).

Still, this could probably use a less specific name, such as Symbol or even Word.

@nrc
nrc Apr 5, 2016 Contributor

I assumed we would use the MacroContext to access the string. The MacroContext should be ubiquitous, so I don't think that is a problem. Might be best to change the name from InternedString to Symbol or something. We could then use where there are string literals too without prejudicing the implementation.

@eddyb
eddyb Apr 5, 2016 Member

Sounds good to me, as long as it doesn't get in your way.

@eddyb eddyb commented on the diff Apr 2, 2016
text/0000-proc-macros.md
+tokens currently in the compiler.
+
+In code:
+
+```
+// We might optimise this representation
+pub struct TokenStream(Vec<TokenTree>);
+
+// A borrowed TokenStream
+pub struct TokenSlice<'a>(&'a [TokenTree]);
+
+// A token or token tree.
+pub struct TokenTree {
+ pub kind: TokenKind,
+ pub span: Span,
+ pub hygiene: HygieneObject,
@eddyb
eddyb Apr 2, 2016 Member

Might want to consider combining the span and the hygiene information - couldn't the expansion traces be deduced from hygiene scopes?

@Zoxc
Zoxc Apr 2, 2016

How will we create TokenTrees?

@eddyb
eddyb Apr 2, 2016 Member

I'd start with better quasi-quoting and see what else we need from there.
One worry I have about Sequence is that it might be expensive to convert into the optimized representation.
Given that the average sequence has 8 tokens, it would be a waste to keep an actual tree in memory, when a lot of cases could fit in the same space the pointers take right now.

@nrc
nrc Apr 5, 2016 Contributor

Hmm, it might be a good idea to combine Spans and HygieneObject here. There have different roles, but are kind of two sides of the same coin. Spans are informative and transparent. HygieneObjects are meant to be normative and opaque. I can imagine that macro authors might want to change them independently, but the common case definitely would be to operate on both at the same time.

@nrc
nrc Apr 5, 2016 Contributor

I'll cover creating token trees in an upcoming RFC

@eddyb
eddyb Apr 6, 2016 Member

How does Origin sound as a name for the combined span & hygiene info?

@eddyb eddyb commented on the diff Apr 2, 2016
text/0000-proc-macros.md
+interior node is a token stream. I.e., a token stream which can contain nested
+token streams. A token tree can be delimited, e.g., `a (b c);` will give
+`TT(None, ['a', TT(Some('()'), ['b', 'c'], ';'']))`. An undelimited token tree
+is useful for grouping tokens due to expansion, without representation in the
+source code. That could be used for unsafety hygiene, or to affect precedence
+and parsing without affecting scoping. They also replace the interpolated AST
+tokens currently in the compiler.
+
+In code:
+
+```
+// We might optimise this representation
+pub struct TokenStream(Vec<TokenTree>);
+
+// A borrowed TokenStream
+pub struct TokenSlice<'a>(&'a [TokenTree]);
@eddyb
eddyb Apr 2, 2016 Member

These two types might be better served by Cow<'a, [TokenTree]> - or if an optimized representation such as "reference-counted ropes of slices" is chosen, no lifetime might be needed at all.

That said, having a lifetime in both the stream type and the context type expands the design space significantly, and the following is probably really close to optimal:

pub struct TokenStream<'a>(Vec<&'a [CompactToken]>);
pub struct MacroContext<'a> {
    tokens: &'a TypedArena<Vec<CompactToken>>,
    ...
}

We must be careful to only provide APIs which can be implemented efficiently regardless of representation, such as forward/reverse iterators (but not random access iterators).

@Zoxc
Zoxc commented Apr 2, 2016

I am missing Space, Tab and Newline from the tokens. Without them, it's hard to tell the difference between + = and +=. They would also allow you to embed languages which are newline sensitive, like assembly.

@thepowersgang
Contributor

They're not needed - += should be a distinct token to + =

@eddyb
Member
eddyb commented Apr 2, 2016

@Zoxc spaces are intentionally omitted, they complicate everything. We skip those tokens in the lexer (Reader::real_token) in the current parser.

@eddyb
Member
eddyb commented Apr 3, 2016

@nrc I've been talking to @mrmonday about his pnet crate, which gives you the tools to define arbitrary packets - network is the main target, but I suspect IPC or serialization would also be valid usecases; I hear dropbox is using it for something.

From what I understand, it would be real helpful if given an arbitrary type path (e.g. the type of a field), a procedural macro could query the MacroContext and get some information about the type definition (a struct or enum, AFAICT).

My reading of #1560 suggests this should be possible, and if nothing else, having a macro next to the type definition would also work, i.e. given field: path::to::Type, pnet could query path::to::Type__pnet_info (previously generated by pnet) or at least defer to it for decisions.

@Zoxc
Zoxc commented Apr 4, 2016

Are you supposed to be able to generate Spans dynamically for token you generate? For example include! should make a new Span for the tokens in the external file. That means there is at least hope of inferring whitespace/newlines from Spans. It would make the fields of Comment and String seem redundant, since those could easily be inferred from the Span.

I wonder if it would be a good idea to include a Unknown token, which could contain an AST or newer tokens not supported by the current version of libmacro. Inline assembly seems to be a good candidate to put inside such a token, since asm! cannot currently return tokens.

@eddyb
Member
eddyb commented Apr 4, 2016

@Zoxc Well, one solution to make matches over tokens, from stable crates, non-exhaustive is to have a dummy variant which is unstable - InlineAsm is another good example of an unstable variant.
Not doing anything like this would force libmacro into a fixed set of tokens which cannot be extended.

@pczarn pczarn commented on the diff Apr 4, 2016
text/0000-proc-macros.md
+
+
+### Interactions with constant evaluation
+
+Both procedural macros and constant evaluation are mechanisms for running Rust
+code at compile time. Currently, and under the proposed design, they are
+considered completely separate features. There might be some benefit in letting
+them interact.
+
+
+### Inline procedural macros
+
+It would nice to allow procedural macros to be defined in the crate in which
+they are used, as well as in separate crates (mentioned above). This complicates
+things since it breaks the invariant that a crate is designed to be used at
+either compile-time or runtime. I leave it for the future.
@pczarn
pczarn Apr 4, 2016

N.B. This would allow an implementation of macro_rules! in terms of procedural macro generation, assuming that procedural macros could be defined in macro expansions.

@eddyb
eddyb Apr 4, 2016 Member

That wouldn't really be allowed, I wouldn't think. It's hard enough to make it work before expansion, during expansion it would be a nightmare.
Unless we had "macro modules" which were split off as an entire crate, disregarding anything else in the current crate (i.e. they would be the root of their own crate). But that might get a bit confusing.

@Ericson2314
Ericson2314 Apr 5, 2016 Contributor

On a different note, this is weird wrt cross compiling. It would be annoying to not be able to cross-compile a crate because the macros defined in it use some (imported) items that can't be built on the host platform.

@lifthrasiir lifthrasiir and 1 other commented on an outdated diff Apr 5, 2016
text/0000-proc-macros.md
+- RFC PR: (leave this empty)
+- Rust Issue: (leave this empty)
+
+# Summary
+[summary]: #summary
+
+This RFC proposes an evolution of Rust's procedural macro system (aka syntax
+extensions, aka compiler plugins). This RFC specifies syntax for the definition
+of procedural macros, a high-level view of their implementation in the compiler,
+and outlines how they interact with the compilation process.
+
+This RFC specifies the architecture of the procedural macro system. It relies on
+[RFC 1561](https://github.com/rust-lang/rfcs/pull/1561) which specifies the
+naming and modularisation of macros. It leaves many of the details for further
+RFCs, in particular the details of the APIs available to macro authors
+(tentatively called `libmacro`). See this [blog post](http://ncameron.org/blog/libmacro/)
@lifthrasiir
lifthrasiir Apr 5, 2016 Contributor

A half-nit-picking and half-question: macro is a reserved keyword, so wouldn't extern crate macro; and subsequent uses of macro::... be a syntax error? We need to put a special status to the macro keyword or rename the crate.

@nrc
nrc Apr 5, 2016 Contributor

Yeah, that is true. We could decide to make macro not a keyword (is that backward compatible?). Or we could make it a contextual keyword. Or we could use a different name. I'll leave it for the libmacro rfc...

@nrc
Contributor
nrc commented Apr 5, 2016

@Zoxc whitespace is elided from the tokens. Since whitespace is the only thing elided, it should always be possible to recreate it from the spans.

@eddyb
Member
eddyb commented Apr 5, 2016

@Zoxc @nrc Whitespace and comments - I just realized the example tokens in this RFC include comments which might be an annoyance, you want to ignore them if you want to ignore whitespace - unless they are doc comments, but in that case see the discussion above about doc comments and attributes.

@Zoxc
Zoxc commented Apr 5, 2016

I would like a simpler API which deals with spans, expansions and hyigene objects. This can be used to build the higher level API proposed in this RFC. Here is the API I would like:

// A contiguous string containing tokens with source location information.
pub struct Span(..);

// This represents a macro expansion. It is passed to the #[macro] function.
pub struct Expansion {
    // `Span` is the source of this macro, including the `macro!()` syntax
    pub span: Span,
    // A map of positions in the span onto macro expansions. macro!(a!() b!()) would have 2 entries here.
    pub macros: HashMap<usize, Expansion>,
    // A map of positions in the span onto hygiene objects.
    pub hyigene: HashMap<usize, HygieneObject>,
}

impl Span {
    pub fn as_str<'s>(&'s self, context: &Context) -> &'s str { .. }
    pub fn new(context: &Context, str: &str) -> Span { .. }
}

pub struct Context(..);

The macros and hyigene fields should be exposed and instead by accessed by method somehow to allow for more efficient representations. There should also be a way to edit Spans which preserves location information.

This requires all macros to return things that can be represented by spans, expansions and hyigene objects. So asm! will have to move to an intrinsic for this to work.

The benefits of this API:

  • It's very light on allocations. It can reuse input source code so it can be very space efficient. Using identity macros (without macros or hygiene objects inside) would require no allocations.
  • It perfectly represents whitespace, newline, comments and all future tokens, unlike the proposed API.
  • It still allows macros which only wants to deal with valid Rust tokens (no whitespace, comments or newlines) a way to do that using some API on top which deals with Tokens.
@eddyb
Member
eddyb commented Apr 5, 2016

@Zoxc First impression: not bad. Definitely doesn't interfere directly with anything I can think of (it's a bit like the lower-level internal codemap API).
However, an efficient token representation would have to be part of libmacro (as opposed to being implemented in an entirely stable crate) to be able to avoid repeated lexing between the compiler and procedural macros.

@plietar
plietar commented Apr 6, 2016

Great to see this moving forward. Two comments :

  • custom deriving isn't mentioned. This is quite a big use case for compiler plugins for eg serde. Passing in the struct/enum definition into the macro as tokens would feel clunky. It would probably be better to pass the list of fields/variants, but this may limit compatibility.

  • some macros embed rust syntax inside there own custom syntax (thinking of json_macros or my own protobuf-macros which parse rust expressions only to include them as is in the generated ast).

    If they do their own parsing, then when a new expression syntax comes along (say the new ? suffix operator for example) and they aren't updated to take it into account, then the new syntax may not be used inside the macro. However the macro doesn't care about the syntax used, it just wants an expression.

    The compiler could provide some parsing as a service for plugins, either generating an opaque expression AST that plugins may include within the tokens (feels clumsy), or simply validate the token tree on behalf of the plugin, which outputs the original tokens.

    Alternatively, the plugins can just include the token tree in the output without making sure it is an actual expression. This however breaks the "It is the procedural macro's responsibility to ensure that the tokens parse without error" rule stated by the RFC, although it is really the user's fault.

    In any case, I think the RFC should mention this in some way.

@nrc
Contributor
nrc commented Apr 8, 2016

@eddyb - I can imagine use cases where a macro might want the comments (although rare). Certainly I've written tools where having the comments would have been useful. I hope that we can include the comments, but present API which effectively filters them out. Strawman: TokenStream could have an iter method which ignores comments and iter_all (or iter_with_comments or something) which iterates over all tokens including comments.

I hope that the most popular mode for operation will be passing the token stream straight to a parsing library so that library can then ignore the comments unless necessary and the macro author never sees them.

@nrc
Contributor
nrc commented Apr 8, 2016

@plietar I had not considered custom deriving until recently. I plan to have a separate RFC for that.

The goal with a token-based API is that we can add new syntax to Rust without changing the API for macros. E.g., although ? for try adds new syntactic forms, it does not add new tokens. Parsing tokens into AST should be provided by external (possibly 3rd party) crates. They would be free to choose from different future-proofing strategies.

There is no guarantee that the input to the macro parses, so the macro must ensure that any transformation (including the identity) will give valid output, even on invalid input. In the general case, I would expect the macro to produce errors if it can't produce valid output. In any case, the user will get an error on invalid input - either from the macro or from the compiler.

@xeno-by xeno-by commented on the diff Apr 8, 2016
text/0000-proc-macros.md
+#[macro_attribute]
+pub fn foo(Option<TokenStream>, TokenStream, &mut MacroContext) -> TokenStream;
+```
+
+The first argument is a list of the tokens between the delimiters in the macro
+use. Examples:
+
+* `#[foo]` => `None`
+* `#[foo()]` => `Some([])`
+* `#[foo(a, b, c)]` => `Some([Ident(a), Comma, Ident(b), Comma, Ident(c)])`
+
+The second argument is the tokens for the AST node the attribute is placed on.
+Note that in order to compute the tokens to pass here, the compiler must be able
+to parse the code the attribute is applied to. However, the AST for the node
+passed to the macro is discarded, it is not passed to the macro nor used by the
+compiler (in practice, this might not be 100% true due to optimisiations). If
@xeno-by xeno-by commented on the diff Apr 8, 2016
text/0000-proc-macros.md
+
+* `#[foo]` => `None`
+* `#[foo()]` => `Some([])`
+* `#[foo(a, b, c)]` => `Some([Ident(a), Comma, Ident(b), Comma, Ident(c)])`
+
+The second argument is the tokens for the AST node the attribute is placed on.
+Note that in order to compute the tokens to pass here, the compiler must be able
+to parse the code the attribute is applied to. However, the AST for the node
+passed to the macro is discarded, it is not passed to the macro nor used by the
+compiler (in practice, this might not be 100% true due to optimisiations). If
+the macro wants an AST, it must parse the tokens itself.
+
+The attribute and the AST node it is applied to are both replaced by the
+returned tokens. In most cases, the tokens returned by a procedural macro will
+be parsed by the compiler. It is the procedural macro's responsibility to ensure
+that the tokens parse without error. In some cases, the tokens will be consumed
@xeno-by
xeno-by Apr 8, 2016

What happens if the tokens representing the macro expansion don't parse without error? Does the user of a macro just get a (possibly cryptic) error message, or you envision something more advanced?

@nrc
nrc Apr 21, 2016 Contributor

Just the possibly cryptic error message. I'm not aware that we can do better than this. We can probably implicate the macro to some extent, but I don't see anything beyond that which is possible.

@sgrif sgrif commented on the diff Apr 8, 2016
text/0000-proc-macros.md
+To define a procedural macro, the programmer must write a function with a
+specific signature and attribute. Where `foo` is the name of a function-like
+macro:
+
+```
+#[macro]
+pub fn foo(TokenStream, &mut MacroContext) -> TokenStream;
+```
+
+The first argument is the tokens between the delimiters in the macro use.
+For example in `foo!(a, b, c)`, the first argument would be `[Ident(a), Comma,
+Ident(b), Comma, Ident(c)]`.
+
+The value returned replaces the macro use.
+
+Attribute-like:
@sgrif
sgrif Apr 8, 2016 Contributor

This section makes no mention of custom derive. Is that meant to be covered by this RFC?

@nrc
nrc Apr 21, 2016 Contributor

No, another RFC.

@joshtriplett
Member

In theory, once the compiler had a complete implementation of this, could macro_rules! become a library function built on top of procedural macros, rather than a built-in language mechanism?

@comex
comex commented Apr 9, 2016

AFAICT, not easily, because macro names are statically determined by the implementing fn names and can't be dynamically registered. That's too bad, because there are valid use cases for doing so.

@withoutboats

(in the unresolved questions / macros with an extra identifier section)

My feeling is that this macro form is not used enough to justify its existence. From a design perspective, it encourages uses of macros for language extension, rather than syntactic abstraction. I feel that such macros are at higher risk of making programs incomprehensible and of fragmenting the ecosystem).

Could you expand on the difference is between a 'language extension' and a 'syntactic abstraction'? To me these seem like two terms for the same concept. I also don't understand the mechanism by which they could fragment the ecosystem, though I agree that any new syntactic form can be used to make the language harder to read for someone (and easier for someone else).

The way I would interpret this comment is that you would prefer for metalinguistic abstraction to show a certain aesthetic restraint. That's a fine position, but I don't think Rust should be opinionated about this.

That said, these comments are in the unresolved questions section of the RFC, so perhaps the discussion should steer away from this.

Also, the blog series that preceded this RFC discussed changes to the syntactic macros system, but I don't see that discussed in this RFC; am I correct that a separate RFC will be penned regarding that?

In theory, once the compiler had a complete implementation of this, could macro_rules! become a library function built on top of procedural macros, rather than a built-in language mechanism?

That would depend on some of the precise details of the RFC, my own preference would be for the implementation of procedural RFCs to leave that possibility open.

@xeno-by xeno-by commented on the diff Apr 9, 2016
text/0000-proc-macros.md
+
+## Tokens
+
+Procedural macros will primarily operate on tokens. There are two main benefits
+to this principle: flexibility and future proofing. By operating on tokens, code
+passed to procedural macros does not need to satisfy the Rust parser, only the
+lexer. Stabilising an interface based on tokens means we need only commit to
+not changing the rules around those tokens, not the whole grammar. I.e., it
+allows us to change the Rust grammar without breaking procedural macros.
+
+In order to make the token-based interface even more flexible and future-proof,
+I propose a simpler token abstraction than is currently used in the compiler.
+The proposed system may be used directly in the compiler or may be an interface
+wrapper over a more efficient representation.
+
+Since macro expansion will not operate purely on tokens, we must keep hygiene
@xeno-by xeno-by commented on the diff Apr 9, 2016
text/0000-proc-macros.md
+allows us to change the Rust grammar without breaking procedural macros.
+
+In order to make the token-based interface even more flexible and future-proof,
+I propose a simpler token abstraction than is currently used in the compiler.
+The proposed system may be used directly in the compiler or may be an interface
+wrapper over a more efficient representation.
+
+Since macro expansion will not operate purely on tokens, we must keep hygiene
+information on tokens, rather than on `Ident` AST nodes (we might be able to
+optimise by not keeping such info for all tokens, but that is an implementation
+detail). We will also keep span information for each token, since that is where
+a record of macro expansion is maintained (and it will make life easier for
+tools. Again, we might optimise internally).
+
+A token is a single lexical element, for example, a numeric literal, a word
+(which could be an identifier or keyword), a string literal, or a comment.
@xeno-by
xeno-by Apr 9, 2016

What about whitespace? Is there any way to track spaces, newlines, etc?

@eddyb
eddyb Apr 9, 2016 Member

In the current compiler, you can do that by looking up the source between adjacent token's spans.

@nrc
nrc Apr 21, 2016 Contributor

Yeah, that should be even easier now with spans for more stuff around, but essentially you just look it up in the source span.

@xeno-by xeno-by commented on the diff Apr 9, 2016
text/0000-proc-macros.md
+
+Procedural macros are a somewhat unpleasant corner of Rust at the moment. It is
+hard to argue that some kind of reform is unnecessary. One could find fault with
+this proposed reform in particular (see below for some alternatives). Some
+drawbacks that come to mind:
+
+* providing such a low-level API risks never seeing good high-level libraries;
+* the design is complex and thus will take some time to implement and stabilise,
+ meanwhile unstable procedural macros are a major pain point in current Rust;
+* dealing with tokens and hygiene may discourage macro authors due to complexity,
+ hopefully that is addressed by library crates.
+
+The actual concept of procedural macros also have drawbacks: executing arbitrary
+code in the compiler makes it vulnerable to crashes and possibly security issues,
+macros can introduce hard to debug errors, macros can make a program hard to
+comprehend, it risks creating de facto dialects of Rust and thus fragmentation
@xeno-by
xeno-by Apr 9, 2016

Has the problem with dialects already come up in your experience with macros 1.0? What do you plan to do if this ends up being an issue?

@nrc
nrc Apr 21, 2016 Contributor

To some extent this exists already, for example, crates which use serde vs rustc_serialize are effectively incompatible. However, due to macro_rules not really allowing ways to extend the language and current procedural macros being unstable, it is not a big issue.

Mitigation strategies are limited. Giving early, strong official support (or providing a more ergonomic alternative if the feature is not great) are the basic options. If a dialect becomes popular, then working with it to maintain interoperability lessens the damage.

@tomaka tomaka commented on an outdated diff Apr 10, 2016
text/0000-proc-macros.md
+uses.
+
+Initially, it will only be legal to apply `#[cfg(macro)]` to a whole crate and
+the `#[macro]` and `#[macro_attribute]` attributes may only appear within a
+`#[cfg(macro)]` crate. This has the effect of partitioning crates into macro-
+defining and non-macro defining crates. Macros may not be used in the crate in
+which they are defined, although they may be called as regular functions. In the
+future, I hope we can relax these restrictions so that macro and non-macro code
+can live in the same crate.
+
+Importing macros for use means using `extern crate` to make the crate available
+and then using `use` imports or paths to name macros, just like other items.
+Again, see [RFC 1561](https://github.com/rust-lang/rfcs/pull/1561) for more
+details.
+
+When a `#[cfg(macro)]` crate is `extern crate`ed, it's items (even public ones)
@tomaka
tomaka commented Apr 10, 2016

About #[cfg(macro)], if I understand correctly the process that is proposed by the RFC is that:

  • Cargo parses each dependency of the Cargo.toml and determines the ones that contain macros/syntax-extensions, depending on the plugin attribute.
  • Cargo then compiles each crate of the dependency tree with rustc, and indicates rustc whether each crate must be compiled with or without #[cfg(macro)]. Cargo also indicates the correct build target. For example if you cross-compile from x86 to ARM, the macro-providing crates will be compiled for x86 and the other ones for ARM.
  • Here is a blurry point: in the final rustc invocation, does Cargo also indicate whether each extern crate is a macro-providing crate or not, or is it automatically detected by rustc?

If the first solution is chosen, then I think it shouldn't be difficult to immediately support crates that provide both macros and non-macros instead of splitting crates in two categories. A crate that provides both macros and non-macros would already have been compiled twice by Cargo, and rustc would simply store two paths to each extern crate (one for macros, one for non-macros).

I'm in favor of immediately allowing macros and non-macros in the same crates, as it's a semver nightmare to have to use two interdependent crates. Imagine if you had to declare all your macro_rules! in a separate crate, that would be very annoying. It's the same for procedural macros.

@gereeter gereeter commented on the diff Apr 10, 2016
text/0000-proc-macros.md
+ // expansion or delimiting items.
+ Exclamation, // `!`
+ Dollar, // `$`
+ // Not actually sure if we need this or if semicolons can be treated like
+ // other punctuation.
+ Semicolon, // `;`
+ Eof, // Do we need this?
+
+ // Word is defined by Unicode Standard Annex 31 -
+ // [Unicode Identifier and Pattern Syntax](http://unicode.org/reports/tr31/)
+ Word(Symbol),
+ Punctuation(char),
+}
+
+pub enum Delimiter {
+ None,
@gereeter
gereeter Apr 10, 2016

Why not just remove this case and use Option<Delimiter>?

@nrc
nrc Apr 21, 2016 Contributor

It's an extra level of nesting. The two are equivalent, but I find the non-Option version more ergonomic (just one level of pattern matching, etc.).

@canndrew
Contributor

Can there be a guarantee that different macro invocations execute in parallel? Because it would be useful if invocations could communicate with each other over channels or something.

@llogiq
Contributor
llogiq commented Apr 18, 2016

I'd like to see some more consideration of span handling: With readability-oriented lints, we often want to know where some code comes from, which is apparently a not-so-easy problem with current Rust. The question, I'd like answers to are

  • Does this code (completely or partially) originate from within a macro? e.g. in add!(x, y) → x + y the + originates from the macro, while the x and y come from the source.
  • Does this code come from one source or has it been stitched together by expansion? (e.g. for that last example the latter is the case, whereas in format!("{]", x + y), the x + y obviously originates from a single span of code.
@canndrew
Contributor

With readability-oriented lints, we often want to know where some code comes from

It would be good if the current line!, column!, file! and module_path! macros took an optional uint argument to specify the macro-invocation-depth they refer to. So line!(0) refers to the exact line where line! was written, line!(1) refers to the line where the macro calling line! was invoked (or a compile error if it was invoked through a macro) and so forth.

@paulstansifer
paulstansifer commented Apr 19, 2016 edited

I don't see any mention of procedural macros using libsyntax to parse Rust syntax inside their arguments; is there are reason that this can't happen? It ought to address the problem of syntactic evolution of the core language; procedural macros would just do the same parsing Rust does.

(I haven't looked at libsyntax in a while, but if it still kills the current task on syntax error, that might be a problem. (A problem which, ironically, ? would be immensely helpful in solving.) It's what macro_rules! macros do, and it that problem is pretty obnoxiously limiting to macro invocation syntax.)

Edit: Oops, I just hadn't read far enough; I hadn't thought about the problem of instability. It seems like the libmacro proposal addresses this problem.

@paulstansifer

Regarding the libmacro proposal (I'm not sure this is the right forum to discuss this), is it true that expand_macro will be unnecessary unless one wants to muck with hygiene or expansion order? Writing a macro that emits the macro invocation in question should be preferable in a large majority of cases, I believe.

@eddyb
Member
eddyb commented Apr 19, 2016 edited

@paulstansifer it's not as much the expansion order in terms of side-effects as it is the nesting and looking at the actual tokens resulting from the expansion, e.g. you might want to parse a string literal which is the result of a macro invocation.

@paulstansifer

@eddyb Thanks. That sounds close to what I expected; something analogous to Racket's local-expand.

@paulstansifer

Regarding punctuation, I believe that tokenization that follows Rust's tokenization rules is much better than having a separate macro tokenizer. It should be reliably possible to take an existing Rust expression (say) and drop it in as an argument to a procedural macro which can expand to that expression and have the end result parse exactly the same way (that's why not having access to libsyntax is disappointing).

One thing that's unclear to me: if the Rust lexer is not used, how will this system convert between the two token streams? Doing it manually seems hard and error-prone, but writing the tokens out to a string and having Rust lex the string loses hygiene information (unless we use hygiene to freshen the names just before writing them out).

@eddyb
Member
eddyb commented Apr 19, 2016 edited

paulstansifer libmacro would internally use a non-stable representation and libsyntax would use libmacro.
libmacro will most likely be Rust's lexer (unless @nrc's plans diverged from what I know, that is).

@paulstansifer

What happens if I try to use a procedural macro in my implementation of a procedural macro? It seems like, in the proposed version where each "phase" of procedural macros is in its own crate, that might work, but there's no way to nest #[macro] in the "Inline procedural macros" proposal. Would having an optional phase argument work (e.g., I define the first macro with #[macro(2)], so that I can use it to define a #[macro(1)] ... the number corresponding to the number of times that Racket's begin-for-syntax would need to be wrapped around something)?

@paulstansifer

@eddyb That sounds very good to me. But then why does libsyntax need to exist at all? Is it just to avoid rewriting things that depend on it? In that event, I'm not sure why libsyntax can't be stabilized, but I'm happy as long as wrapping everything in my_noop_macro!(...) is guaranteed to have no effect.

@eddyb
Member
eddyb commented Apr 19, 2016

@paulstansifer Well, the point is to keep the minimal amount of code for the public stable API in libmacro, and have the rest in libsyntax.
However, if the latter ends up small enough (i.e. libmacro includes the whole parser for some reason), it might get fused to the rest of the compiler.

@paulstansifer

@eddyb I'm confused; how could libmacro's pattern-matching tool match $e:expr (etc.) without access to (almost) the entire parser?

@eddyb
Member
eddyb commented Apr 19, 2016

@paulstansifer Since the goal there is to scan a token stream as fast as possible, it could use a much smaller "validation" parser.
For example, two things that it wouldn't need to understand are precedence (because it doesn't create the AST) and statement parsing (because statements are always found within {...} and those can be taken as whole).

@paulstansifer

@eddyb Ahh, got it. My guess is that it might be easier to get that performance savings by doing some sort of AST smuggling (i.e., use the spans to determine if the AST you matched was just dropped wholesale into the output; if so use an embedded AST instead of those tokens. Hm, that might be able to happen at that level. But I think using embedded ASTs is safe; the important thing is that the macro itself can create code just by concatenating tokens if it wants to.)

Also, I think most of libsyntax is useful to macro authors (pretty-printing, error output, quasiquotation, token trees, AST traversal tools, even identifier spell-checking, if the macro author wants to be particularly generous about errors).

@eddyb
Member
eddyb commented Apr 19, 2016

@paulstansifer The really cool thing is that you don't have to use the spans, if the token streams are not owned, but reuse the same buffers.
You can cache the parsed AST based on the tokens it came from. This is why I want to have a TokenStream API that focuses on reusing existing streams than creating them from scratch, and quasi-quoting to make the common cases simpler (again, quasi-quoting can register a constant token stream once and reuse it multiple times which would allow reusing parts of the parsed AST as well).

@eddyb
Member
eddyb commented Apr 19, 2016

@paulstansifer The useful stuff that we can stabilize without worrying too much will likely end up in libmacro, except anything dealing with ASTs, which will be provided by crates which will be on crates.io, like syntex.

@paulstansifer

@eddyb Nice!

Anyway, to zoom out some, I think this RFC is very good. (Also, I would like to register my vote in favor of deprecating macros-with-an-extra-identifier and in favor of letting macros look at their delimiters and reject callers who use the wrong one.)

Also, the execution phasing stuff I asked about is probably not high-priority, since core Rust is already a large and useful language.

@nixpulvis nixpulvis commented on the diff Apr 20, 2016
text/0000-proc-macros.md
+
+# Unresolved questions
+[unresolved]: #unresolved-questions
+
+### macros with an extra identifier
+
+We currently allow procedural macros to take an extra ident after the macro name
+and before the arguments, e.g., `foo! bar(...)` where `foo` is the macro name
+and `bar` is the extra identifier. This is used for `macro_rules` and is useful
+for macros which define classes of items, rather than instances of items. E.g.,
+a `struct!` macro might be used similarly to the `struct` keyword.
+
+My feeling is that this macro form is not used enough to justify its existence.
+From a design perspective, it encourages uses of macros for language extension,
+rather than syntactic abstraction. I feel that such macros are at higher risk of
+making programs incomprehensible and of fragmenting the ecosystem).
@nixpulvis
nixpulvis Apr 20, 2016

I disagree that this form should be removed. The ability to write a macro like object! Foo { ... } is great. Building languages on languages is one of the great things about macros.

@ticki
ticki Apr 20, 2016 Contributor

This is one of my favorite features of macros. I would oppose removing this form.

@nikomatsakis
nikomatsakis Aug 31, 2016 Contributor

I took like being able to define macros this way, but I am wary of the many compilations. e.g., I would like to put some tokens before:

pub state_machine! Foo {
   ...
}

and maybe after:

state_machine! Foo: Bar { ... }

I'm not quite sure how to cover the full range of possibilities. Not supporting it seems like a simpler starting point.

@canndrew
Contributor

If we're going to keep my_struct! Ident { .. } macros then they should be expanded to support things like having generics on Ident: my_struct! Ident<T> { .. }. Also, they should only be usable in item positions - not in, say, an expression (does this restriction currently exist?).

@canndrew
Contributor

Also, if we're going to keep my_struct! Ident { .. } macros then it might be consistent to add other kinds of macros that resemble other syntactic forms. For instance my_fn! foo() -> T { .. } or my_return! expr. We'd have to be smart about how we restrict their usage in order to prevent them from being abused to write unreadable code. For example it should always be very visible where the macro invocation ends and regular syntax starts again.

@thepowersgang
Contributor

@canndrew The problem with expanding the syntax for macro invocations is the complexity in parsing them. Knowing when to stop parsing the macro invocation and return to the original context.

With the existing two, it's just a simple matter of optionally allowing an identifier after the ! in a macro invocation (before expecting a delimited token tree). This can be parsed without knowing anything about how the macro will eventually be expanded (or even knowing that it's valid).

@paulstansifer

This is a problem that is pretty much inherent to macros in C-like syntax (really, pretty much any language that has more than S-expressions): the only way a parser knows when a particular syntactic form ends is by distinguishing forms from each other (i.e., if it couldn't tell the difference between while and if, it wouldn't know whether to stop after consuming an expression and a block), but macros can't get that privilege unless we want to allow the parser to look up macro definitions (and I don't think that Rust is that kind of language).

I once heard the (possibly joking) proposal that the number of exclamation points after a macro invocation indicate the number of token trees to consume as arguments, so that you don't have to wrap them all in a delimiter. But I think it probably is the best way to expand macro invocation syntax to avoid having to wrap everything in a delimiter.

@eddyb
Member
eddyb commented Apr 20, 2016

@paulstansifer One suggestion I sort of like is stopping at {...} or ; or when an unbalanced delimiter is seen (e.g. { foo!(1)(2)(3)(4) } would pass all 4 literals and their parens).

@paulstansifer

@eddyb In a lot of cases, that would be really nice, but I'd be really worried about something like this:

{
    probably_do_this_once!( macro_argument ) //oops, forgot semicolon!
    ought_to_always_do_this();
}

...being easy to accidentally do, easy to miss on testing, and hard to debug. I also fear there would be a lot of weird edge cases for expression macros. (make_fn!(5)(supposed_to_be_function_arg) might trip someone up, but at least it looks weird for the original intention)

@eddyb
Member
eddyb commented Apr 20, 2016

@paulstansifer Ah, I forgot how it's ambiguous for expression macros. Still could use it outside functions, I suppose?

@Ericson2314
Contributor

@paulstansifer the good thing is token trees are a hell of a lot more like s-expressions.

@eddyb Can we formulate a rule based purely on token trees that in practice roughly amounts to that? I'm thinking this is also a good chance to solve the macro-calls-in-ident-position problem.

@eddyb
Member
eddyb commented Apr 20, 2016 edited

@Ericson2314 The rule is based on token-trees, isn't it? Or do you mean for the "potentially an expression macro" special-case which needs to stop at the first delimited sequence?
You could just always stop at the first delimited sequence (or ;, whichever comes first), I guess.
That would still let you write c_for! x = 0, x < n, x += 1 {...}, for example.

@paulstansifer

@Ericson2314 But in Lisps, syntactic forms like macro invocations are always exactly one S-expression. In Rust, the number of token trees in while depends on what you use as the expression. (e.g. while (x != 0) { ... } is 3 token trees, but while x != 0 { ... } is 5 token trees.) So built-in syntactic forms are always going to be able to do things that macro invocations cannot.

@eddyb Wouldn't that make wrapping the 0 in parentheses break the macro invocation?

I claim that C-style syntax is just incapable of being extended with arbitrary statements that look and feel like C. The token trees buy us a lot of syntactic flexibility inside the macro invocations, but we need delimiters to find them in the first place. (Again, this is assuming that we need to be able to parse a file without looking up its macro definitions. In languages where that isn't true, token trees will (I think!) let you scope your extensions to the parser. But that's neither here nor there.)

@Ericson2314
Contributor

@eddyb The latter. I meant having taking exactly one token tree for "expression macros", but multiple for "item macros", which I believe is equivalent to the second part of what you said?

@Ericson2314
Contributor

@paulstansifer Yes, that's true. On the upside, while the Rust grammar is allowed to associated elements sequence of token trees into new sub trees, it is not allowed to split a node. For these reasons, I've long been wanting to formulate a notion of a "N+1 round grammar" that respects the structure imposed by previous grammars (sexpr, token tree, etc).

@comex
comex commented Apr 21, 2016 edited

Something I just thought of that I haven't seen mentioned: interaction with incremental compilation. If things haven't changed since RFC 1298, the current plan for incremental compilation is to redo parsing and macro expansion every time, so this won't be a practical concern in the immediate future. But there are a few reasons to believe it might eventually be desirable to implement incremental parsing:

(1) There are good use cases for procedural macros that are quite expensive to run (not just to process a macro's output but actually to run it); I'm not sure what people are using today, but examples include perfect hash table generators, bindgen as a macro (parsing arbitrary C code - someday I hope someone writes a good C++ bindings generator, which would be even more expensive), etc.

(2) Even without individually expensive macros, there is no guarantee that rustc can, or will be able to in the future, parse a reasonably large crate from scratch fast enough to preserve a feeling of interactivity in IDEs. For example, on my computer, rustc -Z time-passes claims to take ~250ms to parse libsyntax, whereas I've seen 200ms cited on Rust lists as a benchmark for IDE responses to feel interactive/instantaneous. rustc could get faster, but there are larger crates and slower machines, and that doesn't count whatever other work an incremental compilation run would have to do.

(3) Anyway, 200ms is lame; my definition of "instant" is one frame at 60fps (16ms), and to be absurdly speculative for a moment, I think if someone wrote a Rust compiler from scratch to be incremental from the ground up, it could react that quickly in most cases to the kinds of small changes caused by a human entering keystrokes - but certainly not without incremental parsing.* I'm not volunteering to write one - it's probably really hard and I have no idea if anyone will ever do so - but the language itself should avoid putting obstacles in the way of a hypothetical attempt.

So I think it would be a good idea to consider a few (fairly minor) points:

  • It seems like most procedural macros are, semantically, pure functions of their input token stream (plus any results of compiler-intermediated lookups, as mentioned below), so there should be some sort of attribute on macro definitions to allow a future compiler to cache expansions. Perhaps even the default.
  • Others seem to want global state across a crate, which... ick. eddyb has already mentioned the undesirability of plugins storing things in random statics, despite the use cases for depending on things defined elsewhere in the crate (like pnet). Incremental parsing is just one more reason to strongly discourage the former, but provide suitable alternative APIs for the latter (ideally in the first version, to avoid plugin authors hacking around the limitation). If that includes any API to store arbitrary data along with an item/declaration, it should be a byte string or other serializable type, rather than something like Box<Any> - supporting out-of-process plugin invocation already sort of demands this, unless you want to commit to running multiple invocations of the same plugin in the same process, but a future compiler could persist the same data to disk. Also, if possible, the APIs should prefer associating immutable data with declarations over providing mutable data stores, which implicitly serialize plugin invocations (but maybe there are use cases that fundamentally depend on the latter - it wouldn't make caching impossible, just a bit slower).
  • Other macros, like bindgen!, fundamentally depend on external state (the .h file to parse and its dependencies) but could in principle do their own caching, e.g. imitating make by keeping track of which files were actually used and their mtimes, and upon an incremental rebuild statting them to check whether none were changed. I suppose one option would be for the plugin to have its own database somewhere, and just output a copy of the same tokens upon a cache hit; future-rustc would then hash the token output and compare with its own database to determine whether reparsing is necessary, with no additional API surface required. But this seems inelegant, because then state would be spread across multiple files and time would be wasted generating tokens just to throw them away. It might be nicer if the plugin could just ask the compiler to associate its token output with an opaque cache key, and upon rebuild be prompted with the key and given the opportunity to return "up to date" (instead of tokens).

The last point sounds tricky enough that perhaps it should be deferred until incremental parsing is a reality (if ever), but at least the design should try to avoid making such an interface more difficult to implement in the future. I think the former two are reasonable to build into the design from the start. Any comments?

* (Not always, obviously. If nothing else - and there are many things else - generics are Turing complete, and one keystroke can change one identifier to another and thereby require an arbitrary amount of computation to re-typecheck. But it doesn't have to be always.)

@Ericson2314
Contributor

@comex I've long hoped incremental compilation could be built on top of a library for dag of cached computations. If we had such a thing, then the procedural macros interface could just be another dag to stitch together with rustc's actions---very slick.

@mahkoh mahkoh commented on the diff Apr 22, 2016
text/0000-proc-macros.md
+pub enum StringKind {
+ Regular,
+ Byte,
+}
+
+// A Symbol is a possibly-interned string.
+pub struct Symbol { ... }
+```
+
+### Open question: `Punctuation(char)` and multi-char operators.
+
+Rust has many compound operators, e.g., `<<`. It's not clear how best to deal
+with them. If the source code contains "`+ =`", it would be nice to distinguish
+this in the token stream from "`+=`". On the other hand, if we represent `<<` as
+a single token, then the macro may need to split them into `<`, `<` in generic
+position.
@mahkoh
mahkoh Apr 22, 2016 Contributor

Expressions cannot be parsed without infinite lookahead if || is not a single token.

@eddyb
eddyb Apr 22, 2016 Member

You already need infinite lookahead if you count whitespace tokens.

@mahkoh
mahkoh Apr 22, 2016 edited Contributor

As far as I know, nobody is proposing whitespace tokens. In any case, even with infinite lookahead it's impossible to distinguish a || b | { b: B } from a | | b | { b: B }. Given the right choice of types, both of these parse into (quite different) syntax trees and pass all compile time checks.

@eddyb
eddyb Apr 22, 2016 Member

Ah, that's not a lookahead problem, it's just plain ambiguous.
I agree that distinguishing between || and | | is important, it's one of the first things I complained about and the reason for this open question, but GitHub has hidden that thread.

@jimmycuadra jimmycuadra referenced this pull request in rusoto/rusoto May 5, 2016
Closed

Could not compile `rusoto_codegen` #256

@mattico
mattico commented May 10, 2016

More macro related bikeshedding (sorry I can't resist)

Add an "argument" to the macro keyword to specify its scope:

pub macro(attr) foo(Option<TokenStream>, TokenStream, &mut MacroContext) -> TokenStream;

pub macro(fn) foo(TokenStream, &mut MacroContext) -> TokenStream;

// Possible combined syntax
pub macro(attr, fn) foo(Option<TokenStream>, Option<TokenStream>, &mut MacroContext) -> TokenStream;

// Possibly specify delimeters
pub macro(fn("{", "[", "(")) foo(TokenStream, &mut MacroContext) -> TokenStream;

// Possibly use the bare macro keyword for macro by example
pub macro foo($a:ident) => {}

// Or maybe some other argument?
pub macro(rules) foo($a:ident) => {}
@Ericson2314
Contributor

Is it out of scope to ask how Cargo should treat procedural macro crates? IMO they fall awkwardly in between dev-dependencies, build-dependencies and normal dependencies. For example

  • Use build, not host platform, like build-dependencies
  • May be re-exported in pattern-based macros (?!), necessitating dependency propagation.
@roosmaa roosmaa commented on the diff Jul 26, 2016
text/0000-proc-macros.md
+I propose a simpler token abstraction than is currently used in the compiler.
+The proposed system may be used directly in the compiler or may be an interface
+wrapper over a more efficient representation.
+
+Since macro expansion will not operate purely on tokens, we must keep hygiene
+information on tokens, rather than on `Ident` AST nodes (we might be able to
+optimise by not keeping such info for all tokens, but that is an implementation
+detail). We will also keep span information for each token, since that is where
+a record of macro expansion is maintained (and it will make life easier for
+tools. Again, we might optimise internally).
+
+A token is a single lexical element, for example, a numeric literal, a word
+(which could be an identifier or keyword), a string literal, or a comment.
+
+A token stream is a sequence of tokens, e.g., `a b c;` is a stream of four
+tokens - `['a', 'b', 'c', ';'']`.
@roosmaa
roosmaa Jul 26, 2016

';'' feels like there's one extra single-quote here? The extra quote is also present in the following paragraph.

@nrc nrc added the I-nominated label Aug 30, 2016
@nrc
Contributor
nrc commented Aug 30, 2016 edited

I'd like to propose we move this RFC to final comment period (note that there is a change to the RFC acceptance process here to get more consensus from the team before moving to FCP). Note that there are some questions below that should be answered either before or during FCP.

Summary: this RFC proposes a long-term solution for procedural macros (aka compiler plugins, aka syntax extensions). This builds on earlier RFCs which specify the naming of macros. The key component of this RFC is basing macros on tokens rather than the AST and improving the ergonomics of declaring macros.

There has been considerable discussion on this thread. The overall sentiment has been positive. I believe the undecided questions are summarised below. Although there have been very many useful and constructive comments, I don't think any particular comments need highlighting here.

Questions:

  • the syntax for the function backing a macro - #[macro] fn foo(...) ... vs macro foo(...) ... (vs some other variations).
  • whether we should continue to support the macro! ident ( ... ) form of macros.
  • interaction with Cargo, incremental compilation, and some other tooling issues. I believe these issues can be decided as part of implementation or in a follow-up RFC.

I propose moving to FCP with an inclination to accept, once the questions above are decided.

@rust-lang/lang members, please check off your name to signal agreement. Leave a comment with concerns or objections. Others, please leave comments. Thanks!

@withoutboats

whether we should continue to support the macro! ident ( ... ) form of macros.

This can be an extension proposed in a future RFC, can't it? There's nothing incompatible about adding new syntactic forms for procedural macros in the future?

@jimmycuadra

This can be an extension proposed in a future RFC, can't it? There's nothing incompatible about adding new syntactic forms for procedural macros in the future?

Yes, although we want to motivate people to switch everything using macros 1.0 to macros 2.0 so that macros 1.0 could eventually be removed. If there is something you can do now that you couldn't do in the new system from the get go, it could work against that goal.

@nrc
Contributor
nrc commented Aug 30, 2016

I envisage this RFC replacing the existing procedural macro system, so the facility for macros with idents would disappear completely at some point. I can understand how users of such macros might not want that. Putting it back in the future feels pretty sub-optimal for existing users.

macros 1.0 could eventually be removed

Given that current proc macros are unstable and kind of crufty, I would expect them to be deprecated and removed sooner rather than later (speaking for myself, the rest of the lang team may want to move slower).

@nikomatsakis nikomatsakis commented on an outdated diff Aug 31, 2016
text/0000-proc-macros.md
+ pub span: Span,
+ pub hygiene: HygieneObject,
+}
+
+pub enum TokenKind {
+ Sequence(Delimiter, TokenStream),
+
+ // The content of the comment can be found from the span.
+ Comment(CommentKind),
+
+ // Symbol is the string contents, not including delimiters. It would be nice
+ // to avoid an allocation in the common case that the string is in the
+ // source code. We might be able to use `&'Codemap str` or something.
+ // `Option<usize> is for the count of `#`s if the string is a raw string. If
+ // the string is not raw, then it will be `None`.
+ String(Symbol, Option<usize>, StringKind),
@nikomatsakis
nikomatsakis Aug 31, 2016 Contributor

Nit: I think we should use struct variants.

@nikomatsakis nikomatsakis and 1 other commented on an outdated diff Aug 31, 2016
text/0000-proc-macros.md
+ InnerDoc,
+ OuterDoc,
+}
+
+pub enum StringKind {
+ Regular,
+ Byte,
+}
+
+// A Symbol is a possibly-interned string.
+pub struct Symbol { ... }
+```
+
+### Open question: `Punctuation(char)` and multi-char operators.
+
+Rust has many compound operators, e.g., `<<`. It's not clear how best to deal
@nikomatsakis
nikomatsakis Aug 31, 2016 Contributor

I would imagine we can just accumulate any number of adjacent (not whitespace-separated) characters into a single punctuation.

@nikomatsakis
nikomatsakis Aug 31, 2016 Contributor

Sorry, that was too vague. Initially I was thinking we could use an &str but since we probably don't want to thread a lifetime, you could imagine instead that we just pass enough info that you can ask the MacroContext for the precise string:

impl MacroContext {
    fn token_str(&self, token: &Token) -> &str;
}
@sgrif
sgrif Sep 5, 2016 Contributor

I would imagine we can just accumulate any number of adjacent (not whitespace-separated) characters into a single punctuation.

That sounds dangerously close to the C++ issue of requiring spaces in nested generics.

@nikomatsakis
Contributor
nikomatsakis commented Aug 31, 2016 edited

OK so I'm by-and-large in favor of this RFC and I think it is pointing us in the right direction. My biggest concern with the present text is that it ignores the existence of the "Macros 1.1" RFC. I think that these things are basically compatible, but it'd be nice to see the text reconciled with Macros 1.1 and re-using terms from that RFC where applicable.

In particular, I think the signatures of all these various kinds of procedural macros (including the custom derives) ought to be roughly the same, right? (I think we removed the context from there, but I think that was a mistake. :)

UPDATE: To be clear, I am formally objecting to moving to FCP until the text is updated, just because I think that updating should happen and doing so before FCP makes sense -- not because I expect any major surprises.

@nikomatsakis
Contributor

One other issue I wanted to raise but which does not, I think, need to block moving forward here:

Using a #[macro] annotated fn as the fundamental basis for extension is kind of limiting. If we want to later add more kinds of metadata about the macro, or other kinds of queries that the compiler may want to run, we are kind of stuck. I suspect that ultimately we will want a "procedural macro" trait that you can implement for a dummy type. That said, I think that even then we may still want #[macro]-annotated functions as a shorthand that is equivalent to a kind of minimal trait impl, which is why I say this doesn't strike me as a blocking concern.

Some examples off the top of my head of things we might want to do that would require more metadata or additional fns and which may not be easily accommodated by extending the #[macro] annotation:

  • procedural macros may want to give information to the compiler to enable better diagnostics or help messages, or to tweak the expansion ordering in some way or another;
  • we may want a "dummy expansion" that can be used for refactoring and type-checking without doing the full expansion
  • we may want to adopt a more declarative approach to hygiene as @wycats has proposed, so that we can handle refactorings more correctly without running the full expansion.
@Zoxc
Zoxc commented Aug 31, 2016

I'd still like something that accurately tracks whitespace/strings given to macros. Whitespace interpreters are important!

@Ericson2314
Contributor

@Zoxc One can always use string literals. The relationship between Rust and token trees is like that between Lisps and s-expressions, and I'd like to keep it that way.

@nrc
Contributor
nrc commented Oct 7, 2016

I've updated the RFC to take into account macros 1.1 and feedback in the thread.

@rfcbot fcp merge

@rfcbot
rfcbot commented Oct 7, 2016 edited

Team member @nrc has proposed to merge this. The next step is review by the rest of the tagged teams:

Concerns:

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@nrc
Contributor
nrc commented Oct 7, 2016 edited
+Attribute-like:
+
+```
+#[prco_macro_attribute]
@pnkfelix
pnkfelix Oct 13, 2016 Member

typo: prco

+The second argument is the tokens for the AST node the attribute is placed on.
+Note that in order to compute the tokens to pass here, the compiler must be able
+to parse the code the attribute is applied to. However, the AST for the node
+passed to the macro is discarded, it is not passed to the macro nor used by the
@pnkfelix
pnkfelix Oct 13, 2016 edited Member

this phrasing cannot be correct as stated: "the AST for the node passed to the macro [...] is not passed to the macro" ...

  • It may seem like I am acting dense, but no, I am honestly confused by the construction of the sentence.

Q: Is the idea here just that whatever be the structure passed as the second argument to a proc_macro_attribute procedure, nothing that procedure does can affect the AST that second argument was generated from?

  • its hard to see how this distinction matters, since the AST is replaced by the one generated from parsing the returned tokens, no?
+ // Word is defined by Unicode Standard Annex 31 -
+ // [Unicode Identifier and Pattern Syntax](http://unicode.org/reports/tr31/)
+ Word(Symbol),
+ Punctuation(char),
@pnkfelix
pnkfelix Oct 13, 2016 edited Member

You gave an example above of the token stream ['a', 'b', 'c', ';']. My first impression upon reading that is that all four of the listed tokens belong to the same category, but then I saw this Word/Punctuation distinction in the variants here.

Is ';' considered Punctuation? (Edit: oh, answer is, its listed above as its own variant, Semicolon.)

  • What about ,?
  • What about +?
  • What about && or -> or <-?
  • Edit: ah, down below I see that you have pointed out multi-char operators as an open-question. So I guess all of these cases are considered Punctuation, and its up to the parser to take a sequence [&, &] and turn it into &&? Sounds painful for macro authors (indeed, that pain seems to also be discussed below...)
@pnkfelix
Member

@rfcbot concern multi-char-operators

The RFC lists "Punctuation(char) and multi-char operators" as an open question. I think I would want to see a particular solution to this question settled before we accept. (Some were listed but I cannot infer which would actually be implemented.)

+position.
+
+I had hoped to represent each character as a separate token. However, to make
+pattern matching backwards compatible, we would need to combine some tokens. In
@nikomatsakis
nikomatsakis Oct 19, 2016 edited Contributor

Can someone elaborate on this? I am confused. What does it mean to be "completely backwards compatible" -- backwards compatible with what?

@eddyb
eddyb Oct 19, 2016 Member

Something like able to tell apart << and < <, presumably? Consider a << b > ::c vs a < < b > ::c.

+ Char(char),
+
+ // These tokens are treated specially since they are used for macro
+ // expansion or delimiting items.
@nikomatsakis
nikomatsakis Oct 19, 2016 edited Contributor

Can someone elaborate a bit here too -- why do these tokens need to be treated specially? It seems like some particular procedural macro might treat them separately, but why are we involved here? Maybe I'm missing some context.

@eddyb
eddyb Oct 19, 2016 Member

They shouldn't be, we just had a bad history of treating $foo (or worse, $bar:ty) as a single token.
cc @jseyfried (who I believe has removed at least some of those cases)

@jseyfried
jseyfried Oct 19, 2016 edited

We only treat $foo and $bar:ty as a single token when parsing with a non-zero quote_depth, which (afaik) happens only in macro_rules's tt fragment parser and in the quasi-quoter. In particular, today's procedural macros see $foo as two tokens.
(btw, I agree that these tokens shouldn't be treated specially)

@nikomatsakis
Contributor

On the topic of multibyte characters, a trick I used in my LR(1) Rust grammar (that I stole from somewhere else, the Java grammar maybe?) seems relevant. In that case, I changed it so that the character < generates one of two distinct tokens depending on what comes next. So e.g. if you have a < b that is these tokens: ID <[] ID whereas a << b is these tokens: ID <[<] <[] ID. Note that < is either <[<] (meaning: a < followed by another < without whitespace) or <[] (meaning: a < followed by whitespace or some other thing).

One could imagine having PunctuationAlone(char) and PunctuationJoint(char), where the latter means "a punctuation character with another punctuation character coming immediately afterwards". So << would be PunctuationJoint('<') PunctuationAlone('<'). Probably with better names.

This trick basically has the best of both worlds from what I can tell.

@nikomatsakis
Contributor
nikomatsakis commented Oct 19, 2016 edited

Procedural macros which take an identifier before the argument list (e.g, foo! bar(...)) will not be supported (at least initially).

Strictly for the record, I personally think these can be a really useful form, but I'm happy to leave them out.

My main use case is something like neon, which uses (or plans to use) macros to declare JavaScript classes. It's just nicer to write:

class! Foo {
}

and less great to write:

neon! {
    class Foo {
    }
}

But I know there are all kinds of niggly details (do you accept class! Foo: Bar { ... }? etc) and leaving it out to start seems fine.

@eddyb
Member
eddyb commented Oct 19, 2016 edited

So << would be PunctuationJoint('<') PunctuationAlone('<').

Not quite (see below). What about Op(Joint | Alone, char)?

  • eat(<): Op(_, '<')
  • eat(<<): Op(Joint, '<') Op(_, '<') (could be <<<T>::CONST)

EDIT: I like Op more, it's shorter than Punctuation. Any better suggestions?
Also, who else wishes they could write a pattern once and then reuse it?

pattern Op2(a: char, b: char) = [Op(Joint, a), Op(_, b)];
pattern Shl = Op2('<', '<');
pattern Shr = Op2('>', '>');

Add slice-like pattern-matching on random-access iterators and you have yourself one of the nicest setups for writing parsers, ever, in a systems programming language, maybe nicer than some FP ones.

EDIT2: Ahhhh I want this so badly (and this doesn't even use VG/const generics):

pattern BinOpChar = '+' | '-' | '*' | '/' | '%' | '^' | '|' | '&';
pattern Op2(a: char, b: char) = [Op(Joint, a), Op(_, b)];
pattern Op3(a: char, b: char, c: char) = [Op(Joint, a), Op(Joint, b), Op(_, c)];
pattern Op1Assign([a]: [char; 1]) = Op2(a, '=');
pattern Op2Assign([a, b]: [char; 2]) = Op3(a, b, '=');
pattern OpAssign(op: [char]) = Op2Assign(op @ ['<', '<']) |
                               Op2Assign(op @ ['>', '>']) |
                               Op1Assign(op @ [BinOpChar]);
// ...
// Random example of "lowering":
match tokens {
    [x @ Ident(_), ..OpAssign(op), ..] => [x, Op(Alone, '='), x, op],
    // ...
}
@Ericson2314
Contributor
Ericson2314 commented Oct 19, 2016 edited

I am concerned about the lack of an analog to Racket's (import (for-syntax ..)), but this also applies to macros 1.1 so probably best to discuss it in that tracking issue, and update this post-merge based on the resolution there.

@nikomatsakis
Contributor
nikomatsakis commented Oct 31, 2016 edited

@pnkfelix would you consider this approach to adjacent operators (or, more specifically, @eddyb's single-variant formulation here) to resolve your "multi-char-operators" concern?

@nrc, maybe edit RFC to include it?

@Zoxc
Zoxc commented Oct 31, 2016

@nikomatsakis What about DSLs which want custom operators like <=> without allowing <= > too?

@eddyb
Member
eddyb commented Oct 31, 2016

@Zoxc Read my description (which @nikomatsakis linked above), you can check for chaining indefinitely, I have an example for compound assignment such as >>=, which can be distinguished from >> =.

@pnkfelix
Member
pnkfelix commented Nov 2, 2016

@nrc @nikomatsakis @eddyb I think the options you have outlined would address my concern, (which was mostly about the fact that the known problem was unaddressed; I did not mean to imply the problem was unsolvable.)

@pnkfelix
Member
pnkfelix commented Nov 3, 2016

@rfcbot resolved multi-char-operators

@nrc
Contributor
nrc commented Nov 28, 2016

ping for approval - @aturon @withoutboats #1566 (comment)

I plan to update the RFC with the recent discussion about multi-char-operators.

@aturon
Contributor
aturon commented Nov 29, 2016

@nrc I've re-read the RFC and the thread. Like most everyone else, I'm broadly in favor of the overall direction here (operating on token streams, easing the declaration system). My sense is that there's a lot of room for iteration on the details, but I'm taking this RFC as largely about setting the overall direction of exploration. In particular, I imagine that libproc_macro itself is going to take significant iteration that will feed back into the design elements here.

I'm also in agreement with the various points of scope cutting (e.g., making proc_macro work at crate granularity, leaving out foo! bar (..) macros). There's going to be a lot of work to put this new system together, so anywhere we can punt extensions to the future, we should.

👍 from me!

@withoutboats

👍 from me. I don't think this is impacted by the conversation on #1584, at least not in any way that should block the RFC from being accepted.

@rfcbot
rfcbot commented Nov 30, 2016

🔔 This is now entering its final comment period, as per the review above. 🔔

@aturon aturon referenced this pull request in rust-lang/rust Dec 13, 2016
Open

Tracking issue for RFC 1566: Procedural macros #38356

@aturon aturon merged commit 1c2a50d into rust-lang:master Dec 13, 2016
@aturon
Contributor
aturon commented Dec 13, 2016

The RFC bot has gotten stuck, but almost two weeks have elapsed since FCP, and positive consensus around this feature remains. I'm merging the RFC! Thanks @nrc!

Tracking issue

+two kinds exist today, and other than naming (see
+[RFC 1561](https://github.com/rust-lang/rfcs/pull/1561)) the syntax for using
+these macros remains unchanged. If the macro is called `foo`, then a function-
+like macro is used with syntax `foo!(...)`, and an attribute-like macro with
@est31
est31 Dec 14, 2016

So this means you can't call procedural macros with [] syntax anymore, like vec! for example does?

@Connorcpu
Connorcpu Dec 21, 2016

No, it's talking about function-like (foo!(...), foo![...], foo!{...}) vs attribute macros (#[my_macro] ...)

@bors bors added a commit to rust-lang/rust that referenced this pull request Jan 17, 2017
@bors bors Auto merge of #38842 - abonander:proc_macro_attribute, r=jseyfried
Implement `#[proc_macro_attribute]`

This implements `#[proc_macro_attribute]` as described in rust-lang/rfcs#1566

The following major (hopefully non-breaking) changes are included:

* Refactor `proc_macro::TokenStream` to use `syntax::tokenstream::TokenStream`.
    * `proc_macro::tokenstream::TokenStream` no longer emits newlines between items, this can be trivially restored if desired
    * `proc_macro::TokenStream::from_str` does not try to parse an item anymore, moved to `impl MultiItemModifier for CustomDerive` with more informative error message

* Implement `#[proc_macro_attribute]`, which expects functions of the kind `fn(TokenStream, TokenStream) -> TokenStream`
    * Reactivated `#![feature(proc_macro)]` and gated `#[proc_macro_attribute]` under it
    * `#![feature(proc_macro)]` and `#![feature(custom_attribute)]` are mutually exclusive
    * adding `#![feature(proc_macro)]` makes the expansion pass assume that any attributes that are not built-in, or introduced by existing syntax extensions, are proc-macro attributes

* Fix `feature_gate::find_lang_feature_issue()` to not use `unwrap()`

    * This change wasn't necessary for this PR, but it helped debugging a problem where I was using the wrong feature string.

* Move "completed feature gate checking" pass to after "name resolution" pass

    * This was necessary for proper feature-gating of `#[proc_macro_attribute]` invocations when the `proc_macro` feature flag isn't set.

Prototype/Litmus Test: [Implementation](https://github.com/abonander/anterofit/blob/proc_macro/service-attr/src/lib.rs#L13) -- [Usage](https://github.com/abonander/anterofit/blob/proc_macro/service-attr/examples/post_service.rs#L35)
27d0d45
@bors bors added a commit to rust-lang/rust that referenced this pull request Jan 18, 2017
@bors bors Auto merge of #38842 - abonander:proc_macro_attribute, r=jseyfried
Implement `#[proc_macro_attribute]`

This implements `#[proc_macro_attribute]` as described in rust-lang/rfcs#1566

The following major (hopefully non-breaking) changes are included:

* Refactor `proc_macro::TokenStream` to use `syntax::tokenstream::TokenStream`.
    * `proc_macro::tokenstream::TokenStream` no longer emits newlines between items, this can be trivially restored if desired
    * `proc_macro::TokenStream::from_str` does not try to parse an item anymore, moved to `impl MultiItemModifier for CustomDerive` with more informative error message

* Implement `#[proc_macro_attribute]`, which expects functions of the kind `fn(TokenStream, TokenStream) -> TokenStream`
    * Reactivated `#![feature(proc_macro)]` and gated `#[proc_macro_attribute]` under it
    * `#![feature(proc_macro)]` and `#![feature(custom_attribute)]` are mutually exclusive
    * adding `#![feature(proc_macro)]` makes the expansion pass assume that any attributes that are not built-in, or introduced by existing syntax extensions, are proc-macro attributes

* Fix `feature_gate::find_lang_feature_issue()` to not use `unwrap()`

    * This change wasn't necessary for this PR, but it helped debugging a problem where I was using the wrong feature string.

* Move "completed feature gate checking" pass to after "name resolution" pass

    * This was necessary for proper feature-gating of `#[proc_macro_attribute]` invocations when the `proc_macro` feature flag isn't set.

Prototype/Litmus Test: [Implementation](https://github.com/abonander/anterofit/blob/proc_macro/service-attr/src/lib.rs#L13) -- [Usage](https://github.com/abonander/anterofit/blob/proc_macro/service-attr/examples/post_service.rs#L35)
8926da2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment