New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Macro Expansion for Macro Input #2320

Open
wants to merge 17 commits into
base: master
from

Conversation

Projects
None yet
10 participants
@pierzchalski

pierzchalski commented Feb 2, 2018

@pierzchalski pierzchalski referenced this pull request Feb 3, 2018

Open

Tracking issue: declarative macros 2.0 #39412

9 of 19 tasks complete
@alexreg

This comment has been minimized.

alexreg commented Feb 3, 2018

So the idea would be to implement the lift macro you mentioned in rust-lang/rust#39412 (comment) using this macro expansion API?

@alexreg

This comment has been minimized.

alexreg commented Feb 3, 2018

@pierzchalski Incidentally, you probably want to CC/assign @jseyfried to this PR.

@pierzchalski

This comment has been minimized.

pierzchalski commented Feb 3, 2018

@alexreg Whoops! Done.

@alexreg

This comment has been minimized.

alexreg commented Feb 4, 2018

On second thought, maybe better to CC @petrochenkov given @jseyfried's long-term absence?

@petrochenkov

This comment has been minimized.

Contributor

petrochenkov commented Feb 5, 2018

maybe better to CC @petrochenkov

Sorry, can't say anything useful here, I haven't written a single procedural macro in my life and didn't touch their implementation in the compiler either.

@pierzchalski

This comment has been minimized.

pierzchalski commented Feb 5, 2018

This is a language/compiler RFC so I guess @nikomatsakis and @nrc are two other people to CC, anyone else who would be interested?

@alexreg

This comment has been minimized.

alexreg commented Feb 5, 2018

@petertodd Oh, sorry. I gathered from your comments on the declarative macros 2.0 RFC that you knew something of the macros system in general. My bad.


* Greatly increases the potential for hairy interactions between macro calls. This opens up more of the implementation to be buggy (that is, by restricting how macros can be expanded, we might keep implementation complexity in check).

* Relies on proc macros being in a separate crate, as discussed in the reference level explanation [above](#reference-level-explanation). This makes it harder to implement any future plans of letting proc macros be defined and used in the same crate.

This comment has been minimized.

@Centril

Centril Feb 5, 2018

Contributor

I'd like to highlight this drawback. Are the gains in this RFC enough to outweigh this drawback?

This comment has been minimized.

@alexreg

alexreg Feb 5, 2018

Indeed, why does it require a separate crate for proc macros? Can you elaborate?

This comment has been minimized.

@pierzchalski

pierzchalski Feb 5, 2018

Thinking about it more, this expansion API doesn't add any extra constraints to where a proc macro can be defined, so I guess this shouldn't really be here.

Originally I was worried about macro name resolution (I thought having proc macros in a separate crate at the call site would make that easier but given that there are other issues involving macro paths this seems redundant to worry about), and collecting definitions in an 'executable' form.

Declarative macros can basically be run immediately after they're parsed because they're all compositions of pre-existing built-in purely-syntactic compiler magic. Same-crate procedural macros would need to be 'pre-compiled' like they're tiny little inline build.rss scattered throughout your code. I thought this would interact poorly in situations line this:

#[macro_use]
extern crate some_crate;

#[proc_macro]
fn my_proc_macro(ts: TokenStream) -> TokenStream { ... }

fn main() {
    some_crate::a_macro!(my_proc_macro!(foo));
}

How does some_crate::a_macro! know how to expand my_proc_macro!?

In hindsight, this is just a roundabout way of hitting an existing problem with same-crate proc macros:

// Not a proc-macro.
fn helper(ts: TokenStream) -> TokenStream { ... }

#[proc_macro]
fn a_macro(ts: TokenStream) -> TokenStream {
    let helped_ts = helper(ts);
    ...
}

fn main() {
    a_macro!(foo);
}

Same question: how does a_macro! know how to evaluate helper? I think whatever answer we find there will translate to this macro expansion problem.

Anyway, I'm now slightly more confident that that particular drawback isn't introduced by this RFC. Should I remove it?

This comment has been minimized.

@alexreg

alexreg Feb 5, 2018

Yeah, I'd tend to agree with that assessment. Is there an RFC open for same-crate proc macros currently? If so, I'd be curious to read it over.

This comment has been minimized.

@pierzchalski

pierzchalski Feb 6, 2018

I remember reading some fleeting comments about it, but I just had a quick look around and I can't find anything about plans for it.

This comment has been minimized.

@Centril

Centril Feb 6, 2018

Contributor

I'm no expert wrt. proc macros.. I'd also be interested in any resources wrt. same-crate macros.

Thanks for the detailed review and changes =)

This comment has been minimized.

@alexreg

alexreg Feb 6, 2018

@pierzchalski On a related note, my WIP PR can be found here: rust-lang/rust#47992 (comment). I'm going to make another big commit & push in an hour I think.

@pierzchalski pierzchalski changed the title Add macro expansion API to proc macros RFC: Add macro expansion API to proc macros Feb 5, 2018

Update 0000-proc-macro-expansion-api.md
Remove 'same crate proc macro' drawback and replace it with discussion under reference explanation, since it's an issue that isn't introduced by this RFC and will also probably share a solution.

@sgrif sgrif added the T-lang label Feb 8, 2018


Built-in macros already look more and more like proc macros (or at the very least could be massaged into acting like them), and so they can also be added to the definition map.

Since proc macros and `macro` definitions are relative-path-addressable, the proc macro call context needs to keep track of what the path was at the call site. I'm not sure if this information is available at expansion time, but are there any issues getting it?

This comment has been minimized.

@jseyfried

jseyfried Feb 9, 2018

Yeah, this information is available at expansion time. Resolving the macro shouldn't be a problem.

@pierzchalski

This comment has been minimized.

pierzchalski commented Feb 9, 2018

I just realised that one of the motivations for this feature (the lift! macro alluded to by @alexreg) wouldn't actually be made possible by this RFC. lift! needs to lift the contained macro up two levels:

#[proc_macro]
fn lift(ts: TokenStream) -> TokenStream {
    let mut mac_c = ...;
    mac_c.call_from(...);
    //              ^^^
    // This needs to be the span/scope/context of, in this
    // example, `main`: the caller of `m`, which is the caller of `lift!`.
    ...
}

macro m() {
    lift!(m_helper!()); // Should set the caller context of `m_helper!` to
                        // caller context of `m!`.
}

fn main() {
    m!();
}

But the current Span API doesn't allow such shenanigans. @jseyfried, does the RFC you mentioned here hold any hope? How exciting a change is it?

@alexreg

This comment has been minimized.

alexreg commented Feb 9, 2018

@pierzchalski Yeah, it looks like either we'd have to bake this lift macro into the compiler, or extend the proc macro API (ideally to provide a whole stack of syntax contexts for macro expansions).

@llogiq

This comment has been minimized.

Contributor

llogiq commented Mar 9, 2018

Good job! I've wanted a solution for this for some time. I see but two possible problem with the solution this RFC PR suggests:

  1. If we have multiple procedural macros, their order of execution may change the result. Consider proc_macro_a, which wants to ignore macros, just passing ExprMac nodes unchanged, whereas proc_macro_b will expand them. Now if proc_macro_a runs before proc_macro_b, all is well and the macro authors don't need to care about what could have led to the result.
    However, if proc_macro_b runs before proc_macro_a, the latter will only see the expansion of the expressions, and now proc_macro_a's author will have to worry about whether an expression comes from an expanded macro.
    A simple solution would be to extend the registry API so that proc macros can register themselves as pre-expansion or post-expansion. Pre-expansion macros won't be allowed to fold an Expr to something expanded (which would need a marker and detection visitor), while post-expansion macros will see the expressions after macro expansion (and could find out what led to this particular code via
    the expansion info).
    A possible extension would be to introduce a third during-expansion category, which are allowed to expand macros, but may get the AST at any stage in the expansion chain.
  2. Compiler-internal macros may expand to something that is not allowable outside the compiler (see __unstable_column!() for example). Expanding it from within a macro could
  • fail – as it is currently the case. The macro will abort with a panic. This is suboptimal for obvious reasons
  • return a Result that may contain an error object of some sort. This is still not optimal, as for example, the vec![] macro contains such a thing, and it is one thing we likely want to have expanded, but we can probably deal with that by making the error object return the expanded result until the expansion
    which caused the error, which should suffice for most cases
  • go through and allow proc macro authors to reach into the compiler internals. This is not something we want to stabilize, ever.
@pierzchalski

This comment has been minimized.

pierzchalski commented Apr 3, 2018

@llogiq sorry for the late reply!

I'm not sure what point you're trying to make in (1) - if I change the order of two macro calls, I don't really expect the same result in general, similar to if I change the order of two function calls. Do you have a concrete example of a proc macro which wants to ignore/pass-through macro nodes but which also cares if an expression comes from a macro expansion?

Also re. (1), I'm not overly familiar with the expansion process but as far as I understand and recall, the current setup is recursive fixpoint expansion, which makes it hard to have cleanly delineated pre- and post-expansion phases for macros to register themselves for. Can you clarify how these would work in that context?

Regarding (2), one dodgy solution is to have the macro expansion utility functions be internals-aware by having a blacklist of "do not expand" macros, but that's pretty close to outright stabilising them.

@llogiq

This comment has been minimized.

Contributor

llogiq commented Apr 3, 2018

To answer (2), in mutagen, I'd like to avoid mutating assert! and similar macros, so I'm interested not only if code comes from a macro, but also which one. On the other hand, I'd like to mutate other macro calls, e.g. vec![..] or println(..). This should also explain (1), because mutagen, as a procedural macro, may see a mixture of pre- and post-expansion macro calls, and cannot currently look into the former.

I'm OK with getting the resulting code if I also get expansion info, and also get a way of expanding macros so I can look into them.

@pierzchalski

This comment has been minimized.

pierzchalski commented Apr 3, 2018

So I don't know what changes @jseyfried is making to how contexts and scopes are handled, but I agree that sounds like the right place to put this information (about how a particular token was created or expanded).

Putting it in spans definitely sounds more workable than trying to wrangle invocations to guarantee you see things pre- or post-expansion, but it also means doing a lot more design work to identify what information you need and in what form.

@llogiq

This comment has been minimized.

Contributor

llogiq commented Apr 5, 2018

One thing I think we need is a way for proc macros to mark what they changed (and for quote! to use it automatically).

@nrc nrc self-assigned this Apr 30, 2018

@nrc

This comment has been minimized.

Member

nrc commented May 1, 2018

I just realised that one of the motivations for this feature (the lift! macro alluded to by @alexreg) wouldn't actually be made possible by this RFC. lift! needs to lift the contained macro up two levels:

iiuc, lift is eager expansion? That was covered by #1628 for declarative macros, which I still think is a nice thing to add. If we did add it for decl macros, then we should do something for proc macros too.

@nrc

This comment has been minimized.

Member

nrc commented May 1, 2018

re compiler internals and inspection, I would expect that the results of expansion would be a TokenStream and that could be inspected to see what macro was expanded (one could also inspect the macro before expansion to get some details too). I would expect that 'stability hygiene' would handle access to compiler internals, and that the implementation of that would not allow macro authors to arbitrarily apply that tokens.

@nrc

This comment has been minimized.

Member

nrc commented May 1, 2018

Thanks for this RFC @pierzchalski! I agree that this is definitely a facility we want to provide for macro authors. My primary concern is that this is a surprisingly complex feature and it might be better to try and handle a more minimal version as a first iteration. It might be a good idea to try and avoid any hygiene stuff in a first pass (but keep the API future-compatible in this direction), that would work well with the macros 1.2 work.

It is worth considering how to handle expansion order (although it might be worth just making sure we are future-compatible, rather than spec'ing this completely). Consider the following macros uses:

foo!(baz!());
bar!(); // expands to `macro baz() {}`

If foo is expanded before bar, then baz won't be defined and building will fail. However, if baz! were written directly in the program it would succeed - https://play.rust-lang.org/?gist=32998f65348efbeffdfbe106b0063eeb&version=nightly&mode=debug

Then consider a macro that wants to expand two macros where one is defined by the other - it might be nice if the macro could try different expansion orders. I think all that is needed is for the compiler to tell the macro why expansion failed - is it due to a failed name lookup, or something going wrong during the actual expansion stage.

Which brings to mind another possible problem - what happens if the macro we're expanding panics? Should that be caught by the compiler or the macro requesting expansion?

@nrc

This comment has been minimized.

Member

nrc commented May 1, 2018

Is there prior art for this? What do the Scheme APIs for this look like?

The full API provided by `proc_macro` and used by `syn` is more flexible than suggested by the use of `parse_expand` and `parse_meta_expand` above. To begin, `proc_macro` defines a struct, `MacroCall`, with the following interface:

```rust
struct MacroCall {...};

This comment has been minimized.

@nrc

nrc May 2, 2018

Member

Without getting too deep into a bikeshed, I think something like ExpansionBuilder would be a better name

fn new_attr(path: TokenStream, args: TokenStream, body: TokenStream) -> Self;
fn call_from(self, from: Span) -> Self;

This comment has been minimized.

@nrc

nrc May 2, 2018

Member

I think we should leave this to a later iteration

fn call_from(self, from: Span) -> Self;
fn expand(self) -> Result<TokenStream, Diagnostic>;

This comment has been minimized.

@nrc

nrc May 2, 2018

Member

The error type should probably be an enum of different ways things can go wrong, and where there are compile errors we probably want a Vec of Diagnostics, rather than just one.

```

The functions `new_proc` and `new_attr` create a procedural macro call and an attribute macro call, respectively. Both expect `path` to parse as a [path](https://docs.rs/syn/0.12/syn/struct.Path.html) like `println` or `::std::println`. The scope of the spans of `path` are used to resolve the macro definition. This is unlikely to work unless all the tokens have the same scope.

This comment has been minimized.

@nrc

nrc May 2, 2018

Member

Overall, I really like the idea of using a Builder API - it keeps things simple and is future-proof

@llogiq

I think this is going in a good direction. We might want to flesh out some corner cases and possibly add more examples.


Currently, the compiler does actually perform something similar to the loop described in the section on [expansion order](#macro-expansion-and-marking). We could 'just' augment the step that identifies potential macro calls to also inspect the otherwise unstructured token trees within macro arguments.

This proposal requires that some tokens contain extra semantic information similar to the existing `Span` API. Since that API (and its existence) is in a state of flux, details on what this 'I am a macro call that you need to expand!' idea may need to wait until those have settled.

This comment has been minimized.

@llogiq

llogiq Nov 23, 2018

Contributor

The implementation may also choose to put the marks elsewhere, as long as the mark gets followed up.

This comment has been minimized.

@pierzchalski

pierzchalski Nov 23, 2018

I'm not sure what point you're making - is it that the new token implementation might not have token attributes attached to tokens at all, and instead have some sort of registry as the public API?

This comment has been minimized.

@llogiq

llogiq Nov 23, 2018

Contributor

When I write an RFC, I try to describe behavior and avoid implementation details. That may be overcautious, so feel free to ignore my previous comment.

Show resolved Hide resolved text/0000-macro-expansion-for-macro-input.md Outdated
Show resolved Hide resolved text/0000-macro-expansion-for-macro-input.md Outdated
}
```

The caller of `foo!` probably imagines that `baz!` will be expanded within `mod b`, and so prepends the call with `super`. However, if `foo!` naively marks the call to `super::baz!`, then the path will fail to resolve because macro paths are resolved relative to the location of the call. Handling this would require the macro implementer to track the path offset of its expansion, which is doable but adds complexity.

This comment has been minimized.

@llogiq

llogiq Nov 23, 2018

Contributor

Could the resolver look into module structure within the TokenStream? Or would this break backwards compatibility?

We should also note that this only applies to procedural bang-macros, because with attributes, the resolver already has the AST available, including any possible macro structure, so unless they change the latter, they can safely ignore it.

This comment has been minimized.

@pierzchalski

pierzchalski Nov 23, 2018

Letting the compiler look in to the token stream beyond identifying the macro call sounds possibly risky - there are a lot more parsing corner cases (what if my marked macro just so happens to be near a bunch of unstructured tokens that are structured like a mod?). If we were proposing marking entire items or expressions for the compiler to interpret I'd be more comfortable with that.

This comment has been minimized.

@llogiq

llogiq Nov 23, 2018

Contributor

Sounds like a plan. Mark a complete TokenStream to have all macros within expanded? That would at the very least greatly simplify the usage for macro authors, at the cost of some complexity for expansion.


The caller of `foo!` probably imagines that `baz!` will be expanded within `mod b`, and so prepends the call with `super`. However, if `foo!` naively marks the call to `super::baz!`, then the path will fail to resolve because macro paths are resolved relative to the location of the call. Handling this would require the macro implementer to track the path offset of its expansion, which is doable but adds complexity.

* Commits the compiler to a particular (but loose) macro expansion order, as well as a (limited) way for users to position themselves within that order. What future plans does this interfere with? What potentially unintuitive expansion-order effects might this expose?

This comment has been minimized.

@llogiq

llogiq Nov 23, 2018

Contributor

We should probably be cautious when specifying expansion order, at least allowing unrelated macro expansions (for some definition of relatedness) to run in parallel could improve compile times, which is something at least I don't want to preclude.


* This API allows for a first-pass solution to the problems listed in the [motivation](#motivation). Does it interfere with any known uses of proc macros? Does it prevent any existing techniques from working or cut off potential future ones?

* How does this proposal affect expansion within the _body_ of an attribute macro call? Currently builtin macros like `#[cfg]` are special-cased to expand before things like `#[derive]`; can we unify this behaviour under the new system?

This comment has been minimized.

@llogiq

llogiq Nov 23, 2018

Contributor

This would give us an interesting option: pre-expansion lints currently don't see #[cfg(..)]s and #[cfg_attr]s at all. If we decided that those should be available during expansion, we might lint them. On the other hand, this would likely break many existing lints, and possibly throw expanding macros for a loop.

This comment has been minimized.

@pierzchalski

pierzchalski Nov 23, 2018

Hm, I went looking for where I previously found discussion around the early expansion of #[cfg] but I only found discussion about that with respect to #[derive], not attributes in general. I'd be curious to see a crater run where we only expand #[cfg] early for #[derive] - are there any attribute macros that actually care about #[cfg]?

This comment has been minimized.

@llogiq

llogiq Nov 23, 2018

Contributor

Not that I know of. AFAIR there are some clever hacks for rustdoc, which is basically hooking into the compiler to create a combined view for all targets, but apart from that, no macros I know of can see into #[cfg]s.

This comment has been minimized.

@pierzchalski

pierzchalski Nov 24, 2018

Ah, sorry, I meant to ask are there any attribute macros that would be unhappy if they could suddenly see #[cfg]s in their input. If not, then leaving #[cfg]s unexpanded might be one of those sneaky "technically breaking but not actually" changes that we sometimes allow.

If #[cfg]s are only expanded for #[derive], then we can:

  • View #[derive] as a normal attribute that merely eager-expanding #[cfg]s in its input before continuing.
  • View #[doc] as a normal attribute that merely inspects #[cfg]s to do coverage.

Is there value to this sort of unification of reasoning?

This comment has been minimized.

@llogiq

llogiq Nov 24, 2018

Contributor

I honestly don't know. Presumably, attribute macros should be fine with getting the code multiple times, but my knowledge of what others do with proc macros is limited.

Show resolved Hide resolved text/0000-macro-expansion-for-macro-input.md Outdated
@llogiq

This comment has been minimized.

Contributor

llogiq commented Nov 23, 2018

I feel this RFC is shaping up nicely. From my understanding, there are still some improvements to be made regarding the path problem, but the whole thing looks pretty thorough already. There is only one section we should develop some more: The "how to teach this" section.

"procedural macro recursion" should be taught as the parallel to declarative macro recursion, just that the self-call must be quoted to conform to the interface. Depending on the final interface (the easiest version of which should be a simple attribute on a proc macro function), the API documentation should be able to give enough hints for most usecases.

One thing I've hinted at is the inability of the current proposal to learn how a macro call is parsed without expanding it. I suppose mutagen might be the sole user of such an interface, though, so I'm OK with leaving this use case out of the current iteration, as long as I can find traces of the original arguments by comparing the spans (which should be possible today).

@pierzchalski pierzchalski changed the title RFC: Add macro expansion API to proc macros RFC: Macro Expansion for Macro Input Nov 24, 2018

@pierzchalski

This comment has been minimized.

pierzchalski commented Nov 24, 2018

I think the details of a solution to the path problem strongly affects the final API, so that might be worth discussing in more detail. If we have the following proc macro call:

my_proc_macro! {
    some other stuff;
    mod a {          // < x
        super::b! {  // < x
            c!();    // <-- y
        }            // < x
    }                // < x
}

If we mark all of the relevant tokens (the lines labelled x and y) and say "treat this as an expression or item interpolated where my_proc_macro! was called, then expand all the macros within", then this solves the immediate problem.

Unfortunately, this naively marks the inner call on the line labelled y, which we don't actually know should be eagerly expanded: we're making a judgement about what super::b! wants to do with its input (extra design constraint: this should be hard to do accidentally!).

A previous version of this RFC specified marking all of the tokens of the call (but not the arguments) rather than just a distinguishing token like ! or #. The natural extension here is to mark all the tokens other than the arguments for super::b!; this feels a little like an image editor 'paint' tool, where the boundaries are the arguments to other macros.

As another example, this is what this looks like when there's a nested attribute macro:

my_proc_macro! {
    some other stuff;
    mod a {                // < a
        #[some_attr_macro( // < a
            foo!()         // <-- b
        )]                 // < a
        super::b! {        // <-- b
            c!()           // <-- b
        }                  // <-- b
    }                      // < a
}

Here the problem is more extensive: we don't want my_proc_macro! to expand any of foo!, super::b!, or c!, since those are all the responsibility of #[some_attr_macro]. Hence, we only want to mark the lines labelled a.

Explicitly marking all the call tokens simplifies things like backtracking to parse bang macros, and explicitly marking the environment lets the compiler be a bit smarter about paths. TODO: does this enable anything else interesting? Maybe macro definitions?

I think, for now, we can leave marking the environment as future work, since it's forward-compatible (both as an implementation and semantically) with only marking the call itself.

@pierzchalski

This comment has been minimized.

pierzchalski commented Nov 24, 2018

I'm still not clear on mutagen's relationship to macro expansion. You mention wanting to try and ensure mutagen can 'backtrack' an expansion to only mutate the arguments. Does that mean that in a call like this:

foo!(my custom keywords: x = a + b);

You want mutagen to be able to deduce that x = a + b is meant to be a statement that is eligible to be mutated, without going through the expansion process? If so, then yeah, that seems rather hard to do. A simple maybe-solution is to allow authors to hygienically add arbitrary tags to tokens to track them in the output, but that's definitely more than just an extension of this RFC.

@llogiq

This comment has been minimized.

Contributor

llogiq commented Nov 24, 2018

I trust macro authors to leave the spans of their arguments in place, so while it certainly won't be trivial, tracing the arguments backwards from the source would work for me, so let's not worry about this too much for now.

Regarding the path problem, we should ensure that macro authors can only ever mark outer macros to be expanded. Otherwise we'd end up manipulating expansion order for inner macros, and I think we really want to avoid that.

@llogiq

This comment has been minimized.

Contributor

llogiq commented Nov 25, 2018

Talking to other Rustaceans during RustFest gave me an idea of how to deal with the path conundrum: If we could require macro parsing to mark the TokenStream fragments by the types they get parsed at. If a macro parses twice, the former markers must be overwritten.

This would allow us to a) see into things like module structure in TokenStream once it's parsed and b) see how a macro will interpret its arguments. Win-win! 😄 Even better, we could pre-parse all macro arguments before expansion, thus allowing macros visibility into other macro call's arguments without needing to expand them first.

@pierzchalski

This comment has been minimized.

pierzchalski commented Nov 26, 2018

Currently in macroland there is exactly one place where users can specify the AST-kind of the tokens that get passed in, and that's decl macros. There's one other place where users can currently expect a particular kind, and that's the body of attr macros (I don't know if that's a hard guarantee or if we've left open the option of attributes being applied to any delineated tokens).

Everywhere else, the compiler doesn't know what kind of AST a token will/is/should be parsed as. I imagine any AST structure on proc macros would need to be some opt-in extra configuration (probably on #[proc_macro] re-using the macro_rules! syntax), which would limit the usefulness for tools like mutagen. If it wasn't opt-in, I'd be wary of the compiler having to do some kind of "best effort" attempt to tag unstructured tokens.

I could imagine a utility library of decl macros which exist solely to tag token AST kinds during recursive calls, e.g. macro_rules! tag_item { ($i:item) => {$i}; }, so that a proc macro that wanted to expand a macro within a parsed module might emit:

my_proc_macro! {
    some custom arguments;
    tag_item! { mod a { super::b!() } }
}

But that doesn't really address the desire to see 'intended' AST kind information pre-expansion.

Overall, it's an interesting member of the (apparently quite fruitful!) family of macro ideas that consist of tagging the tokens themselves in various ways. It definitely deserves its own RFC but also kind of convinces me to punt on the path problem for any initial proposal here.

@pierzchalski

This comment has been minimized.

pierzchalski commented Nov 26, 2018

Actually, there's another (less exciting) way to deal with the path problem. Once we have the ability to 1) emit macro calls in our output and 2) ensure those are expanded at least once before we're re-expanded, that's enough for the compiler to provide magic utilities like expand_all! or expand_once! or expand_marked! that do what their names suggest while taking in whole well-kinded ASTs.

@pierzchalski

This comment has been minimized.

pierzchalski commented Nov 26, 2018

Oh, I just realised how gnarly the path problem can actually get. Consider this:

my_proc_macro! {
    mod a {
        macro ma { ... };
        super::b::mb!();
    };
    some custom stuff;
    mod b {
        macro mb { ... }:
        super::a::ma!();
    };
}

If my_proc_macro wants to fully expand its item inputs, should the above successfully expand? If so, what intermediate states of expansion and resolution does the compiler see? What intermediate states does my_proc_macro see?

Another fun one:

macro x { ... };
m! {
    mod a {
        macro x { ... };
        m! {
            mod b {
                super::x!();
            }
        }
    }
}

It's "obvious" that the innermost call to super::x should resolve to a::x (is that always true? It's hard to imagine a use-case, but do we want to preclude the possibility of other behaviours?). All of a sudden it's not clear to me how and when the compiler should get all the information it needs, or what the user API should look like to minimise weird errors due to expansion order.

@llogiq

This comment has been minimized.

Contributor

llogiq commented Nov 26, 2018

Yes, there are weird corner cases pertaining to expansion order. As things are, a macro needs to be defined before it is used (from top to bottom). This is the reason why I advocated for a plain top-to-bottom expansion order, btw.

So let's look at what should happen with your first example: If my_proc_macro doesn't care about macros, it just parses a space-delimited list of Items (let's keep our macro simple for the moment), expands them to whatever is needed, likely containing the arguments, which are then further expanded. So far, so good.

Should my_proc_macro want to delay execution to have arguments expanded, we really need its parser – because otherwise we cannot know if mod a { .. } is a module or part of our macro argument DSL. This means we need to require proc_macro_bang authors who want to expand stuff to first mark it up by parsing it & then defer to the compiler. I should note that they really need to do so anyway just to be able to find out if their arguments contain any macro call at all!

Now that the arguments are parsed, the compiler knows that mod a is actually a module and can find the a::ma macro to expand in module b. The birds are singing, everyone's happy, we've saved the day. Hooray! 🎉

This also means that the users of my_proc_macro as well as any outer proc macros can see how those macros interpret their argument, once they defer themselves.

There is a possible wrinkle if the my_proc_macro creates a macro definition that is then used by its arguments, but in that case the macro definition needs to be expanded before the argument macro calls are, and it shouldn't be working otherwise.

So to reiterate my extended proposal: We need syn's parse functions (and macros etc.) to mark the original tokens with the types they parsed. We also need the declarative macro matchers to do the same to their arguments (this part should be both self-evident and simple to implement). After parsing, proc
macros can decide whether to go on or whether to defer their execution until the macros in their argument token stream have been expanded.

So we no longer need a flag to say something should be expanded, we just need the self-deferral and the token marking by the parser, and we need a way to mark spans in the TokenStream as some type by the parser in a way that will survive quote!ing.

This also means that resolve must update itself on each macro expansion.

@pierzchalski

This comment has been minimized.

pierzchalski commented Nov 27, 2018

Yes, macro_rules! macros must be syntactically defined before use, but macro macros don't need to be. The modules, definitions, and uses in that example work on current 2018-nightly.

Anyway, once again for my own benefit I'm gong to walk through how I think your extended tagging proposal would behave. When a proc macro re-emits itself as in the above example:

  • The compiler re-parses the current AST looking for new calls and definitions (which is what it does today).
  • It encounters the re-emitted call, and detects the tagged tokens (possibly separated by untagged tokens!).
  • It then pretends to interpolate them at the call site, in the sense that temporarily the compiler acts as though the file contents aren't the call to my_proc_macro but instead are just the tagged tokens:
    mod a {
        macro ma { ... };
        super::b::mb!();
    }
    mod b {
        macro mb { ... }:
        super::a::ma!();
    }
    This clarifies precisely how the compiler will do resolution (this also works neatly in the nested example).
  • The compiler then goes on its merry way, expanding macros and finding definitions (including these mutually dependent ones), until it can't expand these tokens any more.
  • Then it un-interpolates these expanded tokens back into my_proc_macro, which becomes eligible for the next round of expansion.

Note that I'm not trying to specify an implementation, but I am trying to determine what the compiler will look like it's doing when it does these tag-based expansions.

I kind of like this as a model to explain eager expansion ("the compiler temporarily pretends your macro doesn't exist but its tagged arguments do"), although there are a few bits to clarify:

  • How to express something like "only expand this macro once, not to completion".
  • How to treat syntactically invalid tagged elements (e.g. an expression being temporarily-interpolated into item position).
  • How to handle something outside of my_proc_macro trying to refer to crate::a::ma (simplest idea: delay until my_proc_macro fully expands, but that only handles very simple cases).

Notably, nothing mentioned so far actually needs to refer to the AST type tags as type tags: there's really only one "pretend I am here and my surrounding macro is not" tag.

@pierzchalski

This comment has been minimized.

pierzchalski commented Nov 27, 2018

If the idea actually is for the compiler to automatically eagerly expand any appropriately tagged tokens, and make macro macros auto-tag their inputs, and have tags be preserved, you end up with a weird sort of easy-to-accidentally-opt-in eager expansion for decl macros. Consider this definition and call:

macro eager_stringify {
    (pre: $e:expr) => {eager_stringify!(post: $e)},
    (post: $e:expr) => {stringify!($e)}
}

eager_stringify!(pre: concat!("a", "b"));

It behaves differently depending on whether the compiler is allowed to inspect and expand intermediate tagged tokens.

@llogiq

This comment has been minimized.

Contributor

llogiq commented Nov 27, 2018

Thanks again for fleshing out my reasoning and asking the hard questions! So here are (hopefully correct) answers:

Deferred expansion means that all things marked as macros in the arguments are expanded once, not eagerly until there is nothing left. If a macro wants to defer itself until all inner macros have been completely expanded, it will be possible to do so with syn helper functions.

Syntactically invalid elements mean that the parse has failed (and hopefully returns a suitable error). proc macros will very likely either try again with a different argument type or panic with that error. In the former case, the argument will be (hopefully correctly) re-tagged, whereas the compiler will show the error and bail in the latter case, as far as I'm concerned.

Once our proc_macro has parsed and thus marked the macro definition and deferred itself, we need to expand it before using the macro definition (because expansion could change it). For example, consider the following:

#[overflow(saturate)]
macro_rules! add_two { ($a:expr, $b:expr) => { $a + $b } }

add_two!(1u32, 2)

I'd like overflower to be able to expand the macro definition to:

macro_rules! add_two { ($a:expr, $b:expr) => { $a.saturating_add($b) } }

This of course needs to be done before the add_two macro is expanded. So once the macro definition is marked, resolve can store this as "defined but not ready yet" until expansion is done. Note that it would be possible to create dependency cycles this way, and we can either detect those or just set a limit on expansion iterations.

@pierzchalski

This comment has been minimized.

pierzchalski commented Nov 27, 2018

Ok, I'm now pretty sure that AST type-tagging is orthogonal to macro expansion, precisely because:

  • The compiler's behaviour can be modelled as pseudo-interpolating tokens back to the callsite.
  • Most macro authors know what AST type they expect to be expanded into.
  • When a macro author identifies tokens they want to expand, they already know what AST type those tokens are.
    • Also, a surprising amount of the time, if you wrap something in an item declaration, it can go anywhere other than inside a trait-like definition or impl, so macro and library authors like syn have a lot of flexibility in that regard.

That is, this setup already behaves as though everyone already knows what AST type each token would be tagged with, so giving proc macro authors the ability to tag AST type but also requiring them to do so for early expansion doesn't seem to be adding value (but does add some still-unaddressed complications regarding decl macros).

It's a constructive thought experiment for how to extend the token interface to give tools like mutagen more expansion introspection power, so I'm looking forward to collaborating on a future RFC, if you want 😀.

@pierzchalski

This comment has been minimized.

pierzchalski commented Nov 28, 2018

Hmm, I'm wondering if there's any real use-case for single-step expansion. I'm currently trying to pin down the semantics of this setup, and while it's pretty straightforward with "expand to completion" semantics it's surprisingly frustrating when we allow single-step.

We want to allow "expand to completion" for efficiency, but then you have these weird interactions between a single-stepping macro trying to single-step expand a to-completion macro.

@llogiq

This comment has been minimized.

Contributor

llogiq commented Nov 28, 2018

Ok, so basically that makes the expansion algorithm look like "for an X macro, try to parse all arguments as X or leave them unstructured" (for X in item, expr, etc.), which will work perfectly in the absence of argument type information, and should solve the path conundrum. As I've told you before, mutagen will be able to work even without that information, because macro expansion will leave us enough information in the spans to work out where things came from, so I'm happy even with the reduced API.

I also think that as long as macro expansion can be traced back by following the span information, there is basically no use case for expanding once that cannot be emulated (notably, a proc macro could walk through the span expansion info and restore any particular state it happens to find). There is a small risk that proc macros may mess up the spans, and we should advise implementers to make sure spans can be traced back no matter what the procedural macro returns.

I note that doing eager expansion will also ameliorate the risk of proc macros tripping over expansion states that are private to the compiler, as is currently possible when partially expanding panic!() because this has a #[allow_internal_unstable] attribute and uses the __rust_unstable_column!() macro internally.

@pierzchalski

This comment has been minimized.

pierzchalski commented Nov 29, 2018

Alright, I shall once again write a million words going into excruciating detail about how a path-problem-handling, complete eager expansion, token-tagging solution to macro input expansion might work.

Key concepts

Token expansion scopes

A token is in possibly many expansion scopes. A scope determines what definitions are 'stable' with respect to other expansion scopes. Consider this:

macro m_1() {}

eager_1! {
    macro m_2() { m_1!(); }
    eager_2! {
        m_2!();
    }
}

m_2!();

When eager_1 eagerly expands its input, it will eagerly expand eager_2. Then, eager_2 will want to resolve and expand m_2!. It should be able to; the tokens defining m_2 are stable for the duration of the expansion of eager_2 because eager_1 can't remove them while eager_2 is expanding.

Contrast this with the top-level call to m_2!. Since the definition of m_2 is within the ongoing expansion of eager_1, those tokens aren't stable with respect to the top-level call. If eager_1 decides to not include the definition of m_2 in its final expansion result, or if it changes the definition of m_2, then the top-level call of m_2! would exhibit inconsistent behaviour if it were allowed to resolve to the definition within eager_1.

There are also hygiene reasons why the top-level call to m_2 shouldn't resolve, but we can imagine that m_2 is a shared identifier token and we're within a larger expansion context.

How macros mark output

Any tokens that are emitted by a macro and marked for eager expansion are given a fresh expansion scope.

Expansion eligibility

The existing macro expansion process is roughly:

  • Collect macro invocations and definitions.
  • If they can be resolved, resolve and expand them.
  • Repeat.

To add opt-in eager expansion, we change "collect macro invocations" to "collect childless macro invocations".

A macro is eligible for expansion if all of the following hold:

  • All of the tokens in a macro call are in a scope S.
  • Among the tokens in the macro call, none of them are in a child scope of S.
  • The tokens in a macro call aren't surrounded by another macro call in S. This rules out 'inner' eager expansion, for instance:
    a! {      // If these are all in scope S,
        b! {} // and S has no children within `a`,
    }         // `a` can be expanded but not `b`.

Resolution

To add path resolution among eager macro calls, we change "collect macro definitions" to "collect macro definitions and their scopes".

A macro definition m in an expansion scope S is an eligible resolution for any macro call to m! in S, or in any child scope of S.

Expansion

When a child scope C has no macro expansions left, the resulting tokens are interpolated to the parent scope P, tracking the original spans.

When a macro in a child scope C is expanded, any surrounding macro invocation syntax in the parent scope P is ignored.

For example, the following is weird but works:

macro m() { struct X; }

eager! {
    foo! {                   // <- Unmarked.
        mod a {              // <+-- Marked.
            my_handy_marker: // <- Unmarked.
            m!();            // <+-- Marked.
        }                    // <+-- Marked.
    }                        // <- Unmarked.
}

The compiler will expand the marked tokens as though they were:

mod a {
    m!();
}

And will interpolate the result as:

eager! {
    foo! {
        mod a {
            my_handy_marker:
            struct X;
        }
    }
}
@llogiq

This comment has been minimized.

Contributor

llogiq commented Nov 30, 2018

Looks good to me, please include this in the RFC. 👍 I'll re-review it and then this should be ready for FCP, unless I find something. Perhaps @Centril will also want to take another look?

@petrochenkov

This comment has been minimized.

Contributor

petrochenkov commented Nov 30, 2018

I haven't followed the discussion so far, but plan to read this in the next week or so.
(I learned some stuff about proc macros and expansion since #2320 (comment).)

pierzchalski added some commits Dec 2, 2018

This proposal:

* Commits the compiler to a particular (but loose) macro expansion order, as well as a (limited) way for users to position themselves within that order. What future plans does this interfere with? What potentially unintuitive expansion-order effects might this expose?
* Parallel expansion has been brought up as a future improvement. The above specified expansion order blocks macro expansion on the expansion of any 'inner' marked macros, but doesn't specify any other orderings. Is this flexible enough?

This comment has been minimized.

@llogiq

llogiq Dec 2, 2018

Contributor

I for one think so.

@petrochenkov

This comment has been minimized.

Contributor

petrochenkov commented Dec 15, 2018

So I read this RFC, and some previous RFCs (#1628) and now reading through the discussion thread.

Apparently it's hard to understand the motivation for the RFC as written without reading the conversation history between @pierzchalski and @llogiq.
The reason is that somewhere after #2320 (comment) the RFC made a 180 degree turn from straightforward fn please_expand(invocation: TokenStream) -> Result<TokenStream, SomeErrorType> into something very different.

I'll finish reading the conversation, then will re-read the RFC, but so far things that looked pretty scary to me:

  • The eagerly expanded input is not contiguous and represents a list of chunks, I'm not sure the correspondence between the output and input chunks can be reliably established.
  • Eager expansions introduce their own little speculative worlds in which names can be introduced rather than just used, moreover even module structures can be formed.
@petrochenkov

This comment has been minimized.

Contributor

petrochenkov commented Dec 15, 2018

FWIW, "straightforward please_expand" is what used right now by built-in macros applying eager expansion to their arguments, like env!.

It's somewhat buggy and speculative expansion already causes unexpected things like rust-lang/rust#52363 to happen, but the main problem is that expansion is still largely formulated in terms of AST fragments, rather than token streams, internally.

So you have to say "take this input expression and give me output expression" in implementation of env! rather than "take this location-agnostic input token stream and give me location-agnostic output token stream".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment