Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upImplement converting an AST to a token tree #43081
Comments
alexcrichton
added
the
A-macros-2.0
label
Jul 6, 2017
This was referenced Jul 6, 2017
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Jul 14, 2017
alexcrichton
referenced this issue
Jul 14, 2017
Merged
Implement tokenization for some items in proc_macro #43230
Mark-Simulacrum
added
the
C-feature-request
label
Jul 28, 2017
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Jul 28, 2017
bors
added a commit
that referenced
this issue
Jul 28, 2017
mattico
added a commit
to mattico/rust
that referenced
this issue
Jul 29, 2017
matthewhammer
pushed a commit
to matthewhammer/rust
that referenced
this issue
Aug 3, 2017
alexcrichton
referenced this issue
Mar 2, 2018
Closed
Macros 2.0: #[cfg_attr] makes .to_string() and TokenStream disagree #48644
alexcrichton
referenced this issue
Apr 2, 2018
Closed
Inconsistent tokens between the same struct defined inside vs outside of function #49604
This was referenced Apr 10, 2018
alexcrichton
referenced this issue
Apr 19, 2018
Closed
Hygiene break in macros involving string containing single quote #50061
alexcrichton
referenced this issue
Apr 27, 2018
Open
Attribute macros invoked at crate root have issues #41430
alexcrichton
referenced this issue
May 17, 2018
Closed
Procedural macros stringify inputs with macro invocations too aggressively #50840
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
May 22, 2018
alexcrichton
referenced this issue
May 22, 2018
Merged
rustc: Correctly pretty-print macro delimiters #50971
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
May 22, 2018
alexcrichton
added
the
A-macros-1.2
label
May 22, 2018
This was referenced May 22, 2018
alexcrichton
added
C-bug
and removed
C-feature-request
labels
May 22, 2018
bors
added a commit
that referenced
this issue
May 24, 2018
mark-i-m
added a commit
to mark-i-m/rust
that referenced
this issue
May 24, 2018
HMPerson1
added a commit
to HMPerson1/rust-clippy
that referenced
this issue
Oct 26, 2018
macpp
referenced this issue
Oct 27, 2018
Closed
proc_macro::Span::source_file() returns incorrect paths #54677
alexcrichton
referenced this issue
Nov 1, 2018
Open
None-delimited groups missing from the input of derive macro #55565
SergioBenitez
referenced this issue
Nov 15, 2018
Merged
Ignore non-semantic tokens for 'probably_eq' streams. #55971
bors
added a commit
that referenced
this issue
Nov 19, 2018
alexcrichton
referenced this issue
Dec 10, 2018
Closed
Weird diagnostics with wasm_bindgen and language server #1097
HMPerson1
added a commit
to HMPerson1/rust-clippy
that referenced
this issue
Dec 21, 2018
HMPerson1
added a commit
to HMPerson1/rust-clippy
that referenced
this issue
Dec 21, 2018
dtolnay
referenced this issue
Dec 23, 2018
Open
Can't access local variable when on a nested macro #7
matthiaskrgr
added a commit
to matthiaskrgr/rust
that referenced
this issue
Dec 23, 2018
bors
added a commit
that referenced
this issue
Dec 24, 2018
petrochenkov
referenced this issue
Dec 30, 2018
Closed
macro_rules macros that expand to proc_macro invocations produce wrong hygiene information. #57207
This comment has been minimized.
This comment has been minimized.
@alexcrichton or @petrochenkov would it be possible to make an order of magnitude estimate of effort for a real fix? Multiple person-weeks, multiple months, most of a year? Does most of the difficulty come from performance implications? I imagine the naive fix of making the AST look more like Syn's AST where every token is individually represented would be pretty bad for compiler performance. I've been hitting this issue over and over and over since it was filed. It wasn't so bad with only derive macros where the symptom was usually just error messages being printed in the wrong place, but now with attribute macros and function-like macros I often find it totally impossible to build composable macros (as seen in #57207) and the consequence of losing spans is no longer cosmetic, it is that code that should compile doesn't compile. |
This comment has been minimized.
This comment has been minimized.
|
@dtolnay This may regress performance of expansion-heavy compilation by few percent, but correctness seems more important and this can be optimized later. Tokens attached to non-terminals will fix the decl macro issues, but not Alternatively, AST as it exists now would only be created after expansion, and only tokens + things necessary for expansion, like attribute targets, would exist before that (unfortunately, too much is needed to resolve and expand an attribute). |
This comment has been minimized.
This comment has been minimized.
|
Thanks! That's very helpful.
Let's try to find the right person to at least be thinking passively about this problem, or else it will keep sitting here racking up mentions as every procedural macro library hits it. Would @eddyb be the most appropriate person? |
This comment has been minimized.
This comment has been minimized.
|
cc @matklad who was also interested in an alternative AST. |
This comment has been minimized.
This comment has been minimized.
|
Thanks for the ping @petrochenkov! It's interesting that I am experimenting with macro expansion in rust-analyzer in this very moment :) I don't have much expertise here: I don't really know how rust macros actually work, and have taken only a cursory look at the spans/syntax context infrastructure. Below are my random thoughts about ast/token tree conversion in the context of IDEs: In rust-analyzer, the syntax trees are not based on token trees (in IDE context, braces are not always balanced, and you need to account for trivia, etc). My current plan is to make token-trees strictly a macro-specific interface and materialize them only when expanding macro invocations. And, because rust-analyzer syntax trees are pretty heavy weight, I also want (rust-analyzer/rust-analyzer#386) to introduce a "post-expansion" AST based on more traditional hierarchy-of-enums representation.
One way to achieve that would be libsyntax2 style tree, whose "real" representation is a sequence of token trees, and where AST nodes don't store data themselves besides pointers into the token tree. |
This comment has been minimized.
This comment has been minimized.
|
I think we probably just haven't put enough weight on this issue yet because it wasn't really all the pressing earlier, but I definitely agree with @dtolnay that it's far more pressing now! I've always thought that the hard part of this is that we'd basically have to make libsyntax's AST looks like syn's, which I figured was a monumental task given the size and usage of libsyntax. I never really considered the performance implications, but I wouldn't be surprised if it had more than a few regressions. Thinking on this for a bit though, we may be able to get to a more short-term solution. Up to now A strategy like that, which is specific to In any case the best way forward with this I think is to try to brainstorm a bit to figure out a few possibilities of what a "reasonable" solution might look like, and then bring this up with the broader compiler team because it has a lot of implications on the compiler! |
This comment has been minimized.
This comment has been minimized.
|
How feasible is it to implement ToTokens for the existing AST without data structure changes? We care most about preserving spans of identifiers as those are what lead to compilation failures. As I understand it, for every identifier represented in the AST there should be a way to form a proc_macro::Span corresponding to its hygiene context. As a first pass, we care less about spans of other tokens like keywords and punctuation since those would typically only affect error message placement. |
This comment has been minimized.
This comment has been minimized.
|
I'm not sure I've actually got a great idea of how feasible that would be. I feel like it would be difficult because we don't track all spans everywhere, but that's based on a mental snapshot of the AST from like 3 years ago. Diagnostics have come a long way since then which often comes with annotating more parts of the AST with more spans. I'm pretty sure, however, that most punctuation in the AST doesn't have span information. For example nothing in (oh right, also keywords aren't really tracked at all) It may just not actually be that hard to implement a raw |
This comment has been minimized.
This comment has been minimized.
That's what I mean. We can implement conversions from AST to token stream (similar to ToTokens) in a way that preserves spans of all identifiers while inventing spans (as currently done) for keywords and punctuation.
@alexcrichton or @petrochenkov would one of you be able to give this a shot for one of the simplest AST types? Then we could see whether other people can fill in the work for the rest of the AST. |
This comment has been minimized.
This comment has been minimized.
|
Before we give this a stab I'd want to game out the viability of this strategy. I feel like the punctuation/keyword spans are basically unused for |
japaric
referenced this issue
Jan 10, 2019
Open
#[entry] attribute disguises application errors #163
This comment has been minimized.
This comment has been minimized.
|
@eddyb had some ideas at the Rust All Hands for how to fix this. As I understand it, we want to treat This will be impactful not just for procedural macros. We identified that IDEs also care about span information captured accurately in macro-expanded code. Mentioning @alexreg who may be interested in tackling this issue. |
This comment has been minimized.
This comment has been minimized.
elichai
commented
Feb 6, 2019
•
|
Just so I understand. this issue is about the Span? i.e.
|
alexcrichton commentedJul 6, 2017
•
edited by petrochenkov
Updated description
First implemented in #43230 the compiler can now tokenize a few AST nodes when necessary losslessly from the original token stream. Currently, however, this is somewhat buggy:
The "real bug" here is that we haven't actually implemented this ticket. The initial implementation in #43230 was effectively just a heuristic, and not even a great one! As a result we still need the ability basically to losslessly tokenize an AST node back to its original set of tokens.
Some bugs that arise from this are:
#[cfg]modifies the AST but doesn't invalidate the cache -- #48644macro_rules!and procedural macros don't play well together -- #49846Invoking a macro with brackets loses span information -- #50840Original Description
There's an associated FIXME in the code right now, and to fix this we'll need to implement tokenization of an AST node. Right now the thinking of how to implement this is to save all
TokenStreaminstances adjacent to an AST node, and use that instead of converting back into a token streamcc @dtolnay, @nrc, @jseyfried