Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proc_macro: implement `TokenTree`, `TokenKind`, hygienic `quote!`, and other API #40939

Merged
merged 13 commits into from Jul 6, 2017

Conversation

@jseyfried
Copy link
Contributor

jseyfried commented Mar 31, 2017

All new API is gated behind #![feature(proc_macro)] and may be used with #[proc_macro], #[proc_macro_attribute], and #[proc_macro_derive] procedural macros.

More specifically, this PR adds the following in proc_macro:

// `TokenStream` constructors:
impl TokenStream { fn empty() -> TokenStream { ... } }
impl From<TokenTree> for TokenStream { ... }
impl From<TokenKind> for TokenStream { ... }
impl<T: Into<TokenStream>> FromIterator<T> for TokenStream { ... } 
macro quote($($t:tt)*) { ... } // A hygienic `TokenStream` quoter

// `TokenStream` destructuring: 
impl TokenStream { fn is_empty(&self) -> bool { ... } }
impl IntoIterator for TokenStream { type Item = TokenTree; ... }

struct TokenTree { span: Span, kind: TokenKind }
impl From<TokenKind> for TokenTree { ... }
impl Display for TokenTree { ... }

struct Span { ... } // a region of source code along with expansion/hygiene information
impl Default for Span { ... } // a span from the current procedural macro definition
impl Span { fn call_site() -> Span { ... } } // the call site of the current expansion
fn quote_span(span: Span) -> TokenStream;

enum TokenKind {
    Group(Delimiter, TokenStream), // A delimited sequence, e.g. `( ... )`
    Term(Term), // a unicode identifier, lifetime ('a), or underscore
    Op(char, Spacing), // a punctuation character (`+`, `,`, `$`, etc.).
    Literal(Literal), // a literal character (`'a'`), string (`"hello"`), or number (`2.3`)
}

enum Delimiter {
    Parenthesis, // `( ... )`
    Brace, // `[ ... ]`
    Bracket, // `{ ... }`
    None, // an implicit delimiter, e.g. `$var`, where $var is  `...`.
}

struct Term { ... } // An interned string
impl Term {
    fn intern(string: &str) -> Symbol { ... }
    fn as_str(&self) -> &str { ... }
}

enum Spacing {
    Alone, // not immediately followed by another `Op`, e.g. `+` in `+ =`.
    Joint, // immediately followed by another `Op`, e.g. `+` in `+=`
}

struct Literal { ... }
impl Display for Literal { ... }
impl Literal {
    fn integer(n: i128) -> Literal { .. } // unsuffixed integer literal
    fn float(n: f64) -> Literal { .. } // unsuffixed floating point literal
    fn u8(n: u8) -> Literal { ... } // similarly: i8, u16, i16, u32, i32, u64, i64, f32, f64
    fn string(string: &str) -> Literal { ... }
    fn character(ch: char) -> Literal { ... }
    fn byte_string(bytes: &[u8]) -> Literal { ... }
}

For details on quote! hygiene, see this example and declarative macros 2.0.

r? @nrc

@jseyfried jseyfried force-pushed the jseyfried:proc_macro_api branch 2 times, most recently from 30f3956 to 40f81aa Mar 31, 2017

@jseyfried

This comment has been minimized.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Mar 31, 2017

Can't we call TokenTree just Token? I'm a bit disappointed in how I fooled myself we could get rid of the "tree" model but flattening seems to not be in the cards.
However, I still think we can keep simpler names in the API.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Mar 31, 2017

Nice! I like the overall approach, some comments on the details:

impl IntoIterator for TokenStream { type Item = TokenTree; ... }

Would impls for &TokenStream and &mut TokenStream also be useful?

struct Symbol { ... } // An interned string

To me the word "symbol" evokes punctuation, excluding letters. Could this be called something else? Identifier, Ident, Word, … ? (Are the first two not appropriate because "identifier" excludes keywords?)

Joint, // immediately followed by another `Op`, e.g. `+` in `+=`

So the = in += would be a separate token? &&, .., or :: would similarly be two tokens each? (>>= or ... three tokens?) Please expand the doc-comments in the PR to explain this.

impl Display for Literal { ... }
impl Literal {

I’d also add:

  • fn byte_string(bytes: &[u8]) -> Literal, much more convenient that building a Sequence of u8 literals for a slice literal. (And more efficient?)
  • Maybe fn byte(byte: u8) -> Literal similar to Literal::u8 but with byte literal syntax, though it only makes a difference when looking at generated code for debugging.
  • fn as_str(&self) -> Option<&str> and other fn as_*(&self) -> Option<*> methods to extract the value. (Though number literals without a type suffix are tricky.) The Display impl technically provides the same information, but writing a Rust parser (even just for literals) is a complex undertaking.

In src/libproc_macro/lib.rs

#[unstable(feature = "proc_macro", issue = "38356")]
#[macro_export]
macro_rules! quote { () => {} }

In src/libproc_macro/quote.rs

macro_rules! quote {
    () => { TokenStream::empty() };
    ($($t:tt)*) => { [ $( quote_tree!($t), )* ].iter().cloned().collect::<TokenStream>() };
}

Why are there two quote macros, with one empty, and only the empty one exported?

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Mar 31, 2017

I think this PR addresses all of my previous comments in #38356 (comment) 👍

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Mar 31, 2017

☔️ The latest upstream changes (presumably #40950) made this pull request unmergeable. Please resolve the merge conflicts.

@jseyfried

This comment has been minimized.

Copy link
Contributor Author

jseyfried commented Apr 1, 2017

@eddyb

I'm a bit disappointed in how I fooled myself we could get rid of the "tree" model but flattening seems to not be in the cards.

Do you mean w.r.t. the internals? I believe we can still use the internal flattened representation you proposed with this API (imo, would be nice but not high priority).

@SimonSapin

Would impls for &TokenStream and &mut TokenStream also be useful?

impl IntoIterator for &TokenStream { type Item = TokenTree; ... } and/or
impl TokenStream { fn iter(&self) -> TokenIter { ... } } could be useful -- both would be equivalent to tokens.clone().into_iter().

impl IntoIterator for &TokenStream { type Item = &TokenTree; ... } isn't possible (yet, perhaps), and would make an internal "flattened" internal representation more complex / less efficient.

TokenStream uses shared, ref-counted memory, so I don't think impl IntoIterator for &mut TokenStream makes sense.

To me the word "symbol" evokes punctuation, excluding letters. Could this be called something else?

Yeah. Currently, Symbol is just an interned string, so Word/Ident don't seem appropriate. If we restrict the type to valid TokenKind::Words, I'd prefer Word for symmetry with TokenKind. Otherwise, perhaps Str/InternedStr? cc @nrc

So the = in += would be a separate token? &&, .., or :: would similarly be two tokens each?

Yeah, it's TokenKind::Op(char, OpKind), so a just single character per Op -- I'll mention that in the doc comment.

I’d also add [more methods for Literal]

Yeah, I agree that we should have Literal::{byte, byte_string, as_*}. This API isn't intended to be exhaustive, e.g. we also want API for emitting "first class" diagnostics (@sergio is working on this).

Why are there two quote macros, with one empty, and only the empty one exported?

The non-empty one is incomplete (by design) and only used to bootstrap the internal implementation of proc_macro::quote. The exported one is a placeholder for the internal implementation (à la lang items). After we support stability hygiene and stability-checking macro resolutions, we should be able to replace the internal implementation with a #[proc_macro].

@jseyfried jseyfried force-pushed the jseyfried:proc_macro_api branch from 9999e58 to 6151672 Apr 3, 2017

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Apr 5, 2017

☔️ The latest upstream changes (presumably #40811) made this pull request unmergeable. Please resolve the merge conflicts.

@nrc
Copy link
Member

nrc left a comment

There's a bunch of changes requested, but most are just about docs, and I don't think there is anything major. Assuming all the fixes are straightforward, then r=me

pub struct TokenStream {
inner: TokenStream_,
}
pub struct TokenStream(tokenstream::TokenStream);

This comment has been minimized.

@nrc

nrc Apr 10, 2017

Member

Not this PR (and pretty nitty), but just pointing out that it should either token_stream or Tokenstream, currently the naming is inconsistent.


/// An iterator over `TokenTree`s.
#[unstable(feature = "proc_macro", issue = "38356")]
pub struct TokenIter {

This comment has been minimized.

@nrc

nrc Apr 10, 2017

Member

Should we call this a TokenTreeIter? That leaves room for a flattening TokenIter and means the name and target correspond

This comment has been minimized.

@jseyfried

jseyfried Apr 10, 2017

Author Contributor

@nrc Yeah, TokenTreeIter makes more sense.

On a similar note, @eddyb proposed using TokenNode instead of TokenKind for better correspondence to TokenTree and to leave room for Token/TokenKind -- what do you think?

}

impl TokenTree {
fn from_raw(stream: tokenstream::TokenStream, next: &mut Option<tokenstream::TokenStream>)

This comment has been minimized.

@nrc

nrc Apr 10, 2017

Member

I don't like from_raw as a name - 'raw' can mean so many different things, and perhaps the most likely here is as in raw string which is wrong. How about from_internal or from_rustc_internal or something like that?

This comment has been minimized.

@jseyfried

jseyfried Apr 10, 2017

Author Contributor

I'll change to from_internal.

TokenTree { span: Span(span), kind: kind }
}

fn to_raw(self) -> tokenstream::TokenStream {

This comment has been minimized.

@nrc

nrc Apr 10, 2017

Member

likewise here.

And somewhat connected, is there a way to mark the API which we expect macro authors to use? Is it all the pub API? Or is some of that transitional or only intended for the compiler to use?

This comment has been minimized.

@jseyfried

jseyfried Apr 10, 2017

Author Contributor

It's all the pub API not in __internal (which is unusable without #![feature(proc_macro_internals)]).

@@ -80,7 +524,11 @@ pub struct LexError {
/// all of the contents.
#[unstable(feature = "proc_macro_internals", issue = "27812")]
#[doc(hidden)]
#[path = ""]

This comment has been minimized.

@nrc

nrc Apr 10, 2017

Member

what is this for? I'm not sure what the effect is

This comment has been minimized.

@jseyfried

jseyfried Apr 10, 2017

Author Contributor

mod quote; in __internal; this allows the file to be libproc_macro/quote.rs. Thinking about this some more, I'll move mod quote into the crate root and remove the #[path = ""].


//! # Quasiquoter
//! This file contains the implementation internals of the quasiquoter provided by `qquote!`.

This comment has been minimized.

@nrc

nrc Apr 10, 2017

Member

Could you add an overview of how the quasi-quoter works please?

This comment has been minimized.

@jseyfried

jseyfried Apr 10, 2017

Author Contributor

Will do.

@@ -196,6 +201,10 @@ impl TokenStream {
}
}

pub fn builder() -> TokenStreamBuilder {

This comment has been minimized.

@nrc

nrc Apr 10, 2017

Member

I think this would be better as a free function rather than a method on TokenStream

This comment has been minimized.

@nrc

nrc Apr 10, 2017

Member

Or it could be a new method on TokenStreamBuilder

This comment has been minimized.

@jseyfried

jseyfried Apr 10, 2017

Author Contributor

I'll change to TokenStreamBuilder::new() (motivation for TokenStream::builder() was to avoid making people use anything else into their scopes).

@@ -225,13 +234,107 @@ impl TokenStream {
}
true
}

pub fn as_tree(self) -> (TokenTree, bool /* joint? */) {

This comment has been minimized.

@nrc

nrc Apr 10, 2017

Member

Could these public methods get docs please? (In particular here, some more explanation of what joint means would be good).

This comment has been minimized.

@lfairy

lfairy Apr 10, 2017

Contributor

Also, since this takes self by value, should this be named into_tree instead? Or are the conventions different for reference counted things

This comment has been minimized.

@jseyfried

jseyfried Apr 10, 2017

Author Contributor

@nrc will do.
@lfairy Perhaps, I don't think it matters much.

@@ -8,47 +8,37 @@
// option. This file may not be copied, modified, or distributed
// except according to those terms.

#![feature(plugin, plugin_registrar, rustc_private)]

This comment has been minimized.

@nrc

nrc Apr 10, 2017

Member

It looks like anyone with an existing procedural macro is going to have to make some significant changes. Could you write up either here or on internals or somewhere, a guide to the changes needed to make a proc macro work after this PR please?

This comment has been minimized.

@jseyfried

jseyfried Apr 10, 2017

Author Contributor

Existing procedural macros will lose the syntax::tokenstream::TokenStream quoter from proc_macro_internals, AFAIK other than that they won't have to make any changes. I could include a syntax::tokenstream::TokenStream quoter for back-compat.

How many people are actually using "existing procedural macros" though? This PR doesn't effect the widely used legacy plugin system; I didn't think many people were using SyntaxExtension::ProcMacro from #36154.

@@ -455,3 +461,38 @@ pub fn is_op(tok: &Token) -> bool {
_ => true,
}
}

#[derive(Clone, Eq, PartialEq, Debug)]
pub struct LazyTokenStream(RefCell<Option<TokenStream>>);

This comment has been minimized.

@nrc

nrc Apr 10, 2017

Member

Is this meant to be used by macro authors? If so could you document with why and when.

This comment has been minimized.

@jseyfried

jseyfried Apr 10, 2017

Author Contributor

No, nothing in libsyntax can be used by proc macro authors.

}

#[derive(Clone, Eq, PartialEq, Debug)]
pub struct LazyTokenStream(RefCell<Option<TokenStream>>);

This comment has been minimized.

@lfairy

lfairy Apr 10, 2017

Contributor

As an aside: if the dynamic borrow checks end up too inefficient, then this could be re-implemented on top of MoveCell.

This comment has been minimized.

@SimonSapin

SimonSapin Apr 10, 2017

Contributor

std::cell::Cell is now like MoveCell :) rust-lang/rfcs#1651

let mut opt_stream = self.0.borrow_mut();
if opt_stream.is_none() {
*opt_stream = Some(f());
};

This comment has been minimized.

@lfairy

lfairy Apr 10, 2017

Contributor

Nit: extra semicolon

@carols10cents

This comment has been minimized.

Copy link
Member

carols10cents commented Apr 17, 2017

Friendly ping that there are merge conflicts and some comments for you @jseyfried! ❤️

@abonander

This comment has been minimized.

Copy link
Contributor

abonander commented Apr 23, 2017

Should we use a different feature gate for the unstable APIs? I'm under the impression that language feature gates and library feature gates are meant to be distinct from each other, and proc_macro currently falls under the language side.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Apr 24, 2017

Hmm, is there a strict separation between language/lib feature-gates? I wouldn't think so.

@abonander

This comment has been minimized.

Copy link
Contributor

abonander commented Apr 24, 2017

It's kind of implied by the structure of the Unstable Book but that might be more of an issue for #41476 where this came up originally.

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Apr 25, 2017

status: still waiting for #40847.

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented May 2, 2017

status: still waiting for #40847.

1 similar comment
@carols10cents

This comment has been minimized.

Copy link
Member

carols10cents commented May 8, 2017

status: still waiting for #40847.

@jseyfried

This comment has been minimized.

Copy link
Contributor Author

jseyfried commented Jul 6, 2017

@alexcrichton Awesome, thanks!

For the record, the failing test was because tokens from a macros 1.1-style proc-macro (i.e. via FromStr for TokenStream) now are given the proc-macro's call site span, as opposed to the proc-macros's call site span with extra expansion information before this PR.

Since we no longer have the extra expansion information, @nrc's save-analysis code tries to re-parses the call site span and expects it to match the proc-macro generated tokens.

I had to get rid of the expansion information for backwards compatibility -- if the expansion information showed that the span came from a procedural macro 2.0, then the underlying tokens would resolve hygienically (as per hygiene 2.0). This would be a backwards-incompatible change, and would mean that stringifying, reparsing and returning the input of a proc-macro would change its semantics, even if the input were entirely unexpanded.

If we want to fix this, we could instead mark tokens from FromStr for TokenStream specially so that save-analysis would know that they are "weird", but so that they would still behave like unexpanded tokens w.r.t. hygiene. I don't have a strong opinion here.

@oli-obk

This comment has been minimized.

Copy link
Contributor

oli-obk commented Jul 9, 2017

The lack of expansion info broke clippy together with proc macros, since code inside derives will now trigger lints. Is there any new way to detect expanded code? If not, what needs to be done in order to create expansion info that doesn't influence save analysis.

See rust-lang/rust-clippy#1877

@nrc

This comment has been minimized.

Copy link
Member

nrc commented Jul 13, 2017

Since we no longer have the extra expansion information, @nrc's save-analysis code tries to re-parses the call site span and expects it to match the proc-macro generated tokens.

Ah, I wondered why I was seeing so many errors in the RLS :-(

I think we must fix this. This feels like another reason not to conflate macro hygiene with expansion history. That might be too hard a way to fix this though.

If we want to fix this, we could instead mark tokens from FromStr for TokenStream specially so that save-analysis would know that they are "weird", but so that they would still behave like unexpanded tokens w.r.t. hygiene. I don't have a strong opinion here.

Not using spans for hygiene seems a much nicer fix here - the span should reflect the expansion history and the hygiene should be altered (to erase hygiene, essentially). I expect we want the correct expansion info for errors as well as save-analysis.

@oli-obk

This comment has been minimized.

Copy link
Contributor

oli-obk commented Jul 13, 2017

I already have a fix in #43179

@jseyfried

This comment has been minimized.

Copy link
Contributor Author

jseyfried commented Jul 13, 2017

@nrc
We could instead just include hygiene bending information along with expansion information so that span expansion info and hygiene info remain unified (@oli-obk's solution is fine for impl FromStr for TokenStream, so we're OK for now).

That is, I don't think think hygiene bending requires us to split up span/hygiene info or construct an "fake" expansion history just for hygiene. I think it'd be nicer to have a single source of truth for expansion info that is expressive enough for hygiene bending.

bors added a commit that referenced this pull request Jul 18, 2017

Auto merge of #43247 - est31:master, r=alexcrichton
Tidy: allow common lang+lib features

This allows changes to the Rust language that have both library
and language components share one feature gate.

The feature gates need to be "about the same change", so that both
library and language components must either be both unstable, or
both stable, and share the tracking issue.

Removes the ugly "proc_macro" exception added by #40939.

Closes #43089

bors added a commit that referenced this pull request Jul 19, 2017

Auto merge of #43247 - est31:master, r=alexcrichton
Tidy: allow common lang+lib features

This allows changes to the Rust language that have both library
and language components share one feature gate.

The feature gates need to be "about the same change", so that both
library and language components must either be both unstable, or
both stable, and share the tracking issue.

Removes the ugly "proc_macro" exception added by #40939.

Closes #43089

bors added a commit that referenced this pull request Jul 19, 2017

Auto merge of #43247 - est31:master, r=alexcrichton
Tidy: allow common lang+lib features

This allows changes to the Rust language that have both library
and language components share one feature gate.

The feature gates need to be "about the same change", so that both
library and language components must either be both unstable, or
both stable, and share the tracking issue.

Removes the ugly "proc_macro" exception added by #40939.

Closes #43089

bors added a commit that referenced this pull request Jul 20, 2017

Auto merge of #43247 - est31:master, r=alexcrichton
Tidy: allow common lang+lib features

This allows changes to the Rust language that have both library
and language components share one feature gate.

The feature gates need to be "about the same change", so that both
library and language components must either be both unstable, or
both stable, and share the tracking issue.

Removes the ugly "proc_macro" exception added by #40939.

Closes #43089
@nrc

This comment has been minimized.

Copy link
Member

nrc commented Jul 25, 2017

So #43179 did not fix the expansion info issue (at least not completely) - I'm still seeing errors in the RLS where we are treating macro-generated names as hand-written ones (and I think this is causing #43371 too).

bors added a commit that referenced this pull request Aug 1, 2017

Auto merge of #43533 - nrc:macro-save, r=jseyfried,
Three small fixes for save-analysis

First commit does some naive deduplication of macro uses. We end up with lots of duplication here because of the weird way we get this data (we extract a use for every span generated by a macro use).

Second commit is basically a typo fix.

Third commit is a bit interesting, it partially reverts a change from #40939 where temporary variables in format! (and thus println!) got a span with the primary pointing at the value stored into the temporary (e.g., `x` in `println!("...", x)`). If `format!` had a definition it should point at the temporary in the macro def, but since it is built-in, that is not possible (for now), so `DUMMY_SP` is the best we can do (using the span in the callee really breaks save-analysis because it thinks `x` is a definition as well as a reference).

There aren't a test for this stuff because: the deduplication is filtered by any of the users of save-analysis, so it is purely an efficiency change. I couldn't actually find an example for the second commit that we have any machinery to test, and the third commit is tested by the RLS, so there will be a test once I update the RLS version and and uncomment the previously failing tests).

r? @jseyfried

@aidanhs aidanhs referenced this pull request Aug 10, 2017

Closed

Mis-calculated spans #43796

bors added a commit that referenced this pull request Feb 15, 2019

Auto merge of #58476 - nnethercote:rm-LazyTokenStream, r=<try>
Remove `LazyTokenStream`.

`LazyTokenStream` was added in #40939. Perhaps it was an effective optimization then, but no longer. This PR removes it, making the code both simpler and faster.

r? @alexcrichton

Centril added a commit to Centril/rust that referenced this pull request Feb 20, 2019

Rollup merge of rust-lang#58476 - nnethercote:rm-LazyTokenStream, r=p…
…etrochenkov

Remove `LazyTokenStream`.

`LazyTokenStream` was added in rust-lang#40939. Perhaps it was an effective optimization then, but no longer. This PR removes it, making the code both simpler and faster.

r? @alexcrichton

Centril added a commit to Centril/rust that referenced this pull request Feb 23, 2019

Rollup merge of rust-lang#58476 - nnethercote:rm-LazyTokenStream, r=p…
…etrochenkov

Remove `LazyTokenStream`.

`LazyTokenStream` was added in rust-lang#40939. Perhaps it was an effective optimization then, but no longer. This PR removes it, making the code both simpler and faster.

r? @alexcrichton

Centril added a commit to Centril/rust that referenced this pull request Feb 23, 2019

Rollup merge of rust-lang#58476 - nnethercote:rm-LazyTokenStream, r=p…
…etrochenkov

Remove `LazyTokenStream`.

`LazyTokenStream` was added in rust-lang#40939. Perhaps it was an effective optimization then, but no longer. This PR removes it, making the code both simpler and faster.

r? @alexcrichton

Centril added a commit to Centril/rust that referenced this pull request Feb 23, 2019

Rollup merge of rust-lang#58476 - nnethercote:rm-LazyTokenStream, r=p…
…etrochenkov

Remove `LazyTokenStream`.

`LazyTokenStream` was added in rust-lang#40939. Perhaps it was an effective optimization then, but no longer. This PR removes it, making the code both simpler and faster.

r? @alexcrichton

Centril added a commit to Centril/rust that referenced this pull request Feb 23, 2019

Rollup merge of rust-lang#58476 - nnethercote:rm-LazyTokenStream, r=p…
…etrochenkov

Remove `LazyTokenStream`.

`LazyTokenStream` was added in rust-lang#40939. Perhaps it was an effective optimization then, but no longer. This PR removes it, making the code both simpler and faster.

r? @alexcrichton

Centril added a commit to Centril/rust that referenced this pull request Feb 23, 2019

Rollup merge of rust-lang#58476 - nnethercote:rm-LazyTokenStream, r=p…
…etrochenkov

Remove `LazyTokenStream`.

`LazyTokenStream` was added in rust-lang#40939. Perhaps it was an effective optimization then, but no longer. This PR removes it, making the code both simpler and faster.

r? @alexcrichton
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.