Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review proc macro API 1.2 #50473

Merged
merged 7 commits into from May 16, 2018

Conversation

Projects
None yet
@petrochenkov
Copy link
Contributor

petrochenkov commented May 6, 2018

cc #38356

Summary of applied changes:

  • Documentation for proc macro API 1.2 is expanded.
  • Renamed APIs: Term -> Ident, TokenTree::Term -> TokenTree::Ident, Op -> Punct, TokenTree::Op -> TokenTree::Punct, Op::op -> Punct::as_char.
  • Removed APIs: Ident::as_str, use Display impl for Ident instead.
  • New APIs (not stabilized in 1.2): Ident::new_raw for creating a raw identifier (I'm not sure new_x it's a very idiomatic name though).
  • Runtime changes:
    • Punct::new now ensures that the input char is a valid punctuation character in Rust.
    • Ident::new ensures that the input str is a valid identifier in Rust.
    • Lifetimes in proc macros are now represented as two joint tokens - Punct('\'', Spacing::Joint) and Ident("lifetime_name_without_quote") similarly to multi-character operators.
  • Stabilized APIs: None yet.

A bit of motivation for renaming (although it was already stated in the review comments):

  • With my compiler frontend glasses on Ident is the single most appropriate name for this thing, especially if we are doing input validation on construction. TokenTree::Ident effectively wraps token::Ident or ast::Ident + is_raw, its meaning is "identifier" and it's already named ident in declarative macros.
  • Regarding Punct, the motivation is that Op is actively misleading. The thing doesn't mean an operator, it's neither a subset of operators (there is non-operator punctuation in the language), nor superset (operators can be multicharacter while this thing is always a single character). So I named it Punct (first proposed in the original RFC, then by @SimonSapin) , together with input validation it's now a subset of ASCII punctuation character category (u8::is_ascii_punctuation).
@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented May 6, 2018

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

@petrochenkov

This comment has been minimized.

Copy link
Contributor Author

petrochenkov commented May 6, 2018

@dtolnay @alexcrichton @eddyb @nrc
Could you go through the comments marked with "REVIEW" and tell what you think?

@petrochenkov petrochenkov referenced this pull request May 6, 2018

Closed

Tracking issue for RFC 1566: Procedural macros #38356

24 of 31 tasks complete
@bors

This comment has been minimized.

Copy link
Contributor

bors commented May 6, 2018

☔️ The latest upstream changes (presumably #50453) made this pull request unmergeable. Please resolve the merge conflicts.

@@ -207,7 +236,7 @@ pub mod token_stream {

/// `quote!(..)` accepts arbitrary tokens and expands into a `TokenStream` describing the input.
/// For example, `quote!(a + b)` will produce a expression, that, when evaluated, constructs
/// the `TokenStream` `[Word("a"), Op('+', Alone), Word("b")]`.
/// the `TokenStream` `[Word("a"), Punct('+', Alone), Word("b")]`.

This comment has been minimized.

@eddyb

eddyb May 6, 2018

Member

Also, Word is now Term, right?

/// An identifier or lifetime identifier.
Ident(Ident),
/// A single punctuation character (`+`, `,`, `$`, etc.).
Punct(Punct),

This comment has been minimized.

@eddyb

eddyb May 6, 2018

Member

Oh I didn't expect these changes. I should leave most of this up to @alexcrichton (I do not have a strong preference either way).

'^', '&', '|', '@', '.', ',', ';', ':', '#', '$', '?'];
if !LEGAL_CHARS.contains(&ch) {
panic!("unsupported character `{:?}`", ch)
}

This comment has been minimized.

@eddyb

eddyb May 6, 2018

Member

Validation seems really nice, I think previously it's been deferred until conversion to the compiler-internal forms is needed.

#[derive(Copy, Clone, Debug)]
#[unstable(feature = "proc_macro", issue = "38356")]
pub struct Term {
pub struct Ident {
// REVIEW(INTERNAL) Symbol + Span is actually `ast::Ident`! We can use it here.

This comment has been minimized.

@eddyb

eddyb May 6, 2018

Member

I can confirm that this would not cause issues with #49219.

@@ -1199,7 +1300,7 @@ impl TokenTree {
'#' => Pound,
'$' => Dollar,
'?' => Question,
_ => panic!("unsupported character {}", op),
_ => panic!("unsupported character {}", ch),

This comment has been minimized.

@eddyb

eddyb May 6, 2018

Member

Can this become unreachable!() now? As only a finite set of characters can be used.

/// An identifier (`ident`) or lifetime identifier (`'ident`).
///
/// REVIEW ATTENTION: `Copy` impl on a struct with private fields.
/// REVIEW Do we want to guarantee `Ident` to be `Copy`?

This comment has been minimized.

@eddyb

eddyb May 6, 2018

Member

If we do keep it Copy, that means we can never soundly offer an &self -> &str API. See #38356 (comment).

This comment has been minimized.

@dtolnay

dtolnay May 7, 2018

Member

I personally don't mind if we remove the Copy here.

@petrochenkov petrochenkov changed the title [WIP] Review proc macro API 1.2 Review proc macro API 1.2 May 7, 2018

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented May 7, 2018

All the changes here look great to me, thanks @petrochenkov! While we've already FCP'd for stability in #38356 I think the Ident name is probably better than Term and I'm personally ambivalent on Op or Punct, but am happy to go with Punct.

I'd prefer to land this all at once if possible to reduce churn in the ecosystem, so I'd be mostly in favor of going ahead and stabilizing everything in this PR (if you're ok with that). I don't mind doing the work myself for the APIs here and pushing those up (annotation-wise). Along those lines, want to remove as_str as well from this?

/// May fail for a number of reasons, for example, if the string contains unbalanced delimiters
/// or characters not existing in the language.
///
/// REVIEW The function actually panics on any error and never returns `LexError`.

This comment has been minimized.

@alexcrichton

alexcrichton May 7, 2018

Member

I think this doesn't match the original implementation, when we first started we were sure to return an error I thought...

In any case I'd think it's a bug that this panics, it should definitely be returning an error (but we can fix that later)

/// Creates a token stream containing a single token tree.
///
/// REVIEW We don't generally have impls `From<T> for Collection<T>`, but I see why this exists
/// REVIEW from practical point of view.

This comment has been minimized.

@alexcrichton

alexcrichton May 7, 2018

Member

Yeah this is currently the main way to go from a TokenTree to a TokenStream, and I think we also accidentally stabilized it already so we're required to keep it.

@@ -227,6 +256,9 @@ pub fn quote_span(span: Span) -> TokenStream {
}

/// A region of source code, along with macro expansion information.
///
/// REVIEW ATTENTION: `Copy` impl on a struct with private fields.
/// REVIEW Do we want to guarantee `Span` to be `Copy`? Yes.

This comment has been minimized.

@alexcrichton

alexcrichton May 7, 2018

Member

Yeah I think we're likely to always want Span to be Copy

TokenTree::Literal(ref tt) => tt.fmt(f),
}
}
}

/// REVIEW the impls below are kind of `From<T> for Option<T>`, not strictly necessary,
/// REVIEW but convenient. No harm, I guess. I'd actually like to see impls
/// REVIEW `From<Group/Ident/Punct/Literal> for TokenStream` to avoid stuttering like

This comment has been minimized.

@alexcrichton

alexcrichton May 7, 2018

Member

I suspect impls like these will grow and come to exist over time yeah, that's definitely the direction that libstd has taken.

/// forms of `Spacing` returned.
///
/// REVIEW ATTENTION: `Copy` impl on a struct with private fields.
/// REVIEW Do we want to guarantee `Punct` to be `Copy`?

This comment has been minimized.

@alexcrichton

alexcrichton May 7, 2018

Member

I think this is less critical than Span to be copy, and wouldn't mind going either way here. Making this not-Copy is probably the most conservative thing to do

This comment has been minimized.

@dtolnay

dtolnay May 7, 2018

Member

I agree, let's go with no Copy.

/// which can be further configured with the `set_span` method below.
///
/// REVIEW Why we even use `char` here? There's no reason to use unicode here.
/// REVIEW I guess because it's more convenient to write `new('+')` than `new(b'+')`, that's ok.

This comment has been minimized.

@alexcrichton

alexcrichton May 7, 2018

Member

Yes let's stick to char for maximal future flexibility.

///
/// REVIEW Again, there's no need for unicode here,
/// REVIEW except for maybe future compatibility in case Rust turns into APL,
/// REVIEW but if it's more convenient to use `char` then that's okay.

This comment has been minimized.

@alexcrichton

alexcrichton May 7, 2018

Member

Yes for future flexibility I think char's the way to go here

@dtolnay
Copy link
Member

dtolnay left a comment

The reason we decided against TokenTree::Ident was because the matcher $i:ident does not accept lifetimes. I would be opposed to calling it Ident here if it can hold lifetimes and $i:ident cannot.

@petrochenkov

This comment has been minimized.

Copy link
Contributor Author

petrochenkov commented May 7, 2018

@dtolnay
I'd be totally okay with accepting lifetimes in $i: ident (with or without dropping $l: lifetime) if we are ready to cement lifetimes as a "lexical class inside identifiers" as opposed to "' plus an identifier" (which proc macro API already does).
There's internals thread about this https://internals.rust-lang.org/t/pre-rfc-splitting-lifetime-into-two-tokens/6716/10. I don't think the original suggestion is a good idea anymore, but see https://internals.rust-lang.org/t/pre-rfc-splitting-lifetime-into-two-tokens/6716/11 in particular about the "lexical class inside identifiers" alternative.

The key questions here are probably:

  • Is it important to discern between identifiers and lifetimes in procedural macros? Why?
    • We want to minimize API surface. It's always possible to detect lifetimes using the first character.
  • Is it important to discern between identifiers and lifetimes in declarative macros? Why?
    • I'd personally want to minimize the language surface (and simplify compiler as well if we drop $l: lifetime). It's possible to detect lifetimes in macros left sides if we keep lifetime as a subset of ident (kinda like path is a subset of ty/expr/pat) or allow lifetime idents in macro "variables" ($'var).
@petrochenkov

This comment has been minimized.

Copy link
Contributor Author

petrochenkov commented May 7, 2018

@alexcrichton

I'd prefer to land this all at once if possible to reduce churn in the ecosystem, so I'd be mostly in favor of going ahead and stabilizing everything in this PR (if you're ok with that). I don't mind doing the work myself for the APIs here and pushing those up (annotation-wise). Along those lines, want to remove as_str as well from this?

Sure, I'll add stable annotations and remove as_str.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented May 7, 2018

Is it important to discern between identifiers and lifetimes in declarative macros? Why?

Probably to parse anything looking like a list of generics, if you want to tell apart 'a and T.

@dtolnay

This comment has been minimized.

Copy link
Member

dtolnay commented May 7, 2018

It's possible to detect lifetimes in macros left sides if we keep lifetime as a subset of ident

I would need to play with this in nightly before I would feel comfortable committing to it. What I wouldn't want is:

  • We stabilize TokenTree::Ident under the expectation that we will make $:ident accept lifetimes.
  • We change $:ident in decl macros to accept lifetimes, while keeping $:lifetime.
  • After some use we discover this is worse this way than having $:ident and $:lifetime disjoint.
  • We revert $:ident to reject lifetimes.
  • Now proc macro and decl macro are inconsistent in an obnoxious way.
@dtolnay

This comment has been minimized.

Copy link
Member

dtolnay commented May 7, 2018

Probably to parse anything looking like a list of generics, if you want to tell apart 'a and T.

Hmm, can we imagine any macros that would need to handle lifetime parameters and type parameters differently? There is some appeal in being able to say the distinction never matters until much later in the compile process.

@dtolnay

This comment has been minimized.

Copy link
Member

dtolnay commented May 7, 2018

One place I can imagine running into trouble is if a macro somehow needs to handle optional lifetimes on a reference.

macro_rules! m {
    (& $($p:lifetime)? $i:ident) => {}
}

m!(&foo);
m!(&'a foo);
error: local ambiguity: multiple parsing options: built-in NTs lifetime ('p') or ident ('i').
 --> src/main.rs:4:5
  |
4 | m!(&foo);
  |     ^^^
@petrochenkov

This comment has been minimized.

Copy link
Contributor Author

petrochenkov commented May 7, 2018

@dtolnay

local ambiguity: multiple parsing options: built-in NTs lifetime ('p') or ident ('i').

This is a weird error. I never investigated how our macro matching works in detail, but I expected it to be modeled after regular expressions (greedy by default) and work in both cases, whether lifetime is a subset of ident or not.

extern crate regex;
use regex::Regex;

fn main() {
    let not_subset = Regex::new(r"a?b").unwrap();
    assert!(not_subset.is_match("ab"));
    assert!(not_subset.is_match("b"));
    let subset = Regex::new(r"a?\w").unwrap();
    assert!(subset.is_match("ab"));
    assert!(subset.is_match("aa"));
    assert!(subset.is_match("b"));
    assert!(subset.is_match("a"));
}

http://play.rust-lang.org/?gist=372d5a5f25219670dabc89a1dfb536ba&version=nightly&mode=debug

@petrochenkov

This comment has been minimized.

Copy link
Contributor Author

petrochenkov commented May 8, 2018

We have 6 weeks if we want to stabilize the API in 1.28.

@eddyb
Can the lifetime question get a slot in the next lang team meeting?

  • Q1. Treating lifetimes as a lexical subset of identifiers (identifier = '? XID_Start XID_Continue*), closing https://internals.rust-lang.org/t/pre-rfc-splitting-lifetime-into-two-tokens/6716/11. Procedural macros currently treat lifetimes like this.
  • Q2. Accepting lifetimes in $i: ident matcher in declarative macros, delaying stabilization of $l: lifetime (#34303) until a good use case not covered by ident is found.
  • Q3. How to refer to both identifiers and lifetime identifiers if they are kept separate. Is "term" previously used by proc macro API a good name?

@petrochenkov petrochenkov force-pushed the petrochenkov:pmapi branch from a2e7f6c to dab8c0a May 15, 2018

@petrochenkov petrochenkov changed the title Review and stabilize proc macro API 1.2 Review proc macro API 1.2 May 15, 2018

@petrochenkov

This comment has been minimized.

Copy link
Contributor Author

petrochenkov commented May 15, 2018

@bors r=alexcrichton

Stabilization commit for later cherry-picking: petrochenkov@3e71e49

@bors

This comment has been minimized.

Copy link
Contributor

bors commented May 15, 2018

📌 Commit dab8c0a has been approved by alexcrichton

@kennytm

This comment has been minimized.

Copy link
Member

kennytm commented May 16, 2018

@bors p=1

Unblocks #49219

@bors

This comment has been minimized.

Copy link
Contributor

bors commented May 16, 2018

⌛️ Testing commit dab8c0a with merge 2a3f536...

bors added a commit that referenced this pull request May 16, 2018

Auto merge of #50473 - petrochenkov:pmapi, r=alexcrichton
Review proc macro API 1.2

cc #38356

Summary of applied changes:
- Documentation for proc macro API 1.2 is expanded.
- Renamed APIs: `Term` -> `Ident`, `TokenTree::Term` -> `TokenTree::Ident`, `Op` -> `Punct`, `TokenTree::Op` -> `TokenTree::Punct`, `Op::op` -> `Punct::as_char`.
- Removed APIs: `Ident::as_str`, use `Display` impl for `Ident` instead.
- New APIs (not stabilized in 1.2): `Ident::new_raw` for creating a raw identifier (I'm not sure `new_x` it's a very idiomatic name though).
- Runtime changes:
    - `Punct::new` now ensures that the input `char` is a valid punctuation character in Rust.
    - `Ident::new` ensures that the input `str` is a valid identifier in Rust.
    - Lifetimes in proc macros are now represented as two joint tokens - `Punct('\'', Spacing::Joint)` and `Ident("lifetime_name_without_quote")` similarly to multi-character operators.
- Stabilized APIs: None yet.

A bit of motivation for renaming (although it was already stated in the review comments):
- With my compiler frontend glasses on `Ident` is the single most appropriate name for this thing, *especially* if we are doing input validation on construction. `TokenTree::Ident` effectively wraps `token::Ident` or `ast::Ident + is_raw`, its meaning is "identifier" and it's already named `ident` in declarative macros.
- Regarding `Punct`, the motivation is that `Op` is actively misleading. The thing doesn't mean an operator, it's neither a subset of operators (there is non-operator punctuation in the language), nor superset (operators can be multicharacter while this thing is always a single character). So I named it `Punct` (first proposed in [the original RFC](rust-lang/rfcs#1566), then [by @SimonSapin](#38356 (comment))) , together with input validation it's now a subset of ASCII punctuation character category (`u8::is_ascii_punctuation`).
@bors

This comment has been minimized.

Copy link
Contributor

bors commented May 16, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 2a3f536 to master...

@bors bors merged commit dab8c0a into rust-lang:master May 16, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details

alexcrichton added a commit to alexcrichton/rust that referenced this pull request May 21, 2018

rustc: Fix procedural macros generating lifetime tokens
This commit fixes an accidental regression from rust-lang#50473 where lifetime tokens
produced by procedural macros ended up getting lost in translation in the
compiler and not actually producing parseable code. The issue lies in the fact
that a lifetime's `Ident` is prefixed with `'`. The `glue` implementation for
gluing joint tokens together forgot to take this into account so the lifetime
inside of `Ident` was missing the leading tick!

The `glue` implementation here is updated to create a new `Symbol` in these
situations to manufacture a new `Ident` with a leading tick to ensure it parses
correctly.

Closes rust-lang#50942

kennytm added a commit to kennytm/rust that referenced this pull request May 22, 2018

Rollup merge of rust-lang#50946 - alexcrichton:fix-parse-lifetime, r=…
…petrochenkov

rustc: Fix procedural macros generating lifetime tokens

This commit fixes an accidental regression from rust-lang#50473 where lifetime tokens
produced by procedural macros ended up getting lost in translation in the
compiler and not actually producing parseable code. The issue lies in the fact
that a lifetime's `Ident` is prefixed with `'`. The `glue` implementation for
gluing joint tokens together forgot to take this into account so the lifetime
inside of `Ident` was missing the leading tick!

The `glue` implementation here is updated to create a new `Symbol` in these
situations to manufacture a new `Ident` with a leading tick to ensure it parses
correctly.

Closes rust-lang#50942

@lfairy lfairy referenced this pull request May 27, 2018

Open

Missing crates #23

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.