New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proc_macro: implement `TokenTree`, `TokenKind`, hygienic `quote!`, and other API #40939

Merged
merged 13 commits into from Jul 6, 2017

Conversation

@jseyfried
Contributor

jseyfried commented Mar 31, 2017

All new API is gated behind #![feature(proc_macro)] and may be used with #[proc_macro], #[proc_macro_attribute], and #[proc_macro_derive] procedural macros.

More specifically, this PR adds the following in proc_macro:

// `TokenStream` constructors:
impl TokenStream { fn empty() -> TokenStream { ... } }
impl From<TokenTree> for TokenStream { ... }
impl From<TokenKind> for TokenStream { ... }
impl<T: Into<TokenStream>> FromIterator<T> for TokenStream { ... } 
macro quote($($t:tt)*) { ... } // A hygienic `TokenStream` quoter

// `TokenStream` destructuring: 
impl TokenStream { fn is_empty(&self) -> bool { ... } }
impl IntoIterator for TokenStream { type Item = TokenTree; ... }

struct TokenTree { span: Span, kind: TokenKind }
impl From<TokenKind> for TokenTree { ... }
impl Display for TokenTree { ... }

struct Span { ... } // a region of source code along with expansion/hygiene information
impl Default for Span { ... } // a span from the current procedural macro definition
impl Span { fn call_site() -> Span { ... } } // the call site of the current expansion
fn quote_span(span: Span) -> TokenStream;

enum TokenKind {
    Group(Delimiter, TokenStream), // A delimited sequence, e.g. `( ... )`
    Term(Term), // a unicode identifier, lifetime ('a), or underscore
    Op(char, Spacing), // a punctuation character (`+`, `,`, `$`, etc.).
    Literal(Literal), // a literal character (`'a'`), string (`"hello"`), or number (`2.3`)
}

enum Delimiter {
    Parenthesis, // `( ... )`
    Brace, // `[ ... ]`
    Bracket, // `{ ... }`
    None, // an implicit delimiter, e.g. `$var`, where $var is  `...`.
}

struct Term { ... } // An interned string
impl Term {
    fn intern(string: &str) -> Symbol { ... }
    fn as_str(&self) -> &str { ... }
}

enum Spacing {
    Alone, // not immediately followed by another `Op`, e.g. `+` in `+ =`.
    Joint, // immediately followed by another `Op`, e.g. `+` in `+=`
}

struct Literal { ... }
impl Display for Literal { ... }
impl Literal {
    fn integer(n: i128) -> Literal { .. } // unsuffixed integer literal
    fn float(n: f64) -> Literal { .. } // unsuffixed floating point literal
    fn u8(n: u8) -> Literal { ... } // similarly: i8, u16, i16, u32, i32, u64, i64, f32, f64
    fn string(string: &str) -> Literal { ... }
    fn character(ch: char) -> Literal { ... }
    fn byte_string(bytes: &[u8]) -> Literal { ... }
}

For details on quote! hygiene, see this example and declarative macros 2.0.

r? @nrc

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Mar 31, 2017

Member

Can't we call TokenTree just Token? I'm a bit disappointed in how I fooled myself we could get rid of the "tree" model but flattening seems to not be in the cards.
However, I still think we can keep simpler names in the API.

Member

eddyb commented Mar 31, 2017

Can't we call TokenTree just Token? I'm a bit disappointed in how I fooled myself we could get rid of the "tree" model but flattening seems to not be in the cards.
However, I still think we can keep simpler names in the API.

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin Mar 31, 2017

Contributor

Nice! I like the overall approach, some comments on the details:

impl IntoIterator for TokenStream { type Item = TokenTree; ... }

Would impls for &TokenStream and &mut TokenStream also be useful?

struct Symbol { ... } // An interned string

To me the word "symbol" evokes punctuation, excluding letters. Could this be called something else? Identifier, Ident, Word, … ? (Are the first two not appropriate because "identifier" excludes keywords?)

Joint, // immediately followed by another `Op`, e.g. `+` in `+=`

So the = in += would be a separate token? &&, .., or :: would similarly be two tokens each? (>>= or ... three tokens?) Please expand the doc-comments in the PR to explain this.

impl Display for Literal { ... }
impl Literal {

I’d also add:

  • fn byte_string(bytes: &[u8]) -> Literal, much more convenient that building a Sequence of u8 literals for a slice literal. (And more efficient?)
  • Maybe fn byte(byte: u8) -> Literal similar to Literal::u8 but with byte literal syntax, though it only makes a difference when looking at generated code for debugging.
  • fn as_str(&self) -> Option<&str> and other fn as_*(&self) -> Option<*> methods to extract the value. (Though number literals without a type suffix are tricky.) The Display impl technically provides the same information, but writing a Rust parser (even just for literals) is a complex undertaking.

In src/libproc_macro/lib.rs

#[unstable(feature = "proc_macro", issue = "38356")]
#[macro_export]
macro_rules! quote { () => {} }

In src/libproc_macro/quote.rs

macro_rules! quote {
    () => { TokenStream::empty() };
    ($($t:tt)*) => { [ $( quote_tree!($t), )* ].iter().cloned().collect::<TokenStream>() };
}

Why are there two quote macros, with one empty, and only the empty one exported?

Contributor

SimonSapin commented Mar 31, 2017

Nice! I like the overall approach, some comments on the details:

impl IntoIterator for TokenStream { type Item = TokenTree; ... }

Would impls for &TokenStream and &mut TokenStream also be useful?

struct Symbol { ... } // An interned string

To me the word "symbol" evokes punctuation, excluding letters. Could this be called something else? Identifier, Ident, Word, … ? (Are the first two not appropriate because "identifier" excludes keywords?)

Joint, // immediately followed by another `Op`, e.g. `+` in `+=`

So the = in += would be a separate token? &&, .., or :: would similarly be two tokens each? (>>= or ... three tokens?) Please expand the doc-comments in the PR to explain this.

impl Display for Literal { ... }
impl Literal {

I’d also add:

  • fn byte_string(bytes: &[u8]) -> Literal, much more convenient that building a Sequence of u8 literals for a slice literal. (And more efficient?)
  • Maybe fn byte(byte: u8) -> Literal similar to Literal::u8 but with byte literal syntax, though it only makes a difference when looking at generated code for debugging.
  • fn as_str(&self) -> Option<&str> and other fn as_*(&self) -> Option<*> methods to extract the value. (Though number literals without a type suffix are tricky.) The Display impl technically provides the same information, but writing a Rust parser (even just for literals) is a complex undertaking.

In src/libproc_macro/lib.rs

#[unstable(feature = "proc_macro", issue = "38356")]
#[macro_export]
macro_rules! quote { () => {} }

In src/libproc_macro/quote.rs

macro_rules! quote {
    () => { TokenStream::empty() };
    ($($t:tt)*) => { [ $( quote_tree!($t), )* ].iter().cloned().collect::<TokenStream>() };
}

Why are there two quote macros, with one empty, and only the empty one exported?

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin Mar 31, 2017

Contributor

I think this PR addresses all of my previous comments in #38356 (comment) 👍

Contributor

SimonSapin commented Mar 31, 2017

I think this PR addresses all of my previous comments in #38356 (comment) 👍

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Mar 31, 2017

Contributor

☔️ The latest upstream changes (presumably #40950) made this pull request unmergeable. Please resolve the merge conflicts.

Contributor

bors commented Mar 31, 2017

☔️ The latest upstream changes (presumably #40950) made this pull request unmergeable. Please resolve the merge conflicts.

@jseyfried

This comment has been minimized.

Show comment
Hide comment
@jseyfried

jseyfried Apr 1, 2017

Contributor

@eddyb

I'm a bit disappointed in how I fooled myself we could get rid of the "tree" model but flattening seems to not be in the cards.

Do you mean w.r.t. the internals? I believe we can still use the internal flattened representation you proposed with this API (imo, would be nice but not high priority).

@SimonSapin

Would impls for &TokenStream and &mut TokenStream also be useful?

impl IntoIterator for &TokenStream { type Item = TokenTree; ... } and/or
impl TokenStream { fn iter(&self) -> TokenIter { ... } } could be useful -- both would be equivalent to tokens.clone().into_iter().

impl IntoIterator for &TokenStream { type Item = &TokenTree; ... } isn't possible (yet, perhaps), and would make an internal "flattened" internal representation more complex / less efficient.

TokenStream uses shared, ref-counted memory, so I don't think impl IntoIterator for &mut TokenStream makes sense.

To me the word "symbol" evokes punctuation, excluding letters. Could this be called something else?

Yeah. Currently, Symbol is just an interned string, so Word/Ident don't seem appropriate. If we restrict the type to valid TokenKind::Words, I'd prefer Word for symmetry with TokenKind. Otherwise, perhaps Str/InternedStr? cc @nrc

So the = in += would be a separate token? &&, .., or :: would similarly be two tokens each?

Yeah, it's TokenKind::Op(char, OpKind), so a just single character per Op -- I'll mention that in the doc comment.

I’d also add [more methods for Literal]

Yeah, I agree that we should have Literal::{byte, byte_string, as_*}. This API isn't intended to be exhaustive, e.g. we also want API for emitting "first class" diagnostics (@sergio is working on this).

Why are there two quote macros, with one empty, and only the empty one exported?

The non-empty one is incomplete (by design) and only used to bootstrap the internal implementation of proc_macro::quote. The exported one is a placeholder for the internal implementation (à la lang items). After we support stability hygiene and stability-checking macro resolutions, we should be able to replace the internal implementation with a #[proc_macro].

Contributor

jseyfried commented Apr 1, 2017

@eddyb

I'm a bit disappointed in how I fooled myself we could get rid of the "tree" model but flattening seems to not be in the cards.

Do you mean w.r.t. the internals? I believe we can still use the internal flattened representation you proposed with this API (imo, would be nice but not high priority).

@SimonSapin

Would impls for &TokenStream and &mut TokenStream also be useful?

impl IntoIterator for &TokenStream { type Item = TokenTree; ... } and/or
impl TokenStream { fn iter(&self) -> TokenIter { ... } } could be useful -- both would be equivalent to tokens.clone().into_iter().

impl IntoIterator for &TokenStream { type Item = &TokenTree; ... } isn't possible (yet, perhaps), and would make an internal "flattened" internal representation more complex / less efficient.

TokenStream uses shared, ref-counted memory, so I don't think impl IntoIterator for &mut TokenStream makes sense.

To me the word "symbol" evokes punctuation, excluding letters. Could this be called something else?

Yeah. Currently, Symbol is just an interned string, so Word/Ident don't seem appropriate. If we restrict the type to valid TokenKind::Words, I'd prefer Word for symmetry with TokenKind. Otherwise, perhaps Str/InternedStr? cc @nrc

So the = in += would be a separate token? &&, .., or :: would similarly be two tokens each?

Yeah, it's TokenKind::Op(char, OpKind), so a just single character per Op -- I'll mention that in the doc comment.

I’d also add [more methods for Literal]

Yeah, I agree that we should have Literal::{byte, byte_string, as_*}. This API isn't intended to be exhaustive, e.g. we also want API for emitting "first class" diagnostics (@sergio is working on this).

Why are there two quote macros, with one empty, and only the empty one exported?

The non-empty one is incomplete (by design) and only used to bootstrap the internal implementation of proc_macro::quote. The exported one is a placeholder for the internal implementation (à la lang items). After we support stability hygiene and stability-checking macro resolutions, we should be able to replace the internal implementation with a #[proc_macro].

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Apr 5, 2017

Contributor

☔️ The latest upstream changes (presumably #40811) made this pull request unmergeable. Please resolve the merge conflicts.

Contributor

bors commented Apr 5, 2017

☔️ The latest upstream changes (presumably #40811) made this pull request unmergeable. Please resolve the merge conflicts.

@nrc

There's a bunch of changes requested, but most are just about docs, and I don't think there is anything major. Assuming all the fixes are straightforward, then r=me

Show outdated Hide outdated src/libproc_macro/lib.rs
pub struct TokenStream {
inner: TokenStream_,
}
pub struct TokenStream(tokenstream::TokenStream);

This comment has been minimized.

@nrc

nrc Apr 10, 2017

Member

Not this PR (and pretty nitty), but just pointing out that it should either token_stream or Tokenstream, currently the naming is inconsistent.

@nrc

nrc Apr 10, 2017

Member

Not this PR (and pretty nitty), but just pointing out that it should either token_stream or Tokenstream, currently the naming is inconsistent.

Show outdated Hide outdated src/libproc_macro/lib.rs
/// An iterator over `TokenTree`s.
#[unstable(feature = "proc_macro", issue = "38356")]
pub struct TokenIter {

This comment has been minimized.

@nrc

nrc Apr 10, 2017

Member

Should we call this a TokenTreeIter? That leaves room for a flattening TokenIter and means the name and target correspond

@nrc

nrc Apr 10, 2017

Member

Should we call this a TokenTreeIter? That leaves room for a flattening TokenIter and means the name and target correspond

This comment has been minimized.

@jseyfried

jseyfried Apr 10, 2017

Contributor

@nrc Yeah, TokenTreeIter makes more sense.

On a similar note, @eddyb proposed using TokenNode instead of TokenKind for better correspondence to TokenTree and to leave room for Token/TokenKind -- what do you think?

@jseyfried

jseyfried Apr 10, 2017

Contributor

@nrc Yeah, TokenTreeIter makes more sense.

On a similar note, @eddyb proposed using TokenNode instead of TokenKind for better correspondence to TokenTree and to leave room for Token/TokenKind -- what do you think?

Show outdated Hide outdated src/libproc_macro/lib.rs
}
impl TokenTree {
fn from_raw(stream: tokenstream::TokenStream, next: &mut Option<tokenstream::TokenStream>)

This comment has been minimized.

@nrc

nrc Apr 10, 2017

Member

I don't like from_raw as a name - 'raw' can mean so many different things, and perhaps the most likely here is as in raw string which is wrong. How about from_internal or from_rustc_internal or something like that?

@nrc

nrc Apr 10, 2017

Member

I don't like from_raw as a name - 'raw' can mean so many different things, and perhaps the most likely here is as in raw string which is wrong. How about from_internal or from_rustc_internal or something like that?

This comment has been minimized.

@jseyfried

jseyfried Apr 10, 2017

Contributor

I'll change to from_internal.

@jseyfried

jseyfried Apr 10, 2017

Contributor

I'll change to from_internal.

Show outdated Hide outdated src/libproc_macro/lib.rs
TokenTree { span: Span(span), kind: kind }
}
fn to_raw(self) -> tokenstream::TokenStream {

This comment has been minimized.

@nrc

nrc Apr 10, 2017

Member

likewise here.

And somewhat connected, is there a way to mark the API which we expect macro authors to use? Is it all the pub API? Or is some of that transitional or only intended for the compiler to use?

@nrc

nrc Apr 10, 2017

Member

likewise here.

And somewhat connected, is there a way to mark the API which we expect macro authors to use? Is it all the pub API? Or is some of that transitional or only intended for the compiler to use?

This comment has been minimized.

@jseyfried

jseyfried Apr 10, 2017

Contributor

It's all the pub API not in __internal (which is unusable without #![feature(proc_macro_internals)]).

@jseyfried

jseyfried Apr 10, 2017

Contributor

It's all the pub API not in __internal (which is unusable without #![feature(proc_macro_internals)]).

Show outdated Hide outdated src/libproc_macro/lib.rs
@@ -80,7 +524,11 @@ pub struct LexError {
/// all of the contents.
#[unstable(feature = "proc_macro_internals", issue = "27812")]
#[doc(hidden)]
#[path = ""]

This comment has been minimized.

@nrc

nrc Apr 10, 2017

Member

what is this for? I'm not sure what the effect is

@nrc

nrc Apr 10, 2017

Member

what is this for? I'm not sure what the effect is

This comment has been minimized.

@jseyfried

jseyfried Apr 10, 2017

Contributor

mod quote; in __internal; this allows the file to be libproc_macro/quote.rs. Thinking about this some more, I'll move mod quote into the crate root and remove the #[path = ""].

@jseyfried

jseyfried Apr 10, 2017

Contributor

mod quote; in __internal; this allows the file to be libproc_macro/quote.rs. Thinking about this some more, I'll move mod quote into the crate root and remove the #[path = ""].

Show outdated Hide outdated src/libproc_macro/quote.rs
//! # Quasiquoter
//! This file contains the implementation internals of the quasiquoter provided by `qquote!`.

This comment has been minimized.

@nrc

nrc Apr 10, 2017

Member

Could you add an overview of how the quasi-quoter works please?

@nrc

nrc Apr 10, 2017

Member

Could you add an overview of how the quasi-quoter works please?

This comment has been minimized.

@jseyfried

jseyfried Apr 10, 2017

Contributor

Will do.

@jseyfried

jseyfried Apr 10, 2017

Contributor

Will do.

Show outdated Hide outdated src/libsyntax/tokenstream.rs
@@ -196,6 +201,10 @@ impl TokenStream {
}
}
pub fn builder() -> TokenStreamBuilder {

This comment has been minimized.

@nrc

nrc Apr 10, 2017

Member

I think this would be better as a free function rather than a method on TokenStream

@nrc

nrc Apr 10, 2017

Member

I think this would be better as a free function rather than a method on TokenStream

This comment has been minimized.

@nrc

nrc Apr 10, 2017

Member

Or it could be a new method on TokenStreamBuilder

@nrc

nrc Apr 10, 2017

Member

Or it could be a new method on TokenStreamBuilder

This comment has been minimized.

@jseyfried

jseyfried Apr 10, 2017

Contributor

I'll change to TokenStreamBuilder::new() (motivation for TokenStream::builder() was to avoid making people use anything else into their scopes).

@jseyfried

jseyfried Apr 10, 2017

Contributor

I'll change to TokenStreamBuilder::new() (motivation for TokenStream::builder() was to avoid making people use anything else into their scopes).

Show outdated Hide outdated src/libsyntax/tokenstream.rs
@@ -225,13 +234,107 @@ impl TokenStream {
}
true
}
pub fn as_tree(self) -> (TokenTree, bool /* joint? */) {

This comment has been minimized.

@nrc

nrc Apr 10, 2017

Member

Could these public methods get docs please? (In particular here, some more explanation of what joint means would be good).

@nrc

nrc Apr 10, 2017

Member

Could these public methods get docs please? (In particular here, some more explanation of what joint means would be good).

This comment has been minimized.

@lfairy

lfairy Apr 10, 2017

Contributor

Also, since this takes self by value, should this be named into_tree instead? Or are the conventions different for reference counted things

@lfairy

lfairy Apr 10, 2017

Contributor

Also, since this takes self by value, should this be named into_tree instead? Or are the conventions different for reference counted things

This comment has been minimized.

@jseyfried

jseyfried Apr 10, 2017

Contributor

@nrc will do.
@lfairy Perhaps, I don't think it matters much.

@jseyfried

jseyfried Apr 10, 2017

Contributor

@nrc will do.
@lfairy Perhaps, I don't think it matters much.

@@ -8,47 +8,37 @@
// option. This file may not be copied, modified, or distributed
// except according to those terms.
#![feature(plugin, plugin_registrar, rustc_private)]

This comment has been minimized.

@nrc

nrc Apr 10, 2017

Member

It looks like anyone with an existing procedural macro is going to have to make some significant changes. Could you write up either here or on internals or somewhere, a guide to the changes needed to make a proc macro work after this PR please?

@nrc

nrc Apr 10, 2017

Member

It looks like anyone with an existing procedural macro is going to have to make some significant changes. Could you write up either here or on internals or somewhere, a guide to the changes needed to make a proc macro work after this PR please?

This comment has been minimized.

@jseyfried

jseyfried Apr 10, 2017

Contributor

Existing procedural macros will lose the syntax::tokenstream::TokenStream quoter from proc_macro_internals, AFAIK other than that they won't have to make any changes. I could include a syntax::tokenstream::TokenStream quoter for back-compat.

How many people are actually using "existing procedural macros" though? This PR doesn't effect the widely used legacy plugin system; I didn't think many people were using SyntaxExtension::ProcMacro from #36154.

@jseyfried

jseyfried Apr 10, 2017

Contributor

Existing procedural macros will lose the syntax::tokenstream::TokenStream quoter from proc_macro_internals, AFAIK other than that they won't have to make any changes. I could include a syntax::tokenstream::TokenStream quoter for back-compat.

How many people are actually using "existing procedural macros" though? This PR doesn't effect the widely used legacy plugin system; I didn't think many people were using SyntaxExtension::ProcMacro from #36154.

Show outdated Hide outdated src/libsyntax/parse/token.rs
@@ -455,3 +461,38 @@ pub fn is_op(tok: &Token) -> bool {
_ => true,
}
}
#[derive(Clone, Eq, PartialEq, Debug)]
pub struct LazyTokenStream(RefCell<Option<TokenStream>>);

This comment has been minimized.

@nrc

nrc Apr 10, 2017

Member

Is this meant to be used by macro authors? If so could you document with why and when.

@nrc

nrc Apr 10, 2017

Member

Is this meant to be used by macro authors? If so could you document with why and when.

This comment has been minimized.

@jseyfried

jseyfried Apr 10, 2017

Contributor

No, nothing in libsyntax can be used by proc macro authors.

@jseyfried

jseyfried Apr 10, 2017

Contributor

No, nothing in libsyntax can be used by proc macro authors.

Show outdated Hide outdated src/libsyntax/parse/token.rs
}
#[derive(Clone, Eq, PartialEq, Debug)]
pub struct LazyTokenStream(RefCell<Option<TokenStream>>);

This comment has been minimized.

@lfairy

lfairy Apr 10, 2017

Contributor

As an aside: if the dynamic borrow checks end up too inefficient, then this could be re-implemented on top of MoveCell.

@lfairy

lfairy Apr 10, 2017

Contributor

As an aside: if the dynamic borrow checks end up too inefficient, then this could be re-implemented on top of MoveCell.

This comment has been minimized.

@SimonSapin

SimonSapin Apr 10, 2017

Contributor

std::cell::Cell is now like MoveCell :) rust-lang/rfcs#1651

@SimonSapin

SimonSapin Apr 10, 2017

Contributor

std::cell::Cell is now like MoveCell :) rust-lang/rfcs#1651

Show outdated Hide outdated src/libsyntax/parse/token.rs
let mut opt_stream = self.0.borrow_mut();
if opt_stream.is_none() {
*opt_stream = Some(f());
};

This comment has been minimized.

@lfairy

lfairy Apr 10, 2017

Contributor

Nit: extra semicolon

@lfairy

lfairy Apr 10, 2017

Contributor

Nit: extra semicolon

@carols10cents

This comment has been minimized.

Show comment
Hide comment
@carols10cents

carols10cents Apr 17, 2017

Member

Friendly ping that there are merge conflicts and some comments for you @jseyfried! ❤️

Member

carols10cents commented Apr 17, 2017

Friendly ping that there are merge conflicts and some comments for you @jseyfried! ❤️

@abonander

This comment has been minimized.

Show comment
Hide comment
@abonander

abonander Apr 23, 2017

Contributor

Should we use a different feature gate for the unstable APIs? I'm under the impression that language feature gates and library feature gates are meant to be distinct from each other, and proc_macro currently falls under the language side.

Contributor

abonander commented Apr 23, 2017

Should we use a different feature gate for the unstable APIs? I'm under the impression that language feature gates and library feature gates are meant to be distinct from each other, and proc_macro currently falls under the language side.

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Apr 24, 2017

Contributor

Hmm, is there a strict separation between language/lib feature-gates? I wouldn't think so.

Contributor

nikomatsakis commented Apr 24, 2017

Hmm, is there a strict separation between language/lib feature-gates? I wouldn't think so.

@abonander

This comment has been minimized.

Show comment
Hide comment
@abonander

abonander Apr 24, 2017

Contributor

It's kind of implied by the structure of the Unstable Book but that might be more of an issue for #41476 where this came up originally.

Contributor

abonander commented Apr 24, 2017

It's kind of implied by the structure of the Unstable Book but that might be more of an issue for #41476 where this came up originally.

@arielb1

This comment has been minimized.

Show comment
Hide comment
@arielb1

arielb1 Apr 25, 2017

Contributor

status: still waiting for #40847.

Contributor

arielb1 commented Apr 25, 2017

status: still waiting for #40847.

@arielb1

This comment has been minimized.

Show comment
Hide comment
@arielb1

arielb1 May 2, 2017

Contributor

status: still waiting for #40847.

Contributor

arielb1 commented May 2, 2017

status: still waiting for #40847.

@carols10cents

This comment has been minimized.

Show comment
Hide comment
@carols10cents

carols10cents May 8, 2017

Member

status: still waiting for #40847.

Member

carols10cents commented May 8, 2017

status: still waiting for #40847.

@Mark-Simulacrum

This comment has been minimized.

Show comment
Hide comment
@Mark-Simulacrum

Mark-Simulacrum May 14, 2017

Member

Still waiting.

Member

Mark-Simulacrum commented May 14, 2017

Still waiting.

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton May 20, 2017

Member

Since enduring awful error messages with my toy async/await idea I now have quite a bit of interest in this :)

Along those lines I was hoping I could start doing the legwork to prepare libraries like syn to take advantage of this API, and I figured I could write down some things I'm learning as I go along. So far everything's related to the API of proc_macro as a library, and I've got:

  • Perhaps the various exported structs should all implement Debug? (just a nice convenience)
  • For floating point literals we may wish to panic if you pass in NaN, infinity, or negative infinity. I don't think we have literals for those, right?
  • The quote crate (on crates.io) currently has support for byte string literals and hexadecimal integer literals. I wonder if perhaps this should also have support for those literal types? The byte string one is relatively easy but the hexadecimal integer literals is pretty interesting. It also interacts with how quote! { #integer_variable } produces something like 1i32 by default (as opposed to 1) I think for disambiguating. Should the API here give a level of control to what the stringified integer literal looks like?

Nothing major at all for landing this PR, just wanted to jot down my thoughts!

Member

alexcrichton commented May 20, 2017

Since enduring awful error messages with my toy async/await idea I now have quite a bit of interest in this :)

Along those lines I was hoping I could start doing the legwork to prepare libraries like syn to take advantage of this API, and I figured I could write down some things I'm learning as I go along. So far everything's related to the API of proc_macro as a library, and I've got:

  • Perhaps the various exported structs should all implement Debug? (just a nice convenience)
  • For floating point literals we may wish to panic if you pass in NaN, infinity, or negative infinity. I don't think we have literals for those, right?
  • The quote crate (on crates.io) currently has support for byte string literals and hexadecimal integer literals. I wonder if perhaps this should also have support for those literal types? The byte string one is relatively easy but the hexadecimal integer literals is pretty interesting. It also interacts with how quote! { #integer_variable } produces something like 1i32 by default (as opposed to 1) I think for disambiguating. Should the API here give a level of control to what the stringified integer literal looks like?

Nothing major at all for landing this PR, just wanted to jot down my thoughts!

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton May 20, 2017

Member

It also looks like doc comments are mapped to a Literal, but I don't think there's a way to create a Literal from a doc comment str?

Member

alexcrichton commented May 20, 2017

It also looks like doc comments are mapped to a Literal, but I don't think there's a way to create a Literal from a doc comment str?

@dtolnay

This comment has been minimized.

Show comment
Hide comment
@dtolnay

dtolnay May 23, 2017

Member

Unsuffixed numeric literals

Like Alex mentioned, I think these are important. For example a proc macro containing this:

quote! {
    f(0);
}

Using the current API there would be no way for quote! to correctly map this to tokens because we can't know what integer type f expects.

Having additional constructors Literal::integer and Literal::float would solve this.

Byte-string literals

How else could it handle quote! { b"\xFF" }. There should be Literal constructors. Maybe also for byte literals b'\xFF'? But those can always be a u8 instead.

Boolean words

Are true and false represented as TokenKind::Word or TokenKind::Literal? When handing tokens back to the compiler, will it accept either one? Also there should be Literal::boolean to construct a boolean literal.

Naming of Op

What does Op stand for? If "operator", I would not consider things like , and $ to be operators. Is there a better name?

Canonical quote! macro

What is the advantage of providing a canonical quote! macro in proc_macro? No, I am not just asking this because I have a quote crate. It seems out of place in that every other piece of API added here is crucial to passing code into and out of a proc macro. But with the right rest of the API, quote! should be possible to implement on top of it. Is there a part I am missing that gives quote! superpowers over what would be possible in a crate? Why not open this up to innovation in the crate ecosystem?

Deref for Symbol

Let's not. There are so many methods on str and you would never want to call any of them on a Symbol. Like 6 trim methods and 6 split methods? It pollutes the rustdoc of Symbol and makes it impossible to find the methods you will actually want to use. I would prefer an explicit method to extract a &str.

Member

dtolnay commented May 23, 2017

Unsuffixed numeric literals

Like Alex mentioned, I think these are important. For example a proc macro containing this:

quote! {
    f(0);
}

Using the current API there would be no way for quote! to correctly map this to tokens because we can't know what integer type f expects.

Having additional constructors Literal::integer and Literal::float would solve this.

Byte-string literals

How else could it handle quote! { b"\xFF" }. There should be Literal constructors. Maybe also for byte literals b'\xFF'? But those can always be a u8 instead.

Boolean words

Are true and false represented as TokenKind::Word or TokenKind::Literal? When handing tokens back to the compiler, will it accept either one? Also there should be Literal::boolean to construct a boolean literal.

Naming of Op

What does Op stand for? If "operator", I would not consider things like , and $ to be operators. Is there a better name?

Canonical quote! macro

What is the advantage of providing a canonical quote! macro in proc_macro? No, I am not just asking this because I have a quote crate. It seems out of place in that every other piece of API added here is crucial to passing code into and out of a proc macro. But with the right rest of the API, quote! should be possible to implement on top of it. Is there a part I am missing that gives quote! superpowers over what would be possible in a crate? Why not open this up to innovation in the crate ecosystem?

Deref for Symbol

Let's not. There are so many methods on str and you would never want to call any of them on a Symbol. Like 6 trim methods and 6 split methods? It pollutes the rustdoc of Symbol and makes it impossible to find the methods you will actually want to use. I would prefer an explicit method to extract a &str.

@jseyfried

This comment has been minimized.

Show comment
Hide comment
@jseyfried

jseyfried May 23, 2017

Contributor

@alexcrichton

Perhaps the various exported structs should all implement Debug? (just a nice convenience)

Agreed, will do.

For floating point literals we may wish to panic if you pass in NaN, infinity, or negative infinity. I don't think we have literals for those, right?

Good point. I believe everything works as is, the only issue is asymmetry -- Token is more general than a token in the source since the latter can't represent NaN as a single Literal token.

Perhaps (some of) the Literal::* constructors should instead be free functions and that return TokenStream?
E.g. float_literal: f32 -> TokenStream would map NaN to f32::NAN (which will always resolve as expected due to hygiene).

I wonder if perhaps this should also have support for [byte string literals and hexadecimal integer literals]?

I think so, less sure about hexadecimal integers. This API isn't intended to be complete -- there are a variety of orthogonal improvements that we can work on in parallel once this lands.

It also looks like doc comments are mapped to a Literal, but I don't think there's a way to create a Literal from a doc comment str?

Yeah, I'm not sure if this is the best way to treat doc comments -- we might instead add a fifth variant TokenKind::DocComment (unaesthetic imo since the other four are more fundamental/general) or do something else.

Should the API here give a level of control to what the stringified integer literal looks like?

Yes, I think we should expose this. I'll add @dtolnay's Literal::integer and Literal::float.

@dtolnay

Byte-string literals

Yeah, this should be in the API.

Are true and false represented as TokenKind::Word or TokenKind::Literal?

Right now, it's TokenKind::Word like other keywords. I think this is simpler, since it avoids the TokenKind::Word("true") vs Token::boolean(true) issue you mentioned and means unicode identifiers are valid Words without exception (I think). That being said, I'm open to a different model.

Is there a better name [than Op / operator]?

Perhaps -- maybe something like Char?
IIRC @eddyb proposed Op for brevity.

... with the right rest of the API, quote! should be possible to implement on top of it.

Agreed. With the Literal API you mentioned earlier and a function quote_span: Span -> TokenStream, I believe you could implement quote! from the other API. We don't have quote_span yet, but it is easy to add.

Is there a part I am missing that gives quote! superpowers over what would be possible in a crate?

The missing piece is quoting spans. If you e.g. just use the dummy span, then hygiene and expansion backtraces won't work (both are determined by Spans' expansion information), and it will limit how powerful the compiler error messages can be. I'm not sure exactly what primitives would be best to expose here, but quote_span: Span -> TokenStream seems like a good start -- it's straightforward and sufficient to implement quote!.

Why not open this up to innovation in the crate ecosystem?

It is open to the crate ecosystem. proc_macro::quote! is just a declarative macro item in proc_macro; people are free to use third_party::quote;. I'm envisioning that third parties won't "publicly" depend on proc_macro at all (i.e. end users won't directly import anything from proc_macro).

A third party library could just re-export proc_macro::quote! or provide their own, more powerful version.

What is the advantage of providing a canonical quote! macro in proc_macro?

A benefit right now is we can experiment with a basic quote! without waiting for third party libraries.

Also, it should be useful for implementing a more powerful quote! -- whenever I implement quote! I find I need to "bootstrap" it with a limited "stage 0" version for use in the implementation, which is generally tedious and uninteresting code. Overall, I think a basic quote! is worth its weight to provide in proc_macro, even if abstractions over proc_macro extend, change, or rewrite it.

Let's not [Deref for Symbol]

But Symbol is just an interned str right now -- it could be Interned<str> where Interned<T> is a general smart pointer. In other words if perf weren't an issue, it'd just be String (like how Op is char).

Contributor

jseyfried commented May 23, 2017

@alexcrichton

Perhaps the various exported structs should all implement Debug? (just a nice convenience)

Agreed, will do.

For floating point literals we may wish to panic if you pass in NaN, infinity, or negative infinity. I don't think we have literals for those, right?

Good point. I believe everything works as is, the only issue is asymmetry -- Token is more general than a token in the source since the latter can't represent NaN as a single Literal token.

Perhaps (some of) the Literal::* constructors should instead be free functions and that return TokenStream?
E.g. float_literal: f32 -> TokenStream would map NaN to f32::NAN (which will always resolve as expected due to hygiene).

I wonder if perhaps this should also have support for [byte string literals and hexadecimal integer literals]?

I think so, less sure about hexadecimal integers. This API isn't intended to be complete -- there are a variety of orthogonal improvements that we can work on in parallel once this lands.

It also looks like doc comments are mapped to a Literal, but I don't think there's a way to create a Literal from a doc comment str?

Yeah, I'm not sure if this is the best way to treat doc comments -- we might instead add a fifth variant TokenKind::DocComment (unaesthetic imo since the other four are more fundamental/general) or do something else.

Should the API here give a level of control to what the stringified integer literal looks like?

Yes, I think we should expose this. I'll add @dtolnay's Literal::integer and Literal::float.

@dtolnay

Byte-string literals

Yeah, this should be in the API.

Are true and false represented as TokenKind::Word or TokenKind::Literal?

Right now, it's TokenKind::Word like other keywords. I think this is simpler, since it avoids the TokenKind::Word("true") vs Token::boolean(true) issue you mentioned and means unicode identifiers are valid Words without exception (I think). That being said, I'm open to a different model.

Is there a better name [than Op / operator]?

Perhaps -- maybe something like Char?
IIRC @eddyb proposed Op for brevity.

... with the right rest of the API, quote! should be possible to implement on top of it.

Agreed. With the Literal API you mentioned earlier and a function quote_span: Span -> TokenStream, I believe you could implement quote! from the other API. We don't have quote_span yet, but it is easy to add.

Is there a part I am missing that gives quote! superpowers over what would be possible in a crate?

The missing piece is quoting spans. If you e.g. just use the dummy span, then hygiene and expansion backtraces won't work (both are determined by Spans' expansion information), and it will limit how powerful the compiler error messages can be. I'm not sure exactly what primitives would be best to expose here, but quote_span: Span -> TokenStream seems like a good start -- it's straightforward and sufficient to implement quote!.

Why not open this up to innovation in the crate ecosystem?

It is open to the crate ecosystem. proc_macro::quote! is just a declarative macro item in proc_macro; people are free to use third_party::quote;. I'm envisioning that third parties won't "publicly" depend on proc_macro at all (i.e. end users won't directly import anything from proc_macro).

A third party library could just re-export proc_macro::quote! or provide their own, more powerful version.

What is the advantage of providing a canonical quote! macro in proc_macro?

A benefit right now is we can experiment with a basic quote! without waiting for third party libraries.

Also, it should be useful for implementing a more powerful quote! -- whenever I implement quote! I find I need to "bootstrap" it with a limited "stage 0" version for use in the implementation, which is generally tedious and uninteresting code. Overall, I think a basic quote! is worth its weight to provide in proc_macro, even if abstractions over proc_macro extend, change, or rewrite it.

Let's not [Deref for Symbol]

But Symbol is just an interned str right now -- it could be Interned<str> where Interned<T> is a general smart pointer. In other words if perf weren't an issue, it'd just be String (like how Op is char).

@dtolnay

This comment has been minimized.

Show comment
Hide comment
@dtolnay

dtolnay May 23, 2017

Member
  • For inf and nan - I would prefer Alex's suggestion of panicking when passed to the Literal constructor. I have never had a use case where I would have wanted them to turn into f32::NAN.

  • I agree, we don't need hex (or in general, human-friendly) literals in this initial implementation.

  • In hindsight how do we feel about the macro_rules handling of doc comments?

    macro_rules! doc {
        (# [ doc = $s:tt ]) => { $s }
    }
    
    fn main() {
        // prints " informative "
        println!("{:?}", doc!(/** informative */));
    }

    I thought we were good with that. Can we do doc comments that way here? # + bracketed content.

    One limitation is there wouldn't be a way to track what is currently is_sugared_doc. For example in stringify!(/** informative */) which prints as a doc comment, not #[doc]. Would a proc macro ever require knowing whether it is a sugared doc? I don't think so, in fact it may be better if they can't.

    If necessary, one possible hackaround would be treat it as Joint(#) + Joint([) + Word(doc) where the [ is joint instead of alone when sugared.

    I agree TokenKind::DocComment is not great but I could accept that too.

  • Char instead of Op isn't great either. Char means character literal 'c'.

    Related: I'm also not happy with Symbol. Something like $ is colloquially way more strongly associated with the word "symbol" than "operator".

    I will try to come up with some good alternatives.

  • TokenKind::Word for true and false is great. I agree with the justification you gave.

  • If that's all that is missing to empower an external quote! implementation, I would include it here. No need to put off something we know we need.

  • Regarding Deref for Symbol: I don't find that compelling. I understand it is just a smart pointer to str but that doesn't seem like the intended way to interact with it or conceptualize it. It should be comparatively rare for a proc macro to need the str of a Symbol or to call str methods on it, so there should be an explicit indication that the user meant to do that.

Member

dtolnay commented May 23, 2017

  • For inf and nan - I would prefer Alex's suggestion of panicking when passed to the Literal constructor. I have never had a use case where I would have wanted them to turn into f32::NAN.

  • I agree, we don't need hex (or in general, human-friendly) literals in this initial implementation.

  • In hindsight how do we feel about the macro_rules handling of doc comments?

    macro_rules! doc {
        (# [ doc = $s:tt ]) => { $s }
    }
    
    fn main() {
        // prints " informative "
        println!("{:?}", doc!(/** informative */));
    }

    I thought we were good with that. Can we do doc comments that way here? # + bracketed content.

    One limitation is there wouldn't be a way to track what is currently is_sugared_doc. For example in stringify!(/** informative */) which prints as a doc comment, not #[doc]. Would a proc macro ever require knowing whether it is a sugared doc? I don't think so, in fact it may be better if they can't.

    If necessary, one possible hackaround would be treat it as Joint(#) + Joint([) + Word(doc) where the [ is joint instead of alone when sugared.

    I agree TokenKind::DocComment is not great but I could accept that too.

  • Char instead of Op isn't great either. Char means character literal 'c'.

    Related: I'm also not happy with Symbol. Something like $ is colloquially way more strongly associated with the word "symbol" than "operator".

    I will try to come up with some good alternatives.

  • TokenKind::Word for true and false is great. I agree with the justification you gave.

  • If that's all that is missing to empower an external quote! implementation, I would include it here. No need to put off something we know we need.

  • Regarding Deref for Symbol: I don't find that compelling. I understand it is just a smart pointer to str but that doesn't seem like the intended way to interact with it or conceptualize it. It should be comparatively rare for a proc macro to need the str of a Symbol or to call str methods on it, so there should be an explicit indication that the user meant to do that.

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin May 23, 2017

Contributor

unicode identifiers are valid Words without exception (I think)

Non-ASCII identifiers go through some amount of normalization, so Symbol::from(s).deref() == s doesn’t hold for every s: &str. If that can’t be the case since Symbol is also used in other cases, then a struct Indent(Symbol); wrapper type doing normalization is needed.

Contributor

SimonSapin commented May 23, 2017

unicode identifiers are valid Words without exception (I think)

Non-ASCII identifiers go through some amount of normalization, so Symbol::from(s).deref() == s doesn’t hold for every s: &str. If that can’t be the case since Symbol is also used in other cases, then a struct Indent(Symbol); wrapper type doing normalization is needed.

@dtolnay

This comment has been minimized.

Show comment
Hide comment
@dtolnay

dtolnay May 23, 2017

Member

Here are four naming ideas.

enum TokenKind {
    Group(Delimiter, TokenStream),
    Term(Term),
    Punctuation(char, Spacing),
    Literal(Literal),
}

Group—I like this strictly better than Sequence. People are used to thinking of parentheses in expressions as grouping things. Sequence sounds flat.

Term—I like this one except for unfortunately it is a prefix of words like terminate, terminal, terminus which may be confusing. I like that the meaning perfectly covers both keywords and names while excluding punctuation symbols.

Punctuation—This is the right word but lots of characters. Punct could work but is inconsistent because nothing else is abbreviated (after removing Op). Alternatives: Key, Mark, Symbol, Sign, Rune

Spacing—Instead of PunctuationKind of course.

Member

dtolnay commented May 23, 2017

Here are four naming ideas.

enum TokenKind {
    Group(Delimiter, TokenStream),
    Term(Term),
    Punctuation(char, Spacing),
    Literal(Literal),
}

Group—I like this strictly better than Sequence. People are used to thinking of parentheses in expressions as grouping things. Sequence sounds flat.

Term—I like this one except for unfortunately it is a prefix of words like terminate, terminal, terminus which may be confusing. I like that the meaning perfectly covers both keywords and names while excluding punctuation symbols.

Punctuation—This is the right word but lots of characters. Punct could work but is inconsistent because nothing else is abbreviated (after removing Op). Alternatives: Key, Mark, Symbol, Sign, Rune

Spacing—Instead of PunctuationKind of course.

@dtolnay

This comment has been minimized.

Show comment
Hide comment
@dtolnay

dtolnay May 23, 2017

Member

Discussed above:

  • Debug for everything
  • Literal constructor panics on bad floats
  • Literal::integer
  • Literal::float
  • Literal::byte_string
  • Naming in TokenKind (my previous comment or similar)
  • Remove Deref for Symbol
  • Treat doc comments as #[doc = ...] instead of Literal
  • quote_span
Member

dtolnay commented May 23, 2017

Discussed above:

  • Debug for everything
  • Literal constructor panics on bad floats
  • Literal::integer
  • Literal::float
  • Literal::byte_string
  • Naming in TokenKind (my previous comment or similar)
  • Remove Deref for Symbol
  • Treat doc comments as #[doc = ...] instead of Literal
  • quote_span
@Mark-Simulacrum

This comment has been minimized.

Show comment
Hide comment
@Mark-Simulacrum

Mark-Simulacrum Jun 27, 2017

Member

@bors retry

Timeouts across the board.

Member

Mark-Simulacrum commented Jun 27, 2017

@bors retry

Timeouts across the board.

@Mark-Simulacrum

This comment has been minimized.

Show comment
Hide comment
@Mark-Simulacrum

Mark-Simulacrum Jun 27, 2017

Member

@bors r-

Still seeing #40939 (comment); not sure actually why bors tested this again...

Member

Mark-Simulacrum commented Jun 27, 2017

@bors r-

Still seeing #40939 (comment); not sure actually why bors tested this again...

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Jun 30, 2017

Contributor

☔️ The latest upstream changes (presumably #42902) made this pull request unmergeable. Please resolve the merge conflicts.

Contributor

bors commented Jun 30, 2017

☔️ The latest upstream changes (presumably #42902) made this pull request unmergeable. Please resolve the merge conflicts.

@aidanhs

This comment has been minimized.

Show comment
Hide comment
@aidanhs

aidanhs Jul 5, 2017

Member

@jseyfried just a quick reminder that there are some test failures that need a look!

Member

aidanhs commented Jul 5, 2017

@jseyfried just a quick reminder that there are some test failures that need a look!

rustbuild: Only -Zsave-analysis for libstd
Don't pass the flag when we're compiling the compiler or other related tools
@alexcrichton

This comment has been minimized.

Show comment
Hide comment
Member

alexcrichton commented Jul 5, 2017

@bors r=nrc

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Jul 5, 2017

Contributor

📌 Commit 78fdbfc has been approved by nrc

Contributor

bors commented Jul 5, 2017

📌 Commit 78fdbfc has been approved by nrc

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Jul 5, 2017

Member

I rebased this on master, fixed a conflict or two, and added 78fdbfc. According to @nrc we were only supposed to do -Zsave-analysis for libstd anyway, and it has the nice side effect of papering over the bugs seen on CI anyway.

Member

alexcrichton commented Jul 5, 2017

I rebased this on master, fixed a conflict or two, and added 78fdbfc. According to @nrc we were only supposed to do -Zsave-analysis for libstd anyway, and it has the nice side effect of papering over the bugs seen on CI anyway.

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Jul 5, 2017

Contributor

⌛️ Testing commit 78fdbfc with merge 4d526e0...

Contributor

bors commented Jul 5, 2017

⌛️ Testing commit 78fdbfc with merge 4d526e0...

bors added a commit that referenced this pull request Jul 5, 2017

Auto merge of #40939 - jseyfried:proc_macro_api, r=nrc
proc_macro: implement `TokenTree`, `TokenKind`, hygienic `quote!`, and other API

All new API is gated behind `#![feature(proc_macro)]` and may be used with `#[proc_macro]`, `#[proc_macro_attribute]`, and `#[proc_macro_derive]` procedural macros.

More specifically, this PR adds the following in `proc_macro`:
```rust
// `TokenStream` constructors:
impl TokenStream { fn empty() -> TokenStream { ... } }
impl From<TokenTree> for TokenStream { ... }
impl From<TokenKind> for TokenStream { ... }
impl<T: Into<TokenStream>> FromIterator<T> for TokenStream { ... }
macro quote($($t:tt)*) { ... } // A hygienic `TokenStream` quoter

// `TokenStream` destructuring:
impl TokenStream { fn is_empty(&self) -> bool { ... } }
impl IntoIterator for TokenStream { type Item = TokenTree; ... }

struct TokenTree { span: Span, kind: TokenKind }
impl From<TokenKind> for TokenTree { ... }
impl Display for TokenTree { ... }

struct Span { ... } // a region of source code along with expansion/hygiene information
impl Default for Span { ... } // a span from the current procedural macro definition
impl Span { fn call_site() -> Span { ... } } // the call site of the current expansion
fn quote_span(span: Span) -> TokenStream;

enum TokenKind {
    Group(Delimiter, TokenStream), // A delimited sequence, e.g. `( ... )`
    Term(Term), // a unicode identifier, lifetime ('a), or underscore
    Op(char, Spacing), // a punctuation character (`+`, `,`, `$`, etc.).
    Literal(Literal), // a literal character (`'a'`), string (`"hello"`), or number (`2.3`)
}

enum Delimiter {
    Parenthesis, // `( ... )`
    Brace, // `[ ... ]`
    Bracket, // `{ ... }`
    None, // an implicit delimiter, e.g. `$var`, where $var is  `...`.
}

struct Term { ... } // An interned string
impl Term {
    fn intern(string: &str) -> Symbol { ... }
    fn as_str(&self) -> &str { ... }
}

enum Spacing {
    Alone, // not immediately followed by another `Op`, e.g. `+` in `+ =`.
    Joint, // immediately followed by another `Op`, e.g. `+` in `+=`
}

struct Literal { ... }
impl Display for Literal { ... }
impl Literal {
    fn integer(n: i128) -> Literal { .. } // unsuffixed integer literal
    fn float(n: f64) -> Literal { .. } // unsuffixed floating point literal
    fn u8(n: u8) -> Literal { ... } // similarly: i8, u16, i16, u32, i32, u64, i64, f32, f64
    fn string(string: &str) -> Literal { ... }
    fn character(ch: char) -> Literal { ... }
    fn byte_string(bytes: &[u8]) -> Literal { ... }
}
```
For details on `quote!` hygiene, see [this example](20a9048) and [declarative macros 2.0](#40847).

r? @nrc
@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Jul 6, 2017

Contributor

☀️ Test successful - status-appveyor, status-travis
Approved by: nrc
Pushing 4d526e0 to master...

Contributor

bors commented Jul 6, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: nrc
Pushing 4d526e0 to master...

@bors bors merged commit 78fdbfc into rust-lang:master Jul 6, 2017

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details

@jseyfried jseyfried deleted the jseyfried:proc_macro_api branch Jul 6, 2017

@jseyfried

This comment has been minimized.

Show comment
Hide comment
@jseyfried

jseyfried Jul 6, 2017

Contributor

@alexcrichton Awesome, thanks!

For the record, the failing test was because tokens from a macros 1.1-style proc-macro (i.e. via FromStr for TokenStream) now are given the proc-macro's call site span, as opposed to the proc-macros's call site span with extra expansion information before this PR.

Since we no longer have the extra expansion information, @nrc's save-analysis code tries to re-parses the call site span and expects it to match the proc-macro generated tokens.

I had to get rid of the expansion information for backwards compatibility -- if the expansion information showed that the span came from a procedural macro 2.0, then the underlying tokens would resolve hygienically (as per hygiene 2.0). This would be a backwards-incompatible change, and would mean that stringifying, reparsing and returning the input of a proc-macro would change its semantics, even if the input were entirely unexpanded.

If we want to fix this, we could instead mark tokens from FromStr for TokenStream specially so that save-analysis would know that they are "weird", but so that they would still behave like unexpanded tokens w.r.t. hygiene. I don't have a strong opinion here.

Contributor

jseyfried commented Jul 6, 2017

@alexcrichton Awesome, thanks!

For the record, the failing test was because tokens from a macros 1.1-style proc-macro (i.e. via FromStr for TokenStream) now are given the proc-macro's call site span, as opposed to the proc-macros's call site span with extra expansion information before this PR.

Since we no longer have the extra expansion information, @nrc's save-analysis code tries to re-parses the call site span and expects it to match the proc-macro generated tokens.

I had to get rid of the expansion information for backwards compatibility -- if the expansion information showed that the span came from a procedural macro 2.0, then the underlying tokens would resolve hygienically (as per hygiene 2.0). This would be a backwards-incompatible change, and would mean that stringifying, reparsing and returning the input of a proc-macro would change its semantics, even if the input were entirely unexpanded.

If we want to fix this, we could instead mark tokens from FromStr for TokenStream specially so that save-analysis would know that they are "weird", but so that they would still behave like unexpanded tokens w.r.t. hygiene. I don't have a strong opinion here.

@oli-obk

This comment has been minimized.

Show comment
Hide comment
@oli-obk

oli-obk Jul 9, 2017

Contributor

The lack of expansion info broke clippy together with proc macros, since code inside derives will now trigger lints. Is there any new way to detect expanded code? If not, what needs to be done in order to create expansion info that doesn't influence save analysis.

See rust-lang-nursery/rust-clippy#1877

Contributor

oli-obk commented Jul 9, 2017

The lack of expansion info broke clippy together with proc macros, since code inside derives will now trigger lints. Is there any new way to detect expanded code? If not, what needs to be done in order to create expansion info that doesn't influence save analysis.

See rust-lang-nursery/rust-clippy#1877

@nrc

This comment has been minimized.

Show comment
Hide comment
@nrc

nrc Jul 13, 2017

Member

Since we no longer have the extra expansion information, @nrc's save-analysis code tries to re-parses the call site span and expects it to match the proc-macro generated tokens.

Ah, I wondered why I was seeing so many errors in the RLS :-(

I think we must fix this. This feels like another reason not to conflate macro hygiene with expansion history. That might be too hard a way to fix this though.

If we want to fix this, we could instead mark tokens from FromStr for TokenStream specially so that save-analysis would know that they are "weird", but so that they would still behave like unexpanded tokens w.r.t. hygiene. I don't have a strong opinion here.

Not using spans for hygiene seems a much nicer fix here - the span should reflect the expansion history and the hygiene should be altered (to erase hygiene, essentially). I expect we want the correct expansion info for errors as well as save-analysis.

Member

nrc commented Jul 13, 2017

Since we no longer have the extra expansion information, @nrc's save-analysis code tries to re-parses the call site span and expects it to match the proc-macro generated tokens.

Ah, I wondered why I was seeing so many errors in the RLS :-(

I think we must fix this. This feels like another reason not to conflate macro hygiene with expansion history. That might be too hard a way to fix this though.

If we want to fix this, we could instead mark tokens from FromStr for TokenStream specially so that save-analysis would know that they are "weird", but so that they would still behave like unexpanded tokens w.r.t. hygiene. I don't have a strong opinion here.

Not using spans for hygiene seems a much nicer fix here - the span should reflect the expansion history and the hygiene should be altered (to erase hygiene, essentially). I expect we want the correct expansion info for errors as well as save-analysis.

@oli-obk

This comment has been minimized.

Show comment
Hide comment
@oli-obk

oli-obk Jul 13, 2017

Contributor

I already have a fix in #43179

Contributor

oli-obk commented Jul 13, 2017

I already have a fix in #43179

@jseyfried

This comment has been minimized.

Show comment
Hide comment
@jseyfried

jseyfried Jul 13, 2017

Contributor

@nrc
We could instead just include hygiene bending information along with expansion information so that span expansion info and hygiene info remain unified (@oli-obk's solution is fine for impl FromStr for TokenStream, so we're OK for now).

That is, I don't think think hygiene bending requires us to split up span/hygiene info or construct an "fake" expansion history just for hygiene. I think it'd be nicer to have a single source of truth for expansion info that is expressive enough for hygiene bending.

Contributor

jseyfried commented Jul 13, 2017

@nrc
We could instead just include hygiene bending information along with expansion information so that span expansion info and hygiene info remain unified (@oli-obk's solution is fine for impl FromStr for TokenStream, so we're OK for now).

That is, I don't think think hygiene bending requires us to split up span/hygiene info or construct an "fake" expansion history just for hygiene. I think it'd be nicer to have a single source of truth for expansion info that is expressive enough for hygiene bending.

bors added a commit that referenced this pull request Jul 18, 2017

Auto merge of #43247 - est31:master, r=alexcrichton
Tidy: allow common lang+lib features

This allows changes to the Rust language that have both library
and language components share one feature gate.

The feature gates need to be "about the same change", so that both
library and language components must either be both unstable, or
both stable, and share the tracking issue.

Removes the ugly "proc_macro" exception added by #40939.

Closes #43089

bors added a commit that referenced this pull request Jul 19, 2017

Auto merge of #43247 - est31:master, r=alexcrichton
Tidy: allow common lang+lib features

This allows changes to the Rust language that have both library
and language components share one feature gate.

The feature gates need to be "about the same change", so that both
library and language components must either be both unstable, or
both stable, and share the tracking issue.

Removes the ugly "proc_macro" exception added by #40939.

Closes #43089

bors added a commit that referenced this pull request Jul 19, 2017

Auto merge of #43247 - est31:master, r=alexcrichton
Tidy: allow common lang+lib features

This allows changes to the Rust language that have both library
and language components share one feature gate.

The feature gates need to be "about the same change", so that both
library and language components must either be both unstable, or
both stable, and share the tracking issue.

Removes the ugly "proc_macro" exception added by #40939.

Closes #43089

bors added a commit that referenced this pull request Jul 20, 2017

Auto merge of #43247 - est31:master, r=alexcrichton
Tidy: allow common lang+lib features

This allows changes to the Rust language that have both library
and language components share one feature gate.

The feature gates need to be "about the same change", so that both
library and language components must either be both unstable, or
both stable, and share the tracking issue.

Removes the ugly "proc_macro" exception added by #40939.

Closes #43089
@nrc

This comment has been minimized.

Show comment
Hide comment
@nrc

nrc Jul 25, 2017

Member

So #43179 did not fix the expansion info issue (at least not completely) - I'm still seeing errors in the RLS where we are treating macro-generated names as hand-written ones (and I think this is causing #43371 too).

Member

nrc commented Jul 25, 2017

So #43179 did not fix the expansion info issue (at least not completely) - I'm still seeing errors in the RLS where we are treating macro-generated names as hand-written ones (and I think this is causing #43371 too).

bors added a commit that referenced this pull request Aug 1, 2017

Auto merge of #43533 - nrc:macro-save, r=jseyfried,
Three small fixes for save-analysis

First commit does some naive deduplication of macro uses. We end up with lots of duplication here because of the weird way we get this data (we extract a use for every span generated by a macro use).

Second commit is basically a typo fix.

Third commit is a bit interesting, it partially reverts a change from #40939 where temporary variables in format! (and thus println!) got a span with the primary pointing at the value stored into the temporary (e.g., `x` in `println!("...", x)`). If `format!` had a definition it should point at the temporary in the macro def, but since it is built-in, that is not possible (for now), so `DUMMY_SP` is the best we can do (using the span in the callee really breaks save-analysis because it thinks `x` is a definition as well as a reference).

There aren't a test for this stuff because: the deduplication is filtered by any of the users of save-analysis, so it is purely an efficiency change. I couldn't actually find an example for the second commit that we have any machinery to test, and the third commit is tested by the RLS, so there will be a test once I update the RLS version and and uncomment the previously failing tests).

r? @jseyfried

@aidanhs aidanhs referenced this pull request Aug 10, 2017

Closed

Mis-calculated spans #43796

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment