Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect and inconsistent jointness of tokens in desugared doc comment #49596

Closed
dtolnay opened this issue Apr 2, 2018 · 9 comments · Fixed by #49597
Closed

Incorrect and inconsistent jointness of tokens in desugared doc comment #49596

dtolnay opened this issue Apr 2, 2018 · 9 comments · Fixed by #49597
Assignees
Labels
A-macros-2.0 Area: Declarative macros 2.0 (#39412)

Comments

@dtolnay
Copy link
Member

dtolnay commented Apr 2, 2018

I have not minimized this yet but in TeXitoi/structopt#88 we are seeing inexplicable behavior when iterating over tokens of a struct field doc comment. Related to #49545 so mentioning @alexcrichton.

One of their test cases contains the following struct.

/// Lorem ipsum
#[derive(StructOpt, PartialEq, Debug)]
struct LoremIpsum {
    /// Fooify a bar
    /// and a baz
    #[structopt(short = "f", long = "foo")]
    foo: bool,
}

Within their macro implementation we are seeing the desugared doc comment of /// Fooify a bar having an Alone spacing:

Op(Op { op: '=', spacing: Alone, span: Span(Span { lo: BytePos(0), hi: BytePos(0), ctxt: #0 }) })
Literal(Literal(Literal(Str_(/// Fooify a bar), None)))

while the desugared /// and a baz has a Joint spacing. I believe the Joint is incorrect because only an Op followed by another Op should be able to have Joint spacing.

Op(Op { op: '=', spacing: Joint, span: Span(Span { lo: BytePos(0), hi: BytePos(0), ctxt: #0 }) })
Literal(Literal(Literal(Str_(/// and a baz), None)))

I stuck the following loop at the top of their derive entry point:

    for tt in input.clone() {
        println!("{:#?}", tt);
    }

and it indicates that the first doc comment has kind: Tree while the second has kind: JointTree.

                        TokenStream {
                            kind: Tree(
                                Token(
                                    Span {
                                        lo: BytePos(
                                            0
                                        ),
                                        hi: BytePos(
                                            0
                                        ),
                                        ctxt: #0
                                    },
                                    DocComment(
                                        /// Fooify a bar
                                    )
                                )
                            )
                        },
                        TokenStream {
                            kind: JointTree(
                                Token(
                                    Span {
                                        lo: BytePos(
                                            0
                                        ),
                                        hi: BytePos(
                                            0
                                        ),
                                        ctxt: #0
                                    },
                                    DocComment(
                                        /// and a baz
                                    )
                                )
                            )
                        },
@alexcrichton
Copy link
Member

@dtolnay I think I know what's happening here in terms of where the Joint is coming from but I'm also a little confused as to where TeXitoi/structopt#88 is happening.

I was unable to get JointTree or Joint to show up though unfortunately in a smaller hello-world test with just adding some doc comments.

I'll see what I can about the spurious Joint at least though

@TeXitoi
Copy link
Contributor

TeXitoi commented Apr 2, 2018

The bug has nothing related to the PR, that's just the first time the error is reported. I can reproduce the failed assertion on master.

But there is a description of what happen in the PR. Basically, there is a 2 line doc comment, but only the first one is processed.

@TeXitoi
Copy link
Contributor

TeXitoi commented Apr 2, 2018

@dtolnay
Copy link
Member Author

dtolnay commented Apr 2, 2018

Yeah I couldn't reproduce this in a straightforward minimized macro either, but checking out the structopt PR at commit 5000840d1a6b79cf704caa927b4f5b15d29978c9 and cargo test --features nightly does reproduce it. I am curious what you find!

Their problem is here + here where a Joint doc = "..." is not considered a legal Meta.

@alexcrichton
Copy link
Member

Ok thanks for the info! I've confirmed that this is fixed by #49597 where I jiggered things around a bit. Namely it appears that the Alone spacing is indeed what we need to fix this.

@dtolnay
Copy link
Member Author

dtolnay commented Apr 2, 2018

Repro script

#!/bin/sh

cargo new --lib structopt_derive
cargo new --lib repro

echo >structopt_derive/src/lib.rs '
#![feature(proc_macro)]

extern crate proc_macro;
use proc_macro::{TokenStream, TokenTreeIter, TokenTree, TokenNode, Spacing};

#[proc_macro_derive(StructOpt)]
pub fn derive_structopt(input: TokenStream) -> TokenStream {
    let mut iter = input.into_iter();
    assert_eq!(iter.next().unwrap().to_string(), "struct");
    assert_eq!(iter.next().unwrap().to_string(), "S");
    let mut inner = unwrap_group(iter.next().unwrap());

    assert_eq!(inner.next().unwrap().to_string(), "#");
    let mut x = unwrap_group(inner.next().unwrap());
    assert_eq!(x.next().unwrap().to_string(), "doc");
    println!("{:?} {}", unwrap_spacing(x.next().unwrap()), x.next().unwrap());

    assert_eq!(inner.next().unwrap().to_string(), "#");
    let mut y = unwrap_group(inner.next().unwrap());
    assert_eq!(y.next().unwrap().to_string(), "doc");
    println!("{:?} {}", unwrap_spacing(y.next().unwrap()), y.next().unwrap());

    TokenStream::empty()
}

fn unwrap_group(tt: TokenTree) -> TokenTreeIter {
    match tt.kind {
        TokenNode::Group(_, s) => s.into_iter(),
        _ => unimplemented!(),
    }
}

fn unwrap_spacing(tt: TokenTree) -> Spacing {
    match tt.kind {
        TokenNode::Op(_, s) => s,
        _ => unimplemented!(),
    }
}
'

echo >>structopt_derive/Cargo.toml '
[lib]
proc-macro = true
'

echo >repro/src/lib.rs '
#![allow(dead_code)]

#[macro_use]
extern crate structopt_derive;

fn f() {
    #[derive(StructOpt)]
    struct S {
        /// X
        /// Y
        #[doc(hidden)]
        foo: bool,
    }
}
'

echo >>repro/Cargo.toml '
structopt_derive = { path = "../structopt_derive" }
'

cargo build --manifest-path repro/Cargo.toml

Output

Alone "/// X"
Joint "/// Y"

@dtolnay
Copy link
Member Author

dtolnay commented Apr 2, 2018

Interestingly in the script if you move struct S outside of the fn f, then the output is correct.

Alone "/// X"
Alone "/// Y"

@alexcrichton when you say "jiggered things around a bit" is that of the intentional sort or the accidentally-fixed-it sort? It would be good to understand why the token stream was different depending on whether the macro is invoked inside a function or outside -- not clear in your PR what might have fixed that.

@alexcrichton
Copy link
Member

@dtolnay sure yeah, worth documenting!

So I'm not really sure why, but the JointTree vs Tree is what's going on here. For whatever reason rustc seems to be parsing a JointTree inside a function and a Tree outside, I'm not really sure why. Assuming that though we know that the is_joint is true for the second macro. The is_joint is then later used for spacing in Op tokens, and in the op! macro you'll see how the last token uses the op_kind and all previous ones are Spacing::Joint.

Unfortunately I fat-fingered this a bit and in the doc comment expansion the code also uses op! as a sort of shortcut to TokenTree::Op. This is the bug, however, as the tokens are inheriting the is_joint variable and Joint spacing accidentally. The internal tokens don't have the same spacing, and we don't actually have any ability to communicate a Joint doc comment so that information needs to get lost (instead of preserved at a sort of random location).

In the PR I made these are both explicitly changed to Spacing::Alone as they're supposed to be (as the operators aren't actually joint with anything).

@dtolnay
Copy link
Member Author

dtolnay commented Apr 2, 2018

For whatever reason rustc seems to be parsing a JointTree inside a function and a Tree outside, I'm not really sure why.

This is really what I was interested in. I filed #49604 to follow up. Thanks!

alexcrichton added a commit to alexcrichton/rust that referenced this issue Apr 5, 2018
…chenkov

proc_macro: Reorganize public API

This commit is a reorganization of the `proc_macro` crate's public user-facing
API. This is the result of a number of discussions at the recent Rust All-Hands
where we're hoping to get the `proc_macro` crate into ship shape for
stabilization of a subset of its functionality in the Rust 2018 release.

The reorganization here is motivated by experiences from the `proc-macro2`,
`quote`, and `syn` crates on crates.io (and other crates which depend on them).
The main focus is future flexibility along with making a few more operations
consistent and/or fixing bugs. A summary of the changes made from today's
`proc_macro` API is:

* The `TokenNode` enum has been removed and the public fields of `TokenTree`
  have also been removed. Instead the `TokenTree` type is now a public enum
  (what `TokenNode` was) and each variant is an opaque struct which internally
  contains `Span` information. This makes the various tokens a bit more
  consistent, require fewer wrappers, and otherwise provides good
  future-compatibility as opaque structs are easy to modify later on.

* `Literal` integer constructors have been expanded to be unambiguous as to what
  they're doing and also allow for more future flexibility. Previously
  constructors like `Literal::float` and `Literal::integer` were used to create
  unsuffixed literals and the concrete methods like `Literal::i32` would create
  a suffixed token. This wasn't immediately clear to all users (the
  suffixed/unsuffixed aspect) and having *one* constructor for unsuffixed
  literals required us to pick a largest type which may not always be true. To
  fix these issues all constructors are now of the form
  `Literal::i32_unsuffixed` or `Literal::i32_suffixed` (for all integral types).
  This should allow future compatibility as well as being immediately clear
  what's suffixed and what isn't.

* Each variant of `TokenTree` internally contains a `Span` which can also be
  configured via `set_span`. For example `Literal` and `Term` now both
  internally contain a `Span` rather than having it stored in an auxiliary
  location.

* Constructors of all tokens are called `new` now (aka `Term::intern` is gone)
  and most do not take spans. Manufactured tokens typically don't have a fresh
  span to go with them and the span is purely used for error-reporting
  **except** the span for `Term`, which currently affects hygiene. The default
  spans for all these constructed tokens is `Span::call_site()` for now.

  The `Term` type's constructor explicitly requires passing in a `Span` to
  provide future-proofing against possible hygiene changes. It's intended that a
  first pass of stabilization will likely only stabilize `Span::call_site()`
  which is an explicit opt-in for "I would like no hygiene here please". The
  intention here is to make this explicit in procedural macros to be
  forwards-compatible with a hygiene-specifying solution.

* Some of the conversions for `TokenStream` have been simplified a little.

* The `TokenTreeIter` iterator was renamed to `token_stream::IntoIter`.

Overall the hope is that this is the "final pass" at the API of `TokenStream`
and most of `TokenTree` before stabilization. Explicitly left out here is any
changes to `Span`'s API which will likely need to be re-evaluated before
stabilization.

All changes in this PR have already been reflected to the [`proc-macro2`],
`quote`, and `syn` crates. New versions of all these crates have also been
published to crates.io.

Once this lands in nightly I plan on making an internals post again summarizing
the changes made here and also calling on all macro authors to give the APIs a
spin and see how they work. Hopefully pending no major issues we can then have
an FCP to stabilize later this cycle!

[`proc-macro2`]: https://docs.rs/proc-macro2/0.3.1/proc_macro2/

Closes rust-lang#49596
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-macros-2.0 Area: Declarative macros 2.0 (#39412)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants