Incorrect and inconsistent jointness of tokens in desugared doc comment #49596

dtolnay · 2018-04-02T18:48:42Z

I have not minimized this yet but in TeXitoi/structopt#88 we are seeing inexplicable behavior when iterating over tokens of a struct field doc comment. Related to #49545 so mentioning @alexcrichton.

One of their test cases contains the following struct.

/// Lorem ipsum
#[derive(StructOpt, PartialEq, Debug)]
struct LoremIpsum {
    /// Fooify a bar
    /// and a baz
    #[structopt(short = "f", long = "foo")]
    foo: bool,
}

Within their macro implementation we are seeing the desugared doc comment of /// Fooify a bar having an Alone spacing:

Op(Op { op: '=', spacing: Alone, span: Span(Span { lo: BytePos(0), hi: BytePos(0), ctxt: #0 }) })
Literal(Literal(Literal(Str_(/// Fooify a bar), None)))

while the desugared /// and a baz has a Joint spacing. I believe the Joint is incorrect because only an Op followed by another Op should be able to have Joint spacing.

Op(Op { op: '=', spacing: Joint, span: Span(Span { lo: BytePos(0), hi: BytePos(0), ctxt: #0 }) })
Literal(Literal(Literal(Str_(/// and a baz), None)))

I stuck the following loop at the top of their derive entry point:

    for tt in input.clone() {
        println!("{:#?}", tt);
    }

and it indicates that the first doc comment has kind: Tree while the second has kind: JointTree.

                        TokenStream {
                            kind: Tree(
                                Token(
                                    Span {
                                        lo: BytePos(
                                            0
                                        ),
                                        hi: BytePos(
                                            0
                                        ),
                                        ctxt: #0
                                    },
                                    DocComment(
                                        /// Fooify a bar
                                    )
                                )
                            )
                        },
                        TokenStream {
                            kind: JointTree(
                                Token(
                                    Span {
                                        lo: BytePos(
                                            0
                                        ),
                                        hi: BytePos(
                                            0
                                        ),
                                        ctxt: #0
                                    },
                                    DocComment(
                                        /// and a baz
                                    )
                                )
                            )
                        },

The text was updated successfully, but these errors were encountered:

alexcrichton · 2018-04-02T20:09:53Z

@dtolnay I think I know what's happening here in terms of where the Joint is coming from but I'm also a little confused as to where TeXitoi/structopt#88 is happening.

I was unable to get JointTree or Joint to show up though unfortunately in a smaller hello-world test with just adding some doc comments.

I'll see what I can about the spurious Joint at least though

TeXitoi · 2018-04-02T20:18:02Z

The bug has nothing related to the PR, that's just the first time the error is reported. I can reproduce the failed assertion on master.

But there is a description of what happen in the PR. Basically, there is a 2 line doc comment, but only the first one is processed.

TeXitoi · 2018-04-02T20:20:02Z

https://github.com/TeXitoi/structopt/blob/master/tests/doc-comments-help.rs first test.

dtolnay · 2018-04-02T20:20:10Z

Yeah I couldn't reproduce this in a straightforward minimized macro either, but checking out the structopt PR at commit 5000840d1a6b79cf704caa927b4f5b15d29978c9 and cargo test --features nightly does reproduce it. I am curious what you find!

Their problem is here + here where a Joint doc = "..." is not considered a legal Meta.

alexcrichton · 2018-04-02T20:47:51Z

Ok thanks for the info! I've confirmed that this is fixed by #49597 where I jiggered things around a bit. Namely it appears that the Alone spacing is indeed what we need to fix this.

dtolnay · 2018-04-02T20:48:01Z

Repro script

#!/bin/sh

cargo new --lib structopt_derive
cargo new --lib repro

echo >structopt_derive/src/lib.rs '
#![feature(proc_macro)]

extern crate proc_macro;
use proc_macro::{TokenStream, TokenTreeIter, TokenTree, TokenNode, Spacing};

#[proc_macro_derive(StructOpt)]
pub fn derive_structopt(input: TokenStream) -> TokenStream {
    let mut iter = input.into_iter();
    assert_eq!(iter.next().unwrap().to_string(), "struct");
    assert_eq!(iter.next().unwrap().to_string(), "S");
    let mut inner = unwrap_group(iter.next().unwrap());

    assert_eq!(inner.next().unwrap().to_string(), "#");
    let mut x = unwrap_group(inner.next().unwrap());
    assert_eq!(x.next().unwrap().to_string(), "doc");
    println!("{:?} {}", unwrap_spacing(x.next().unwrap()), x.next().unwrap());

    assert_eq!(inner.next().unwrap().to_string(), "#");
    let mut y = unwrap_group(inner.next().unwrap());
    assert_eq!(y.next().unwrap().to_string(), "doc");
    println!("{:?} {}", unwrap_spacing(y.next().unwrap()), y.next().unwrap());

    TokenStream::empty()
}

fn unwrap_group(tt: TokenTree) -> TokenTreeIter {
    match tt.kind {
        TokenNode::Group(_, s) => s.into_iter(),
        _ => unimplemented!(),
    }
}

fn unwrap_spacing(tt: TokenTree) -> Spacing {
    match tt.kind {
        TokenNode::Op(_, s) => s,
        _ => unimplemented!(),
    }
}
'

echo >>structopt_derive/Cargo.toml '
[lib]
proc-macro = true
'

echo >repro/src/lib.rs '
#![allow(dead_code)]

#[macro_use]
extern crate structopt_derive;

fn f() {
    #[derive(StructOpt)]
    struct S {
        /// X
        /// Y
        #[doc(hidden)]
        foo: bool,
    }
}
'

echo >>repro/Cargo.toml '
structopt_derive = { path = "../structopt_derive" }
'

cargo build --manifest-path repro/Cargo.toml

Output

Alone "/// X"
Joint "/// Y"

dtolnay · 2018-04-02T20:58:22Z

Interestingly in the script if you move struct S outside of the fn f, then the output is correct.

Alone "/// X"
Alone "/// Y"

@alexcrichton when you say "jiggered things around a bit" is that of the intentional sort or the accidentally-fixed-it sort? It would be good to understand why the token stream was different depending on whether the macro is invoked inside a function or outside -- not clear in your PR what might have fixed that.

alexcrichton · 2018-04-02T21:46:28Z

@dtolnay sure yeah, worth documenting!

So I'm not really sure why, but the JointTree vs Tree is what's going on here. For whatever reason rustc seems to be parsing a JointTree inside a function and a Tree outside, I'm not really sure why. Assuming that though we know that the is_joint is true for the second macro. The is_joint is then later used for spacing in Op tokens, and in the op! macro you'll see how the last token uses the op_kind and all previous ones are Spacing::Joint.

Unfortunately I fat-fingered this a bit and in the doc comment expansion the code also uses op! as a sort of shortcut to TokenTree::Op. This is the bug, however, as the tokens are inheriting the is_joint variable and Joint spacing accidentally. The internal tokens don't have the same spacing, and we don't actually have any ability to communicate a Joint doc comment so that information needs to get lost (instead of preserved at a sort of random location).

In the PR I made these are both explicitly changed to Spacing::Alone as they're supposed to be (as the operators aren't actually joint with anything).

dtolnay · 2018-04-02T22:25:50Z

For whatever reason rustc seems to be parsing a JointTree inside a function and a Tree outside, I'm not really sure why.

This is really what I was interested in. I filed #49604 to follow up. Thanks!

…chenkov proc_macro: Reorganize public API This commit is a reorganization of the `proc_macro` crate's public user-facing API. This is the result of a number of discussions at the recent Rust All-Hands where we're hoping to get the `proc_macro` crate into ship shape for stabilization of a subset of its functionality in the Rust 2018 release. The reorganization here is motivated by experiences from the `proc-macro2`, `quote`, and `syn` crates on crates.io (and other crates which depend on them). The main focus is future flexibility along with making a few more operations consistent and/or fixing bugs. A summary of the changes made from today's `proc_macro` API is: * The `TokenNode` enum has been removed and the public fields of `TokenTree` have also been removed. Instead the `TokenTree` type is now a public enum (what `TokenNode` was) and each variant is an opaque struct which internally contains `Span` information. This makes the various tokens a bit more consistent, require fewer wrappers, and otherwise provides good future-compatibility as opaque structs are easy to modify later on. * `Literal` integer constructors have been expanded to be unambiguous as to what they're doing and also allow for more future flexibility. Previously constructors like `Literal::float` and `Literal::integer` were used to create unsuffixed literals and the concrete methods like `Literal::i32` would create a suffixed token. This wasn't immediately clear to all users (the suffixed/unsuffixed aspect) and having *one* constructor for unsuffixed literals required us to pick a largest type which may not always be true. To fix these issues all constructors are now of the form `Literal::i32_unsuffixed` or `Literal::i32_suffixed` (for all integral types). This should allow future compatibility as well as being immediately clear what's suffixed and what isn't. * Each variant of `TokenTree` internally contains a `Span` which can also be configured via `set_span`. For example `Literal` and `Term` now both internally contain a `Span` rather than having it stored in an auxiliary location. * Constructors of all tokens are called `new` now (aka `Term::intern` is gone) and most do not take spans. Manufactured tokens typically don't have a fresh span to go with them and the span is purely used for error-reporting **except** the span for `Term`, which currently affects hygiene. The default spans for all these constructed tokens is `Span::call_site()` for now. The `Term` type's constructor explicitly requires passing in a `Span` to provide future-proofing against possible hygiene changes. It's intended that a first pass of stabilization will likely only stabilize `Span::call_site()` which is an explicit opt-in for "I would like no hygiene here please". The intention here is to make this explicit in procedural macros to be forwards-compatible with a hygiene-specifying solution. * Some of the conversions for `TokenStream` have been simplified a little. * The `TokenTreeIter` iterator was renamed to `token_stream::IntoIter`. Overall the hope is that this is the "final pass" at the API of `TokenStream` and most of `TokenTree` before stabilization. Explicitly left out here is any changes to `Span`'s API which will likely need to be re-evaluated before stabilization. All changes in this PR have already been reflected to the [`proc-macro2`], `quote`, and `syn` crates. New versions of all these crates have also been published to crates.io. Once this lands in nightly I plan on making an internals post again summarizing the changes made here and also calling on all macro authors to give the APIs a spin and see how they work. Hopefully pending no major issues we can then have an FCP to stabilize later this cycle! [`proc-macro2`]: https://docs.rs/proc-macro2/0.3.1/proc_macro2/ Closes rust-lang#49596

dtolnay added the A-macros-2.0 Area: Declarative macros 2.0 (#39412) label Apr 2, 2018

dtolnay assigned alexcrichton Apr 2, 2018

dtolnay mentioned this issue Apr 2, 2018

Disallow default_value and required for Option TeXitoi/structopt#88

Merged

alexcrichton mentioned this issue Apr 2, 2018

proc_macro: Reorganize public API #49597

Merged

dtolnay mentioned this issue Apr 3, 2018

Inconsistent tokens between the same struct defined inside vs outside of function #49604

Closed

alexcrichton mentioned this issue Apr 3, 2018

stdsimd-verify failures rust-lang/stdarch#411

Closed

alexcrichton closed this as completed in #49597 Apr 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect and inconsistent jointness of tokens in desugared doc comment #49596

Incorrect and inconsistent jointness of tokens in desugared doc comment #49596

dtolnay commented Apr 2, 2018 •

edited

Loading

alexcrichton commented Apr 2, 2018

TeXitoi commented Apr 2, 2018

TeXitoi commented Apr 2, 2018

dtolnay commented Apr 2, 2018

alexcrichton commented Apr 2, 2018

dtolnay commented Apr 2, 2018

dtolnay commented Apr 2, 2018

alexcrichton commented Apr 2, 2018

dtolnay commented Apr 2, 2018

Incorrect and inconsistent jointness of tokens in desugared doc comment #49596

Incorrect and inconsistent jointness of tokens in desugared doc comment #49596

Comments

dtolnay commented Apr 2, 2018 • edited Loading

alexcrichton commented Apr 2, 2018

TeXitoi commented Apr 2, 2018

TeXitoi commented Apr 2, 2018

dtolnay commented Apr 2, 2018

alexcrichton commented Apr 2, 2018

dtolnay commented Apr 2, 2018

Repro script

Output

dtolnay commented Apr 2, 2018

alexcrichton commented Apr 2, 2018

dtolnay commented Apr 2, 2018

dtolnay commented Apr 2, 2018 •

edited

Loading