Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store tokens alongside more AST expressions #70091

Closed
wants to merge 8 commits into from

Conversation

Aaron1011
Copy link
Member

See #43081 (comment)
This PR calls collect_tokens during the parsing of more AST nodes, and stores the captured tokens in the parsed AST structs. These tokens are then used in nt_to_tokenstream to avoid needing to stringify AST nodes.

Since this implementation completely ignores attributes, it will probably explode when given any kind of complicated input. This PR is intended mainly to estimate the performance impact of collecting and storing more tokens - a correct implementation will most likely have to do more work than this.

I've only implemented token collecting for a few types of expressions - the current expression parsing implementation makes it difficult to get the proper tokens for every expression.

Nevertheless, this is able to bootstrap libstd, and generate better error messages for a simple proc-macro example: (https://github.com/Aaron1011/for-await-test)

#![feature(stmt_expr_attributes, proc_macro_hygiene)]
use futures::stream::Stream;
use futures_async_stream::for_await;

async fn collect(stream: impl Stream<Item = i32>) -> Vec<i32> {
    let mut vec = Vec::new();
    #[for_await]
    for value in stream.foo() {
        vec.push(value);
    }
    vec
}

fn main() {
    println!("Hello, world!");
}

on the latest nightly:

error[E0599]: no method named `foo` found for type parameter `impl Stream<Item = i32>` in the current scope
 --> src/main.rs:7:5
  |
7 |     #[for_await]
  |     ^^^^^^^^^^^^ method not found in `impl Stream<Item = i32>`

error: aborting due to previous error

with this PR:

error[E0599]: no method named `foo` found for type parameter `impl Stream<Item = i32>` in the current scope
 --> src/main.rs:8:25
  |
8 |     for value in stream.foo() {
  |                         ^^^ method not found in `impl Stream<Item = i32>`

@rust-highfive

This comment has been minimized.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 18, 2020
@Aaron1011
Copy link
Member Author

r? @petrochenkov

@rust-highfive
Copy link
Collaborator

The job mingw-check of your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
2020-03-18T01:53:34.0708748Z ========================== Starting Command Output ===========================
2020-03-18T01:53:34.0711629Z [command]/bin/bash --noprofile --norc /home/vsts/work/_temp/8234da10-2c46-43e5-9224-84df59cb321e.sh
2020-03-18T01:53:34.0711951Z 
2020-03-18T01:53:34.0716774Z ##[section]Finishing: Disable git automatic line ending conversion
2020-03-18T01:53:34.0737531Z ##[section]Starting: Checkout rust-lang/rust@refs/pull/70091/merge to s
2020-03-18T01:53:34.0741868Z Task         : Get sources
2020-03-18T01:53:34.0742201Z Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.
2020-03-18T01:53:34.0742515Z Version      : 1.0.0
2020-03-18T01:53:34.0743792Z Author       : Microsoft
---
2020-03-18T01:53:35.2090725Z ##[command]git remote add origin https://github.com/rust-lang/rust
2020-03-18T01:53:35.2099111Z ##[command]git config gc.auto 0
2020-03-18T01:53:35.2114085Z ##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
2020-03-18T01:53:35.2117649Z ##[command]git config --get-all http.proxy
2020-03-18T01:53:35.2133976Z ##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/70091/merge:refs/remotes/pull/70091/merge
---
2020-03-18T02:00:27.2129929Z     Checking rustc_parse v0.0.0 (/checkout/src/librustc_parse)
2020-03-18T02:00:28.7219048Z error: unused variable: `tokens`
2020-03-18T02:00:28.7220458Z     --> src/librustc_parse/parser/expr.rs:2134:9
2020-03-18T02:00:28.7221567Z      |
2020-03-18T02:00:28.7222538Z 2134 |         tokens: Option<TokenStream>,
2020-03-18T02:00:28.7224281Z      |         ^^^^^^ help: consider prefixing with an underscore: `_tokens`
2020-03-18T02:00:28.7226264Z      = note: `-D unused-variables` implied by `-D warnings`
2020-03-18T02:00:28.7226793Z 
2020-03-18T02:00:29.5294120Z error: aborting due to previous error
2020-03-18T02:00:29.5294879Z 
2020-03-18T02:00:29.5294879Z 
2020-03-18T02:00:29.5374184Z error: could not compile `rustc_parse`.
2020-03-18T02:00:29.5378485Z 
2020-03-18T02:00:29.5379181Z To learn more, run the command again with --verbose.
2020-03-18T02:00:29.5396461Z warning: build failed, waiting for other jobs to finish...
2020-03-18T02:00:30.3367502Z error: build failed
2020-03-18T02:00:30.3370440Z command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "check" "--target" "x86_64-unknown-linux-gnu" "-Zbinary-dep-depinfo" "-j" "2" "--release" "--color" "always" "--features" " llvm" "--manifest-path" "/checkout/src/rustc/Cargo.toml" "--message-format" "json-render-diagnostics"
2020-03-18T02:00:30.3372442Z failed to run: /checkout/obj/build/bootstrap/debug/bootstrap check
2020-03-18T02:00:30.3372972Z Build completed unsuccessfully in 0:04:14
2020-03-18T02:00:30.3373389Z == clock drift check ==
2020-03-18T02:00:30.3373779Z   local time: Wed Mar 18 02:00:29 UTC 2020
2020-03-18T02:00:30.3373779Z   local time: Wed Mar 18 02:00:29 UTC 2020
2020-03-18T02:00:30.3374253Z   network time: Wed, 18 Mar 2020 02:00:30 GMT
2020-03-18T02:00:30.3374686Z == end clock drift check ==
2020-03-18T02:00:30.9943242Z 
2020-03-18T02:00:31.0023438Z ##[error]Bash exited with code '1'.
2020-03-18T02:00:31.0049128Z ##[section]Finishing: Run build
2020-03-18T02:00:31.0107013Z ##[section]Starting: Checkout rust-lang/rust@refs/pull/70091/merge to s
2020-03-18T02:00:31.0112430Z Task         : Get sources
2020-03-18T02:00:31.0112855Z Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.
2020-03-18T02:00:31.0113231Z Version      : 1.0.0
2020-03-18T02:00:31.0113485Z Author       : Microsoft
2020-03-18T02:00:31.0113485Z Author       : Microsoft
2020-03-18T02:00:31.0113909Z Help         : [More Information](https://go.microsoft.com/fwlink/?LinkId=798199)
2020-03-18T02:00:31.0114401Z ==============================================================================
2020-03-18T02:00:31.3848888Z Cleaning any cached credential from repository: rust-lang/rust (GitHub)
2020-03-18T02:00:31.3914358Z ##[section]Finishing: Checkout rust-lang/rust@refs/pull/70091/merge to s
2020-03-18T02:00:31.4042609Z Cleaning up task key
2020-03-18T02:00:31.4044468Z Start cleaning up orphan processes.
2020-03-18T02:00:31.4302879Z Terminate orphan process: pid (3625) (python)
2020-03-18T02:00:31.4533403Z ##[section]Finishing: Finalize Job

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @rust-lang/infra. (Feature Requests)

@Centril Centril self-assigned this Mar 18, 2020
@petrochenkov
Copy link
Contributor

@bors try

@bors
Copy link
Contributor

bors commented Mar 19, 2020

⌛ Trying commit e56b697 with merge e58cfd9...

bors added a commit that referenced this pull request Mar 19, 2020
[WIP] Store tokens alongside more AST expressions

See #43081 (comment)
This PR calls `collect_tokens` during the parsing of more AST nodes, and stores the captured tokens in the parsed AST structs. These tokens are then used in `nt_to_tokenstream` to avoid needing to stringify AST nodes.

Since this implementation completely ignores attributes, it will probably explode when given any kind of complicated input. This PR is intended mainly to estimate the performance impact of collecting and storing more tokens - a correct implementation will most likely have to do more work than this.

I've only implemented token collecting for a few types of expressions - the current expression parsing implementation makes it difficult to get the proper tokens for every expression.

Nevertheless, this is able to bootstrap libstd, and generate better error messages for a simple proc-macro example: (https://github.com/Aaron1011/for-await-test)

```rust
#![feature(stmt_expr_attributes, proc_macro_hygiene)]
use futures::stream::Stream;
use futures_async_stream::for_await;

async fn collect(stream: impl Stream<Item = i32>) -> Vec<i32> {
    let mut vec = Vec::new();
    #[for_await]
    for value in stream.foo() {
        vec.push(value);
    }
    vec
}

fn main() {
    println!("Hello, world!");
}
```

on the latest nightly:

```
error[E0599]: no method named `foo` found for type parameter `impl Stream<Item = i32>` in the current scope
 --> src/main.rs:7:5
  |
7 |     #[for_await]
  |     ^^^^^^^^^^^^ method not found in `impl Stream<Item = i32>`

error: aborting due to previous error
```

with this PR:

```
error[E0599]: no method named `foo` found for type parameter `impl Stream<Item = i32>` in the current scope
 --> src/main.rs:8:25
  |
8 |     for value in stream.foo() {
  |                         ^^^ method not found in `impl Stream<Item = i32>`
```
@petrochenkov petrochenkov added S-waiting-on-perf Status: Waiting on a perf run to be completed. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 19, 2020
@bors
Copy link
Contributor

bors commented Mar 19, 2020

☀️ Try build successful - checks-azure
Build commit: e58cfd9 (e58cfd95ea92ee5f46c4ff923ebee0b7ef08f793)

@petrochenkov
Copy link
Contributor

@rust-timer build e58cfd9

@rust-timer
Copy link
Collaborator

Queued e58cfd9 with parent 6724d58, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit e58cfd9, comparison URL.

@Aaron1011
Copy link
Member Author

Those results are pretty disappointing. However, the worst regressions seem to be on 'weird' crates - e.g. deep-vector is just an invocation of vec! with 8k lines of input. I'll see if there's anything that we can do speed to collect_tokens for these kinds of cases.

@Aaron1011
Copy link
Member Author

Aaron1011 commented Mar 19, 2020

@petrochenkov: What do you think about modifying TokenStream to store a Range alongside the Lrc<Vec<TreeAndJoint>>? Instead of building up a new Vec in collect_tokens, we could just clone the existing Lrc<Vec<TreeAndJoint>>, and store a Range indicating the slice of tokens that we actually captured. The tokens captured by collect_tokens should always have matching delimiters, so we shouldn't need to worry about capturing only 'part of' a TokenTree:Delimited.

@petrochenkov petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Mar 19, 2020
@petrochenkov
Copy link
Contributor

We can reduce (2) to almost zero by collecting tokens only for expressions that actually have attributes (which is a very small percent of expressions)

So, I looked at the uses of nt_to_tokenstream and besides passing tokens to attribute macros it's also used for passing nonterminals generated by macro_rules to proc macros.

So, we need to collect tokens in two cases:

  • parse_expr is called from librustc_expand\mbe\macro_parser.rs for parsing a nonterminal.
  • parsing the expression after expression attributes

In other cases the collection shouldn't be required.
Limiting token collection to these two cases may also simplify the parsing part.

@petrochenkov
Copy link
Contributor

@Aaron1011
#70091 (comment) may be a good solution if we end up collecting all the tokens.
But I'd like to first try to not collect them at all unless necessary.

@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 21, 2020
@Aaron1011
Copy link
Member Author

parsing the expression after expression attributes

@petrochenkov: How does this interact with custom inner attributes? We need to already be collecting tokens by the time we parse them.

@petrochenkov petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 22, 2020
@eddyb
Copy link
Member

eddyb commented Mar 23, 2020

@Aaron1011 Btw, you slicing idea is one of the original reasons for TokenStream using reference-counting and whatnot, we just never got around to implementing it.
Not to mention proc macros have no easy way to access anything like that.

@petrochenkov
Copy link
Contributor

TokenStream using reference-counting and whatnot, we just never got around to implementing it.

Some history of TokenStream representation - #57004 (comment).
Are we going on the second cycle? :D

@petrochenkov
Copy link
Contributor

@Aaron1011

How does this interact with custom inner attributes? We need to already be collecting tokens by the time we parse them.

Good catch.
To cover inner attributes we need to collect tokens for blocks, array/tuple literals, and other constructions that can act as "containers" for inner attributes.

However, the main question is - what we are aiming for here, a practical improvement, or a proper holistic solution?
I'd say it's the former.

  • If we don't fix inner attributes it's still an improvement, inner macro attributes are rare and feature-gated.
  • Even if we collect tokens for expressions with inner attributes, they will still fail the probably_equal_for_proc_macro check (because it will compare token streams before and after the inner attribute macro is removed). Items with inner macro attributes currently go through pretty-printing despite the tokens being collected for items.
  • Even if we fix the above problem somehow, fragments with cfgs will still fail the probably_equal_for_proc_macro check.
  • Some other cases like unnecessary path disambiguators (type A = Vec::<u8>;) will fail the check as well because we do not keep them in AST precisely.

A proper solution is a big design task, which I'm certainly not ready to drive now.
Perhaps something like @matklad's redesign of AST and parser will address these issues long term somehow.

In the meantime, we can improve the situation for common cases in ways that don't regress compile times. This largely means ignoring inner macro attributes.

@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 23, 2020
@petrochenkov
Copy link
Contributor

(To clarify, I'm not against making improvements to treatment of inner attributes in a separate PR with a separate perf run, as one more incremental improvement.)

@Dylan-DPC-zz Dylan-DPC-zz marked this pull request as draft April 10, 2020 13:54
@Dylan-DPC-zz Dylan-DPC-zz changed the title [WIP] Store tokens alongside more AST expressions Store tokens alongside more AST expressions Apr 10, 2020
@Dylan-DPC-zz
Copy link

r? @petrochenkov

@Dylan-DPC-zz
Copy link

r? @oli-obk

@oli-obk
Copy link
Contributor

oli-obk commented May 14, 2020

This is waiting for author

r? @petrochenkov

@petrochenkov
Copy link
Contributor

Could close due to inactivity as well, it's been ~1.5 months.

bors added a commit to rust-lang-ci/rust that referenced this pull request May 24, 2020
…r=petrochenkov

Store tokens inside `ast::Expr`

This is a smaller version of rust-lang#70091.

We now store captured tokens inside `ast::Expr`, which allows us to avoid some reparsing in `nt_to_tokenstream`. To try to mitigate the performance impact, we only collect tokens when we've seen an outer attribute.

This makes progress towards solving rust-lang#43081. There are still many things left to do:

* Collect tokens for other AST items.
* Come up with a way to handle inner attributes (we need to be collecting tokens by the time we encounter them)
* Avoid re-parsing when a `#[cfg]` attr is used.

However, this is enough to fix spans for a simple example, which I've included as a test case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants