Migrate parser to the new span combining scheme #126763

petrochenkov · 2024-06-20T20:42:12Z

Summary:

I a, b, ..., z are multiple consecutive span nodes that need to be combined into a single new span node, then its combined span should be

span(a).to(span(b)).to(span(c))....to(span(z))`

"Span node" is either an individual token (every token has a span), or some larger AST node having a Span field.

Examples:

-10 // an expression
a + 11 // one more expression
a::<u8>::b // a path

The combined span for these pieces of code will be

// expressions are span nodes because they have their own spans in AST, `-` doesn't have a larger AST node so it's treated as a primitive token
span(expr(-10)) = span(token(-)).to(span(expr(10)));
span(expr(a + 11)) = span(expr(a)).to(span(token(+))).to(span(expr(11)));
// the whole generic arg is a span node because it has its own span in AST
span(path(a::<u8>::b)) = span(ident(a)).to(span(token(::))).to(span(generic_arg(<u8>))).to(span(token(::))).to(span(ident(b)));

Status quo:

Currently the resulting span is typically built like span_first_token.to(span_last_token).
So anything in the middle and the internal structure (i.e. nodes, as opposed to tokens) are ignored.

Why we need to change it:

The to operation will automatically take macro variables into account, and will try to put the resulting span into the best suitable macro context (this was implemented in #119673).
E.g. in $tt + 5 the combined expression span will be put into the context of the macro using $tt as a macro parameter.
Note, that the same thing often happens in the current parser as well, but not consistently, e.g. $a::$b will produce an incorrect resulting path span because the :: in the middle is not considered.
This will also give us some single relatively well predictable rule for combining AST spans.

Implementation:

This work should be parallelizable relatively well (but may require a one time initial setup).
I'll review PRs doing this, they can be assigned to me.
It may be convenient to have a rolling value in the Parser structure for span of the current (or previous?) span node.
The parser has a lot of bespoke diagnostic logic (including snapshotting) that stands in the way of any systematic improvements like this.

How this can be tested:

Make a macro that emits complex nodes using tokens from different contexts, e.g.

macro m($l:tt $op:tt $r:tt) {
    2 + 3;
    $l + 3;
    2 $op 3;
    2 + $r;
    $l $op 3;
    $l + $r;
    2 $op $r;
    $l $op $r;
}

m!(2 + 3);

and emit some diagnostic using those nodes' spans (maybe can add a special internal diagnostic for this testing).

The text was updated successfully, but these errors were encountered:

petrochenkov · 2024-06-20T20:46:48Z

It may also be possible to introduce a smarter n-ary span combining operation to(a1, ..., aN) instead of the current binary to, but it may be more expensive and will make spans better only in very rare cases.
So combining using a chain of binary to operations should be fine for now.

petrochenkov · 2024-06-24T11:35:42Z

either an individual token

The "token" here is a "token tree" really, despite Rust parser working mostly with flattened token sequences at the moment.
So span(#[word]) should ideally be span(#) to span([word]) and not span(#) to span([) to span(word) to(span(]).

Also span combining for delimited groups can ignore all their internal tokens and only consider their delimiters.
Opening and closing delimiters always have the same context, and cannot be passed to macros separately in any way.
So span([word]) should be span([) to span(]) and not span([) to span(word) to span(]).

danik292 · 2024-08-16T09:06:46Z

@rustbot assing

danik292 · 2024-08-16T09:07:24Z

@rustbot claim

danik292 · 2024-08-16T09:09:24Z

@petrochenkov do you mentor this?

petrochenkov · 2024-08-16T11:53:08Z

@danik292 Yes

danik292 · 2024-08-16T14:58:21Z

@petrochenkov ok where Is spans implemented?

petrochenkov · 2024-08-17T19:42:57Z

I suggest implementing this for some specific AST nodes that are not too complex, e.g. for paths maybe (fn parse_path in rustc_parse), or for something in types (compiler\rustc_parse\src\parser\ty.rs).
The current (and/or previous) node span may need to be kept in struct Parser.

I'd also suggest starting with some testing infra, like a new attribute for showing spans in an AST node.

#[rustc_show_spans(expr)]
fn foo() {
  let x = a + b;
}

should show a diagnostic (probably a note) similar to this

#[rustc_show_spans(expr)]
fn foo() {
  let x = a + b;
          ^
              ^
          ^^^^^
}

See e.g. rustc_effective_visibility (in ./compiler) for an example of adding a new internal attribute.
The logic for showing spans should likely live in the rustc_ast_passes crate.

noahmbright · 2024-11-04T21:45:25Z

Is the main issue that something like $a::$b would have an incorrect span? I haven't fully understood the problem yet, but if the span is like a start, a length, and a context, how does the problem come from ignoring the middle tokens?

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jun 20, 2024

jieyouxu added C-cleanup Category: PRs that clean code up or issues documenting cleanup. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jun 20, 2024

petrochenkov mentioned this issue Jun 24, 2024

How to determine hygienic context for "non-atomic" code fragments #50122

Closed

rustbot assigned danik292 Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate parser to the new span combining scheme #126763

Migrate parser to the new span combining scheme #126763

petrochenkov commented Jun 20, 2024 •

edited by rustbot

Loading

petrochenkov commented Jun 20, 2024

petrochenkov commented Jun 24, 2024

danik292 commented Aug 16, 2024

danik292 commented Aug 16, 2024

danik292 commented Aug 16, 2024

petrochenkov commented Aug 16, 2024

danik292 commented Aug 16, 2024

petrochenkov commented Aug 17, 2024

noahmbright commented Nov 4, 2024

Migrate parser to the new span combining scheme #126763

Migrate parser to the new span combining scheme #126763

Comments

petrochenkov commented Jun 20, 2024 • edited by rustbot Loading

Summary:

Examples:

Status quo:

Why we need to change it:

Implementation:

How this can be tested:

petrochenkov commented Jun 20, 2024

petrochenkov commented Jun 24, 2024

danik292 commented Aug 16, 2024

danik292 commented Aug 16, 2024

danik292 commented Aug 16, 2024

petrochenkov commented Aug 16, 2024

danik292 commented Aug 16, 2024

petrochenkov commented Aug 17, 2024

noahmbright commented Nov 4, 2024

petrochenkov commented Jun 20, 2024 •

edited by rustbot

Loading