Remove `ast::Lit::kind` #101885

nnethercote · 2022-09-16T07:05:50Z

This is an alternative to the "Remove token::Lit from ast::Lit commit in #101562.

r? @petrochenkov

rustbot · 2022-09-16T07:05:53Z

Some changes occurred in src/librustdoc/clean/types.rs

cc @camelid

Some changes occurred in src/tools/rustfmt

cc @rust-lang/rustfmt

Some changes occurred in src/tools/clippy

cc @rust-lang/clippy

nnethercote · 2022-09-16T07:11:02Z

The best thing about this: ast::Lit shrinks from 48 bytes to 20 bytes. Though that's more than we need, going below 32 bytes doesn't currently provide any extra benefit. And couple of places in the code get a little cleaner.

The downside is that every literal has to be processed at least twice: once during parsing to detect errors (where we discard the converted semantic literal) and again during lowering to HIR. On some benchmarks this is noticeable:

Benchmark	Profile	Scenario	% Change	Significance Factor?
deunicode-1.3.1	check	full	17.39%	86.93x
coercions	check	full	3.09%	15.43x
unicode_categories-0.1.1	check	full	1.24%	6.18x
pest-2.1.3	check	full	0.63%	3.17x
deep-vector	check	full	0.51%	2.53x
unicode-normalization-0.1.19	check	full	0.31%	1.53x
ctfe-stress-5	check	full	-0.31%	1.55x

deunicode-1.3.1 is interesting. It uses include_bytes! to create a 400 KB byte vector. With the old code there was no unescaping, and with the new code it gets unescaped once, which accounts for the big slowdown.

The code is a bit rough, with some njn comments on things that need cleaning up. Also, I haven't converted clippy code yet. And I haven't removed the symbol_unescaped field from ast::StrLit, though that will be easy.

I'm not sure if this is a good idea, but I thought I'd do it to help the discussion in #101562.

petrochenkov · 2022-09-16T12:41:51Z

The downside is that every literal has to be processed at least twice: once during parsing to detect errors (where we discard the converted semantic literal) and again during lowering to HIR.

The goal of this change is exactly to stop semantically processing the literals during parsing (parsing Rust proper syntax, not syntaxes introduced by macros), and doing it only if they reach HIR, similarly to what float literals do now.

macro_rules! m { ($tt:tt) => () }

m!("str"suffix); // OK right now

#[cfg(FALSE)]
fn f() {
    "str"suffix; // ERROR, but probably shouldn't be
                 // will be OK if semantic checking of literals is delayed until HIR
}

fn main() {
    "str"suffix; // ERROR, expected semantic error
}

Semantic literals are sometimes needed before lowering, typically in attributes.
I guess the simplest solution is to just continue eagerly converting them to semantic form when parsing attributes specifically.

If this is done (together with moving spans "upwards" from #101562), then most of AST will basically keep token::Lits directly and ast::Lit will be eliminated.

deunicode-1.3.1 is interesting. It uses include_bytes! to create a 400 KB byte vector. With the old code there was no unescaping, and with the new code it gets unescaped once, which accounts for the big slowdown.

include_bytes is kind of a separate issue (#65818), we need to figure out how to process it without extra conversions anyway.

nnethercote · 2022-09-16T23:26:46Z

The goal of this change is exactly to stop semantically processing the literals during parsing (parsing Rust proper syntax, not syntaxes introduced by macros), and doing it only if they reach HIR, similarly to what float literals do now.

I tried removing the check-and-discard code from Lit::from_token:

// We need to do the conversion to detect errors, even though we
// discard the result.
_ = LitKind::from_token_lit(token_lit)?;

and I got tons of ui test failures. But that's likely because I have lots of LitKind::from_token_lit(x).unwrap() calls in the places where the semantic literals are needed before lowering. I guess I could make those places fallible and then hopefully things will work out and the double processing will be avoided.

Semantic literals are sometimes needed before lowering, typically in attributes. I guess the simplest solution is to just continue eagerly converting them to semantic form when parsing attributes specifically.

Yes, it's a tiny fraction of the cases, so shouldn't matter for performance.

If this is done (together with moving spans "upwards" from #101562), then most of AST will basically keep token::Lits directly and ast::Lit will be eliminated.

I noticed that too. 👍 ast::LitKind will still exist.

It sounds like you are more in favour of this PR's approach than the first commit in #101562?

petrochenkov · 2022-09-17T09:12:47Z

It sounds like you are more in favour of this PR's approach than the first commit in #101562?

I need to investigate this a bit, I just don't have time right now.
For now I suggest just landing #101562 without ceab0b4.

nnethercote · 2022-09-19T08:52:53Z

@bors try @rust-timer queue

rust-timer · 2022-09-19T08:52:55Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-09-19T08:53:04Z

⌛ Trying commit c5a1b53e13f7ded9f8e9a269dc3c8a865e977e38 with merge d39d569d36b74d4b16fe4a1451452b23e16ae7ff...

nnethercote · 2022-09-19T08:55:34Z

@petrochenkov: I have finished this. I haven't done StrLit, that can be done in a later PR. I also haven't moved the Span out of ast::Lit.

The places that need the semantic literal prior to lowering are a little more complicated in their error handling, and a few error messages have changed, but they are all quite obscure ones. Local measurements show that it is now performance-neutral, except for deunicode-1.3.1.

bors · 2022-09-19T10:16:55Z

☀️ Try build successful - checks-actions
Build commit: d39d569d36b74d4b16fe4a1451452b23e16ae7ff (d39d569d36b74d4b16fe4a1451452b23e16ae7ff)

rust-timer · 2022-09-19T10:16:57Z

Queued d39d569d36b74d4b16fe4a1451452b23e16ae7ff with parent efa717b, future comparison URL.

rust-timer · 2022-09-19T11:35:23Z

Finished benchmarking commit (d39d569d36b74d4b16fe4a1451452b23e16ae7ff): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	0.5%	[0.3%, 0.7%]	6
Regressions ❌ (secondary)	1.2%	[1.2%, 1.2%]	2
Improvements ✅ (primary)	-0.3%	[-0.5%, -0.2%]	5
Improvements ✅ (secondary)	-0.4%	[-0.8%, -0.2%]	12
All ❌✅ (primary)	0.1%	[-0.5%, 0.7%]	11

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	4.2%	[3.0%, 6.3%]	4
Improvements ✅ (primary)	-1.1%	[-2.1%, -0.6%]	11
Improvements ✅ (secondary)	-2.9%	[-5.1%, -1.4%]	5
All ❌✅ (primary)	-1.1%	[-2.1%, -0.6%]	11

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.3%	[2.3%, 2.3%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-3.5%	[-4.5%, -2.5%]	2
All ❌✅ (primary)	-	-	0

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

nnethercote · 2022-09-19T21:40:43Z

Some small performance ups and downs, but overall it's roughly neutral, as expected.

@rustbot label: +perf-regression-triaged

Because they are sometimes hot and each have a single call site.

This will become interesting in the next commit.

This shrinks `ast::Lit` kind considerably. The conversion to semantic literals now happens at AST lowering time, with a few exceptions where the literal values are needed before that.

bors · 2022-09-21T23:03:19Z

☔ The latest upstream changes (presumably #101558) made this pull request unmergeable. Please resolve the merge conflicts.

petrochenkov · 2022-10-07T09:57:03Z

MetaItem and friends are used in both HIR and AST (I'd say primarily in HIR rather than in AST).
So keeping a lowered literal there is quite reasonable.

Maybe we should keep ast::Lit as is and apply the ast::Lit -> token::Lit change only to expressions (ExprKind::Lit).
This will also be a minimal change that can go to lang team for approval (not checking literals in configured out code is a language change).

(If there are ways to make literal parsing in attributes more lazy too, then we can try them later, probably without further lang team approval.)
@rustbot author

nnethercote · 2022-10-10T04:12:02Z

We seem to be in agreement that ast::Lit should be trimmed down.

In Shrink ast::Expr harder #101562 I removed the token_lit field, which is an "early lowering" approach, similar to what we currently do.
In this PR I removed the kind field, which is the "late lowering" approach.

You don't seem happy with either approach, and now you're suggesting a hybrid approach where some literals (in attributes) get early lowering and some (in expressions) get late lowering. I think this would be a clumsy mix that is worse than the other two approaches.

In fact, the "late lowering" approach above is also a hybrid approach, because some early lowerings are required to process attributes. It's just that the early-lowered literals aren't stored.

Therefore, I think the token_lit removal from #101562 is best. I don't see much benefit of late lowering. And token_lit removal doesn't change any existing behaviour.

petrochenkov · 2022-10-10T17:35:18Z

And token_lit removal doesn't change any existing behaviour.

I do want to change the language behavior though, that's what the removed FIXME is about, AST shrinkage by itself is a secondary goal.

You don't seem happy with either approach, and now you're suggesting a hybrid approach where some literals (in attributes) get early lowering and some (in expressions) get late lowering. I think this would be a clumsy mix that is worse than the other two approaches.
In fact, the "late lowering" approach above is also a hybrid approach, because some early lowerings are required to process attributes. It's just that the early-lowered literals aren't stored.

Literals in attributes (Attribute/AttrItem) are not lowered, they are lowered in MetaItems which are not even stored in AST and by themselves are a lowered semantic representation of built-in attributes.
Two kinds of lowering in question (AST -> HIR and Attribute -> MetaItem) just don't match perfectly with each other (MetaItems are occasionally needed before HIR, and in other cases not needed at all?), but otherwise it looks quite logical to me - semantic representation needs semantic literals, syntactic representation can use token literals.

nnethercote · 2022-10-11T06:20:29Z

What's the benefit of the proposed language change? Can you give an example?

petrochenkov · 2022-10-11T06:42:08Z

Better consistency between arguments of fn-like and attribute macros (Remove ast::Lit::kind #101885 (comment)), better consistency with float literals which are checked late.
Avoiding doing unnecessary work, literals are not converted to semantic form if that form is not used, AST size shrinking also fits into this.

nnethercote · 2022-11-20T22:57:44Z

We ended up taking a different approach in #102944.

rust-highfive assigned petrochenkov Sep 16, 2022

rustbot added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Sep 16, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 16, 2022

This comment has been minimized.

Sign in to view

nnethercote mentioned this pull request Sep 16, 2022

Shrink ast::Expr harder #101562

Merged

nnethercote force-pushed the rm-ast-Lit-kind branch from dd76cd6 to c5a1b53 Compare September 19, 2022 08:52

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 19, 2022

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Sep 19, 2022

This comment has been minimized.

Sign in to view

rustbot added the perf-regression-triaged The performance regression has been triaged. label Sep 19, 2022

nnethercote added 3 commits September 21, 2022 08:06

Inline integer_lit and float_lit.

0bed23b

Because they are sometimes hot and each have a single call site.

Add a test.

ca00cfb

This will become interesting in the next commit.

Remove ast::Lit::kind.

434b81e

This shrinks `ast::Lit` kind considerably. The conversion to semantic literals now happens at AST lowering time, with a few exceptions where the literal values are needed before that.

nnethercote force-pushed the rm-ast-Lit-kind branch from c5a1b53 to 434b81e Compare September 20, 2022 22:10

nnethercote mentioned this pull request Oct 5, 2022

Overhaul ThinVec usage #100666

Closed

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 7, 2022

petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 10, 2022

petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 10, 2022

nnethercote mentioned this pull request Oct 12, 2022

Use token::Lit in ast::ExprKind::Lit. #102944

Merged

nnethercote closed this Nov 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove `ast::Lit::kind` #101885

Remove `ast::Lit::kind` #101885

nnethercote commented Sep 16, 2022

rustbot commented Sep 16, 2022

nnethercote commented Sep 16, 2022 •

edited

This comment has been minimized.

petrochenkov commented Sep 16, 2022

nnethercote commented Sep 16, 2022

petrochenkov commented Sep 17, 2022

nnethercote commented Sep 19, 2022

rust-timer commented Sep 19, 2022

bors commented Sep 19, 2022

nnethercote commented Sep 19, 2022

bors commented Sep 19, 2022

rust-timer commented Sep 19, 2022

rust-timer commented Sep 19, 2022

This comment has been minimized.

nnethercote commented Sep 19, 2022

bors commented Sep 21, 2022

petrochenkov commented Oct 7, 2022

nnethercote commented Oct 10, 2022

petrochenkov commented Oct 10, 2022

nnethercote commented Oct 11, 2022

petrochenkov commented Oct 11, 2022

nnethercote commented Nov 20, 2022

Remove ast::Lit::kind #101885

Remove ast::Lit::kind #101885

Conversation

nnethercote commented Sep 16, 2022

rustbot commented Sep 16, 2022

nnethercote commented Sep 16, 2022 • edited

This comment has been minimized.

petrochenkov commented Sep 16, 2022

nnethercote commented Sep 16, 2022

petrochenkov commented Sep 17, 2022

nnethercote commented Sep 19, 2022

rust-timer commented Sep 19, 2022

bors commented Sep 19, 2022

nnethercote commented Sep 19, 2022

bors commented Sep 19, 2022

rust-timer commented Sep 19, 2022

rust-timer commented Sep 19, 2022

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Instruction count

Max RSS (memory usage)

Cycles

Footnotes

This comment has been minimized.

nnethercote commented Sep 19, 2022

bors commented Sep 21, 2022

petrochenkov commented Oct 7, 2022

nnethercote commented Oct 10, 2022

petrochenkov commented Oct 10, 2022

nnethercote commented Oct 11, 2022

petrochenkov commented Oct 11, 2022

nnethercote commented Nov 20, 2022

Remove `ast::Lit::kind` #101885

Remove `ast::Lit::kind` #101885

nnethercote commented Sep 16, 2022 •

edited