Optimize `fold_ty` #107627

nnethercote · 2023-02-03T05:45:43Z

Micro-optimizing the heck out of the important fold_ty methods.

r? @oli-obk

nnethercote · 2023-02-03T05:46:09Z

Best reviewed one commit at a time.

@bors try @rust-timer queue

bors · 2023-02-03T05:46:18Z

⌛ Trying commit a1df2c5d1e9a41e6f4fdceac31a831ba01763108 with merge 14f439f45a195e76c41d576f2e6aeac48603aae3...

bors · 2023-02-03T08:26:36Z

☀️ Try build successful - checks-actions
Build commit: 14f439f45a195e76c41d576f2e6aeac48603aae3 (14f439f45a195e76c41d576f2e6aeac48603aae3)

bors · 2023-02-03T08:26:36Z

☀️ Try build successful - checks-actions
Build commit: 14f439f45a195e76c41d576f2e6aeac48603aae3 (14f439f45a195e76c41d576f2e6aeac48603aae3)

rust-timer · 2023-02-03T12:06:26Z

Finished benchmarking commit (14f439f45a195e76c41d576f2e6aeac48603aae3): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.2%	[1.2%, 1.2%]	4
Improvements ✅ (primary)	-0.7%	[-2.1%, -0.2%]	40
Improvements ✅ (secondary)	-0.8%	[-1.6%, -0.2%]	35
All ❌✅ (primary)	-0.7%	[-2.1%, -0.2%]	40

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.0%	[2.0%, 2.0%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.4%	[-2.8%, -1.2%]	4
All ❌✅ (primary)	-	-	0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.3%	[-1.4%, -1.1%]	5
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.3%	[-1.4%, -1.1%]	5

compiler-errors

r=me with nits or not, unless you want a review from oli specifically

compiler-errors · 2023-02-03T18:32:42Z

compiler/rustc_infer/src/infer/freshen.rs

-                ty::IntVar(v),
-                ty::FreshIntTy,
-            ),
+                #[cfg(debug_assertions)]


Personally think this should stay an "always" assertion

agreed. To avoid any performance issues, this could call a #[cold] function with the bug! inside instead of having the formatting inside the main function

I tried changing it back to an always assertion, and it had a noticeable perf impact, e.g. the instruction count for wg-grammar increased by 0.7%. I then tried the #[cold] function and it made a small improvement, but was still 0.5% worse.

So I will leave this as is, but I will add a comment about it.

compiler/rustc_infer/src/infer/resolve.rs

compiler/rustc_infer/src/infer/mod.rs

compiler-errors · 2023-02-03T18:38:44Z

compiler/rustc_infer/src/infer/freshen.rs

            }
-
-            ty::Generator(..)


Additional question: Does the wildcard have a perf difference over the exhaustive match? Otherwise, I kinda prefer the exhaustive match.

It probably does. Transforming exhaustive matches to a wildcard in the code generation may be a good idea if so.

It depends on the use case. If the match can be converted to a table lookup, the exhaustive match will have one less branch in LLVM, but have a bigger lookup table: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=c7b21a7f9d032aea5aa261953a85d735

For actual branching logic, it doesn't really matter. There may be a larger lookup table in LLVM IR, but that will become the same thing at the assembly level

With the debug assertion for Placeholder and Bound in place, doing an exhaustive match is awkward, so I've left this unchanged as well. If it helps, this leaves this method not dissimilar to ShallowResolver::fold_ty, which has the form if let ty::Infer(v) = ty.kind() { ... } else { ty }.

(It makes me sad that we discover the need to do micro-optimizations like re-encoding a big or-pattern arm as a wildcard; I, like @compiler-errors, find value in the exhaustive match from the view point of maintenance. Are we keeping track of efforts, if any, to put such a transformation into rustc itself?)

(I could even imagine an #[rustc_*] attribute that would tell the compiler to convert a given arm into a wild-card at the end of the match. That would provide a way to make @compiler-errors errors and also ease experiments like this one that @nnethercote is doing, right?)

The problem here is not "wildcard is faster than manually listing the alternatives". The problem is the assertion on Placeholder and Bound. A debug assertion is faster, which makes sense. And once you have the debug assertion for those variants, having a wildcard is a lot easier.

If that assertion wasn't necessary, then you can do an exhaustive match that is the same speed as a wildcard match. (I just tried it out; same speed.) Though I would argue that an exhaustive match probably isn't appropriate when ty::Infer gets treatment A and every other variant gets treatment B.

compiler/rustc_infer/src/infer/freshen.rs

So one doesn't have to be constructed every time.

`!t.has_non_region_infer()` is the test used in `OpportunisticVarResolver`, and catches a few cases that `!t.needs_infer()` misses.

nnethercote · 2023-02-05T22:30:20Z

I addressed most of the suggestions, mostly by adding comments. I couldn't address the ones about the match in freshen.rs because it hurt perf, as explained above.

Based on @compiler-errors' previos "r=me with nits or not", I will say:
@bors r=compiler-errors

Thanks for the reviews!

bors · 2023-02-05T22:30:21Z

📌 Commit 4aec134 has been approved by compiler-errors

It is now in the queue for this repository.

bors · 2023-02-05T23:13:44Z

⌛ Testing commit 4aec134 with merge 14ea63a...

bors · 2023-02-06T02:08:51Z

☀️ Test successful - checks-actions
Approved by: compiler-errors
Pushing 14ea63a to master...

rust-timer · 2023-02-06T03:28:04Z

Finished benchmarking commit (14ea63a): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.1%	[1.0%, 1.1%]	2
Regressions ❌ (secondary)	2.3%	[0.6%, 4.2%]	12
Improvements ✅ (primary)	-0.4%	[-0.5%, -0.3%]	18
Improvements ✅ (secondary)	-0.7%	[-1.6%, -0.2%]	30
All ❌✅ (primary)	-0.3%	[-0.5%, 1.1%]	20

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.4%	[3.2%, 3.6%]	2
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.4%	[-1.4%, -1.4%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	1.8%	[-1.4%, 3.6%]	3

Cycles

This benchmark run did not return any relevant results for this metric.

nnethercote · 2023-02-06T21:12:24Z

The post-merge perf run has regressions in keccak, cranelift-codegen, and tt-muncher that weren't in the pre-merge run. These regressions all appear to be random fluctuations that were all reversed again in #107627. keccak and cranelift-codegen appear to be one-off fluctuations; tt-muncher has entered a new noisy period.

@rustbot label: +perf-regression-triaged

rustbot assigned oli-obk Feb 3, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 3, 2023

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 3, 2023

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 3, 2023

compiler-errors approved these changes Feb 3, 2023

View reviewed changes

compiler-errors reviewed Feb 3, 2023

View reviewed changes

oli-obk reviewed Feb 4, 2023

View reviewed changes

compiler/rustc_infer/src/infer/freshen.rs Show resolved Hide resolved

nnethercote added 5 commits February 6, 2023 08:50

Put a ShallowResolver within OpportunisticVarResolver.

bac7628

So one doesn't have to be constructed every time.

Improve early bailout test in resolve_vars_if_possible.

f08a337

`!t.has_non_region_infer()` is the test used in `OpportunisticVarResolver`, and catches a few cases that `!t.needs_infer()` misses.

Inline OpportunisticVarResolver::fold_ty.

c2cf3f7

Split and inline ShallowResolver::fold_ty.

fb8e681

Split and inline TypeFreshener::fold_ty.

4aec134

nnethercote force-pushed the optimize-fold_ty branch from a1df2c5 to 4aec134 Compare February 5, 2023 22:27

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 5, 2023

bors added the merged-by-bors This PR was explicitly merged by bors. label Feb 6, 2023

bors merged commit 14ea63a into rust-lang:master Feb 6, 2023

rustbot added this to the 1.69.0 milestone Feb 6, 2023

bors mentioned this pull request Feb 6, 2023

Pattern types MVP #107606

Closed

rustbot added the perf-regression Performance regression. label Feb 6, 2023

nnethercote deleted the optimize-fold_ty branch February 6, 2023 21:09

nnethercote mentioned this pull request Feb 6, 2023

Remove OnHit callback from query caches. #107667

Merged

rustbot added the perf-regression-triaged The performance regression has been triaged. label Feb 6, 2023

pnkfelix mentioned this pull request Feb 7, 2023

Run expand-yaml-anchors in x test tidy #107704

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `fold_ty` #107627

Optimize `fold_ty` #107627

nnethercote commented Feb 3, 2023

nnethercote commented Feb 3, 2023

This comment has been minimized.

bors commented Feb 3, 2023

bors commented Feb 3, 2023

bors commented Feb 3, 2023

This comment has been minimized.

rust-timer commented Feb 3, 2023

compiler-errors left a comment

compiler-errors Feb 3, 2023

oli-obk Feb 4, 2023

nnethercote Feb 5, 2023

compiler-errors Feb 3, 2023

Zoxc Feb 4, 2023

oli-obk Feb 4, 2023

nnethercote Feb 5, 2023

pnkfelix Feb 7, 2023 •

edited

pnkfelix Feb 7, 2023 •

edited

nnethercote Feb 7, 2023

nnethercote commented Feb 5, 2023

bors commented Feb 5, 2023

bors commented Feb 5, 2023

bors commented Feb 6, 2023

rust-timer commented Feb 6, 2023

nnethercote commented Feb 6, 2023

Optimize fold_ty #107627

Optimize fold_ty #107627

Conversation

nnethercote commented Feb 3, 2023

nnethercote commented Feb 3, 2023

This comment has been minimized.

bors commented Feb 3, 2023

bors commented Feb 3, 2023

bors commented Feb 3, 2023

This comment has been minimized.

rust-timer commented Feb 3, 2023

Overall result: ✅ improvements - no action needed

compiler-errors left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pnkfelix Feb 7, 2023 • edited

Choose a reason for hiding this comment

pnkfelix Feb 7, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nnethercote commented Feb 5, 2023

bors commented Feb 5, 2023

bors commented Feb 5, 2023

bors commented Feb 6, 2023

rust-timer commented Feb 6, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

nnethercote commented Feb 6, 2023

Optimize `fold_ty` #107627

Optimize `fold_ty` #107627

pnkfelix Feb 7, 2023 •

edited

pnkfelix Feb 7, 2023 •

edited