Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize fold_ty #107627

Merged
merged 5 commits into from Feb 6, 2023
Merged

Optimize fold_ty #107627

merged 5 commits into from Feb 6, 2023

Conversation

nnethercote
Copy link
Contributor

Micro-optimizing the heck out of the important fold_ty methods.

r? @oli-obk

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 3, 2023
@nnethercote
Copy link
Contributor Author

Best reviewed one commit at a time.

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 3, 2023
@bors
Copy link
Contributor

bors commented Feb 3, 2023

⌛ Trying commit a1df2c5d1e9a41e6f4fdceac31a831ba01763108 with merge 14f439f45a195e76c41d576f2e6aeac48603aae3...

@bors
Copy link
Contributor

bors commented Feb 3, 2023

☀️ Try build successful - checks-actions
Build commit: 14f439f45a195e76c41d576f2e6aeac48603aae3 (14f439f45a195e76c41d576f2e6aeac48603aae3)

1 similar comment
@bors
Copy link
Contributor

bors commented Feb 3, 2023

☀️ Try build successful - checks-actions
Build commit: 14f439f45a195e76c41d576f2e6aeac48603aae3 (14f439f45a195e76c41d576f2e6aeac48603aae3)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (14f439f45a195e76c41d576f2e6aeac48603aae3): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.2% [1.2%, 1.2%] 4
Improvements ✅
(primary)
-0.7% [-2.1%, -0.2%] 40
Improvements ✅
(secondary)
-0.8% [-1.6%, -0.2%] 35
All ❌✅ (primary) -0.7% [-2.1%, -0.2%] 40

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.0% [2.0%, 2.0%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.4% [-2.8%, -1.2%] 4
All ❌✅ (primary) - - 0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.3% [-1.4%, -1.1%] 5
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.3% [-1.4%, -1.1%] 5

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 3, 2023
Copy link
Member

@compiler-errors compiler-errors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r=me with nits or not, unless you want a review from oli specifically

ty::IntVar(v),
ty::FreshIntTy,
),
#[cfg(debug_assertions)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally think this should stay an "always" assertion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. To avoid any performance issues, this could call a #[cold] function with the bug! inside instead of having the formatting inside the main function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried changing it back to an always assertion, and it had a noticeable perf impact, e.g. the instruction count for wg-grammar increased by 0.7%. I then tried the #[cold] function and it made a small improvement, but was still 0.5% worse.

So I will leave this as is, but I will add a comment about it.

compiler/rustc_infer/src/infer/resolve.rs Show resolved Hide resolved
compiler/rustc_infer/src/infer/mod.rs Show resolved Hide resolved
}

ty::Generator(..)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional question: Does the wildcard have a perf difference over the exhaustive match? Otherwise, I kinda prefer the exhaustive match.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably does. Transforming exhaustive matches to a wildcard in the code generation may be a good idea if so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the use case. If the match can be converted to a table lookup, the exhaustive match will have one less branch in LLVM, but have a bigger lookup table: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=c7b21a7f9d032aea5aa261953a85d735

For actual branching logic, it doesn't really matter. There may be a larger lookup table in LLVM IR, but that will become the same thing at the assembly level

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the debug assertion for Placeholder and Bound in place, doing an exhaustive match is awkward, so I've left this unchanged as well. If it helps, this leaves this method not dissimilar to ShallowResolver::fold_ty, which has the form if let ty::Infer(v) = ty.kind() { ... } else { ty }.

Copy link
Member

@pnkfelix pnkfelix Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It makes me sad that we discover the need to do micro-optimizations like re-encoding a big or-pattern arm as a wildcard; I, like @compiler-errors, find value in the exhaustive match from the view point of maintenance. Are we keeping track of efforts, if any, to put such a transformation into rustc itself?)

Copy link
Member

@pnkfelix pnkfelix Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I could even imagine an #[rustc_*] attribute that would tell the compiler to convert a given arm into a wild-card at the end of the match. That would provide a way to make @compiler-errors errors and also ease experiments like this one that @nnethercote is doing, right?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is not "wildcard is faster than manually listing the alternatives". The problem is the assertion on Placeholder and Bound. A debug assertion is faster, which makes sense. And once you have the debug assertion for those variants, having a wildcard is a lot easier.

If that assertion wasn't necessary, then you can do an exhaustive match that is the same speed as a wildcard match. (I just tried it out; same speed.) Though I would argue that an exhaustive match probably isn't appropriate when ty::Infer gets treatment A and every other variant gets treatment B.

So one doesn't have to be constructed every time.
`!t.has_non_region_infer()` is the test used in
`OpportunisticVarResolver`, and catches a few cases that
`!t.needs_infer()` misses.
@nnethercote
Copy link
Contributor Author

I addressed most of the suggestions, mostly by adding comments. I couldn't address the ones about the match in freshen.rs because it hurt perf, as explained above.

Based on @compiler-errors' previos "r=me with nits or not", I will say:
@bors r=compiler-errors

Thanks for the reviews!

@bors
Copy link
Contributor

bors commented Feb 5, 2023

📌 Commit 4aec134 has been approved by compiler-errors

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 5, 2023
@bors
Copy link
Contributor

bors commented Feb 5, 2023

⌛ Testing commit 4aec134 with merge 14ea63a...

@bors
Copy link
Contributor

bors commented Feb 6, 2023

☀️ Test successful - checks-actions
Approved by: compiler-errors
Pushing 14ea63a to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Feb 6, 2023
@bors bors merged commit 14ea63a into rust-lang:master Feb 6, 2023
@rustbot rustbot added this to the 1.69.0 milestone Feb 6, 2023
@bors bors mentioned this pull request Feb 6, 2023
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (14ea63a): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.1% [1.0%, 1.1%] 2
Regressions ❌
(secondary)
2.3% [0.6%, 4.2%] 12
Improvements ✅
(primary)
-0.4% [-0.5%, -0.3%] 18
Improvements ✅
(secondary)
-0.7% [-1.6%, -0.2%] 30
All ❌✅ (primary) -0.3% [-0.5%, 1.1%] 20

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.4% [3.2%, 3.6%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.4% [-1.4%, -1.4%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.8% [-1.4%, 3.6%] 3

Cycles

This benchmark run did not return any relevant results for this metric.

@rustbot rustbot added the perf-regression Performance regression. label Feb 6, 2023
@nnethercote nnethercote deleted the optimize-fold_ty branch February 6, 2023 21:09
@nnethercote
Copy link
Contributor Author

The post-merge perf run has regressions in keccak, cranelift-codegen, and tt-muncher that weren't in the pre-merge run. These regressions all appear to be random fluctuations that were all reversed again in #107627. keccak and cranelift-codegen appear to be one-off fluctuations; tt-muncher has entered a new noisy period.

@rustbot label: +perf-regression-triaged

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants