Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Represent Result<usize, Box<T>> as ScalarPair(i64, ptr) #121668

Merged
merged 5 commits into from
Mar 13, 2024

Conversation

erikdesjardins
Copy link
Contributor

@erikdesjardins erikdesjardins commented Feb 27, 2024

This allows types like Result<usize, std::io::Error> (and integers of differing sign, e.g. Result<u64, i64>) to be passed in a pair of registers instead of through memory, like Result<u64, u64> or Result<Box<T>, Box<U>> are today.

Fixes #97540.

r? @ghost

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 27, 2024
Comment on lines +9 to +17
// CHECK-LABEL: @insert_int
#[no_mangle]
pub fn insert_int(x: usize) -> Result<usize, Box<()>> {
// CHECK: start:
// CHECK-NEXT: inttoptr i{{[0-9]+}} %x to ptr
// CHECK-NEXT: insertvalue
// CHECK-NEXT: ret { i{{[0-9]+}}, ptr }
Ok(x)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised that this just works, inserting inttoptr/ptrtoint as necessary with only a layout change and no changes to codegen. It feels like something in codegen is a bit too permissive about adding casts whenever it needs to...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is optimized IR, so it's likely that rustc did something much more boring but LLVM optimized it to this.

Add -C no-prepopulate-passes in the compile-flags if you want to look at what cg_llvm is doing directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, you get an inttoptr like that after LLVM optimizes a store+load: https://rust.godbolt.org/z/qMPWr7Tzz

Copy link
Member

@scottmcm scottmcm Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Rust would actually prefer this example to be a GEP off null, like #121242, I think. Not that this PR would need to do that.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did I forget that...

Yeah, the IR we generate just spills to an alloca.

Comment on lines 35 to +41
// CHECK-LABEL: @result_nop_match_32
#[no_mangle]
pub fn result_nop_match_32(x: Result<i32, u32>) -> Result<i32, u32> {
// CHECK: start
// CHECK-NEXT: ret i64 %0
// CHECK: start:
// CHECK-NEXT: insertvalue { i32, i32 }
// CHECK-NEXT: insertvalue { i32, i32 }
// CHECK-NEXT: ret { i32, i32 }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this is no longer just a ret instruction, this is the fewest possible instructions for scalar pair layout, since you need to pack the two separate arguments into a single return value.

For real, non-noop code, the scalar pair layout is arguably slightly better, since passing the components in separate arguments means that consuming code can just use the second argument directly, instead of having to shift it down from the top 32 bits of an i64 reg.

It was kinda janky in the first place for this test to rely on the fact that integers with differing signedness don't get scalar pair layout; Result<u32, u32> has always gotten the same scalar pair layout (since 1.27) that Result<i32, u32> now gets with this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, agreed it's fine to change this test like this. The important part is that it's not branching, not icmping, not selecting, etc.

@Noratrieb
Copy link
Member

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 27, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 27, 2024
Represent `Result<usize, Box<T>>` as ScalarPair(i64, ptr)

This allows types like `Result<usize, std::io::Error>` (and integers of differing sign, e.g. `Result<u64, i64>`) to be passed in a pair of registers instead of through memory, like `Result<u64, u64>` or `Result<Box<T>, Box<U>>` are today.

Fixes rust-lang#97540.

r? `@ghost`
@bors
Copy link
Contributor

bors commented Feb 27, 2024

⌛ Trying commit 4dabbcb with merge 620bc57...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Feb 27, 2024

☀️ Try build successful - checks-actions
Build commit: 620bc57 (620bc57fd23a71e77bfab2c9db4cb12dff02e970)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (620bc57): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.7% [0.4%, 0.8%] 5
Regressions ❌
(secondary)
0.4% [0.2%, 1.2%] 27
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.8% [-0.8%, -0.8%] 1
All ❌✅ (primary) 0.7% [0.4%, 0.8%] 5

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.8% [-3.8%, -3.8%] 1
All ❌✅ (primary) - - 0

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.0% [0.0%, 0.0%] 4
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.2% [-0.2%, -0.2%] 1
All ❌✅ (primary) 0.0% [0.0%, 0.0%] 4

Bootstrap: 649.3s -> 650.378s (0.17%)
Artifact size: 311.08 MiB -> 311.04 MiB (-0.01%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Feb 27, 2024
@erikdesjardins
Copy link
Contributor Author

Hmm, nearly all of the changed benchmarks seem to be noisy, so it's hard to tell if it's real or not...

@Noratrieb
Copy link
Member

looks like noise and it being perf neutral to me

@erikdesjardins erikdesjardins marked this pull request as ready for review February 28, 2024 22:56
@erikdesjardins
Copy link
Contributor Author

In that case...

r? compiler

@rust-log-analyzer

This comment has been minimized.

Copy link
Member

@davidtwco davidtwco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me but I'm not familiar enough with this code to anticipate all the consequences of the change.

@davidtwco
Copy link
Member

r? compiler

@rustbot rustbot assigned nnethercote and unassigned davidtwco Mar 4, 2024
@nnethercote
Copy link
Contributor

I'm not a good reviewer for this, let's try r? @scottmcm and cc @cuviper for good luck.

@rustbot rustbot assigned scottmcm and unassigned nnethercote Mar 5, 2024
@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Mar 6, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Mar 6, 2024
It seems that LLVM 17 doesn't fully optimize out unwrap_unchecked.

We can just loosen the check lines to account for this, since we don't
really care about the exact instructions, we just want to make sure that
inttoptr/ptrtoint aren't used for Box.
@erikdesjardins
Copy link
Contributor Author

erikdesjardins commented Mar 7, 2024

The job x86_64-gnu-llvm-17 failed!

It seems LLVM 17 doesn't optimize out the unwrap_unchecked. Loosened the check lines since we don't care about the precise instruction sequence.

@erikdesjardins
Copy link
Contributor Author

Ping @scottmcm, I made some changes to the codegen tests so they also pass on LLVM 17. Let me know if you'd prefer to revert the last commit and use min-llvm-version: 18 instead.

Copy link
Member

@scottmcm scottmcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, looks like I accidentally commented on the commit, which has github a bit confused. Let's try this instead...

(And do @rustbot ready once you're happy with it, please.)

tests/codegen/common_prim_int_ptr.rs Outdated Show resolved Hide resolved
tests/codegen/common_prim_int_ptr.rs Outdated Show resolved Hide resolved
tests/codegen/common_prim_int_ptr.rs Show resolved Hide resolved
@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 13, 2024
Co-authored-by: Scott McMurray <scottmcm@users.noreply.github.com>
@erikdesjardins
Copy link
Contributor Author

Yeah, something like that works.

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 13, 2024
@scottmcm
Copy link
Member

Oh, right, scalar pair, so clearly more than one parameter 🤦

Test updates look good:
@bors r=scottmcm,oli-obk rollup=never

@bors
Copy link
Contributor

bors commented Mar 13, 2024

📌 Commit 9f55200 has been approved by scottmcm,oli-obk

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 13, 2024
@bors
Copy link
Contributor

bors commented Mar 13, 2024

⌛ Testing commit 9f55200 with merge 3cbb932...

@bors
Copy link
Contributor

bors commented Mar 13, 2024

☀️ Test successful - checks-actions
Approved by: scottmcm,oli-obk
Pushing 3cbb932 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Mar 13, 2024
@bors bors merged commit 3cbb932 into rust-lang:master Mar 13, 2024
12 checks passed
@rustbot rustbot added this to the 1.78.0 milestone Mar 13, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (3cbb932): comparison URL.

Overall result: ❌✅ regressions and improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.1% [1.1%, 1.1%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.5% [-3.5%, -3.5%] 1
All ❌✅ (primary) - - 0

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.0% [-3.0%, -3.0%] 1
All ❌✅ (primary) - - 0

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.0% [0.0%, 0.2%] 5
Regressions ❌
(secondary)
1.0% [1.0%, 1.3%] 28
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.2% [-0.2%, -0.2%] 1
All ❌✅ (primary) 0.0% [0.0%, 0.2%] 5

Bootstrap: 675.199s -> 675.071s (-0.02%)
Artifact size: 310.80 MiB -> 310.73 MiB (-0.02%)

@rustbot rustbot removed the perf-regression Performance regression. label Mar 13, 2024
@erikdesjardins erikdesjardins deleted the commonprim branch March 13, 2024 21:36
@erikdesjardins
Copy link
Contributor Author

erikdesjardins commented Mar 13, 2024

Quoth nnethercote: Are the binary size increases expected?

This change looks basically the same as #121665 (comment), where all of the regressions are due to a uniform ~4k increase in the stdlib "runtime" that's included in every binary.

image

I'm looking into whether we can make some improvements to the stdlib backtrace code as a whole, either by cha—wait a minute.

Didn't helloworld-tiny start at 305,488 bytes last time too? It sure did--and broadly the same (+/- a few hundred bytes) for all the other 1% regressions.

So it was noise all along?

(I'm still gonna look into some backtrace size improvements though, since all the big functions in helloworld are backtrace related. Edit: opened #122462 as a first step)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Result that uses niches resulting in a final size of 16 bytes emits poor LLVM IR due to ABI