Separate immediate and in-memory ScalarPair representation #118991

nikic · 2023-12-15T16:50:54Z

Currently, we assume that ScalarPair is always represented using a two-element struct, both as an immediate value and when stored in memory.

This currently works fairly well, but runs into problems with #116672, where a ScalarPair involving an i128 type can no longer be represented as a two-element struct in memory. For example, the tuple (i32, i128) needs to be represented in-memory as { i32, [3 x i32], i128 } to satisfy alignment requirements. Using { i32, i128 } instead will result in the second element being stored at the wrong offset (prior to LLVM 18).

Resolve this issue by no longer requiring that the immediate and in-memory type for ScalarPair are the same. The in-memory type will now look the same as for normal struct types (and will include padding filler and similar), while the immediate type stays a simple two-element struct type. This also means that booleans in immediate ScalarPair are now represented as i1 rather than i8, just like we do everywhere else.

The core change here is to llvm_type (which now treats ScalarPair as a normal struct) and immediate_llvm_type (which returns the two-element struct that llvm_type used to produce). The rest is fixing things up to no longer assume these are the same. In particular, this switches places that try to get pointers to the ScalarPair elements to use byte-geps instead of struct-geps.

Currently, we assume that ScalarPair is always represented using a two-element struct, both as an immediate value and when stored in memory. This currently works fairly well, but runs into problems with rust-lang#116672, where a ScalarPair involving an i128 type can no longer be represented as a two-element struct in memory. For example, the tuple `(i32, i128)` needs to be represented in-memory as `{ i32, [3 x i32], i128 }` to satisfy alignment requirement. Using `{ i32, i128 }` instead will result in the second element being stored at the wrong offset (prior to LLVM 18). Resolve this issue by no longer requiring that the immediate and in-memory type for ScalarPair are the same. The in-memory type will now look the same as for normal struct types (and will include padding filler and similar), while the immediate type stays a simple two-element struct type. This also means that booleans in immediate ScalarPair are now represented as i1 rather than i8, just like we do everywhere else. The core change here is to llvm_type (which now treats ScalarPair as a normal struct) and immediate_llvm_type (which returns the two-element struct that llvm_type used to produce). The rest is fixing things up to no longer assume these are the same. In particular, this switches places that try to get pointers to the ScalarPair elements to use byte-geps instead of struct-geps.

rustbot · 2023-12-15T16:51:03Z

r? @b-naber

(rustbot has picked a reviewer for you, use r? to override)

maurer · 2023-12-15T18:06:24Z

I tested #116672 locally on top of this change, and the issue we were hitting before (building with built-in LLVM or LLVM-17 fails during stage 2) is resolved.

tests/codegen/intrinsics/transmute.rs

scottmcm · 2023-12-15T23:23:51Z

tests/codegen/scalar-pair-bool.rs

 #[no_mangle]
 pub fn pair_bool_bool(pair: (bool, bool)) -> (bool, bool) {
    pair
 }

-// CHECK: define{{.*}}{ i8, i32 } @pair_bool_i32(i1 noundef zeroext %pair.0, i32 noundef %pair.1)
+// CHECK: define{{.*}}{ i1, i32 } @pair_bool_i32(i1 noundef zeroext %pair.0, i32 noundef %pair.1)


Oh, this is amazing! 🎉

Should save truncs in the uses that (since there's no trunc nuw) I've sometimes seem cause poor codegen.

cuviper · 2023-12-15T23:39:34Z

Perf results are interesting in #116672 (comment), but we should test this on its own too...

@bors try @rust-timer queue

bors · 2023-12-15T23:40:43Z

⌛ Trying commit c2fd26a with merge 8f49c16...

Separate immediate and in-memory ScalarPair representation Currently, we assume that ScalarPair is always represented using a two-element struct, both as an immediate value and when stored in memory. This currently works fairly well, but runs into problems with rust-lang#116672, where a ScalarPair involving an i128 type can no longer be represented as a two-element struct in memory. For example, the tuple `(i32, i128)` needs to be represented in-memory as `{ i32, [3 x i32], i128 }` to satisfy alignment requirements. Using `{ i32, i128 }` instead will result in the second element being stored at the wrong offset (prior to LLVM 18). Resolve this issue by no longer requiring that the immediate and in-memory type for ScalarPair are the same. The in-memory type will now look the same as for normal struct types (and will include padding filler and similar), while the immediate type stays a simple two-element struct type. This also means that booleans in immediate ScalarPair are now represented as i1 rather than i8, just like we do everywhere else. The core change here is to llvm_type (which now treats ScalarPair as a normal struct) and immediate_llvm_type (which returns the two-element struct that llvm_type used to produce). The rest is fixing things up to no longer assume these are the same. In particular, this switches places that try to get pointers to the ScalarPair elements to use byte-geps instead of struct-geps.

bors · 2023-12-16T01:07:50Z

☀️ Try build successful - checks-actions
Build commit: 8f49c16 (8f49c1689d271405ae5c5994468ab01b777caa70)

rust-timer · 2023-12-16T03:20:13Z

Finished benchmarking commit (8f49c16): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.7%	[0.7%, 0.8%]	5
Regressions ❌ (secondary)	0.4%	[0.2%, 0.6%]	23
Improvements ✅ (primary)	-0.7%	[-1.6%, -0.1%]	14
Improvements ✅ (secondary)	-0.3%	[-0.3%, -0.3%]	1
All ❌✅ (primary)	-0.4%	[-1.6%, 0.8%]	19

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.9%	[2.9%, 2.9%]	1
Regressions ❌ (secondary)	3.8%	[3.0%, 4.3%]	3
Improvements ✅ (primary)	-2.7%	[-2.7%, -2.7%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.1%	[-2.7%, 2.9%]	2

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.7%	[0.7%, 0.7%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.0%	[-1.4%, -0.6%]	3
Improvements ✅ (secondary)	-2.3%	[-2.3%, -2.3%]	1
All ❌✅ (primary)	-0.6%	[-1.4%, 0.7%]	4

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.2%, 0.2%]	3
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.0%	[-0.0%, -0.0%]	1
Improvements ✅ (secondary)	-1.3%	[-1.3%, -1.3%]	1
All ❌✅ (primary)	0.1%	[-0.0%, 0.2%]	4

Bootstrap: 671.894s -> 673.337s (0.21%)
Artifact size: 312.51 MiB -> 312.50 MiB (-0.00%)

lqd · 2023-12-16T12:55:15Z

(Note: most of the small benchmarks with regressions here are currently being slightly noisy)

b-naber · 2023-12-22T21:14:22Z

Don't think I understand all the intricacies here. Maybe r? @davidtwco ?

nagisa · 2024-01-02T12:28:14Z

tests/codegen/personality_lifetimes.rs

@@ -24,7 +24,7 @@ pub fn test() {
    let _s = S;
    // Check that the personality slot alloca gets a lifetime start in each cleanup block, not just
    // in the first one.
-    // CHECK: [[SLOT:%[0-9]+]] = alloca { ptr, i32 }
+    // CHECK: [[SLOT:%[0-9]+]] = alloca { ptr, i32, [1 x i32] }


One thing I don’t quite understand is why did this change? Presumably alignment? But this thing looks like it could also be 32-bit aligned on 32-bit achitectures in which case [1 x i32] only serves to increase the size?

I think this is just because our struct generation makes all padding (including trailing padding) explicit. It makes no practical difference here (as long as Rust and LLVM data layout agree -- e.g. if the first element were an i128, then after the alignment change having an [3 x i32] at the end would be important, otherwise LLVM would create a too small allocation).

nagisa · 2024-01-02T12:38:02Z

compiler/rustc_codegen_llvm/src/intrinsic.rs

@@ -179,7 +179,10 @@ impl<'ll, 'tcx> IntrinsicCallMethods<'tcx> for Builder<'_, 'll, 'tcx> {
                unsafe {
                    llvm::LLVMSetAlignment(load, align);
                }
-                self.to_immediate(load, self.layout_of(tp_ty))
+                if !result.layout.is_zst() {
+                    self.store(load, result.llval, result.align);


Why was this change necessary? Is it because there might now be trailing padding where previously there wasn’t any?

The volatile_load implementation blindly loads the value using the in-memory type, so it can produce loads of array and (arbitrary) struct types. These are not really valid immediates as far as rustc is concerned (and non-canonical as far as LLVM is concerned).

The way this ends up getting handled is that if the value has Scalar ABI we convert it to an immediate, while everything else is left alone. After this change, this no longer works for ScalarPair ABI, which would need an adjustment.

What this change does it to never treat the value as an immediate in the first place, and just directly store it back in in-memory representation.

The implementation of volatile_load is really questionable in general (we really shouldn't be generating array/struct loads), but it's not really clear how it should be implemented given that we made the major design mistake of allowing volatile loads of arbitrary types, which is not a well-defined operation. We just leave it up to LLVM to interpret this in some way...

nagisa · 2024-01-02T12:41:24Z

compiler/rustc_codegen_llvm/src/builder.rs


            let mut load = |i, scalar: abi::Scalar, layout, align, offset| {
-                let llptr = self.struct_gep(pair_ty, place.llval, i as u64);
+                let llptr = if i == 0 {
+                    place.llval


Hm, this if seems like a code smell to me. Since there are just two calls to this closure (as far as I can tell) and this if is the sole use of the i argument, perhaps consider constructing the llptr in the caller and passing it into the closure instead?

The i argument is also used to determine llty.

I've tried moving this into the caller, but this ends up being rather ugly, especially as self is captured by the closure, so we can't use it outside without further changes.

compiler/rustc_codegen_llvm/src/type_of.rs

compiler/rustc_codegen_ssa/src/mir/place.rs

nagisa · 2024-01-02T14:40:04Z

@bors r+

bors · 2024-01-02T14:40:07Z

📌 Commit 8e64fc9 has been approved by nagisa

It is now in the queue for this repository.

bors · 2024-01-02T14:40:07Z

🌲 The tree is currently closed for pull requests below priority 100. This pull request will be tested once the tree is reopened.

bors · 2024-01-05T10:10:19Z

⌛ Testing commit 8e64fc9 with merge 8143a57...

Separate immediate and in-memory ScalarPair representation Currently, we assume that ScalarPair is always represented using a two-element struct, both as an immediate value and when stored in memory. This currently works fairly well, but runs into problems with rust-lang#116672, where a ScalarPair involving an i128 type can no longer be represented as a two-element struct in memory. For example, the tuple `(i32, i128)` needs to be represented in-memory as `{ i32, [3 x i32], i128 }` to satisfy alignment requirements. Using `{ i32, i128 }` instead will result in the second element being stored at the wrong offset (prior to LLVM 18). Resolve this issue by no longer requiring that the immediate and in-memory type for ScalarPair are the same. The in-memory type will now look the same as for normal struct types (and will include padding filler and similar), while the immediate type stays a simple two-element struct type. This also means that booleans in immediate ScalarPair are now represented as i1 rather than i8, just like we do everywhere else. The core change here is to llvm_type (which now treats ScalarPair as a normal struct) and immediate_llvm_type (which returns the two-element struct that llvm_type used to produce). The rest is fixing things up to no longer assume these are the same. In particular, this switches places that try to get pointers to the ScalarPair elements to use byte-geps instead of struct-geps.

bors · 2024-01-05T10:36:49Z

💔 Test failed - checks-actions

nikic · 2024-01-05T11:41:19Z

@bors r=nagisa

bors · 2024-01-05T11:41:21Z

📌 Commit 3cd6cde has been approved by nagisa

It is now in the queue for this repository.

bors · 2024-01-05T14:32:00Z

⌛ Testing commit 3cd6cde with merge 432fffa...

bors · 2024-01-05T16:29:58Z

☀️ Test successful - checks-actions
Approved by: nagisa
Pushing 432fffa to master...

rust-timer · 2024-01-05T18:28:32Z

Finished benchmarking commit (432fffa): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.6%	[0.6%, 0.6%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.0%	[-1.5%, -0.6%]	13
Improvements ✅ (secondary)	-0.2%	[-0.2%, -0.2%]	1
All ❌✅ (primary)	-0.9%	[-1.5%, 0.6%]	14

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-3.7%	[-3.7%, -3.7%]	1
Improvements ✅ (secondary)	-1.3%	[-1.3%, -1.3%]	2
All ❌✅ (primary)	-3.7%	[-3.7%, -3.7%]	1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	7.7%	[7.3%, 8.1%]	5
Improvements ✅ (primary)	-1.2%	[-1.3%, -1.1%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.2%	[-1.3%, -1.1%]	2

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.2%, 0.2%]	3
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.0%	[-0.0%, -0.0%]	4
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.1%	[-0.0%, 0.2%]	7

Bootstrap: 670.411s -> 668.199s (-0.33%)
Artifact size: 311.14 MiB -> 311.12 MiB (-0.01%)

rustbot assigned b-naber Dec 15, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 15, 2023

tgross35 mentioned this pull request Dec 15, 2023

LLVM 18 x86 data layout update #116672

Merged

scottmcm reviewed Dec 15, 2023

View reviewed changes

tests/codegen/intrinsics/transmute.rs Show resolved Hide resolved

scottmcm reviewed Dec 15, 2023

View reviewed changes

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 15, 2023

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Dec 16, 2023

rustbot assigned davidtwco and unassigned b-naber Dec 22, 2023

nagisa reviewed Jan 2, 2024

View reviewed changes

Address review comments

8e64fc9

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 2, 2024

This comment has been minimized.

Sign in to view

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jan 5, 2024

Make test compatible with 32-bit as well

3cd6cde

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 5, 2024

bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 5, 2024

bors merged commit 432fffa into rust-lang:master Jan 5, 2024
12 checks passed

rustbot added this to the 1.77.0 milestone Jan 5, 2024

rustbot removed the perf-regression Performance regression. label Jan 5, 2024

krtab mentioned this pull request Mar 31, 2024

Matching on ASCII digits no longer optimized #123305

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate immediate and in-memory ScalarPair representation #118991

Separate immediate and in-memory ScalarPair representation #118991

nikic commented Dec 15, 2023 •

edited

rustbot commented Dec 15, 2023

maurer commented Dec 15, 2023

scottmcm Dec 15, 2023

cuviper commented Dec 15, 2023

This comment has been minimized.

bors commented Dec 15, 2023

bors commented Dec 16, 2023

This comment has been minimized.

rust-timer commented Dec 16, 2023

lqd commented Dec 16, 2023

b-naber commented Dec 22, 2023

nagisa Jan 2, 2024

nikic Jan 2, 2024

nagisa Jan 2, 2024

nikic Jan 2, 2024

nagisa Jan 2, 2024

nikic Jan 2, 2024

nagisa commented Jan 2, 2024

bors commented Jan 2, 2024

bors commented Jan 2, 2024

bors commented Jan 5, 2024

This comment has been minimized.

bors commented Jan 5, 2024

nikic commented Jan 5, 2024

bors commented Jan 5, 2024

bors commented Jan 5, 2024

bors commented Jan 5, 2024

rust-timer commented Jan 5, 2024

Separate immediate and in-memory ScalarPair representation #118991

Separate immediate and in-memory ScalarPair representation #118991

Conversation

nikic commented Dec 15, 2023 • edited

rustbot commented Dec 15, 2023

maurer commented Dec 15, 2023

Choose a reason for hiding this comment

cuviper commented Dec 15, 2023

This comment has been minimized.

bors commented Dec 15, 2023

bors commented Dec 16, 2023

This comment has been minimized.

rust-timer commented Dec 16, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

lqd commented Dec 16, 2023

b-naber commented Dec 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nagisa commented Jan 2, 2024

bors commented Jan 2, 2024

bors commented Jan 2, 2024

bors commented Jan 5, 2024

This comment has been minimized.

bors commented Jan 5, 2024

nikic commented Jan 5, 2024

bors commented Jan 5, 2024

bors commented Jan 5, 2024

bors commented Jan 5, 2024

rust-timer commented Jan 5, 2024

Overall result: ✅ improvements - no action needed

nikic commented Dec 15, 2023 •

edited