Return values up to 128 bits in registers #76986

jonas-schievink · 2020-09-20T22:34:40Z

This fixes #26494 (comment) by making Rust's default ABI pass return values up to 128 bits in size in registers, just like the System V ABI.

The result is that these methods from the comment linked above now generate the same code, making the Rust ABI as efficient as the "C" ABI:

pub struct Stats { x: u32, y: u32, z: u32, }

pub extern "C" fn sum_c(a: &Stats, b: &Stats) -> Stats {
    return Stats {x: a.x + b.x, y: a.y + b.y, z: a.z + b.z };
}

pub fn sum_rust(a: &Stats, b: &Stats) -> Stats {
    return Stats {x: a.x + b.x, y: a.y + b.y, z: a.z + b.z };
}

sum_rust:
	movl	(%rsi), %eax
	addl	(%rdi), %eax
	movl	4(%rsi), %ecx
	addl	4(%rdi), %ecx
	movl	8(%rsi), %edx
	addl	8(%rdi), %edx
	shlq	$32, %rcx
	orq	%rcx, %rax
	retq

rust-highfive · 2020-09-20T22:34:44Z

r? @estebank

(rust_highfive has picked a reviewer for you, use r? to override)

jonas-schievink · 2020-09-20T22:35:09Z

r? @nagisa perhaps? cc @eddyb

compiler/rustc_target/src/abi/mod.rs

Mark-Simulacrum · 2020-09-20T22:39:56Z

Is there a reason not to do this on all x86_64 architectures?

@bors try @rust-timer queue

rust-timer · 2020-09-20T22:39:57Z

Awaiting bors try build completion

bors · 2020-09-20T22:40:08Z

⌛ Trying commit 1be289d9c76441135b72186b133e949411657024 with merge 90ed0d04e2f640ecd65d3b39ffd37f7fd9b3cd25...

bors · 2020-09-20T23:24:24Z

☀️ Try build successful - checks-actions, checks-azure
Build commit: 90ed0d04e2f640ecd65d3b39ffd37f7fd9b3cd25 (90ed0d04e2f640ecd65d3b39ffd37f7fd9b3cd25)

rust-timer · 2020-09-20T23:24:26Z

Queued 90ed0d04e2f640ecd65d3b39ffd37f7fd9b3cd25 with parent 1fd5b9d, future comparison URL.

rust-timer · 2020-09-21T02:03:50Z

Finished benchmarking try commit (90ed0d04e2f640ecd65d3b39ffd37f7fd9b3cd25): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

jonas-schievink · 2020-09-21T17:49:02Z

Hey that's neat, the improvements are in the compiler itself, so this seems to have a practical impact! It looks like most regressions are in LLVM, which is somewhat expected, since it might deal differently with the code now.

calebsander · 2020-09-21T17:52:40Z

pub struct Stats { x: u32, y: u32, z: u32, }

pub extern "C" fn sum_c(a: &Stats, b: &Stats) -> Stats {
    return Stats {x: a.x + b.x, y: a.y + b.y, z: a.z + b.z };
}

example::sum_c:
        mov     rax, qword ptr [rsi]
        add     rax, qword ptr [rdi]
        mov     edx, dword ptr [rsi + 8]
        add     edx, dword ptr [rdi + 8]
        mov     cl, byte ptr [rsi + 12]
        add     cl, byte ptr [rdi + 12]
        movzx   ecx, cl
        shl     rcx, 32
        or      rdx, rcx
        ret

The generated assembly here doesn't seem to match the code. It looks like you've used struct Stats { x: u64, y: u32, z: u8 } instead? The assembly I get (using the posted code) is:

example::sum_c:
        mov     eax, dword ptr [rsi]
        add     eax, dword ptr [rdi]
        mov     ecx, dword ptr [rsi + 4]
        add     ecx, dword ptr [rdi + 4]
        mov     edx, dword ptr [rsi + 8]
        add     edx, dword ptr [rdi + 8]
        shl     rcx, 32
        or      rax, rcx
        ret

jonas-schievink · 2020-09-21T19:30:41Z

@calebsander ah, sorry, my mistake. Updated the snippet.

pftbest · 2020-09-21T22:36:27Z

Why not make this change for all 64bit platforms? All major ones do this in C: https://godbolt.org/z/ds1ezh

jonas-schievink · 2020-09-21T22:45:21Z

Because I only know x86_64

nagisa · 2020-09-26T13:08:37Z

Can you add a codegen test for this that verifies we don't materialize the return value into stack/memory?

r=me after that.

jonas-schievink · 2020-09-26T15:42:46Z

@bors r=nagisa

bors · 2020-09-26T15:42:48Z

📌 Commit cc2ba3b has been approved by nagisa

bors · 2020-09-27T02:35:16Z

⌛ Testing commit cc2ba3b with merge 62fe055...

bors · 2020-09-27T04:50:37Z

☀️ Test successful - checks-actions, checks-azure
Approved by: nagisa
Pushing 62fe055 to master...

ecstatic-morse · 2020-09-29T03:50:49Z

The final perf results for this PR are in. Instruction counts have increased on most benchmarks, and task-clock shows no improvement.

This is a bit disappointing, as I expected this PR to be a pretty clear win. The try run also showed small losses across the board (never trust the emoji), although stress tests of the trait resolution code (keccak and inflate) seemed to fair better than they did in the final run. Possibly the change in #77041 caused the discrepancy? @jonas-schievink Am I missing anything? If not, I think we should consider reverting this unless there are benefits outside of perf.

jonas-schievink · 2020-09-29T10:19:01Z

Oh, that's disappointing. But this PR was mostly aimed at improving the generated code, not speeding up rustc.

Seems likely that #77041 has resulted in the same improvements I saw here.

pftbest · 2020-09-29T10:45:02Z

It looks to me like the only heavy operation here is a string comparison for spec.target_spec().arch.as_str()

Can it be removed somehow? For example replace it with a check for pointer size == 64

jonas-schievink · 2020-09-29T10:49:24Z

The regressions are in LLVM from what I can tell, not the code I added

pftbest · 2020-09-29T10:51:32Z

Oh I see, that's too bad.

ecstatic-morse · 2020-09-29T17:12:26Z

Ah that's true. It seems like rustc just doesn't benefit from this very much at runtime. Here's the task-clock graph starting from the parent of this commit. Obviously there's a lot of noise here, but check builds remained about the same while codegen-ed builds regressed slightly. That said, I would guess there's code in the ecosystem that would benefit from this, and the compile-time regressions are pretty small.

eddyb · 2020-10-01T14:52:22Z

compiler/rustc_target/src/abi/call/mod.rs

+/// Returns the maximum size of return values to be passed by value in the Rust ABI.
+///
+/// Return values beyond this size will use an implicit out-pointer instead.
+pub fn max_ret_by_val<C: HasTargetSpec + HasDataLayout>(spec: &C) -> Size {
+    match spec.target_spec().arch.as_str() {
+        // System-V will pass return values up to 128 bits in RAX/RDX.
+        "x86_64" => Size::from_bits(128),
+
+        _ => spec.data_layout().pointer_size,
+    }
+}


Is there anything wrong with 2 * pointer_size? IIRC we already return pairs in two registers.

That is, we already do 2 * pointer_size on all architectures, just not for arbitrary data, only pairs of scalars.

Yeah, that makes sense. Opened #77434.

…-boogalo, r=nagisa Returns values up to 2*usize by value Addresses rust-lang#76986 (comment) and rust-lang#76986 (comment) by doing the optimization on all targets. This matches what we do for functions returning `&[T]` and other fat pointers, so it should be Harmless™

rust-highfive assigned estebank Sep 20, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 20, 2020

rust-highfive assigned nagisa and unassigned estebank Sep 20, 2020

Mark-Simulacrum reviewed Sep 20, 2020

View reviewed changes

compiler/rustc_target/src/abi/mod.rs Outdated Show resolved Hide resolved

jonas-schievink added 2 commits September 26, 2020 15:34

Return values up to 128 bits in registers

4c5acc4

Add a test for 128-bit return values

cc2ba3b

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 26, 2020

bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 27, 2020

bors merged commit 62fe055 into rust-lang:master Sep 27, 2020

rustbot added this to the 1.48.0 milestone Sep 27, 2020

eddyb reviewed Oct 1, 2020

View reviewed changes

jonas-schievink mentioned this pull request Oct 1, 2020

Returns values up to 2*usize by value #77434

Merged

jonas-schievink deleted the ret-in-reg branch October 1, 2020 22:32

This was referenced Dec 7, 2021

Basic vectorization performance regression from 1.48.0 onwards #85265

Closed

Rust should use registers more aggressively #26494

Closed

[Experiment] revert issue #26494 associated pulls #76986 and #79547 #91719

Closed

erikdesjardins mentioned this pull request Apr 26, 2024

Make Rust ABI return types up to two pointers in registers #124373

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return values up to 128 bits in registers #76986

Return values up to 128 bits in registers #76986

jonas-schievink commented Sep 20, 2020 •

edited

Loading

rust-highfive commented Sep 20, 2020

jonas-schievink commented Sep 20, 2020

Mark-Simulacrum commented Sep 20, 2020

rust-timer commented Sep 20, 2020

bors commented Sep 20, 2020

bors commented Sep 20, 2020

rust-timer commented Sep 20, 2020

rust-timer commented Sep 21, 2020

jonas-schievink commented Sep 21, 2020

calebsander commented Sep 21, 2020

jonas-schievink commented Sep 21, 2020

pftbest commented Sep 21, 2020

jonas-schievink commented Sep 21, 2020

nagisa commented Sep 26, 2020 •

edited

Loading

jonas-schievink commented Sep 26, 2020

bors commented Sep 26, 2020

bors commented Sep 27, 2020

bors commented Sep 27, 2020

ecstatic-morse commented Sep 29, 2020

jonas-schievink commented Sep 29, 2020

pftbest commented Sep 29, 2020

jonas-schievink commented Sep 29, 2020

pftbest commented Sep 29, 2020

ecstatic-morse commented Sep 29, 2020

eddyb Oct 1, 2020

eddyb Oct 1, 2020

jonas-schievink Oct 1, 2020

Return values up to 128 bits in registers #76986

Return values up to 128 bits in registers #76986

Conversation

jonas-schievink commented Sep 20, 2020 • edited Loading

rust-highfive commented Sep 20, 2020

jonas-schievink commented Sep 20, 2020

Mark-Simulacrum commented Sep 20, 2020

rust-timer commented Sep 20, 2020

bors commented Sep 20, 2020

bors commented Sep 20, 2020

rust-timer commented Sep 20, 2020

rust-timer commented Sep 21, 2020

jonas-schievink commented Sep 21, 2020

calebsander commented Sep 21, 2020

jonas-schievink commented Sep 21, 2020

pftbest commented Sep 21, 2020

jonas-schievink commented Sep 21, 2020

nagisa commented Sep 26, 2020 • edited Loading

jonas-schievink commented Sep 26, 2020

bors commented Sep 26, 2020

bors commented Sep 27, 2020

bors commented Sep 27, 2020

ecstatic-morse commented Sep 29, 2020

jonas-schievink commented Sep 29, 2020

pftbest commented Sep 29, 2020

jonas-schievink commented Sep 29, 2020

pftbest commented Sep 29, 2020

ecstatic-morse commented Sep 29, 2020

eddyb Oct 1, 2020

Choose a reason for hiding this comment

eddyb Oct 1, 2020

Choose a reason for hiding this comment

jonas-schievink Oct 1, 2020

Choose a reason for hiding this comment

jonas-schievink commented Sep 20, 2020 •

edited

Loading

nagisa commented Sep 26, 2020 •

edited

Loading