Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop using LLVM struct types for alloca #122053

Merged
merged 5 commits into from Apr 24, 2024
Merged

Conversation

erikdesjardins
Copy link
Contributor

The alloca type has no semantic meaning, only the size (and alignment, but we specify it explicitly) matter. Using [N x i8] is a more direct way to specify that we want N bytes, and avoids relying on LLVM's struct layout. It is likely that a future LLVM version will change to an untyped alloca representation.

Split out from #121577.

r? @ghost

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 6, 2024
@erikdesjardins
Copy link
Contributor Author

Test changes will conflict with #122050.

Additionally, I believe SROA still uses the alloca type in some cases, so this may cause perf regressions, and hence not be viable, until LLVM 19 (or later).

@nikic
Copy link
Contributor

nikic commented Mar 6, 2024

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 6, 2024
@bors
Copy link
Contributor

bors commented Mar 6, 2024

⌛ Trying commit 26be569 with merge 52e34ca...

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 6, 2024
Stop using LLVM struct types for alloca

The alloca type has no semantic meaning, only the size (and alignment, but we specify it explicitly) matter. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's struct layout. It is likely that a future LLVM version will change to an untyped alloca representation.

Split out from rust-lang#121577.

r? `@ghost`
@bors
Copy link
Contributor

bors commented Mar 6, 2024

☀️ Try build successful - checks-actions
Build commit: 52e34ca (52e34ca98c60292a68ff4f7b01a3bb5d813cda61)

1 similar comment
@bors
Copy link
Contributor

bors commented Mar 6, 2024

☀️ Try build successful - checks-actions
Build commit: 52e34ca (52e34ca98c60292a68ff4f7b01a3bb5d813cda61)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (52e34ca): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.2%, 0.2%] 2
Regressions ❌
(secondary)
0.3% [0.2%, 0.4%] 6
Improvements ✅
(primary)
-1.6% [-1.6%, -1.6%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.4% [-1.6%, 0.2%] 3

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.4% [-2.0%, -1.0%] 3
Improvements ✅
(secondary)
-2.3% [-2.9%, -1.7%] 2
All ❌✅ (primary) -1.4% [-2.0%, -1.0%] 3

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 647.585s -> 646.111s (-0.23%)
Artifact size: 175.07 MiB -> 175.01 MiB (-0.03%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Mar 6, 2024
@compiler-errors
Copy link
Member

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 15, 2024
@bors
Copy link
Contributor

bors commented Mar 15, 2024

⌛ Trying commit 8536da4 with merge 7a9b98b...

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 15, 2024
Stop using LLVM struct types for alloca

The alloca type has no semantic meaning, only the size (and alignment, but we specify it explicitly) matter. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's struct layout. It is likely that a future LLVM version will change to an untyped alloca representation.

Split out from rust-lang#121577.

r? `@ghost`
@bors
Copy link
Contributor

bors commented Mar 15, 2024

☀️ Try build successful - checks-actions
Build commit: 7a9b98b (7a9b98bb9b00c608f4174583891c151b87723e9f)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (7a9b98b): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.4% [0.2%, 0.6%] 9
Regressions ❌
(secondary)
0.7% [0.2%, 3.7%] 9
Improvements ✅
(primary)
-1.7% [-1.7%, -1.7%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.1% [-1.7%, 0.6%] 10

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.5% [-2.5%, -2.5%] 1
Improvements ✅
(secondary)
-2.1% [-2.5%, -1.7%] 2
All ❌✅ (primary) -2.5% [-2.5%, -2.5%] 1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.8% [3.8%, 3.9%] 2
Improvements ✅
(primary)
-1.1% [-1.1%, -1.1%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.1% [-1.1%, -1.1%] 1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 671.201s -> 669.516s (-0.25%)
Artifact size: 311.47 MiB -> 311.44 MiB (-0.01%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 15, 2024
@erikdesjardins
Copy link
Contributor Author

Both perf runs look somewhat consistent. In terms of instruction count:

Improvements to exa. These look legit, cachegrind diffs show LLVM does less work. The resulting exa opt binaries are the same size (within 8 bytes), so this seems to be a real improvement and not because some optimization was prevented.

Regressions on a few other benchmarks--stm32f4, cranelift-codegen, tuple-stress, and externs show up in both perf runs. These also look real--the cachegrind diffs for those four all look something like:

--------------------------------------------------------------------------------
-- Function:file summary
--------------------------------------------------------------------------------
  Ir_________  function:file

>  25,868,377  <rustc_lexer::cursor::Cursor>::advance_token:???
> -16,715,810  <hashbrown::raw::RawTable<(rustc_span::def_id::LocalDefId, rustc_hir::hir_id::ItemLocalId)>>::reserve_rehash::<hashbrown::map::make_hasher<rustc_span::def_id::LocalDefId, rustc_hir::hir_id::ItemLocalId, core::hash::BuildHasherDefault<rustc_hash::FxHasher>>::{closure#0}>:???
>  16,715,810  <hashbrown::raw::RawTable<(rustc_ast::node_id::NodeId, rustc_hir::hir_id::ItemLocalId)>>::reserve_rehash::<hashbrown::map::make_hasher<rustc_ast::node_id::NodeId, rustc_hir::hir_id::ItemLocalId, core::hash::BuildHasherDefault<rustc_hash::FxHasher>>::{closure#0}>:???
>  -4,391,303  <rustc_metadata::creader::CrateMetadataRef>::opt_item_name:???
>   4,064,505  rustc_metadata::rmeta::decoder::cstore_impl::provide_extern::associated_item:???
>   1,188,404  <rustc_span::span_encoding::Span>::to:???
>  -1,179,620  <rustc_parse::parser::Parser>::parse_expr_assoc_with:???
...

that is, a bunch of inlining noise, and a real regression in rustc_lexer::cursor::Cursor::advance_token. It seems like this change causes advance_token to consistently execute more instructions.

Looking at cycle count, while exa still has a significant improvement, the cycle count changes for stm32f4, cranelift-codegen, tuple-stress, and externs are all well below the significance threshold (some tiny improvements, some tiny regressions, looks like noise). So perhaps advance_token isn't actually slower, it just gets codegenned in a way that executes more instructions.

Looking at the asm diff for advance_token, a lot of code is moved around, but there don't seem to be any significant changes, except for this diff at an early exit near the beginning of the function (although I could have missed something--full diff here, after normalizing constants etc.):

 	mov    rdx,QWORD PTR [rsi]
 	mov    rdi,QWORD PTR [rsi+0xXXXX]
 	mov    ecx,DWORD PTR [rsi+0xXXXX]
 	add    ecx,edx
 	sub    ecx,edi
+	shl    r15,0x20
+	movzx  r8d,r13b
+	shl    r8d,0x18
+	or     r8,r15
+	movzx  r9d,r12b
+	shl    r9d,0x10
+	or     r9,r8
+	shl    r14d,0x8
+	movzx  r8d,r14w
+	or     r8,r9
+	movzx  r9d,bpl
+	or     r9,r8
 	sub    rdi,rdx
 	mov    QWORD PTR [rsi+0xXXXX],rdi
-	mov    BYTE PTR [rax],r11b
-	mov    BYTE PTR [rax+0xXXXX],bpl
-	mov    BYTE PTR [rax+0xXXXX],r14b
-	mov    BYTE PTR [rax+0xXXXX],r15b
-	mov    DWORD PTR [rax+0xXXXX],r12d
+	mov    QWORD PTR [rax],r9
 	mov    DWORD PTR [rax+0xXXXX],ecx
 	add    rsp,0x28
 	pop    rbx

This uses 7 more instructions, but it combines 5 stores into 1, which seems better.

So, given that, and the lack of regressions in cycle count on those four benchmarks, I think this is not actually a regression, and we can go ahead with this change.

@erikdesjardins erikdesjardins marked this pull request as ready for review March 15, 2024 19:15
@rustbot
Copy link
Collaborator

rustbot commented Mar 15, 2024

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo, @GuillaumeGomez

bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 12, 2024
Stop using LLVM struct types for alloca

The alloca type has no semantic meaning, only the size (and alignment, but we specify it explicitly) matter. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's struct layout. It is likely that a future LLVM version will change to an untyped alloca representation.

Split out from rust-lang#121577.

r? `@ghost`
@bors
Copy link
Contributor

bors commented Apr 12, 2024

⌛ Testing commit daaaacd with merge 662ae7e...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Apr 12, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Apr 12, 2024
@nikic
Copy link
Contributor

nikic commented Apr 15, 2024

Looks like there are two extra variants of the stack protector test for Windows, which will need the same updates as the Linux test.

@RalfJung
Copy link
Member

@bors r-
(PR got re-queued)

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 17, 2024
@erikdesjardins
Copy link
Contributor Author

@rustbot review

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 23, 2024
@nikic
Copy link
Contributor

nikic commented Apr 24, 2024

@bors r+

@bors
Copy link
Contributor

bors commented Apr 24, 2024

📌 Commit 6df27ef has been approved by nikic

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 24, 2024
@bors
Copy link
Contributor

bors commented Apr 24, 2024

⌛ Testing commit 6df27ef with merge 29a56a3...

@bors
Copy link
Contributor

bors commented Apr 24, 2024

☀️ Test successful - checks-actions
Approved by: nikic
Pushing 29a56a3 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 24, 2024
@bors bors merged commit 29a56a3 into rust-lang:master Apr 24, 2024
13 checks passed
@rustbot rustbot added this to the 1.79.0 milestone Apr 24, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (29a56a3): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.2%, 0.3%] 8
Regressions ❌
(secondary)
0.4% [0.2%, 1.1%] 17
Improvements ✅
(primary)
-1.9% [-1.9%, -1.9%] 1
Improvements ✅
(secondary)
-0.4% [-0.4%, -0.4%] 1
All ❌✅ (primary) -0.0% [-1.9%, 0.3%] 9

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.8% [-2.8%, -2.8%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -2.8% [-2.8%, -2.8%] 1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.7% [1.5%, 2.0%] 3
Improvements ✅
(primary)
-1.2% [-1.2%, -1.2%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.2% [-1.2%, -1.2%] 1

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.2% [-0.2%, -0.2%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.2% [-0.2%, -0.2%] 1

Bootstrap: 675.097s -> 673.216s (-0.28%)
Artifact size: 316.24 MiB -> 315.45 MiB (-0.25%)

@erikdesjardins erikdesjardins deleted the alloca branch April 24, 2024 12:10
@erikdesjardins
Copy link
Contributor Author

Spot-checking a few of those instruction count regressions, e.g. unicode-normalization:

--------------------------------------------------------------------------------
-- Function:file summary
--------------------------------------------------------------------------------
  Ir_______  function:file

> 2,007,011  <rustc_lexer::cursor::Cursor>::advance_token:???
>   927,357  <rustc_parse::parser::Parser>::collect_tokens_for_expr::<<rustc_parse::parser::Parser>::parse_expr_prefix::{closure#7}>::{closure#0}:???
>  -859,667  <rustc_parse::parser::Parser>::parse_expr_prefix::{closure#7}:???
>   321,256  <rustc_span::span_encoding::Span>::to:???
>  -263,369  <rustc_parse::parser::Parser>::parse_expr_assoc_with:???
>  -127,459  <rustc_parse::lexer::tokentrees::TokenTreesReader>::bump:???
>   -66,640  <rustc_parse::parser::Parser>::parse_expr_prefix:???
>    43,746  <rustc_parse::parser::Parser>::parse_expr_dot_or_call:???
...

this looks like the same thing I investigated above (#122053 (comment)), where we merge 5 stores into 1 in advance_token, which is better, but involves using more instructions.

So given the relative lack of cycle count changes (a small improvement on one benchmark, and a small regression on another benchmark), I think this change doesn't have any significant perf impact.

@Mark-Simulacrum
Copy link
Member

Mark-Simulacrum commented Apr 29, 2024

Edit: Deleting comment, meant to leave it on another PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet