rustc_codegen_ssa: don't use LLVM struct types for field offsets. #98615

eddyb · 2022-06-28T10:33:59Z

As opaque pointers in LLVM become inevitable, let's review our usage of LLVM "struct" types:

representing Rust types
- ~~constants~~ (gone since miri, we just flatly serialize the mix of raw bytes and relocations)
- ~~pointee types~~ (will be gone with opaque pointers)
- alloca: only for size (we already always supply the alignment explicitly)
  - impact of replacing the type with [i8 x sizeof(T)] still needs to be measured, I have a sneaking suspicion that (even post-opaque-pointers) some LLVM passes use the type to make decisions
- getelementptr: only for offset computation
  - this PR replaces them with byte offsets, which might interfere with some optimizations (as I worry about alloca), but I doubt LLVM would have a hard time rewriting the GEP to fit the type (i.e. this might push back all the pessimizations to a future alloca de-typing)
  - my intuition back when I tried to get involved with the opaque pointers work, was that LLVM was wasting a lot of time (re)computing¹ offsets² when a more pragmatic IR would just put the constant offset in the instruction (at most you'd end up with what I call "integer dot product", for the full form that indexes nested arrays)
    _{¹can't be easily cached IIRC because types are interned in the Context but only a Module can specify a datalayout}
    _{²you want offsets as any decision based on "fields" would likely miss some optimization opportunities}
not representing Rust types (but automatically generated for e.g. ABI reasons)
- the common one here is "returning more than one value" (as LLVM doesn't support the feature itself, but forces emitters to go through the tuple-returning idiom)
- wherever possible, rustc_codegen_ssa's abstractions should work with e.g. SmallVec<[Bx::Value; 2]> (or have some sort of Bx::ValueBundle for handling trickier situations like invoke, where the invoke and the extracts are in different BBs)

I never got LLVM devs to agree with me that aggregate types in LLVM were a mistake, but if we can get away with not using them (outside of "multiple return values"), maybe that will make a difference.
_{(a commonly cited reason was debugging the IR, but wouldn't you want more readable DWARF metadata in that case? LLVM types are already quite lossy. also, the representation of LLVM constants I was floating back then in the context of removing LLVM aggregates, turned into a miri "abstract allocation" representation proposal, so it wasn't entirely a waste of time)}

A couple notes for context:

historically the lack of opaque pointers, and the (perceived?) need to represent Rust types (or rather, their layouts) using LLVM aggregates, had delayed (and complicated) what we now call the rustc ABI/layout system (thankfully we didn't end up needing cycle detection for "mutual recursion" groups)
I was looking into generalizing "scalar pair" layouts the other day, and the interaction with LLVM struct types is getting in the way a lot (both coming up with appropriate LLVM types for "scalar components", and subsequent GEPs)

Some additional concerns come from backends that are even higher-level than LLVM:

rustc_codegen_gcc (cc @antoyo)
- not super familiar with it, but I suspect it could also switch fully to untyped pointers
rustc_codegen_spirv
- currently takes advantage of being able to represent Rust struct types (and a limited selection of enums, based on layout) as SPIR-V structs
- already handles pointer-casting GEPs (i.e. it tries to recover a castless GEP based on the offset), so all it really needs is to see the layout in alloca and all the call ABI handling
  - that is, its backend Values would be tracking a richer SPIR-V type than rustc_codegen_ssa needs or wants for the backend Type, in an opaque-pointer/aggreggate-less world
- long-term the plan is to introduce a SPIR-V derivative IR that would more readily allow both representing the "untyped memory" semantics Rust wants, and legalizing it away (mostly into SSA values, but inferring "typed memory" is also an option for some things)

I'm not sure how well we can test the LLVM optimization impact of this PR, but to some extent a perf run might be able to reflect at least the impact on rustc's code.

cc @rust-lang/wg-llvm @bjorn3 @tmiasko

rust-highfive · 2022-06-28T10:34:03Z

Some changes occured to rustc_codegen_gcc

cc @antoyo

rustbot · 2022-06-28T10:34:03Z

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo

eddyb · 2022-06-28T10:36:06Z

@bors try @rust-timer queue

rust-timer · 2022-06-28T10:36:09Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-06-28T10:36:18Z

⌛ Trying commit 67d2c9f with merge 2f9a464f3f179bba982ccfa5b3d354d32b6dc88f...

eddyb · 2022-06-28T10:53:52Z

r? @nikic

nikic · 2022-06-28T11:19:35Z

impact of replacing the type with [i8 x sizeof T] still needs to be measured, I have a sneaking suspicion that (even post-opaque-pointers) some LLVM passes use the type to make decisions

SROA does make use of the alloca type, though only a fallback. I'm not sure how important it is. The relevant code is here: https://github.com/llvm/llvm-project/blob/04dac2ca7c06d0ce173e53527e3b90a07e3b325d/llvm/lib/Transforms/Scalar/SROA.cpp#L4259

this PR replaces them with byte offsets, which might interfere with some optimizations (as I worry about alloca), but I doubt LLVM would have a hard time rewriting the GEP to fit the type (i.e. this might push back all the pessimizations to a future alloca de-typing)

If the offset is constant, this is okay in theory. LLVM is pretty good about not caring about GEP types when it comes to constant offsets (the same is not true if variable offsets are involved). Though the last time I actually tried to do this as an InstCombine canonicalization, I observed significant codegen impact, and didn't analyze it further at the time.

my intuition back when I tried to get involved with the opaque pointers work, was that LLVM was wasting a lot of time (re)computing1 offsets2 when a more pragmatic IR would just put the constant offset in the instruction (at most you'd end up with what I call "integer dot product", for the full form that indexes nested arrays)

Your intuition is correct.

I think generally this change makes sense, but I'm not sure whether now is the right time to make it -- we're currently still on LLVM 14 without opaque pointers, where this would have higher impact.

compiler/rustc_codegen_ssa/src/mir/place.rs

nikic · 2022-06-28T11:22:46Z

FWIW changing GEP representation to use offsets is on my long term roadmap, but it would be another major effort, and needs the typed pointer removal to be finalized first anyway.

bors · 2022-06-28T12:09:43Z

☀️ Try build successful - checks-actions
Build commit: 2f9a464f3f179bba982ccfa5b3d354d32b6dc88f (2f9a464f3f179bba982ccfa5b3d354d32b6dc88f)

rust-timer · 2022-06-28T12:09:48Z

Queued 2f9a464f3f179bba982ccfa5b3d354d32b6dc88f with parent 64eb9ab, future comparison URL.

rust-timer · 2022-06-28T13:15:17Z

Finished benchmarking commit (2f9a464f3f179bba982ccfa5b3d354d32b6dc88f): comparison url.

Instruction count

Primary benchmarks: mixed results
Secondary benchmarks: mixed results

	mean¹	max	count²
Regressions 😿 (primary)	0.8%	1.5%	8
Regressions 😿 (secondary)	3.3%	8.9%	11
Improvements 🎉 (primary)	-0.7%	-3.8%	14
Improvements 🎉 (secondary)	-1.5%	-2.5%	7
All 😿🎉 (primary)	-0.2%	-3.8%	22

Max RSS (memory usage)

Results

Primary benchmarks: 🎉 relevant improvement found
Secondary benchmarks: mixed results

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	10.2%	14.1%	5
Improvements 🎉 (primary)	-3.9%	-3.9%	1
Improvements 🎉 (secondary)	-5.2%	-6.3%	4
All 😿🎉 (primary)	-3.9%	-3.9%	1

Cycles

Results

Primary benchmarks: 🎉 relevant improvement found
Secondary benchmarks: 😿 relevant regressions found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	6.0%	12.1%	8
Improvements 🎉 (primary)	-3.2%	-3.2%	1
Improvements 🎉 (secondary)	-2.6%	-2.6%	1
All 😿🎉 (primary)	-3.2%	-3.2%	1

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

eddyb · 2022-06-28T13:41:14Z

@Mark-Simulacrum ^^ is it normal that there's nothing under the "Bootstrap timings" heading?

Either way, pretty mixed results, though it seems like opt benefits and debug loses (at least when the amplitude is high enough, there's probably second-order effects in there as well).

Maybe we need opaque pointers before this can work well enough? The extra casts might be a thorn in the side of debug mode.

Sadly, looking at the opaque pointers perf run, seems like it would be a bit of a pain to do a fair comparison (since it would have to be opaque-pointers vs opaque-pointers+this PR).

lqd · 2022-06-28T14:17:17Z

@Mark-Simulacrum ^^ is it normal that there's nothing under the "Bootstrap timings" heading?

It seems rustc is currently broken on the perfbot https://perf.rust-lang.org/status.html

antoyo · 2022-06-28T14:33:43Z

rustc_codegen_gcc (cc @antoyo)

not super familiar with it, but I suspect it could also switch fully to untyped pointers

I'm assuming the API to things accessing pointers like load() will still carry the type information, so that the GCC codegen will be able to cast to the right pointer type. Is this correct?

eddyb · 2022-06-28T19:50:13Z

I'm assuming the API to things accessing pointers like load() will still carry the type information, so that the GCC codegen will be able to cast to the right pointer type. Is this correct?

Oh, right, just accounting for opaque pointer types would probably make what I'm doing here work.
(With opaque pointers, pointee types are provided to alloca/GEP/load but not tracked in the pointer type)

I suppose I should ask if GCC will treat ((char*)base) + offset much worse than &((Foo*)base)->field when they're equivalent (i.e. offset == offsetof(Foo, field)).
And separately/on top of that, whether declaring stack locals (LLVM alloca) as char[sizeof(Foo)] is worse than just Foo (not relevant to this PR, just the logical next step).

IOW, are types just used as a impractical proxy for the underlying untyped memory semantics or are they actually special-cased? (for LLVM it can vary per pass but generally it leans towards the former)

eddyb · 2022-06-29T09:47:01Z

Okay at least deep-vector seems to be one small file with no dependencies, so at least profiling LLVM and whatnot should be maximally doable (but from some quick testing I will need release-debuginfo = true, haven't done a local LLVM build in a while, this is going to be fun).

nikic · 2022-09-09T19:00:22Z

I think the numbers here are good enough to land this. I expect this will have mixed improvements and regressions in codegen. We can address reported regressions in the next LLVM release. See #101210 for an example where this should be an improvement, and I've been seeing patterns like this quite often recently.

r=me with that one fixme dropped.

Kobzol · 2022-09-27T07:49:49Z

Can we add a codegen test to check if this helps with #101210? Or should it be in a follow-up PR?

compiler/rustc_codegen_ssa/src/mir/place.rs

nikic · 2022-12-17T09:12:05Z

As it's been quite a while, rerunning perf to sanity check.

@bors try @rust-timer queue

bors · 2022-12-17T09:12:45Z

⌛ Trying commit 4d803a0 with merge 8456a3d8aa534b17f06fbd8ec8caa743630805e5...

bors · 2022-12-17T11:53:02Z

☀️ Try build successful - checks-actions
Build commit: 8456a3d8aa534b17f06fbd8ec8caa743630805e5 (8456a3d8aa534b17f06fbd8ec8caa743630805e5)

rust-timer · 2022-12-17T13:12:28Z

Finished benchmarking commit (8456a3d8aa534b17f06fbd8ec8caa743630805e5): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.4%	[0.3%, 0.4%]	5
Regressions ❌ (secondary)	0.4%	[0.2%, 0.9%]	16
Improvements ✅ (primary)	-0.8%	[-1.5%, -0.4%]	13
Improvements ✅ (secondary)	-1.5%	[-3.6%, -0.5%]	7
All ❌✅ (primary)	-0.5%	[-1.5%, 0.4%]	18

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.3%	[2.3%, 2.3%]	1
Regressions ❌ (secondary)	3.6%	[2.5%, 5.6%]	4
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	2.3%	[2.3%, 2.3%]	1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.3%	[-1.8%, -1.0%]	9
Improvements ✅ (secondary)	-2.1%	[-3.6%, -1.1%]	5
All ❌✅ (primary)	-1.3%	[-1.8%, -1.0%]	9

erikdesjardins · 2023-01-07T16:59:54Z

@nikic the fixme comment has been removed and I think perf still looks reasonable, are there any other changes you want here?

nikic · 2023-03-11T09:02:54Z

For the record, https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699?u=nikic is the LLVM proposal to move to purely offset-based GEPs.

WaffleLapkin · 2023-03-12T21:22:13Z

What's the status here? Are merge conflicts the last thing that is stopping this from being merged? Are the next steps [rebase, re-check perf, r=nikic]?

Dylan-DPC · 2023-05-14T08:18:52Z

What's the status here? Are merge conflicts the last thing that is stopping this from being merged? Are the next steps [rebase, re-check perf, r=nikic]?

@WaffleLapkin this is marked as blocked on rust-lang/rustc-perf#1345 as per this comment though i'm not sure if that's still a concern

Stop using LLVM struct types for alloca, byval, sret, and many GEPs This is an extension of rust-lang#98615, extending the removal from field offsets to most places that it's feasible right now. (It might make sense to split this PR up, but I want to test perf with everything.) For `alloca`, `byval`, and `sret`, the type has no semantic meaning, only the size matters\*†. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's layout algorithm. Particularly for `alloca`, it is likely that a future LLVM will change to a representation where you only specify the size. For GEPs, upstream LLVM is in the beginning stages of [migrating to `ptradd`](https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699). LLVM 19 will [canonicalize](llvm/llvm-project#68882) all constant-offset GEPs to i8, which is the same thing we do here. \*: Since we always explicitly specify the alignment. For `byval`, this wasn't the case until rust-lang#112157. †: For `byval`, the hidden copy may be impacted by padding in the LLVM struct type, i.e. padding bytes may not be copied. (I'm not sure if this is done today, but I think it would be legal.) But we manually pad our LLVM struct types specifically to avoid there ever being LLVM-visible padding, so that shouldn't be an issue here. r? `@ghost`

Always generate GEP i8 / ptradd for struct offsets This implements rust-lang#98615, and goes a bit further to remove `struct_gep` entirely. Upstream LLVM is in the beginning stages of [migrating to `ptradd`](https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699). LLVM 19 will [canonicalize](llvm/llvm-project#68882) all constant-offset GEPs to i8, which has roughly the same effect as this change. Split out from rust-lang#121577. r? `@nikic`

Always generate GEP i8 / ptradd for struct offsets This implements rust-lang#98615, and goes a bit further to remove `struct_gep` entirely. Upstream LLVM is in the beginning stages of [migrating to `ptradd`](https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699). LLVM 19 will [canonicalize](llvm/llvm-project#68882) all constant-offset GEPs to i8, which has roughly the same effect as this change. Fixes rust-lang#121719. Split out from rust-lang#121577. r? `@nikic`

erikdesjardins · 2024-03-06T00:00:18Z

This change was landed in #121665, so this PR can be closed.

rustc_codegen_ssa: don't use LLVM struct types for field offsets.

67d2c9f

rust-highfive assigned lcnr Jun 28, 2022

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jun 28, 2022

This comment was marked as outdated.

Sign in to view

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 28, 2022

This comment was marked as resolved.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 28, 2022

This comment was marked as outdated.

Sign in to view

rust-highfive assigned jackh726 and unassigned lcnr Jun 28, 2022

nikic reviewed Jun 28, 2022

View reviewed changes

compiler/rustc_codegen_ssa/src/mir/place.rs Show resolved Hide resolved

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jun 28, 2022

nikic mentioned this pull request Sep 1, 2022

Regression: Inlined loops not fully evaluating with const-eable dependencies #101082

Closed

eddyb mentioned this pull request Dec 4, 2022

non-power-of-2 Simd types have wrong size rust-lang/portable-simd#319

Closed

pnkfelix reviewed Dec 16, 2022

View reviewed changes

compiler/rustc_codegen_ssa/src/mir/place.rs Outdated Show resolved Hide resolved

Update compiler/rustc_codegen_ssa/src/mir/place.rs

4d803a0

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 17, 2022

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 17, 2022

Dylan-DPC removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 21, 2023

nikic mentioned this pull request Jan 2, 2024

Separate immediate and in-memory ScalarPair representation #118991

Merged

erikdesjardins mentioned this pull request Feb 25, 2024

Stop using LLVM struct types for alloca, byval, sret, and many GEPs #121577

Closed

erikdesjardins mentioned this pull request Feb 27, 2024

Always generate GEP i8 / ptradd for struct offsets #121665

Merged

nikic closed this Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rustc_codegen_ssa: don't use LLVM struct types for field offsets. #98615

rustc_codegen_ssa: don't use LLVM struct types for field offsets. #98615

eddyb commented Jun 28, 2022 •

edited

Loading

rust-highfive commented Jun 28, 2022

rustbot commented Jun 28, 2022

This comment was marked as outdated.

eddyb commented Jun 28, 2022

This comment was marked as resolved.

rust-timer commented Jun 28, 2022

bors commented Jun 28, 2022

This comment was marked as outdated.

eddyb commented Jun 28, 2022

nikic commented Jun 28, 2022

nikic commented Jun 28, 2022

bors commented Jun 28, 2022

rust-timer commented Jun 28, 2022

rust-timer commented Jun 28, 2022

eddyb commented Jun 28, 2022

lqd commented Jun 28, 2022

antoyo commented Jun 28, 2022

eddyb commented Jun 28, 2022

eddyb commented Jun 29, 2022

nikic commented Sep 9, 2022

Kobzol commented Sep 27, 2022

nikic commented Dec 17, 2022

This comment has been minimized.

bors commented Dec 17, 2022

bors commented Dec 17, 2022

This comment has been minimized.

rust-timer commented Dec 17, 2022

erikdesjardins commented Jan 7, 2023 •

edited

Loading

nikic commented Mar 11, 2023

WaffleLapkin commented Mar 12, 2023

Dylan-DPC commented May 14, 2023 •

edited

Loading

erikdesjardins commented Mar 6, 2024

rustc_codegen_ssa: don't use LLVM struct types for field offsets. #98615

rustc_codegen_ssa: don't use LLVM struct types for field offsets. #98615

Conversation

eddyb commented Jun 28, 2022 • edited Loading

rust-highfive commented Jun 28, 2022

rustbot commented Jun 28, 2022

This comment was marked as outdated.

eddyb commented Jun 28, 2022

This comment was marked as resolved.

rust-timer commented Jun 28, 2022

bors commented Jun 28, 2022

This comment was marked as outdated.

eddyb commented Jun 28, 2022

nikic commented Jun 28, 2022

nikic commented Jun 28, 2022

bors commented Jun 28, 2022

rust-timer commented Jun 28, 2022

rust-timer commented Jun 28, 2022

Instruction count

Max RSS (memory usage)

Cycles

Footnotes

eddyb commented Jun 28, 2022

lqd commented Jun 28, 2022

antoyo commented Jun 28, 2022

eddyb commented Jun 28, 2022

eddyb commented Jun 29, 2022

nikic commented Sep 9, 2022

Kobzol commented Sep 27, 2022

nikic commented Dec 17, 2022

This comment has been minimized.

bors commented Dec 17, 2022

bors commented Dec 17, 2022

This comment has been minimized.

rust-timer commented Dec 17, 2022

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Instruction count

Max RSS (memory usage)

Cycles

erikdesjardins commented Jan 7, 2023 • edited Loading

nikic commented Mar 11, 2023

WaffleLapkin commented Mar 12, 2023

Dylan-DPC commented May 14, 2023 • edited Loading

erikdesjardins commented Mar 6, 2024

eddyb commented Jun 28, 2022 •

edited

Loading

erikdesjardins commented Jan 7, 2023 •

edited

Loading

Dylan-DPC commented May 14, 2023 •

edited

Loading