Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor type memory layouts and ABIs, to be more general and easier to optimize. #45225

Merged
merged 69 commits into from
Nov 20, 2017

Conversation

eddyb
Copy link
Member

@eddyb eddyb commented Oct 12, 2017

To combat combinatorial explosion, type layouts are now described through 3 orthogonal properties:

  • Variants describes the plurality of sum types (where applicable)
    • Single is for one inhabited/active variant, including all C structs and unions
    • Tagged has its variants discriminated by an integer tag, including C enums
    • NicheFilling uses otherwise-invalid values ("niches") for all but one of its inhabited variants
  • FieldPlacement describes the number and memory offsets of fields (if any)
    • Union has all its fields at offset 0
    • Array has offsets that are a multiple of its stride; guarantees all fields have one type
    • Arbitrary records all the field offsets, which can be out-of-order
  • Abi describes how values of the type should be passed around, including for FFI
    • Uninhabited corresponds to no values, associated with unreachable control-flow
    • Scalar is ABI-identical to its only integer/floating-point/pointer "scalar component"
    • ScalarPair has two "scalar components", but only applies to the Rust ABI
    • Vector is for SIMD vectors, typically #[repr(simd)] structs in Rust
    • Aggregate has arbitrary contents, including all non-transparent C structs and unions

Size optimizations implemented so far:

  • ignoring uninhabited variants (i.e. containing uninhabited fields), e.g.:
    • Option<!> is 0 bytes
    • Result<T, !> has the same size as T
  • using arbitrary niches, not just 0, to represent a data-less variant, e.g.:
    • Option<bool>, Option<Option<bool>>, Option<Ordering> are all 1 byte
    • Option<char> is 4 bytes
  • using a range of niches to represent multiple data-less variants, e.g.:
    • enum E { A(bool), B, C, D } is 1 byte

Code generation now takes advantage of Scalar and ScalarPair to, in more cases, pass around scalar components as immediates instead of indirectly, through pointers into temporary memory, while avoiding LLVM's "first-class aggregates", and there's more untapped potential here.

Closes #44426, fixes #5977, fixes #14540, fixes #43278.

@rust-highfive
Copy link
Collaborator

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@eddyb
Copy link
Member Author

eddyb commented Oct 12, 2017

Started a crater run (we'll run cargobomb later, but that takes longer).

@Gankra
Copy link
Contributor

Gankra commented Oct 12, 2017

holy shit

@scottmcm
Copy link
Member

Wow! I think this might fix #43278 too.

} else if let layout::Abi::ScalarPair(ref a, ref b) = self.layout.abi {
// Offsets have to match either first or second field.
assert_eq!(offset, a.value.size(ccx).abi_align(b.value.align(ccx)));
bcx.struct_gep(self.llval, 1)
Copy link
Contributor

@arielb1 arielb1 Oct 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this always 1? Because you do a pointercast instead of a gep in the offset 0 case?

Copy link
Member Author

@eddyb eddyb Oct 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and the assert ensures it.

@@ -1,14 +0,0 @@
// Copyright 2013 The Rust Project Developers. See the COPYRIGHT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you removing this test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of those things that don't trigger a typeck error but is totally broken, and I didn't want to deal with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "totally broken" mean in this context?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something hid the ICE and #30276 was closed, but that was accidental. This is more or less a bug in WF-checking of function arguments in signatures, which @nikomatsakis somewhat insisted on keeping around, which allows fn(Unsized) to even exist (it shouldn't, for now).

@@ -54,9 +54,6 @@ pub struct PackedPair(u8, u32);
// CHECK-LABEL: @pkd_pair
#[no_mangle]
pub fn pkd_pair(pair1: &mut PackedPair, pair2: &mut PackedPair) {
// CHECK: [[V1:%[a-z0-9]+]] = load i8, i8* %{{.*}}, align 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the case this test was intended to check unreachable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, yeah, ScalarPair only works by having the placement of the second scalar at an ABI-aligned offset, so it's incompatible with packing.

}
/// Describes how the fields of a type are located in memory.
#[derive(PartialEq, Eq, Hash, Debug)]
pub enum FieldPlacement {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do "general" enums use FieldPlacement::Union for the discriminant only and "niche" enums use FieldPlacement::Arbitrary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because a tag is always at offset 0 whereas a niche can be at some offset - we could have a variant of FieldPlacement that's just one Size to avoid the Vec but that's about it.

Copy link
Contributor

@arielb1 arielb1 Oct 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why we can't use FieldPlacement::Arbitrary for both of them? "Normal" structs with one field don't suddenly use FieldPlacement::Union because they can.

Copy link
Member Author

@eddyb eddyb Oct 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, FieldPlacement::Arbitrary is a bit of a waste, and I considered using FieldPlacement::Union for normal structs too, but then I have checks for Union in some places which would conflict with that now. I think you're right and I should make the change here.

EDIT: done.

EDIT2: That's funny, not using FieldPlacement::Union results in some missed optimizations (codegen/{alloc-optimisation,vec-optimizes-away}). I'll try to see what I can do about it.

EDIT3: None of the tricks I've tried work, I'll just leave it like this to avoid pessimizing tagged enums for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment to that effect?

@eddyb eddyb force-pushed the trans-abi branch 2 times, most recently from c0e7215 to 08eb7c7 Compare October 12, 2017 23:01
@eddyb
Copy link
Member Author

eddyb commented Oct 12, 2017

I've just moved Uninhabited from Variants to Abi - that way, we can still compute field types for uninhabited types, but also signal the fact that there's no valid values anyway.

@eddyb eddyb force-pushed the trans-abi branch 2 times, most recently from a5b5722 to d595119 Compare October 12, 2017 23:51
@eddyb
Copy link
Member Author

eddyb commented Oct 12, 2017

@scottmcm Your original example from #43278 compiles to a single ret i32 %0 with this PR.

@mrhota
Copy link
Contributor

mrhota commented Oct 13, 2017

@eddyb also would seem to fix #5977

any relevance to #14540? #6791? And am I right in thinking this work puts us on a path for doing some of the optimizations here rust-lang/rfcs#1230?

For travelers from the future, also see the older, far less ambitious PRs #31215 and #36237.

@bors
Copy link
Contributor

bors commented Oct 13, 2017

☔ The latest upstream changes (presumably #45031) made this pull request unmergeable. Please resolve the merge conflicts.

@eddyb
Copy link
Member Author

eddyb commented Oct 13, 2017

@mrhota Are you sure you meant #6791? That issue was closed more than three years ago.

The problem with #31215 and #36237 was that they only added code where this PR is a net negative.

This PR implements most of rust-lang/rfcs#1230, the description there is confused in places (i.e. it's a list of special cases, but several of them happen to be the exact same optimization, just scattered).
The cases I couldn't do here:

  • Option<Option<(&T, &U)>> requires us to initialize the second &U with a non-zero value for the inner Option, which we don't do right now (and could be expensive in general)
  • enum E { A(&T / Enum / etc., A...), B(B...), C(C...) } (where A... takes up at least as much space as each of B... and C...) requires trickier search logic, and would be hard/impossible to integrate with the current rustc_trans::mir::constant, we'd have to wait for miri to experiment with it
  • enum F { A(T, A), B(T, B) } as (T, enum G { A(A), B(B) }) where we already are able to optimize G - this includes enum Foo { A(&u8, &u8), B(&u8) } and many others - requires effectively a 2D layout process (although, just like the previous case, it's tricky and there may be many valid approaches) and will likely need a bunch more caching, around niches probably

EDIT: for an updated view on these hard cases, see rust-lang/rfcs#1230 (comment).

My approach with this PR has been to avoid doing anything that would pessimize codegen, and one example of this is that enum X { A(bool), B, C } and enum Y { A, B, C(bool) } use (invalid for bool) values 2 and 3 for the nullary variants, but enum Z { A, B(bool), C } uses 2 and 4, wasting 3, which (if you were to transmute it) matches B, to avoid a more runtime-expensive value remapping when the discriminant value is needed (e.g. for every match).

@kennytm kennytm added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Oct 13, 2017
@eddyb
Copy link
Member Author

eddyb commented Oct 13, 2017

I can't reproduce the Travis failure locally, even after commenting out most of my config.toml 😞

@eddyb
Copy link
Member Author

eddyb commented Oct 13, 2017

@alexcrichton @Mark-Simulacrum I hope you don't mind that I just pushed a temporary commit to build on Travis without LLVM 3.7, to check if it's some old LLVM bug getting triggered.

@eddyb eddyb force-pushed the trans-abi branch 2 times, most recently from 4c4d081 to 5d92b1a Compare October 13, 2017 17:40
@eddyb
Copy link
Member Author

eddyb commented Oct 13, 2017

Confirmed successful build with newer (4.0?) LLVM. I'll try to reproduce locally.

@eddyb
Copy link
Member Author

eddyb commented Oct 13, 2017

@alexcrichton @Mark-Simulacrum May I suggest outputting a signal number by default?
Currently, all I see (for example, in Travis CI) is:

error: Could not compile `quote`.

But if I turn verbose mode on, there's an extra (manually wrapped for your convenience):

Caused by:
  process didn't exit successfully: `
/home/eddy/Projects/rust-2/build/bootstrap/debug/rustc --crate-name quote
/home/eddy/.cargo/registry/src/github.com-1ecc6299db9ec823/quote-0.3.15/src/lib.rs
--crate-type lib --emit=dep-info,link -C opt-level=2 -C metadata=396e074f31421a34
-C extra-filename=-396e074f31421a34
--out-dir /home/eddy/Projects/rust-2/build/x86_64-unknown-linux-gnu/stage2-tools/release/deps
-L dependency=/home/eddy/Projects/rust-2/build/x86_64-unknown-linux-gnu/stage2-tools/release/deps
--cap-lints allow` (signal: 11, SIGSEGV: invalid memory reference)

Printing just "rustc exited with signal: 11, SIGSEGV: invalid memory reference" by default would be really great, although I'd understand if it's too much of an edge case to bother with.

tromey added a commit to tromey/rust that referenced this pull request Sep 21, 2018
The DWARF generated for Rust enums was always somewhat unusual.
Rather than using DWARF constructs directly, it would emit magic field
names like "RUST$ENCODED$ENUM$0$Name" and "RUST$ENUM$DISR".  Since
PR rust-lang#45225, though, even this has not worked -- the ad hoc scheme was
not updated to handle the wider variety of niche-filling layout
optimizations now available.

This patch changes the generated DWARF to use the standard tags meant
for this purpose; namely, DW_TAG_variant and DW_TAG_variant_part.

The patch to implement this went in to LLVM 7.  In order to work with
older versions of LLVM, and because LLVM doesn't do anything here for
PDB, the existing code is kept as a fallback mode.

Support for this DWARF is in the Rust lldb and in gdb 8.2.

Closes rust-lang#32920
Closes rust-lang#32924
Closes rust-lang#52762
Closes rust-lang#53153
bors added a commit that referenced this pull request Sep 23, 2018
Fix DWARF generation for enums

The DWARF generated for Rust enums was always somewhat unusual.
Rather than using DWARF constructs directly, it would emit magic field
names like "RUST$ENCODED$ENUM$0$Name" and "RUST$ENUM$DISR".  Since
PR #45225, though, even this has not worked -- the ad hoc scheme was
not updated to handle the wider variety of niche-filling layout
optimizations now available.

This patch changes the generated DWARF to use the standard tags meant
for this purpose; namely, DW_TAG_variant and DW_TAG_variant_part.

The patch to implement this went in to LLVM 7.  In order to work with
older versions of LLVM, and because LLVM doesn't do anything here for
PDB, the existing code is kept as a fallback mode.

Support for this DWARF is in the Rust lldb and in gdb 8.2.

Closes #32920
Closes #32924
Closes #52762
Closes #53153
tromey added a commit to tromey/rust that referenced this pull request Oct 1, 2018
The DWARF generated for Rust enums was always somewhat unusual.
Rather than using DWARF constructs directly, it would emit magic field
names like "RUST$ENCODED$ENUM$0$Name" and "RUST$ENUM$DISR".  Since
PR rust-lang#45225, though, even this has not worked -- the ad hoc scheme was
not updated to handle the wider variety of niche-filling layout
optimizations now available.

This patch changes the generated DWARF to use the standard tags meant
for this purpose; namely, DW_TAG_variant and DW_TAG_variant_part.

The patch to implement this went in to LLVM 7.  In order to work with
older versions of LLVM, and because LLVM doesn't do anything here for
PDB, the existing code is kept as a fallback mode.

Support for this DWARF is in the Rust lldb and in gdb 8.2.

Closes rust-lang#32920
Closes rust-lang#32924
Closes rust-lang#52762
Closes rust-lang#53153
bors added a commit that referenced this pull request Oct 1, 2018
Fix DWARF generation for enums

The DWARF generated for Rust enums was always somewhat unusual.
Rather than using DWARF constructs directly, it would emit magic field
names like "RUST$ENCODED$ENUM$0$Name" and "RUST$ENUM$DISR".  Since
PR #45225, though, even this has not worked -- the ad hoc scheme was
not updated to handle the wider variety of niche-filling layout
optimizations now available.

This patch changes the generated DWARF to use the standard tags meant
for this purpose; namely, DW_TAG_variant and DW_TAG_variant_part.

The patch to implement this went in to LLVM 7.  In order to work with
older versions of LLVM, and because LLVM doesn't do anything here for
PDB, the existing code is kept as a fallback mode.

Support for this DWARF is in the Rust lldb and in gdb 8.2.

Closes #32920
Closes #32924
Closes #52762
Closes #53153
tromey added a commit to tromey/rust that referenced this pull request Oct 30, 2018
The DWARF generated for Rust enums was always somewhat unusual.
Rather than using DWARF constructs directly, it would emit magic field
names like "RUST$ENCODED$ENUM$0$Name" and "RUST$ENUM$DISR".  Since
PR rust-lang#45225, though, even this has not worked -- the ad hoc scheme was
not updated to handle the wider variety of niche-filling layout
optimizations now available.

This patch changes the generated DWARF to use the standard tags meant
for this purpose; namely, DW_TAG_variant and DW_TAG_variant_part.

The patch to implement this went in to LLVM 7.  In order to work with
older versions of LLVM, and because LLVM doesn't do anything here for
PDB, the existing code is kept as a fallback mode.

Support for this DWARF is in the Rust lldb and in gdb 8.2.

Closes rust-lang#32920
Closes rust-lang#32924
Closes rust-lang#52762
Closes rust-lang#53153
bors added a commit that referenced this pull request Oct 30, 2018
Fix DWARF generation for enums

The DWARF generated for Rust enums was always somewhat unusual.
Rather than using DWARF constructs directly, it would emit magic field
names like "RUST$ENCODED$ENUM$0$Name" and "RUST$ENUM$DISR".  Since
PR #45225, though, even this has not worked -- the ad hoc scheme was
not updated to handle the wider variety of niche-filling layout
optimizations now available.

This patch changes the generated DWARF to use the standard tags meant
for this purpose; namely, DW_TAG_variant and DW_TAG_variant_part.

The patch to implement this went in to LLVM 7.  In order to work with
older versions of LLVM, and because LLVM doesn't do anything here for
PDB, the existing code is kept as a fallback mode.

Support for this DWARF is in the Rust lldb and in gdb 8.2.

Closes #32920
Closes #32924
Closes #52762
Closes #53153
siretty pushed a commit to siretty/llvm-project that referenced this pull request Feb 17, 2019
n Rust, an enum that carries data in the variants is, essentially, a
discriminated union. Furthermore, the Rust compiler will perform
space optimizations on such enums in some situations. Previously,
DWARF for these constructs was emitted using a hack (a magic field
name); but this approach stopped working when more space optimizations
were added in rust-lang/rust#45225.

This patch changes LLVM to allow discriminated unions to be
represented in DWARF. It adds createDiscriminatedUnionType and
createDiscriminatedMemberType to DIBuilder and then arranges for this
to be emitted using DWARF's DW_TAG_variant_part and DW_TAG_variant.

Note that DWARF requires that a discriminated union be represented as
a structure with a variant part. However, as Rust only needs to emit
pure discriminated unions, this is what I chose to expose on
DIBuilder.

Patch by Tom Tromey!

Differential Revision: https://reviews.llvm.org/D42082

llvm-svn: 324426
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this pull request Oct 2, 2019
…tly Rust (from servo:stylo-size-of); r=emilio

This is on top of servo/servo#19285.

Rust Nightly has new enum memory layout optimizations: rust-lang/rust#45225

Source-Repo: https://github.com/servo/servo
Source-Revision: 17e97b9320fdb7cdb33bbc5f4d0fde0653bbf2e4

UltraBlame original commit: 6de571030d5d998dcadbd3dac602fa006395165c
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this pull request Oct 2, 2019
…tly Rust (from servo:stylo-size-of); r=emilio

This is on top of servo/servo#19285.

Rust Nightly has new enum memory layout optimizations: rust-lang/rust#45225

Source-Repo: https://github.com/servo/servo
Source-Revision: 17e97b9320fdb7cdb33bbc5f4d0fde0653bbf2e4

UltraBlame original commit: 6de571030d5d998dcadbd3dac602fa006395165c
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this pull request Oct 2, 2019
…tly Rust (from servo:stylo-size-of); r=emilio

This is on top of servo/servo#19285.

Rust Nightly has new enum memory layout optimizations: rust-lang/rust#45225

Source-Repo: https://github.com/servo/servo
Source-Revision: 17e97b9320fdb7cdb33bbc5f4d0fde0653bbf2e4

UltraBlame original commit: 6de571030d5d998dcadbd3dac602fa006395165c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet