Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unwrapping a Result<u32,!> produces non-trivial code #43278

Closed
scottmcm opened this issue Jul 17, 2017 · 7 comments · Fixed by #45225
Closed

Unwrapping a Result<u32,!> produces non-trivial code #43278

scottmcm opened this issue Jul 17, 2017 · 7 comments · Fixed by #45225
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@scottmcm
Copy link
Member

fn foo(x: Result<u32,!>) -> u32 { x.unwrap() }

compiles in release to

define i32 @foo(i64) unnamed_addr #1 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
start:
  %1 = and i64 %0, 4294967295
  %cond.i = icmp eq i64 %1, 0
  br i1 %cond.i, label %"_ZN47_$LT$core..result..Result$LT$T$C$$u20$E$GT$$GT$6unwrap17h6dfb0b0940cfa89bE.exit", label %bb3.i

bb3.i:                                            ; preds = %start
; call core::result::unwrap_failed
  tail call fastcc void @_ZN4core6result13unwrap_failed17h70f49da6467af679E()
  unreachable

"_ZN47_$LT$core..result..Result$LT$T$C$$u20$E$GT$$GT$6unwrap17h6dfb0b0940cfa89bE.exit": ; preds = %start
  %self.sroa.4.0.extract.shift.i = lshr i64 %0, 32
  %self.sroa.4.0.extract.trunc.i = trunc i64 %self.sroa.4.0.extract.shift.i to i32
  ret i32 %self.sroa.4.0.extract.trunc.i
}

given that the Err variant is !, the type system information should be sufficient for it to become just

  ret i32 %0

Playground repro: https://play.rust-lang.org/?gist=af0170fb612114692ff35dab58c81615&version=nightly&mode=release

@Mark-Simulacrum Mark-Simulacrum added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Jul 19, 2017
@nagisa
Copy link
Member

nagisa commented Jul 21, 2017

This is generally applicable to all enums, structs etc with uninhabitable fields.

@scottmcm
Copy link
Member Author

scottmcm commented Jul 21, 2017

Looks like there are two distinct improvement opportunities here:

  • enum layout is storing a discriminant even when none is needed. (One- and zero-variant enums correctly don't store a discriminant, but both Result<T, !> and Result<!, !> do.)
  • the monomorphized matching in unwrap is unaware that the uninhabited variant is unreachable, and thus is unable to remove the dead code in the arm.

Should I split the issue?

@SimonSapin
Copy link
Contributor

Doesn’t fixing the former require fixing the latter? While the end result is two different kinds of improvement, they’re very related.

@nagisa
Copy link
Member

nagisa commented Jul 22, 2017

Neither codegen nor layout are really concerned with this issue. Match MIR generation should simply be made aware of the impossibility of the variants. That being said, there’s a caveat that involves unsafe code guidelines. i.e. I’m not aware of any decision to truly consider variants with never type in them as “unused discriminants”.

@scottmcm
Copy link
Member Author

Well, both of these generate the expected branchless code:

fn foo2(x: Result<u32,!>) -> u32 { 
    match x {
        Ok(a) => a,
    }
}

fn foo3(x: Result<u32,!>) -> u32 { 
    match x {
        Ok(a) => a,
        Err(_) => panic!(),
    }
}

Which implies to me that the never type variant is already an unused discriminant. (I updated my previous post to note that it's the monomorphized unwrap that's the problem, not match codegen in general.)

@SimonSapin I figured it'd be possible to update the MIR discriminant and as operations to account for different layout without actually telling the optimizer enough to remove the dead branch. I could easily be wrong, though.

@Mark-Simulacrum Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 26, 2017
@dhardy
Copy link
Contributor

dhardy commented Aug 29, 2017

Another workaround (not mine):

pub fn foo(x: Result<u32,!>) -> u32 { x.unwrap_or_else(|e| e) }

There's still some shift and truncate instructions (Result<u32, !> is 64-bits?), but no branching.

I did some benchmarking using this and found no overhead using Result<T, !> compared to T in this case.

bors added a commit that referenced this issue Oct 14, 2017
Refactor type memory layouts and ABIs, to be more general and easier to optimize.

To combat combinatorial explosion, type layouts are now described through 3 orthogonal properties:
* `Variants` describes the plurality of sum types (where applicable)
  * `Single` is for one inhabited/active variant, including all C `struct`s and `union`s
  * `Tagged` has its variants discriminated by an integer tag, including C `enum`s
  * `NicheFilling` uses otherwise-invalid values ("niches") for all but one of its inhabited variants
* `FieldPlacement` describes the number and memory offsets of fields (if any)
  * `Union` has all its fields at offset `0`
  * `Array` has offsets that are a multiple of its `stride`; guarantees all fields have one type
  * `Arbitrary` records all the field offsets, which can be out-of-order
* `Abi` describes how values of the type should be passed around, including for FFI
  * `Uninhabited` corresponds to no values, associated with unreachable control-flow
  * `Scalar` is ABI-identical to its only integer/floating-point/pointer "scalar component"
  * `ScalarPair` has two "scalar components", but only applies to the Rust ABI
  * `Vector` is for SIMD vectors, typically `#[repr(simd)]` `struct`s in Rust
  * `Aggregate` has arbitrary contents, including all non-transparent C `struct`s and `union`s

Size optimizations implemented so far:
* ignoring uninhabited variants (i.e. containing uninhabited fields), e.g.:
  * `Option<!>` is 0 bytes
  * `Result<T, !>` has the same size as `T`
* using arbitrary niches, not just `0`, to represent a data-less variant, e.g.:
  * `Option<bool>`, `Option<Option<bool>>`, `Option<Ordering>` are all 1 byte
  * `Option<char>` is 4 bytes
* using a range of niches to represent *multiple* data-less variants, e.g.:
  * `enum E { A(bool), B, C, D }` is 1 byte

Code generation now takes advantage of `Scalar` and `ScalarPair` to, in more cases, pass around scalar components as immediates instead of indirectly, through pointers into temporary memory, while avoiding LLVM's "first-class aggregates", and there's more untapped potential here.

Closes #44426, fixes #5977, fixes #14540, fixes #43278.
@scottmcm
Copy link
Member Author

Looks like this PR will fix both parts: #45225 (comment)

bors added a commit that referenced this issue Oct 29, 2017
Refactor type memory layouts and ABIs, to be more general and easier to optimize.

To combat combinatorial explosion, type layouts are now described through 3 orthogonal properties:
* `Variants` describes the plurality of sum types (where applicable)
  * `Single` is for one inhabited/active variant, including all C `struct`s and `union`s
  * `Tagged` has its variants discriminated by an integer tag, including C `enum`s
  * `NicheFilling` uses otherwise-invalid values ("niches") for all but one of its inhabited variants
* `FieldPlacement` describes the number and memory offsets of fields (if any)
  * `Union` has all its fields at offset `0`
  * `Array` has offsets that are a multiple of its `stride`; guarantees all fields have one type
  * `Arbitrary` records all the field offsets, which can be out-of-order
* `Abi` describes how values of the type should be passed around, including for FFI
  * `Uninhabited` corresponds to no values, associated with unreachable control-flow
  * `Scalar` is ABI-identical to its only integer/floating-point/pointer "scalar component"
  * `ScalarPair` has two "scalar components", but only applies to the Rust ABI
  * `Vector` is for SIMD vectors, typically `#[repr(simd)]` `struct`s in Rust
  * `Aggregate` has arbitrary contents, including all non-transparent C `struct`s and `union`s

Size optimizations implemented so far:
* ignoring uninhabited variants (i.e. containing uninhabited fields), e.g.:
  * `Option<!>` is 0 bytes
  * `Result<T, !>` has the same size as `T`
* using arbitrary niches, not just `0`, to represent a data-less variant, e.g.:
  * `Option<bool>`, `Option<Option<bool>>`, `Option<Ordering>` are all 1 byte
  * `Option<char>` is 4 bytes
* using a range of niches to represent *multiple* data-less variants, e.g.:
  * `enum E { A(bool), B, C, D }` is 1 byte

Code generation now takes advantage of `Scalar` and `ScalarPair` to, in more cases, pass around scalar components as immediates instead of indirectly, through pointers into temporary memory, while avoiding LLVM's "first-class aggregates", and there's more untapped potential here.

Closes #44426, fixes #5977, fixes #14540, fixes #43278.
bors added a commit that referenced this issue Nov 13, 2017
Refactor type memory layouts and ABIs, to be more general and easier to optimize.

To combat combinatorial explosion, type layouts are now described through 3 orthogonal properties:
* `Variants` describes the plurality of sum types (where applicable)
  * `Single` is for one inhabited/active variant, including all C `struct`s and `union`s
  * `Tagged` has its variants discriminated by an integer tag, including C `enum`s
  * `NicheFilling` uses otherwise-invalid values ("niches") for all but one of its inhabited variants
* `FieldPlacement` describes the number and memory offsets of fields (if any)
  * `Union` has all its fields at offset `0`
  * `Array` has offsets that are a multiple of its `stride`; guarantees all fields have one type
  * `Arbitrary` records all the field offsets, which can be out-of-order
* `Abi` describes how values of the type should be passed around, including for FFI
  * `Uninhabited` corresponds to no values, associated with unreachable control-flow
  * `Scalar` is ABI-identical to its only integer/floating-point/pointer "scalar component"
  * `ScalarPair` has two "scalar components", but only applies to the Rust ABI
  * `Vector` is for SIMD vectors, typically `#[repr(simd)]` `struct`s in Rust
  * `Aggregate` has arbitrary contents, including all non-transparent C `struct`s and `union`s

Size optimizations implemented so far:
* ignoring uninhabited variants (i.e. containing uninhabited fields), e.g.:
  * `Option<!>` is 0 bytes
  * `Result<T, !>` has the same size as `T`
* using arbitrary niches, not just `0`, to represent a data-less variant, e.g.:
  * `Option<bool>`, `Option<Option<bool>>`, `Option<Ordering>` are all 1 byte
  * `Option<char>` is 4 bytes
* using a range of niches to represent *multiple* data-less variants, e.g.:
  * `enum E { A(bool), B, C, D }` is 1 byte

Code generation now takes advantage of `Scalar` and `ScalarPair` to, in more cases, pass around scalar components as immediates instead of indirectly, through pointers into temporary memory, while avoiding LLVM's "first-class aggregates", and there's more untapped potential here.

Closes #44426, fixes #5977, fixes #14540, fixes #43278.
bors added a commit that referenced this issue Nov 17, 2017
Short-circuiting internal iteration with Iterator::try_fold & try_rfold

These are the core methods in terms of which the other methods (`fold`, `all`, `any`, `find`, `position`, `nth`, ...) can be implemented, allowing Iterator implementors to get the full goodness of internal iteration by only overriding one method (per direction).

Based off the `Try` trait, so works with both `Result` and `Option` (:tada: #42526).  The `try_fold` rustdoc examples use `Option` and the `try_rfold` ones use `Result`.

AKA continuing in the vein of PRs #44682 & #44856 for more of `Iterator`.

New bench following the pattern from the latter of those:
```
test iter::bench_take_while_chain_ref_sum          ... bench:   1,130,843 ns/iter (+/- 25,110)
test iter::bench_take_while_chain_sum              ... bench:     362,530 ns/iter (+/- 391)
```

I also ran the benches without the `fold` & `rfold` overrides to test their new default impls, with basically no change.  I left them there, though, to take advantage of existing overrides and because `AlwaysOk` has some sub-optimality due to #43278 (which 45225 should fix).

If you're wondering why there are three type parameters, see issue #45462

Thanks for @bluss for the [original IRLO thread](https://internals.rust-lang.org/t/pre-rfc-fold-ok-is-composable-internal-iteration/4434) and the rfold PR and to @cuviper for adding so many folds, [encouraging me](#45379 (comment)) to make this PR, and finding a catastrophic bug in a pre-review.
bors added a commit that referenced this issue Nov 19, 2017
Refactor type memory layouts and ABIs, to be more general and easier to optimize.

To combat combinatorial explosion, type layouts are now described through 3 orthogonal properties:
* `Variants` describes the plurality of sum types (where applicable)
  * `Single` is for one inhabited/active variant, including all C `struct`s and `union`s
  * `Tagged` has its variants discriminated by an integer tag, including C `enum`s
  * `NicheFilling` uses otherwise-invalid values ("niches") for all but one of its inhabited variants
* `FieldPlacement` describes the number and memory offsets of fields (if any)
  * `Union` has all its fields at offset `0`
  * `Array` has offsets that are a multiple of its `stride`; guarantees all fields have one type
  * `Arbitrary` records all the field offsets, which can be out-of-order
* `Abi` describes how values of the type should be passed around, including for FFI
  * `Uninhabited` corresponds to no values, associated with unreachable control-flow
  * `Scalar` is ABI-identical to its only integer/floating-point/pointer "scalar component"
  * `ScalarPair` has two "scalar components", but only applies to the Rust ABI
  * `Vector` is for SIMD vectors, typically `#[repr(simd)]` `struct`s in Rust
  * `Aggregate` has arbitrary contents, including all non-transparent C `struct`s and `union`s

Size optimizations implemented so far:
* ignoring uninhabited variants (i.e. containing uninhabited fields), e.g.:
  * `Option<!>` is 0 bytes
  * `Result<T, !>` has the same size as `T`
* using arbitrary niches, not just `0`, to represent a data-less variant, e.g.:
  * `Option<bool>`, `Option<Option<bool>>`, `Option<Ordering>` are all 1 byte
  * `Option<char>` is 4 bytes
* using a range of niches to represent *multiple* data-less variants, e.g.:
  * `enum E { A(bool), B, C, D }` is 1 byte

Code generation now takes advantage of `Scalar` and `ScalarPair` to, in more cases, pass around scalar components as immediates instead of indirectly, through pointers into temporary memory, while avoiding LLVM's "first-class aggregates", and there's more untapped potential here.

Closes #44426, fixes #5977, fixes #14540, fixes #43278.
bors added a commit that referenced this issue Jun 13, 2018
Replace `core::iter::AlwaysOk<T>` by `Result<T, !>`

#43278 has been fixed, so we don't need this struct anymore.

(Actually we don't even need `.unwrap()` thanks to `#![feature(exhaustive_patterns)]`)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants