Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missed optimization: nonnull assumptions cannot reason over select #48319

Closed
ojeda opened this issue Jan 31, 2021 · 11 comments
Closed

Missed optimization: nonnull assumptions cannot reason over select #48319

ojeda opened this issue Jan 31, 2021 · 11 comments
Assignees
Labels
bugzilla Issues migrated from bugzilla

Comments

@ojeda
Copy link

ojeda commented Jan 31, 2021

Bugzilla Link 48975
Version trunk
OS All
CC @davidbolvansky,@jdm,@aqjune,@nikic,@rotateright

Extended Description

LLVM fails to optimize:

define nonnull i32* @​f(i32** %0) {
start:
    %1 = load i32*, i32** %0, align 8
    %2 = icmp eq i32* %1, null
    %3 = bitcast i32** %0 to i32*
    %4 = select i1 %2, i32* null, i32* %3
    ; %assume = icmp ne i32* %4, null
    ; call void @​llvm.assume(i1 %assume)
    ret i32* %4
}

into:

define nonnull i32* @​f(i32** %0) {
start:
    %1 = bitcast i32** %0 to i32*
    ret i32* %1
}

which Alive2 confirms as valid (i.e. return type is nonnull => %4 is nonnull => the second branch of select was taken => %4 is %3).

Uncommenting the explicit assume (instead of relying on the nonnull return attribute) doesn't help either. However, providing the assume directly on %2 or a !nonnull on the load both work:

define i32* @​src(i32** %0) {
start:
    %1 = load i32*, i32** %0, align 8, !nonnull !{}
    %2 = icmp eq i32* %1, null
    ; %assume = xor i1 %2, true
    ; call void @​llvm.assume(i1 %assume)
    %3 = bitcast i32** %0 to i32*
    %4 = select i1 %2, i32* null, i32* %3
    ret i32* %4
}

Thus it would seem like LLVM is not realizing that if %4 is nonnull, then the select must come from the false branch, which implies %1 is false.

This is a reduction from Rust code such as:

#![feature(option_result_unwrap_unchecked)]
pub struct V(Vec<u8>);

pub unsafe fn f(x: &mut Option<V>) -> &mut V {
    x.as_mut().unwrap_unchecked()
}

There, the unwrap_unchecked() provides the nonnull assumption, while as_mut() generates the select. However, if one manually expands the Rust code, LLVM finds the optimization, though:

pub struct V(Vec<u8>);

pub unsafe fn f(x: &mut Option<V>) -> &mut V {
    let y = match x {
        Some(ref mut v) => Some(v),
        None => None,
    };

    match y {
        Some(v) => v,
        None => core::hint::unreachable_unchecked(),
    }
}

because this ends up at:

*** IR Dump After SROA ***
; Function Attrs: norecurse nounwind nonlazybind readonly uwtable
define align 8 dereferenceable(24) %X* @&#8203;_ZN7example1f17h543f6e99cb8f036dE(%"std::option::Option<X>"* readonly align 8 dereferenceable(24) %0) unnamed_addr #&#8203;0 !dbg !&#8203;6 {
  %2 = bitcast %"std::option::Option<X>"* %0 to {}**, !dbg !&#8203;10
  %3 = load {}*, {}** %2, align 8, !dbg !&#8203;10
  %4 = icmp eq {}* %3, null, !dbg !&#8203;10
  %5 = getelementptr inbounds %"std::option::Option<X>", %"std::option::Option<X>"* %0, i64 0, i32 0, i64 0, !dbg !&#8203;10
  %6 = select i1 %4, i64* null, i64* %5, !dbg !&#8203;10
  %7 = icmp eq i64* %6, null, !dbg !&#8203;11
  br i1 %7, label %8, label %9, !dbg !&#8203;11

8:                                                ; preds = %1
  unreachable, !dbg !&#8203;12

9:                                                ; preds = %1
  %10 = bitcast i64* %6 to %X*, !dbg !&#8203;13
  ret %X* %10, !dbg !&#8203;14
}

*** IR Dump After Early CSE w/ MemorySSA ***
; Function Attrs: norecurse nounwind nonlazybind readonly uwtable
define align 8 dereferenceable(24) %X* @&#8203;_ZN7example1f17h543f6e99cb8f036dE(%"std::option::Option<X>"* readonly align 8 dereferenceable(24) %0) unnamed_addr #&#8203;0 !dbg !&#8203;6 {
  %2 = bitcast %"std::option::Option<X>"* %0 to {}**, !dbg !&#8203;10
  %3 = load {}*, {}** %2, align 8, !dbg !&#8203;10
  %4 = icmp eq {}* %3, null, !dbg !&#8203;10
  %5 = getelementptr inbounds %"std::option::Option<X>", %"std::option::Option<X>"* %0, i64 0, i32 0, i64 0, !dbg !&#8203;10
  %6 = select i1 %4, i64* null, i64* %5, !dbg !&#8203;10
  br i1 %4, label %7, label %8, !dbg !&#8203;11

7:                                                ; preds = %1
  unreachable, !dbg !&#8203;12

8:                                                ; preds = %1
  %9 = bitcast i64* %6 to %X*, !dbg !&#8203;13
  ret %X* %9, !dbg !&#8203;14
}

i.e. the branch on %7 (after the select) is transformed into a branch on %4 (before the select).

However, when Rust code uses unwrap_unchecked() and similar functions, LLVM optimizes those first, then integrates, but by then the branches aren't there anymore (similar to what is shown in the reduction), and the optimization above does not take place. Then, since it cannot reason over the select, the suboptimal code is generated.

In Rust, using macros instead of functions for those that contain unreachable() hints can be a workaround.

@ojeda
Copy link
Author

ojeda commented Jan 31, 2021

assigned to @aqjune

@nikic
Copy link
Contributor

nikic commented Jan 31, 2021

I believe the lack of an passingValueIsAlwaysUndefined()-style optimization for select came up in recent SimplifyCFG patch as well. Though here the desired semantics would be more along the lines of passingValueIsAlwaysPoison(), so as to cover the nonnull attribute case as well.

@aqjune
Copy link
Contributor

aqjune commented Feb 9, 2021

Would it make sense if the optimization we'd like to implement goes into InstCombine?
Selects can be directly introduced by the frontend language, and having the transformations in InstCombine will help optimize such code too.

@aqjune
Copy link
Contributor

aqjune commented Feb 14, 2021

https://reviews.llvm.org/D96663

BTW, is it okay to return an uninitialized value or pass such value to a function argument if 'unsafe' keyword is used in Rust?

If it is UB, it might be a good idea to attach noundef attributes to the return value and function arguments from the Rust side. It will bring more optimization opportunities to LLVM.

@ojeda
Copy link
Author

ojeda commented Feb 15, 2021

Thanks a lot! That was quick.

With the current semantics, AFAIK, all variables are assumed to be properly initialized, including parameters and return values. This is true regardless of unsafe (in Rust, unsafe does not change what is UB or not, it only unblocks some language features like dereferencing a pointer). One needs to use MaybeUninit to work with uninitialized memory.

So yeah, Rust is able to use noundef. Having said that, at least for this example, it wouldn't make a difference since dereferenceable implies noundef already, no?

@aqjune
Copy link
Contributor

aqjune commented Jun 21, 2021

It should be fixed now. Can you reproduce the fix?

@ojeda
Copy link
Author

ojeda commented Jun 22, 2021

Compiler Explorer has not picked up yet the new HEAD, I will confirm the fix as soon as that happens.

@ojeda
Copy link
Author

ojeda commented Jun 24, 2021

The compiler test added works in CE: https://godbolt.org/z/1vcj63xc8

As soon as rustc supports LLVM 13 I will build it locally to confirm the optimization works end-to-end.

By the way, what about the reduced example in the top? i.e.

define nonnull i32* @&#8203;f(i32** %0) {
start:
    %1 = load i32*, i32** %0, align 8
    %2 = icmp eq i32* %1, null
    %3 = bitcast i32** %0 to i32*
    %4 = select i1 %2, i32* null, i32* %3
    ret i32* %4
}

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
@ojeda
Copy link
Author

ojeda commented Dec 13, 2021

Starting with rustc 1.56.0 (which uses LLVM 13). it gets optimized properly now: https://godbolt.org/z/4T7xW1EqT.

@fhahn
Copy link
Contributor

fhahn commented Dec 13, 2021

Thanks for confirming. I'm closing the issue now, please feel free to reach out in case there's still an issue.

@fhahn fhahn closed this as completed Dec 13, 2021
@ojeda
Copy link
Author

ojeda commented Dec 13, 2021

It's nothing!

There is the pending question about the reduced example, but perhaps it should go into a new issue:

By the way, what about the reduced example in the top? i.e.

define nonnull i32* @&#8203;f(i32** %0) {
start:
    %1 = load i32*, i32** %0, align 8
    %2 = icmp eq i32* %1, null
    %3 = bitcast i32** %0 to i32*
    %4 = select i1 %2, i32* null, i32* %3
    ret i32* %4
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

4 participants