Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra null check in slice iterator's .next().is_none() #37945

Closed
bluss opened this issue Nov 22, 2016 · 19 comments
Closed

Extra null check in slice iterator's .next().is_none() #37945

bluss opened this issue Nov 22, 2016 · 19 comments
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@bluss
Copy link
Member

bluss commented Nov 22, 2016

This is an interesting codegen bug (Note that this reproduces in a special case setting and in many regular contexts, this problem does not exist).

Expected behavior:

This code should just be a pointer comparison (ptr == end leads to None being returned)

use std::slice::Iter;
pub fn is_empty_1(xs: Iter<f32>) -> bool {
    {xs}.next().is_none()
}

Actual behavior:

It compiles to the equivalent of ptr == end || ptr.is_null().

Playground link. Use release mode.

version: rustc 1.15.0-nightly (0bd2ce6 2016-11-19)

Additional notes:

This version has the expected codegen, without the null check.

pub fn is_empty_2(xs: Iter<f32>) -> bool {
    xs.map(|&x| x).next().is_none()
}
@bluss bluss changed the title Extra null check in slice iterator's .next() Extra null check in slice iterator's .next().is_none() Nov 22, 2016
@arielb1
Copy link
Contributor

arielb1 commented Nov 23, 2016

We just plain never emit nonnull metadata, don't we? Ah, we do have an assume. Need to figure out why it is being eaten.

@arielb1 arielb1 added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Nov 23, 2016
@bluss
Copy link
Member Author

bluss commented Nov 23, 2016

Assume shouldn't even be needed, should it. Creating a shared reference should be enough of a non-null assertion.

@arielb1
Copy link
Contributor

arielb1 commented Nov 23, 2016

It's SROA this time, eating !nonnull when it is on an alloca:

define zeroext i1 @sroa_fail(float**) {
entry-block:
  %buf0 = alloca float*
  %buf1 = alloca i8*

  %load0 = load float*, float** %0, align 8
  store float* %load0, float** %buf0

  %load1 = load float*, float** %buf0, align 8, !nonnull !0

  %bc = bitcast float* %load1 to i8*
  store i8* %bc, i8** %buf1, align 8

  %load2 = load i8*, i8** %buf1, align 8
  %ret = icmp eq i8* %load2, null
  ret i1 %ret
}

!0 = !{}

The first load/store pair comes from a lowered memcpy.

@arielb1
Copy link
Contributor

arielb1 commented Nov 23, 2016

@luqmana
Copy link
Member

luqmana commented Nov 24, 2016

I hacked together an LLVM patch that preserves the nonnull metadata on Loads in SROA/mem2reg: luqmana/llvm@da58c74

With that, the extraneous null check is gone:

define zeroext i1 @_ZN9sroa_fail10is_empty_117hfeb3b6a38ecff321E(%"core::slice::Iter<f32>"* noalias nocapture readonly dereferenceable(16)) unnamed_addr #0 {
entry-block:
  %xs.sroa.0.0..sroa_idx = getelementptr inbounds %"core::slice::Iter<f32>", %"core::slice::Iter<f32>"* %0, i64 0, i32 0
  %xs.sroa.0.0.copyload = load float*, float** %xs.sroa.0.0..sroa_idx, align 8, !nonnull !0
  %xs.sroa.4.0..sroa_idx12 = getelementptr inbounds %"core::slice::Iter<f32>", %"core::slice::Iter<f32>"* %0, i64 0, i32 1
  %xs.sroa.4.0.copyload = load float*, float** %xs.sroa.4.0..sroa_idx12, align 8, !nonnull !0
  %1 = icmp eq float* %xs.sroa.0.0.copyload, %xs.sroa.4.0.copyload
  ret i1 %1
}

@bluss
Copy link
Member Author

bluss commented Nov 24, 2016

that particular nonnull, where does it come from? The assume?

@bluss
Copy link
Member Author

bluss commented Nov 24, 2016

My question asked differently, will this also have an extra null check?

pub struct MockIter {
    start: *const f32,
    end: *const f32,
}

impl MockIter {
    unsafe fn next<'a>(&mut self) -> Option<&'a f32> {
        if self.start != self.end {
            let ptr = self.start;
            self.start = self.start.offset(1);
            Some(&*ptr)
        } else {
            None
        }
    }
}

pub fn is_empty_3(xs: MockIter) -> bool {
    unsafe {
        {xs}.next().is_none()
    }
}

I am keen on having stable rust (no assume superpowers) be the best it can be.

@luqmana
Copy link
Member

luqmana commented Nov 24, 2016

@bluss The nonnull metadata do come from the assume; the InstCombine pass turns assume( (load addr) != null ) to a load addr !nonnull.
So unfortunately is_empty_3 would still have the extra null check.

@luqmana luqmana added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Nov 24, 2016
@bluss
Copy link
Member Author

bluss commented Nov 24, 2016

Even a simple identity function fn non_null(x: &f32) -> &f32 has the attribute dereferenceable(4) on the return value. Maybe it can be used to insert a non null tag here?

@bluss
Copy link
Member Author

bluss commented Nov 25, 2016

Can I help with getting that patch upstreamed somehow?

@luqmana
Copy link
Member

luqmana commented Nov 28, 2016

@bluss I've submitted it upstream ( https://reviews.llvm.org/D27114 )

@bluss
Copy link
Member Author

bluss commented Dec 13, 2016

Are all the three parts needed in the patch? Since one was apparently not a valid optimization.

@bluss
Copy link
Member Author

bluss commented Feb 22, 2017

https://reviews.llvm.org/rL294897 was merged upstream, not sure how much/little that solves.

@luqmana
Copy link
Member

luqmana commented Mar 22, 2017

Ok, I finally committed https://reviews.llvm.org/D27114 upstream. So we can either cherypick the commit or just wait til the next LLVM update.

@bluss
Copy link
Member Author

bluss commented Mar 22, 2017

Whew! Congratulations and thanks for working on it.

@luqmana
Copy link
Member

luqmana commented Mar 23, 2017

Using the testcase above

#![crate_type="lib"]

use std::slice::Iter;

// weird codegen (aliasing / None vs null bug?)
pub fn is_empty_1(xs: Iter<f32>) -> bool {
    {xs}.next().is_none()
}

// good codegen
pub fn is_empty_2(xs: Iter<f32>) -> bool {
    xs.map(|&x| x).next().is_none()
}

With the LLVM patch, both functions generate the same IR:

"core::slice::Iter<f32>" = type { float*, float*, %"core::marker::PhantomData<&f32>" }
%"core::marker::PhantomData<&f32>" = type {}
%"unwind::libunwind::_Unwind_Exception" = type { i64, void (i32, %"unwind::libunwind::_Unwind_Exception"*)*, [6 x i64] }
%"unwind::libunwind::_Unwind_Context" = type {}

; Function Attrs: nounwind uwtable
define zeroext i1 @_ZN3bad10is_empty_117h1c4be4edff235dbfE(%"core::slice::Iter<f32>"* noalias nocapture readonly dereferenceable(16)) unnamed_addr #0 {
entry-block:
  %xs.sroa.0.0..sroa_idx = getelementptr inbounds %"core::slice::Iter<f32>", %"core::slice::Iter<f32>"* %0, i64 0, i32 0
  %xs.sroa.0.0.copyload = load float*, float** %xs.sroa.0.0..sroa_idx, align 8, !nonnull !0
  %xs.sroa.4.0..sroa_idx13 = getelementptr inbounds %"core::slice::Iter<f32>", %"core::slice::Iter<f32>"* %0, i64 0, i32 1
  %xs.sroa.4.0.copyload = load float*, float** %xs.sroa.4.0..sroa_idx13, align 8, !nonnull !0
  %1 = icmp eq float* %xs.sroa.0.0.copyload, %xs.sroa.4.0.copyload
  ret i1 %1
}

; Function Attrs: nounwind uwtable
define zeroext i1 @_ZN3bad10is_empty_217h72beabc012005049E(%"core::slice::Iter<f32>"* noalias nocapture readonly dereferenceable(16)) unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
entry-block:
  %xs.sroa.0.0..sroa_idx = getelementptr inbounds %"core::slice::Iter<f32>", %"core::slice::Iter<f32>"* %0, i64 0, i32 0
  %xs.sroa.0.0.copyload = load float*, float** %xs.sroa.0.0..sroa_idx, align 8, !nonnull !0
  %xs.sroa.4.0..sroa_idx14 = getelementptr inbounds %"core::slice::Iter<f32>", %"core::slice::Iter<f32>"* %0, i64 0, i32 1
  %xs.sroa.4.0.copyload = load float*, float** %xs.sroa.4.0..sroa_idx14, align 8, !nonnull !0
  %1 = icmp eq float* %xs.sroa.0.0.copyload, %xs.sroa.4.0.copyload
  ret i1 %1
}

; Function Attrs: nounwind
declare i32 @rust_eh_personality(i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*) unnamed_addr #1

attributes #0 = { nounwind uwtable }
attributes #1 = { nounwind }

!0 = !{}

The patch is a bit more annoying to backport I now realize, since it uses a relatively new method (copyMetadata on Instructions) that's not in our fork which requires cherry picking two more commits.

@arielb1
Copy link
Contributor

arielb1 commented Mar 24, 2017

Could you open a PR against rust-lang/llvm ?

@vadimcn
Copy link
Contributor

vadimcn commented May 3, 2017

@luqmana: Looks like this patch causes a LLVM assertion: https://bugs.llvm.org/show_bug.cgi?id=32902. Commenting out your changes in SROA.cpp makes assert go away.
(as of llvm-mirror/llvm@edadbde)

@arielb1
Copy link
Contributor

arielb1 commented Jun 15, 2017

Confirming fixed with #40914. With LLVM 5.0 there's also https://bugs.llvm.org/show_bug.cgi?id=33470, but that does not affect Rust's LLVM 4.0.

arielb1 added a commit to arielb1/rust that referenced this issue Jul 12, 2017
This still does not work on 32-bit archs because of an LLVM limitation,
but this is only an optimization, so let's push it on 64-bit only for now.

Fixes rust-lang#37945
bors added a commit that referenced this issue Jul 14, 2017
[LLVM] Avoid losing the !nonnull attribute in SROA

Fixes #37945.

r? @alexcrichton
@arielb1 arielb1 mentioned this issue Aug 14, 2017
43 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

4 participants