Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler fails to optimize out bounds checks on 32-bit x86 and ARM #22924

Closed
rprichard opened this Issue Mar 1, 2015 · 3 comments

Comments

Projects
None yet
4 participants
@rprichard
Copy link
Contributor

rprichard commented Mar 1, 2015

Test case:

#[inline(never)]
pub fn my_write(dst: &mut [u8]) {
    if dst.len() >= 41 {
        dst[40] = 17;
    }
}

fn main() {
    let mut buf = [0u8; 41];
    my_write(&mut buf);
}

The 32-bit x86 and ARM disassembly for my_write includes two checks of the slice length and a call to panic_bounds_check. The bounds check is optimized away on x86_64.

Here's the x86 assembly:

_ZN8my_write20he7b6161ce2fc6d9beaaE:
    subl    $12, %esp
    cmpl    $41, 4(%ecx)
    jb  .LBB0_3
    movl    4(%ecx), %eax
    cmpl    $41, %eax
    jb  .LBB0_4
    movl    (%ecx), %eax
    movb    $17, 40(%eax)
.LBB0_3:
    addl    $12, %esp
    retl
.LBB0_4:
    movl    %eax, 8(%esp)
    movl    $40, 4(%esp)
    movl    $panic_bounds_check_loc20, (%esp)
    calll   _ZN9panicking18panic_bounds_check20hdb38771954ce4aaf58rE

LLVM bitcode:

; Function Attrs: noinline uwtable
define internal fastcc void @_ZN8my_write20he7b6161ce2fc6d9beaaE({ i8*, i32 }* noalias nocapture dereferenceable(8)) unnamed_addr #0 {
entry-block:
  %1 = bitcast { i8*, i32 }* %0 to i8*
  %2 = bitcast { i8*, i32 }* %0 to i64*
  %3 = load i64* %2, align 4
  %.sroa.3.0.extract.shift.i.i = lshr i64 %3, 32
  %.sroa.3.0.extract.trunc.i.i = trunc i64 %.sroa.3.0.extract.shift.i.i to i32
  %4 = icmp ugt i32 %.sroa.3.0.extract.trunc.i.i, 40
  %5 = trunc i64 %3 to i32
  %6 = inttoptr i32 %5 to i8*
  br i1 %4, label %then-block-17-, label %next-block

then-block-17-:                                   ; preds = %entry-block
  %7 = getelementptr inbounds { i8*, i32 }* %0, i32 0, i32 1
  %8 = load i32* %7, align 4
  %9 = icmp ult i32 %8, 41
  br i1 %9, label %cond, label %next, !prof !0

next:                                             ; preds = %then-block-17-
  %10 = getelementptr inbounds i8* %6, i32 40
  store i8 17, i8* %10, align 1
  br label %next-block

cond:                                             ; preds = %then-block-17-
  tail call void @_ZN9panicking18panic_bounds_check20hdb38771954ce4aaf58rE({ %str_slice, i32 }* noalias readonly dereferenceable(12) @panic_bounds_check_loc20, i32 40, i32 %8)
  unreachable

next-block:                                       ; preds = %entry-block, %next
  tail call void @llvm.lifetime.end(i64 8, i8* %1)
  ret void
}

Compiler version:

rustc 1.0.0-nightly (e233987ce 2015-02-27) (built 2015-02-28)
binary: rustc
commit-hash: e233987ce1de88a48db2ce612019ba644d3cf5dd
commit-date: 2015-02-27
build-date: 2015-02-28
host: i686-unknown-linux-gnu
release: 1.0.0-nightly

My ARM compiler is older -- 2015-02-18 or so.

@rprichard

This comment has been minimized.

Copy link
Contributor Author

rprichard commented Mar 1, 2015

The problem goes away if I stick this at the start of my_write:

let dst = dst;
@dotdash

This comment has been minimized.

Copy link
Contributor

dotdash commented Mar 4, 2015

http://llvm.org/bugs/show_bug.cgi?id=22786

We could work around this by special casing slices and copying them by doing two loads/stores. Not sure if it's worth it.

@cmr

This comment has been minimized.

Copy link
Member

cmr commented Mar 25, 2015

@dotdash should obviously be fixed upstream, will help more than just use I assume.

@cmr cmr self-assigned this Mar 25, 2015

dotdash added a commit to dotdash/rust that referenced this issue Jun 18, 2015

Pass fat pointers in two immediate arguments instead an indirect argu…
…ments

This has a number of advantages. The obvious one is that we don't have
to put the data into memory but can keep it in registers. Since we're
currently passing a pointer anyway (instead of using e.g. a known offset
on the stack, which is what the `byval` attribute would achieve), we
only use a single additional register for each fat pointer, but save at
least two pointers worth of stack in exchange (sometimes more because
more than one copy gets eliminated). On archs that pass arguments on the
stack, we even save a pointer worth of stack even without considering
the omitted copies.

Additionally, LLVM can optimize the code a lot better, to a large degree
due to the fact that lots of copies are gone or can be optimized away.
Additionally, we can now emit attributes on the data and/or vtable
pointers contained in the fat pointer, potentially allowing for even
more optimizations.

LLVM passes are about 3-7% faster (depending on the crate), and the
resulting code is also a few percent smaller, for example:

   text    data  filename
5671479 3941461  before/librustc-d8ace771.so
5447663 3905745  after/librustc-d8ace771.so

1944425 2394024  before/libstd-d8ace771.so
1896769 2387610  after/libstd-d8ace771.so

Fixes rust-lang#22924

Cc rust-lang#22891 (at least for fat pointers the code is good now)

dotdash added a commit to dotdash/rust that referenced this issue Jun 19, 2015

Pass fat pointers in two immediate arguments
This has a number of advantages compared to creating a copy in memory
and passing a pointer. The obvious one is that we don't have to put the
data into memory but can keep it in registers. Since we're currently
passing a pointer anyway (instead of using e.g. a known offset on the
stack, which is what the `byval` attribute would achieve), we only use a
single additional register for each fat pointer, but save at least two
pointers worth of stack in exchange (sometimes more because more than
one copy gets eliminated). On archs that pass arguments on the stack, we
save a pointer worth of stack even without considering the omitted
copies.

Additionally, LLVM can optimize the code a lot better, to a large degree
due to the fact that lots of copies are gone or can be optimized away.
Additionally, we can now emit attributes like nonnull on the data and/or
vtable pointers contained in the fat pointer, potentially allowing for
even more optimizations.

This results in LLVM passes being about 3-7% faster (depending on the
crate), and the resulting code is also a few percent smaller, for
example:

   text    data  filename
5671479 3941461  before/librustc-d8ace771.so
5447663 3905745  after/librustc-d8ace771.so

1944425 2394024  before/libstd-d8ace771.so
1896769 2387610  after/libstd-d8ace771.so

I had to remove a call in the backtrace-debuginfo test, because LLVM can
now merge the tails of some blocks when optimizations are turned on,
which can't correctly preserve line info.

Fixes rust-lang#22924

Cc rust-lang#22891 (at least for fat pointers the code is good now)

dotdash added a commit to dotdash/rust that referenced this issue Jun 20, 2015

Pass fat pointers in two immediate arguments
This has a number of advantages compared to creating a copy in memory
and passing a pointer. The obvious one is that we don't have to put the
data into memory but can keep it in registers. Since we're currently
passing a pointer anyway (instead of using e.g. a known offset on the
stack, which is what the `byval` attribute would achieve), we only use a
single additional register for each fat pointer, but save at least two
pointers worth of stack in exchange (sometimes more because more than
one copy gets eliminated). On archs that pass arguments on the stack, we
save a pointer worth of stack even without considering the omitted
copies.

Additionally, LLVM can optimize the code a lot better, to a large degree
due to the fact that lots of copies are gone or can be optimized away.
Additionally, we can now emit attributes like nonnull on the data and/or
vtable pointers contained in the fat pointer, potentially allowing for
even more optimizations.

This results in LLVM passes being about 3-7% faster (depending on the
crate), and the resulting code is also a few percent smaller, for
example:

   text    data  filename
5671479 3941461  before/librustc-d8ace771.so
5447663 3905745  after/librustc-d8ace771.so

1944425 2394024  before/libstd-d8ace771.so
1896769 2387610  after/libstd-d8ace771.so

I had to remove a call in the backtrace-debuginfo test, because LLVM can
now merge the tails of some blocks when optimizations are turned on,
which can't correctly preserve line info.

Fixes rust-lang#22924

Cc rust-lang#22891 (at least for fat pointers the code is good now)

bors added a commit that referenced this issue Jun 20, 2015

Auto merge of #26411 - dotdash:fat_in_registers, r=Aatch

This has a number of advantages compared to creating a copy in memory
and passing a pointer. The obvious one is that we don't have to put the
data into memory but can keep it in registers. Since we're currently
passing a pointer anyway (instead of using e.g. a known offset on the
stack, which is what the `byval` attribute would achieve), we only use a
single additional register for each fat pointer, but save at least two
pointers worth of stack in exchange (sometimes more because more than
one copy gets eliminated). On archs that pass arguments on the stack, we
save a pointer worth of stack even without considering the omitted
copies.

Additionally, LLVM can optimize the code a lot better, to a large degree
due to the fact that lots of copies are gone or can be optimized away.
Additionally, we can now emit attributes like nonnull on the data and/or
vtable pointers contained in the fat pointer, potentially allowing for
even more optimizations.

This results in LLVM passes being about 3-7% faster (depending on the
crate), and the resulting code is also a few percent smaller, for
example:

|text|data|filename|
|----|----|--------|
|5671479|3941461|before/librustc-d8ace771.so|
|5447663|3905745|after/librustc-d8ace771.so|
| | | |
|1944425|2394024|before/libstd-d8ace771.so|
|1896769|2387610|after/libstd-d8ace771.so|

I had to remove a call in the backtrace-debuginfo test, because LLVM can
now merge the tails of some blocks when optimizations are turned on,
which can't correctly preserve line info.

Fixes #22924

Cc #22891 (at least for fat pointers the code is good now)

dotdash added a commit to dotdash/rust that referenced this issue Jun 20, 2015

Pass fat pointers in two immediate arguments
This has a number of advantages compared to creating a copy in memory
and passing a pointer. The obvious one is that we don't have to put the
data into memory but can keep it in registers. Since we're currently
passing a pointer anyway (instead of using e.g. a known offset on the
stack, which is what the `byval` attribute would achieve), we only use a
single additional register for each fat pointer, but save at least two
pointers worth of stack in exchange (sometimes more because more than
one copy gets eliminated). On archs that pass arguments on the stack, we
save a pointer worth of stack even without considering the omitted
copies.

Additionally, LLVM can optimize the code a lot better, to a large degree
due to the fact that lots of copies are gone or can be optimized away.
Additionally, we can now emit attributes like nonnull on the data and/or
vtable pointers contained in the fat pointer, potentially allowing for
even more optimizations.

This results in LLVM passes being about 3-7% faster (depending on the
crate), and the resulting code is also a few percent smaller, for
example:

   text    data  filename
5671479 3941461  before/librustc-d8ace771.so
5447663 3905745  after/librustc-d8ace771.so

1944425 2394024  before/libstd-d8ace771.so
1896769 2387610  after/libstd-d8ace771.so

I had to remove a call in the backtrace-debuginfo test, because LLVM can
now merge the tails of some blocks when optimizations are turned on,
which can't correctly preserve line info.

Fixes rust-lang#22924

Cc rust-lang#22891 (at least for fat pointers the code is good now)

bors added a commit that referenced this issue Jun 20, 2015

Auto merge of #26411 - dotdash:fat_in_registers, r=aatch

This has a number of advantages compared to creating a copy in memory
and passing a pointer. The obvious one is that we don't have to put the
data into memory but can keep it in registers. Since we're currently
passing a pointer anyway (instead of using e.g. a known offset on the
stack, which is what the `byval` attribute would achieve), we only use a
single additional register for each fat pointer, but save at least two
pointers worth of stack in exchange (sometimes more because more than
one copy gets eliminated). On archs that pass arguments on the stack, we
save a pointer worth of stack even without considering the omitted
copies.

Additionally, LLVM can optimize the code a lot better, to a large degree
due to the fact that lots of copies are gone or can be optimized away.
Additionally, we can now emit attributes like nonnull on the data and/or
vtable pointers contained in the fat pointer, potentially allowing for
even more optimizations.

This results in LLVM passes being about 3-7% faster (depending on the
crate), and the resulting code is also a few percent smaller, for
example:

|text|data|filename|
|----|----|--------|
|5671479|3941461|before/librustc-d8ace771.so|
|5447663|3905745|after/librustc-d8ace771.so|
| | | |
|1944425|2394024|before/libstd-d8ace771.so|
|1896769|2387610|after/libstd-d8ace771.so|

I had to remove a call in the backtrace-debuginfo test, because LLVM can
now merge the tails of some blocks when optimizations are turned on,
which can't correctly preserve line info.

Fixes #22924

Cc #22891 (at least for fat pointers the code is good now)

@bors bors closed this in #26411 Jun 20, 2015

jroesch added a commit to jroesch/rust that referenced this issue Jul 21, 2015

Pass fat pointers in two immediate arguments
This has a number of advantages compared to creating a copy in memory
and passing a pointer. The obvious one is that we don't have to put the
data into memory but can keep it in registers. Since we're currently
passing a pointer anyway (instead of using e.g. a known offset on the
stack, which is what the `byval` attribute would achieve), we only use a
single additional register for each fat pointer, but save at least two
pointers worth of stack in exchange (sometimes more because more than
one copy gets eliminated). On archs that pass arguments on the stack, we
save a pointer worth of stack even without considering the omitted
copies.

Additionally, LLVM can optimize the code a lot better, to a large degree
due to the fact that lots of copies are gone or can be optimized away.
Additionally, we can now emit attributes like nonnull on the data and/or
vtable pointers contained in the fat pointer, potentially allowing for
even more optimizations.

This results in LLVM passes being about 3-7% faster (depending on the
crate), and the resulting code is also a few percent smaller, for
example:

   text    data  filename
5671479 3941461  before/librustc-d8ace771.so
5447663 3905745  after/librustc-d8ace771.so

1944425 2394024  before/libstd-d8ace771.so
1896769 2387610  after/libstd-d8ace771.so

I had to remove a call in the backtrace-debuginfo test, because LLVM can
now merge the tails of some blocks when optimizations are turned on,
which can't correctly preserve line info.

Fixes rust-lang#22924

Cc rust-lang#22891 (at least for fat pointers the code is good now)

thepowersgang added a commit to thepowersgang/rust that referenced this issue Jul 25, 2015

Pass fat pointers in two immediate arguments
This has a number of advantages compared to creating a copy in memory
and passing a pointer. The obvious one is that we don't have to put the
data into memory but can keep it in registers. Since we're currently
passing a pointer anyway (instead of using e.g. a known offset on the
stack, which is what the `byval` attribute would achieve), we only use a
single additional register for each fat pointer, but save at least two
pointers worth of stack in exchange (sometimes more because more than
one copy gets eliminated). On archs that pass arguments on the stack, we
save a pointer worth of stack even without considering the omitted
copies.

Additionally, LLVM can optimize the code a lot better, to a large degree
due to the fact that lots of copies are gone or can be optimized away.
Additionally, we can now emit attributes like nonnull on the data and/or
vtable pointers contained in the fat pointer, potentially allowing for
even more optimizations.

This results in LLVM passes being about 3-7% faster (depending on the
crate), and the resulting code is also a few percent smaller, for
example:

   text    data  filename
5671479 3941461  before/librustc-d8ace771.so
5447663 3905745  after/librustc-d8ace771.so

1944425 2394024  before/libstd-d8ace771.so
1896769 2387610  after/libstd-d8ace771.so

I had to remove a call in the backtrace-debuginfo test, because LLVM can
now merge the tails of some blocks when optimizations are turned on,
which can't correctly preserve line info.

Fixes rust-lang#22924

Cc rust-lang#22891 (at least for fat pointers the code is good now)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.