Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small SIMD test fails with --release but passes without #50154

Closed
danielrh opened this issue Apr 22, 2018 · 17 comments · Fixed by #55073
Closed

Small SIMD test fails with --release but passes without #50154

danielrh opened this issue Apr 22, 2018 · 17 comments · Fixed by #55073
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-simd Area: SIMD (Single Instruction Multiple Data) C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@danielrh
Copy link

danielrh commented Apr 22, 2018

Full repro here:
https://github.com/danielrh/simd_playground run with cargo test --release
failing test is here:

#![feature(stdsimd)]
mod test {
#[test]
fn baseline() {
use std::simd::*;
   let symbol = 2i16;
   let inc = 1i16;
   let data = i16x16::new(4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60, 64);
   let one_to_16 = i16x16::new(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16);
   let increment_v = i16x16::splat(inc);
   let mask_v = unsafe {
            ::std::arch::x86_64::_mm256_cmpgt_epi16(::std::arch::x86_64::__m256i::from_bits(one_to_16),
                                                   ::std::arch::x86_64::__m256i::from_bits(i16x16::splat(i16::from(symbol))))
    };
    let output = data + (increment_v & i16x16::from_bits(mask_v));
    let mut xfinal = [0i16; 16];
    output.store_unaligned(&mut xfinal);
    assert_eq!(xfinal, [4, 8, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57, 61, 65]);
}
}

using:

binary: rustc
commit-hash: ac3c2288f9f9d977acb46406ba60033d65165a7b
commit-date: 2018-04-18
host: x86_64-apple-darwin
release: 1.27.0-nightly
LLVM version: 6.0

on OSX 10.12.6 (16G1314)

and also on linux

rustc --verbose --version
rustc 1.27.0-nightly (ac3c2288f 2018-04-18)
binary: rustc
commit-hash: ac3c2288f9f9d977acb46406ba60033d65165a7b
commit-date: 2018-04-18
host: x86_64-unknown-linux-gnu
release: 1.27.0-nightly
LLVM version: 6.0

I was unable to reproduce the above problem by making a simple main with the above function, so it could be something to do with the build options.

One more note: the same exact test and code have been working for months with stdsimd 0.0.3 and 0.0.4 crate. I couldn't get that crate to build with nightly 1.27.0, so I couldn't see if it still worked.

@sfackler
Copy link
Member

Are you compiling with the avx2 target feature enabled?

@danielrh
Copy link
Author

Great question: I can avoid the bug by specifying RUSTFLAGS="-C target-cpu=core-avx-i" cargo test --release as the build command line

But in the past, llvm has been able to polyfill the instructions down to SSE2 with stdsimd-0.0.4 and therefore I didn't have to specify any flag at all.

I've noticed that asking LLVM to polyfill the instructions from avx2 down to core-avx-i actually improves performance on most available AVX2 hardware unless your instructions are sufficiently dense. This is because the AVX2 instructions downclock the chip for some time, and so it's much better to keep the full clock speed, but then keep the AVX2 code around for newer chips like skylake.

I really loved writing AVX2 intrinsics and having the compiler match them to my desired architecture...having 4 code paths (avx2, avx and SSE4.2 and SSE2) to match my desired targets is a significant support burden, so from my perspective, the old behavior with the SSE2 polyfill was ideal.

@hanna-kruppe
Copy link
Contributor

This... really isn't what those intrinsics are for. Sometimes the path of least resistance for the compiler is to treat some intrinsics as a generic operation that can be lowered to other instruction sets as well, but that is not at all guaranteed. If you want something portable across different SIMD instruction sets, you should use the (in-development) portable SIMD types, not AVX2 intrinsics.

@danielrh
Copy link
Author

danielrh commented Apr 23, 2018

Good suggestion about the portable simd types: I was able to make this helper function which seems to translate into the avx2 intrinsic when needed. Would this kind of thing be worth providing for all portable SIMD types?

#[inline(always)]
fn cmp_gt_i16x16(lhs: i16x16, rhs: i16x16) -> i16x16 {
    let lz = rhs - lhs;
    let sign_bit = lz & i16x16::splat(-32768);
    sign_bit >> 15
}

I do, however, think that this should either error out in the development build, or preferably yield a compiler error (or at least SIGILL) instead of providing wrong arithmetic results in release. Also, I suspect this is a recent LLVM bug in their polyfill...and may still be worth correcting

@alexcrichton alexcrichton added A-simd Area: SIMD (Single Instruction Multiple Data) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. labels May 7, 2018
@alexcrichton
Copy link
Member

This is an issue with opt-level 3 specifically and I believe is a bug inside of LLVM. The problem is that we're passing all arguments by reference (the SIMD arguments) and LLVM is accidentally promoting them to by-value which is known to produce bugs.

Specifically LLVM's Promote 'by reference' arguments to scalars on SCC pass is promoting pass-by-reference to pass-by-value, which is invalid in the sense of how we're expecting to use these functions.

I've opened an upstream LLVM bug at https://bugs.llvm.org/show_bug.cgi?id=37358

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 5, 2018

fn baseline() {
    let data = i16x16::new(4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60, 64);
    let one_to_16 = i16x16::new(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16);
    let output = one_to_16.gt(i16x16::splat(2i16)).select(data + 1i16, data);
    // note: if the mask is often false for all lanes you could guard the select
    // behind an `if mask.any() { ... }`
    assert_eq!(
        output,
        i16x16::new(4, 8, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57, 61, 65)
    );
}
  • @alexcrichton I've pinged a couple of more people on that LLVM bug, it would be nice to get this fixed for LLVM 7. The code in the OP uses unsafe to call an AVX function, but this is safe (defined behavior) if the code path is only reached in a CPU with AVX. Therefore, this bug is turning defined behavior into undefined behavior, and all of this in stable Rust, so we should probably mark this with I-unsound.

Is there a way to tell which from the I-unsound issues affect stable Rust only? There are a lot of them, and many affect only nightly Rust, but it is hard to tell them apart.

@alexcrichton
Copy link
Member

Thanks for the extra pings @gnzlbg! Let's see how that plays out...

@hellow554
Copy link
Contributor

@gnzlbg Your playground does not work anymore :(

2 | use std::simd::i16x16;
  |          ^^^^ Could not find `simd` in `std`

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 23, 2018

@hellow554 you need to use the packed_simd crate, std::simd is (hopefully temporarily) not part of std anymore.

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 4, 2018

So it appears that this won't be fixed in LLVM any time soon, and AFAICT this is not something we can easily warn about in the Rust side of things for the time being :/

@XAMPPRocky XAMPPRocky added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-bug Category: This is a bug. labels Sep 25, 2018
@raphlinus
Copy link
Contributor

raphlinus commented Oct 14, 2018

I find this bug unfortunate, as I'm trying to do safe wrappers (called fearless_simd), but this basically makes that approach unworkable. If the bug won't be fixed soon, maybe we should document the danger zone.

I read the llvm issue. It's interesting that this bug has persisted so long without getting triggered; it's evidence that the way people use C++ and Rust are quite different in spite of the similar approaches to zero-cost abstractions etc.

@alexcrichton
Copy link
Member

I'm hoping that I woke up on the right side of the bed this morning as after reading #55059 I was struck with inspiration about how we might solve this, manifested in #55073. If others are familiar with LLVM review on that would be greatly appreciated!

alexcrichton added a commit to alexcrichton/rust that referenced this issue Oct 14, 2018
The issue of passing around SIMD types as values between functions has
seen [quite a lot] of [discussion], and although we thought [we fixed
it][quite a lot] it [wasn't]! This PR is a change to rustc to, again,
try to fix this issue.

The fundamental problem here remains the same, if a SIMD vector argument
is passed by-value in LLVM's function type, then if the caller and
callee disagree on target features a miscompile happens. We solve this
by never passing SIMD vectors by-value, but LLVM will still thwart us
with its argument promotion pass to promote by-ref SIMD arguments to
by-val SIMD arguments.

This commit is an attempt to thwart LLVM thwarting us. We, just before
codegen, will take yet another look at the LLVM module and demote any
by-value SIMD arguments we see. This is a very manual attempt by us to
ensure the codegen for a module keeps working, and it unfortunately is
likely producing suboptimal code, even in release mode. The saving grace
for this, in theory, is that if SIMD types are passed by-value across
a boundary in release mode it's pretty unlikely to be performance
sensitive (as it's already doing a load/store, and otherwise
perf-sensitive bits should be inlined).

The implementation here is basically a big wad of C++. It was largely
copied from LLVM's own argument promotion pass, only doing the reverse.
In local testing this...

Closes rust-lang#50154
Closes rust-lang#52636
Closes rust-lang#54583
Closes rust-lang#55059

[quite a lot]: rust-lang#47743
[discussion]: rust-lang#44367
[wasn't]: rust-lang#50154
bors added a commit that referenced this issue Oct 14, 2018
rustc: Fix (again) simd vectors by-val in ABI

The issue of passing around SIMD types as values between functions has
seen [quite a lot] of [discussion], and although we thought [we fixed
it][quite a lot] it [wasn't]! This PR is a change to rustc to, again,
try to fix this issue.

The fundamental problem here remains the same, if a SIMD vector argument
is passed by-value in LLVM's function type, then if the caller and
callee disagree on target features a miscompile happens. We solve this
by never passing SIMD vectors by-value, but LLVM will still thwart us
with its argument promotion pass to promote by-ref SIMD arguments to
by-val SIMD arguments.

This commit is an attempt to thwart LLVM thwarting us. We, just before
codegen, will take yet another look at the LLVM module and demote any
by-value SIMD arguments we see. This is a very manual attempt by us to
ensure the codegen for a module keeps working, and it unfortunately is
likely producing suboptimal code, even in release mode. The saving grace
for this, in theory, is that if SIMD types are passed by-value across
a boundary in release mode it's pretty unlikely to be performance
sensitive (as it's already doing a load/store, and otherwise
perf-sensitive bits should be inlined).

The implementation here is basically a big wad of C++. It was largely
copied from LLVM's own argument promotion pass, only doing the reverse.
In local testing this...

Closes #50154
Closes #52636
Closes #54583
Closes #55059

[quite a lot]: #47743
[discussion]: #44367
[wasn't]: #50154
alexcrichton added a commit to alexcrichton/rust that referenced this issue Oct 16, 2018
The issue of passing around SIMD types as values between functions has
seen [quite a lot] of [discussion], and although we thought [we fixed
it][quite a lot] it [wasn't]! This PR is a change to rustc to, again,
try to fix this issue.

The fundamental problem here remains the same, if a SIMD vector argument
is passed by-value in LLVM's function type, then if the caller and
callee disagree on target features a miscompile happens. We solve this
by never passing SIMD vectors by-value, but LLVM will still thwart us
with its argument promotion pass to promote by-ref SIMD arguments to
by-val SIMD arguments.

This commit is an attempt to thwart LLVM thwarting us. We, just before
codegen, will take yet another look at the LLVM module and demote any
by-value SIMD arguments we see. This is a very manual attempt by us to
ensure the codegen for a module keeps working, and it unfortunately is
likely producing suboptimal code, even in release mode. The saving grace
for this, in theory, is that if SIMD types are passed by-value across
a boundary in release mode it's pretty unlikely to be performance
sensitive (as it's already doing a load/store, and otherwise
perf-sensitive bits should be inlined).

The implementation here is basically a big wad of C++. It was largely
copied from LLVM's own argument promotion pass, only doing the reverse.
In local testing this...

Closes rust-lang#50154
Closes rust-lang#52636
Closes rust-lang#54583
Closes rust-lang#55059

[quite a lot]: rust-lang#47743
[discussion]: rust-lang#44367
[wasn't]: rust-lang#50154
kennytm added a commit to kennytm/rust that referenced this issue Oct 18, 2018
rustc: Fix (again) simd vectors by-val in ABI

The issue of passing around SIMD types as values between functions has
seen [quite a lot] of [discussion], and although we thought [we fixed
it][quite a lot] it [wasn't]! This PR is a change to rustc to, again,
try to fix this issue.

The fundamental problem here remains the same, if a SIMD vector argument
is passed by-value in LLVM's function type, then if the caller and
callee disagree on target features a miscompile happens. We solve this
by never passing SIMD vectors by-value, but LLVM will still thwart us
with its argument promotion pass to promote by-ref SIMD arguments to
by-val SIMD arguments.

This commit is an attempt to thwart LLVM thwarting us. We, just before
codegen, will take yet another look at the LLVM module and demote any
by-value SIMD arguments we see. This is a very manual attempt by us to
ensure the codegen for a module keeps working, and it unfortunately is
likely producing suboptimal code, even in release mode. The saving grace
for this, in theory, is that if SIMD types are passed by-value across
a boundary in release mode it's pretty unlikely to be performance
sensitive (as it's already doing a load/store, and otherwise
perf-sensitive bits should be inlined).

The implementation here is basically a big wad of C++. It was largely
copied from LLVM's own argument promotion pass, only doing the reverse.
In local testing this...

Closes rust-lang#50154
Closes rust-lang#52636
Closes rust-lang#54583
Closes rust-lang#55059

[quite a lot]: rust-lang#47743
[discussion]: rust-lang#44367
[wasn't]: rust-lang#50154
alexcrichton added a commit to alexcrichton/rust that referenced this issue Oct 19, 2018
The issue of passing around SIMD types as values between functions has
seen [quite a lot] of [discussion], and although we thought [we fixed
it][quite a lot] it [wasn't]! This PR is a change to rustc to, again,
try to fix this issue.

The fundamental problem here remains the same, if a SIMD vector argument
is passed by-value in LLVM's function type, then if the caller and
callee disagree on target features a miscompile happens. We solve this
by never passing SIMD vectors by-value, but LLVM will still thwart us
with its argument promotion pass to promote by-ref SIMD arguments to
by-val SIMD arguments.

This commit is an attempt to thwart LLVM thwarting us. We, just before
codegen, will take yet another look at the LLVM module and demote any
by-value SIMD arguments we see. This is a very manual attempt by us to
ensure the codegen for a module keeps working, and it unfortunately is
likely producing suboptimal code, even in release mode. The saving grace
for this, in theory, is that if SIMD types are passed by-value across
a boundary in release mode it's pretty unlikely to be performance
sensitive (as it's already doing a load/store, and otherwise
perf-sensitive bits should be inlined).

The implementation here is basically a big wad of C++. It was largely
copied from LLVM's own argument promotion pass, only doing the reverse.
In local testing this...

Closes rust-lang#50154
Closes rust-lang#52636
Closes rust-lang#54583
Closes rust-lang#55059

[quite a lot]: rust-lang#47743
[discussion]: rust-lang#44367
[wasn't]: rust-lang#50154
Manishearth added a commit to Manishearth/rust that referenced this issue Oct 20, 2018
The issue of passing around SIMD types as values between functions has
seen [quite a lot] of [discussion], and although we thought [we fixed
it][quite a lot] it [wasn't]! This PR is a change to rustc to, again,
try to fix this issue.

The fundamental problem here remains the same, if a SIMD vector argument
is passed by-value in LLVM's function type, then if the caller and
callee disagree on target features a miscompile happens. We solve this
by never passing SIMD vectors by-value, but LLVM will still thwart us
with its argument promotion pass to promote by-ref SIMD arguments to
by-val SIMD arguments.

This commit is an attempt to thwart LLVM thwarting us. We, just before
codegen, will take yet another look at the LLVM module and demote any
by-value SIMD arguments we see. This is a very manual attempt by us to
ensure the codegen for a module keeps working, and it unfortunately is
likely producing suboptimal code, even in release mode. The saving grace
for this, in theory, is that if SIMD types are passed by-value across
a boundary in release mode it's pretty unlikely to be performance
sensitive (as it's already doing a load/store, and otherwise
perf-sensitive bits should be inlined).

The implementation here is basically a big wad of C++. It was largely
copied from LLVM's own argument promotion pass, only doing the reverse.
In local testing this...

Closes rust-lang#50154
Closes rust-lang#52636
Closes rust-lang#54583
Closes rust-lang#55059

[quite a lot]: rust-lang#47743
[discussion]: rust-lang#44367
[wasn't]: rust-lang#50154
@alexcrichton alexcrichton reopened this Oct 23, 2018
@alexcrichton
Copy link
Member

I'm posting a revert for the fix in #55281 because I don't think the fix was quite right (causing segfaults for me). LLVM, however, in the meantime should have an official fix, so this should hopefully get closed out in the near future once that lands.

@mati865
Copy link
Contributor

mati865 commented Feb 12, 2019

Upstream fix has landed in Rust's LLVM fork: rust-lang/llvm-project@3d36e5c

@gnzlbg
Copy link
Contributor

gnzlbg commented Feb 12, 2019

Does this still reproduce with nightly? If not we can close this.

@andersk
Copy link
Contributor

andersk commented Feb 12, 2019

Fails with nightly-2019-01-26 == rustc 1.33.0-nightly (bf669d1 2019-01-25).
Passes with nightly-2019-01-27 == rustc 1.33.0-nightly (20c2cba 2019-01-26).

This is consistent with the fix being in the LLVM update of #57675, so yeah, I think we can close this.

@araspik
Copy link

araspik commented Aug 9, 2019

Is there any reason this is not yet closed?

@nikic nikic closed this as completed Aug 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-simd Area: SIMD (Single Instruction Multiple Data) C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.