Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible incorrect inclusion of AVX2 instructions when not enabled #53

Closed
danking opened this issue Feb 9, 2017 · 3 comments
Closed

Comments

@danking
Copy link

danking commented Feb 9, 2017

Hi there,

AFAICT, libsimdpp is erroneously including AVX2 instructions when I neither enable them with a compiler switch nor enable them with the instruction set selection macro. I have a file called fail.cpp:

#define SIMDPP_ARCH_X86_SSE2

#include <simdpp/simd.h>
#include <inttypes.h>

using namespace simdpp;

int main(int argc, char ** argv) {
  return 0;
}

uint64<2> bad(uint64<2> x, uint64<2> y) {
  return bit_andnot(x, y);
}

which I compile with this invocation:

g++ -march=native -std=c++11 -Ilibsimdpp-2.0-rc2 -Wall -Werror fail.cpp

and then I take a look at a.out:

[ec2-user@ip-172-31-54-96 c]$ objdump -M intel -d a.out
...
00000000004005f2 <_Z3badN6simdpp9arch_sse26uint64ILj2EvEES2_>:
  4005f2:       55                      push   rbp
  4005f3:       48 89 e5                mov    rbp,rsp
  4005f6:       48 81 ec 20 02 00 00    sub    rsp,0x220
...
  40085a:       c5 f9 df 85 30 ff ff    vpandn xmm0,xmm0,XMMWORD PTR [rbp-0xd0]

and it includes vpandn which, AFAIK, is an AVX2 instruction. Moreover, this triggers a SIGILL on my machine, so at the very least it's not compatible with my architecture.

Have I done something wrong? Perhaps a bad flag somewhere?

When I compile libsimdpp with cmake ., it does correctly conclude that I lack AVX2:

...
-- Performing Test CAN_RUN_X86_AVX
-- Performing Test CAN_RUN_X86_AVX - Success
-- Performing Test CAN_RUN_X86_AVX2
-- Performing Test CAN_RUN_X86_AVX2 - Failed
...
[ec2-user@ip-172-31-54-96 c]$ gcc --version
gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9)
[ec2-user@ip-172-31-54-96 c]$ cat /proc/cpuinfo | grep flags
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm fsgsbase smep erms xsaveopt
[ec2-user@ip-172-31-54-96 c]$ gcc -march=native -Q --help=target | grep avx
  -march=                               core-avx-i
  -mavx                                 [enabled]
  -mavx2                                [disabled]
  -mavx256-split-unaligned-load         [disabled]
  -mavx256-split-unaligned-store        [disabled]
  -mprefer-avx128                       [disabled]
  -msse2avx                             [disabled]
  -mtune=                               core-avx-i
@danking
Copy link
Author

danking commented Feb 9, 2017

FWIW, I expected the behavior to be to emit pandn i.e.:

__m128i _mm_andnot_si128 (__m128i a, __m128i b)

@p12tic
Copy link
Owner

p12tic commented Feb 17, 2017

Hi, I suspect that the issue is not related to libsimdpp itself as AVX equipped CPU should support the instruction in question.

Processors that support AVX also support the version of vpandn that operates on XMM registers (ref). The v in vpandn only indicates that the instruction is using VEX prefix, which is a separate instruction encoding scheme introduced on AVX. Most SSE-SSE4 instructions are available in VEX-prefixed form on AVX. AVX2 is a requirement only for instructions that operate on YMM registers.

Using non VEX instructions along with VEX instructions introduces a large performance penalty, so compilers try to use vpandn for _mm_andnot_si128 instead of older pandn from AVX onwards.

Could you please check whether running any code that just uses __m128i _mm_andnot_si128 (__m128i a, __m128i b) works on your CPU when that code is compiled for AVX? Not being able to run 128-bit vpandn instruction on AVX is very strange. The fact that CAN_RUN_X86_AVX test succeeded is even more strange, because that means that it's possible to run certain instructions with VEX prefixes but not others.

@danking
Copy link
Author

danking commented Feb 21, 2017

Upon further investigation, I cannot reproduce the SIGILL. The only plausible explanation I have is that while moving some source code around I brought along an old binary and failed to cleanly recompile before trying to execute it. Everything is copacetic now.

My primary confusion came from looking at the assembly and the faulty thinking that vpandn was an AVX2 instruction. The Intel Intrinsics Guide doesn't list a vpandn when filtered to AVX, heh. Thanks for the pointer to http://www.felixcloutier.com/x86/, I'll be sure to check there before filing any more errant bugs 😬.

Thanks for your time and the library!

@danking danking closed this as completed Feb 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants