Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i686 gcc 12: regression at -O1 or -O2 #981

Open
mr-c opened this issue Jan 21, 2023 · 9 comments
Open

i686 gcc 12: regression at -O1 or -O2 #981

mr-c opened this issue Jan 21, 2023 · 9 comments
Labels
compiler-bug help wanted We lack either the expertise or time for this. Basically, "patches welcome".

Comments

@mr-c
Copy link
Collaborator

mr-c commented Jan 21, 2023

test:         x86/sse/emul/c
start time:   16:39:16
duration:     0.03s
result:       exit status 1
command:      MALLOC_PERTURB_=148 '/<<PKGBUILDDIR>>/gcc_test/test/x86/sse-emul-c'
----------------------------------- stderr -----------------------------------
../test/x86/sse.c:4307: assertion failed: r[0] == test_vec[i].r[0] (19623 == 292)
../test/x86/sse.c:4348: assertion failed: r[0] == test_vec[i].r[0] (19623 == 292)
@mr-c
Copy link
Collaborator Author

mr-c commented Apr 29, 2023

To reproduce

cd docker
./simde-dev.sh
# now inside the container
simde-reset-build.sh i686-gcc-12-debflags
meson compile -C i686-gcc-12-debflags
meson test -C i686-gcc-12-debflags --print-errorlogs

Confirmed with https://github.com/simd-everywhere/simde/releases/tag/v0.7.4-rc3

@mr-c mr-c added the help wanted We lack either the expertise or time for this. Basically, "patches welcome". label Apr 29, 2023
@mr-c
Copy link
Collaborator Author

mr-c commented May 30, 2023

Will probably be helped by #1025

@easyaspi314
Copy link
Contributor

I'll investigate. It is very much possible that this is due to MMX.

@easyaspi314
Copy link
Contributor

easyaspi314 commented May 31, 2023

Ok, this is not related to MMX.

Doing some tests, it seems that this is a GCC bug specific to GCC 12 that has been fixed in GCC 12.2.1. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

Baaasically, it seems that there was an incorrect guard for whether the MULH pattern was autovectorizable and when it was targeting an unsupported instruction set it just shrugged and tried to do a MULH on each machine word in the vector:

# x86_64 -mno-mmx -mno-sse (artificial)
mulhi_pu16:
        mov     rax, rsi
        mul     rdi
        mov     rax, rdx
        ret

# i686 -mno-mmx -mno-sse
mulhi_pu16:
        push    ebx
        mov     eax, DWORD PTR [esp+12]
        mul     DWORD PTR [esp+20]
        mov     eax, DWORD PTR [esp+16]
        mov     ecx, DWORD PTR [esp+8]
        mov     ebx, edx
        mul     DWORD PTR [esp+24]
        mov     eax, ecx
        mov     DWORD PTR [ecx], ebx
        mov     DWORD PTR [ecx+4], edx
        pop     ebx
        ret     4

# arm -mfloat-abi=soft
mulhi_pu16:
        ldrd    ip, r1, [sp]
        umull   ip, r2, ip, r2
        umull   r1, r3, r1, r3
        strd    r2, r3, [r0]
        bx      lr

It seems that we just need to break up the autovec pattern, which amusingly is as simple as a backwards loop :trollface:

@mr-c
Copy link
Collaborator Author

mr-c commented May 31, 2023

@easyaspi314 thanks for the research!

Can you be more specific as to how we can workaround this for those using the affected versions of GCC?

@mr-c
Copy link
Collaborator Author

mr-c commented May 31, 2023

Ah, I'll try __attribute__((optimize("no-tree-vectorize")))

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322#c14

@easyaspi314
Copy link
Contributor

easyaspi314 commented Jun 1, 2023

Ah, I'll try __attribute__((optimize("no-tree-vectorize")))

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322#c14

attribute optimize does work temporarily but it does disable inlining which emits a warn/error.

   /* GCC 12.1-12.2.0 emit garbage if vector.umulh is not
     * a legal operation. Looping backwards breaks the
     * vectorization pattern. */
#if defined(SIMDE_GCC_BUG_106322)
   for (int i = num; i-- > 0; ) {
#else
   for (int i = 0; i < num; i++) {
#endif

should in theory work but I haven't tested thoroughly

@mr-c
Copy link
Collaborator Author

mr-c commented Jun 1, 2023

Hmm. Neither reversing the loop nor using __attribute__((optimize("no-tree-vectorize"))) fixed the tests under i686 GCC-12 with -O2 (via https://github.com/simd-everywhere/simde/blob/master/docker/cross-files/i686-gcc-12-debflags.cross )

diff --git a/simde/x86/sse.h b/simde/x86/sse.h
index 80d4c6b3..2250e66d 100644
--- a/simde/x86/sse.h
+++ b/simde/x86/sse.h
@@ -3435,7 +3435,8 @@ simde_mm_mulhi_pu16 (simde__m64 a, simde__m64 b) {
       r_.neon_u16 = t3;
     #else
       SIMDE_VECTORIZE
-      for (size_t i = 0 ; i < (sizeof(r_.u16) / sizeof(r_.u16[0])) ; i++) {
+      //for (size_t i = 0 ; i < (sizeof(r_.u16) / sizeof(r_.u16[0])) ; i++) {
+      for (size_t i = (sizeof(r_.u16) / sizeof(r_.u16[0])); i > 0 ; i--) {
         r_.u16[i] = HEDLEY_STATIC_CAST(uint16_t, ((HEDLEY_STATIC_CAST(uint32_t, a_.u16[i]) * HEDLEY_STATIC_CAST(uint32_t, b_.u16[i])) >> UINT32_C(16)));
       }
     #endif

@easyaspi314
Copy link
Contributor

That backwards loop is off by one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler-bug help wanted We lack either the expertise or time for this. Basically, "patches welcome".
Projects
None yet
Development

No branches or pull requests

2 participants