internal compiler error when compiling on ARM v8 ThunderX2 #13622

undertherain · 2019-05-25T13:03:51Z

Compilation of NumPy 1.16.x fails when I compile it on a machine with Cavium ThunderX2 CPU

Versions 1.15.4 and below compile normally

Reproducing code example:

get last stable release, run python3 setup.py install
or pip install

Error message:

the first error which is not from _configtest.c is

In file included from numpy/core/include/numpy/npy_math.h:580:0,
                 from numpy/core/src/npymath/npy_math_common.h:9,
                 from numpy/core/src/npymath/npy_math_complex.c.src:34:
numpy/core/src/npymath/npy_math_internal.h.src: In function ‘npy_cacoshf’:
numpy/core/src/npymath/npy_math_internal.h.src:482:12: internal compiler error: Segmentation fault
     return @kind@@c@(x, y);
            ^~~~~~~~~~~~~~~
0x994aff crash_signal

full compilation log: https://pastebin.com/83Jj3knk

Environment:

OS: CentOS Linux release 7.6.1810 (AltArch)
Python 3.7.2, tried several other versions.
GCC 7.4.0, also tried system's default 4.8.5
CPU:

Family: ARM
Manufacturer: Cavium Inc.
ID: F1 0A 1F 43 00 00 00 00
Signature: Implementor 0x43, Variant 0x1, Architecture 15, Part 0x0af, Revision 1
Version: Cavium ThunderX2(R) CPU CN9975 v2.1 @ 2.0GHz

The text was updated successfully, but these errors were encountered:

tylerjereddy · 2019-05-28T03:00:33Z

Hmm, ARMv8 testing is part of our CI these days--I wonder what is so different about your setup? Pretty sure shippable is using Ubuntu for their native builds--so that's one difference.

undertherain · 2019-05-29T13:55:35Z

Well, one thing I should mention is that we have x64 nodes with amlost identical software stack (same OS, packages version, Python installed with same spack comand etc) - and NumPy 1.16.x compiles ok.
I'll be able to physically access servers only in a couple of weeks and try to live-boot to ubuntu arm server and try compiling on it, if no other solutions emerge until then...

crbaird · 2019-06-07T17:34:24Z

I'm also seeing this issue on CentOS 7.6. Interestingly, it builds just fine on the same hardware with SLE 12 SP4.

mattip · 2019-06-11T06:30:01Z

I am not sure we can help with this too much since our CI succeeds. What compiler is SLE 12 SP4 using?

You might be able to extract this as a stand-alone compiler bug by looking a few lines up for the actual gcc call and the gcc-options, then whittling down the example until it fails, starting off by removing unneeded include paths. Mine is below. Note the file you want to compile is actually the processed numpy/core/src/npymath/npy_math.c

``` x86_64-linux-gnu-gcc -Ibuild/src.linux-x86_64-3.6/numpy/core/src/npymath -Inumpy/core/include -Ibuild/src.linux-x86_64-3.6/numpy/core/include/numpy -Inumpy/core/src/common -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -I/usr/include/python3.6m -I/path/to/python/include/python3.6m -Ibuild/src.linux-x86_64-3.6/numpy/core/src/common -Ibuild/src.linux-x86_64-3.6/numpy/core/src/npymath -Ibuild/src.linux-x86_64-3.6/numpy/core/src/common -Ibuild/src.linux-x86_64-3.6/numpy/core/src/npymath -c numpy/core/src/npymath/npy_math.c ```

eric-wieser · 2019-06-11T06:47:38Z

It would be useful to know what the value of NPY_USE_C99_COMPLEX is on your build - that seems like the most likely difference between 1.15 and 1.16

I attempted to reduce it here but it didn't fail on any of the compilers I tried.

linedot · 2019-06-17T21:02:53Z

Experiencing the same issue on CentOS 7.6 on Huawei Taishan 2280 ARM64 servers.
Reduced example not failing to compile on native GCC versions 8.2.0, 8.3.0 and 9.1.0
Any advice on further debugging?

Edit: Bug does not occur when using -O0 on GCC 8.2.0 and does not occur at all with GCC 9.1.0

ginomcevoy · 2019-08-27T21:19:37Z

I am also experiencing this issue on this environment:
Processor: ThunderX2
OS: RHEL 7.5 with updates and a custom Python 3.6 installation
compiler: GCC 7.2.1 (RHSCL 3.0)

I believe that the difference between the aarch64 CI and the failing environments could be related to GLIBC version. Numpy uses the "numpy" version of npy_cacoshf (npy_math_complex.c.src:1389 in the source) if GLIBC version is below 2.18, and the "glibc" version otherwise (npy_math_complex.c:5343 in the build). Centos 7.6 uses GLIBC 2.17.

After playing with the compiler/numpy flags, I confirmed the issue occurs when GCC tries to inline the function. Adding either -O0 or -fno-inline flags makes the compilation succeed, but this is not desirable.

The only (ugly) workaround that I found so far that did not affect other functions was to force -O0 on the function definition. Here is the source diff for the workaround (GNU GCC only of course):

diff --git a/numpy/core/src/npymath/npy_math_complex.c.src b/numpy/core/src/npymath/npy_math_complex.c.src
index dad3812..adcd83c 100644
--- a/numpy/core/src/npymath/npy_math_complex.c.src
+++ b/numpy/core/src/npymath/npy_math_complex.c.src
@@ -1385,7 +1385,7 @@ npy_catan@c@(@ctype@ z)
 #endif

 #ifndef HAVE_CACOSH@C@
-@ctype@
+@ctype@ __attribute__((optimize("-O0")))
 npy_cacosh@c@(@ctype@ z)
 {
     /*

More debugging info:

noinline / __attribute__((noinline)) / __attribute__((gnu_inline)) attributes didn't work for me
Setting -std=c89 or -std=gnu89 works for that file, but numpy expects C99 standard in other places
GCC 4.8.5 can compile the file, even with -std=c99, for some reason.

charris · 2019-08-27T21:35:20Z

@ginomcevoy Thanks for the informative debug info. You should be able to compile with GCC 7.2.1 even without the -std=c99 flag, I believe it is only needed for the 4.8 series. But that shouldn't make any difference here. Hmm...

charris · 2019-08-27T21:37:00Z

Note that c99 is only needed for NumPy >= 1.17.

ginomcevoy · 2019-08-28T04:49:51Z

Yes, I forgot to mention that I was trying to install NumPy 1.17, that is why I didn't go for -std=c89 which works for that particular file. c99 is the default value for -std in GCC 7.2.1, and it fails with that (and with any value higher than that). Had the same problem for GCC 8.2.0, but not for GCC 4.8.5 (which is terrible at optimizing code for ThunderX2).

I was able to compile NumPy after that ugly fix, and only one test failed for me (TestComplexFunctions.test_loss_of_precision[complex256]).

siddhesh · 2019-10-09T21:54:57Z

An ICE would typically mean a compiler bug. I'll see if I can find a centos box to reproduce this.

EDIT: I should add that I just built numpy with gcc 8.3.0, so whatever the bug, it's likely to have already been fixed and only a backport may be necessary to get this working again.

EDIT2: Fixed gcc version. Sorry, checked the wrong machine, too many tabs open :/

siddhesh · 2019-10-10T17:15:22Z

Sorry I couldn't find a suitable machine to reproduce this. If someone can give me access I'll be happy to help isolate the problem.

maxim-kuvyrkov · 2019-10-23T14:51:19Z

I've reproduced this on cortex-a72 with both CentOS 7.6 and Ubuntu 18.04. I'm looking into this.
The hunch that this is a glibc bug (also my first guess) seems to be wrong. The crash reproduces with glibc-2.27.

Please report bugs like this to the upstream community (https://gcc.gnu.org/bugzilla/). The GNU Toolchain community prioritizes ICE (internal compiler error) and wrong-code bugs. Most of the time you just need to attach pre-processed source file and cc1 command line -- add "-v -save-temps" to compilation flags to get these.

nSircombe · 2019-10-23T15:01:50Z

I think it was here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90075

...and fixed, I believe.

NumPy builds with 9.2.0 on Aarch64 for me now.

maxim-kuvyrkov · 2019-10-23T15:12:24Z

@nSircombe , thanks! This saved me some cycles digging into this further. The patch was backported to gcc-7 and gcc-8 release branches, and will be in next update releases, which distros should pick up.
GCC 7.5 is expected to release in the next couple of weeks, and will be the final GCC 7 update.
Is anything else required from GCC side of things? Or should this be closed?

nSircombe · 2019-10-23T16:20:13Z

I've not tested the newer versions of GCC 7 & 8. But I can confirm that pip install numpy works for 9.2.0.

maxim-kuvyrkov · 2019-10-23T16:47:11Z

Confirmed that GCC 7 built from current gcc-7-branch works fine.

mattip · 2019-10-23T18:01:14Z

Thanks for the detective work. Should we wait for the toolchain to be released to close this?

BaptisteGerondeau · 2019-11-14T13:56:49Z

Release 1.17.4 has moved things around, and the above workaround patch is no longer applicable. Here is what works for me at the moment :

--- numpy/core/src/npymath/npy_math_internal.h.src.orig	2019-11-14 12:20:01.387180922 +0000
+++ numpy/core/src/npymath/npy_math_internal.h.src	2019-11-14 12:19:17.960646234 +0000
@@ -477,7 +477,7 @@
  * #KIND = ATAN2,HYPOT,POW,FMOD,COPYSIGN#
  */
 #ifdef HAVE_@KIND@@C@
-NPY_INPLACE @type@ npy_@kind@@c@(@type@ x, @type@ y)
+NPY_INPLACE __attribute__((optimize("-O0"))) @type@ npy_@kind@@c@(@type@ x, @type@ y)
 {
     return @kind@@c@(x, y);
 }

Not sure if it is the "best" (least worse) workaround though, feel free to give some feedback !

mattip · 2020-03-07T21:06:35Z

Closing, please reopen if there are still problems with this particular toolchain.

tylerjereddy added the component: build label May 28, 2019

koomie mentioned this issue Jun 8, 2019

numpy (v1.17.4) openhpc/ohpc#960

Closed

undertherain mentioned this issue Jul 31, 2019

numpy compilation fails for numpy >=16.0 dl4fugaku/dl4fugaku#2

Open

mattip closed this as completed Mar 7, 2020

This was referenced Apr 22, 2020

manylinux2014 docker image doesn't support numpy pypa/manylinux#548

Closed

CI: Add arm64 in travis-ci scipy/scipy#11867

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

internal compiler error when compiling on ARM v8 ThunderX2 #13622

internal compiler error when compiling on ARM v8 ThunderX2 #13622

undertherain commented May 25, 2019 •

edited

tylerjereddy commented May 28, 2019

undertherain commented May 29, 2019 •

edited

crbaird commented Jun 7, 2019

mattip commented Jun 11, 2019

eric-wieser commented Jun 11, 2019 •

edited

linedot commented Jun 17, 2019 •

edited

ginomcevoy commented Aug 27, 2019 •

edited

charris commented Aug 27, 2019

charris commented Aug 27, 2019

ginomcevoy commented Aug 28, 2019

siddhesh commented Oct 9, 2019 •

edited

siddhesh commented Oct 10, 2019

maxim-kuvyrkov commented Oct 23, 2019

nSircombe commented Oct 23, 2019 •

edited

maxim-kuvyrkov commented Oct 23, 2019

nSircombe commented Oct 23, 2019

maxim-kuvyrkov commented Oct 23, 2019

mattip commented Oct 23, 2019

BaptisteGerondeau commented Nov 14, 2019

mattip commented Mar 7, 2020

internal compiler error when compiling on ARM v8 ThunderX2 #13622

internal compiler error when compiling on ARM v8 ThunderX2 #13622

Comments

undertherain commented May 25, 2019 • edited

Reproducing code example:

Error message:

Environment:

tylerjereddy commented May 28, 2019

undertherain commented May 29, 2019 • edited

crbaird commented Jun 7, 2019

mattip commented Jun 11, 2019

eric-wieser commented Jun 11, 2019 • edited

linedot commented Jun 17, 2019 • edited

ginomcevoy commented Aug 27, 2019 • edited

charris commented Aug 27, 2019

charris commented Aug 27, 2019

ginomcevoy commented Aug 28, 2019

siddhesh commented Oct 9, 2019 • edited

siddhesh commented Oct 10, 2019

maxim-kuvyrkov commented Oct 23, 2019

nSircombe commented Oct 23, 2019 • edited

maxim-kuvyrkov commented Oct 23, 2019

nSircombe commented Oct 23, 2019

maxim-kuvyrkov commented Oct 23, 2019

mattip commented Oct 23, 2019

BaptisteGerondeau commented Nov 14, 2019

mattip commented Mar 7, 2020

undertherain commented May 25, 2019 •

edited

undertherain commented May 29, 2019 •

edited

eric-wieser commented Jun 11, 2019 •

edited

linedot commented Jun 17, 2019 •

edited

ginomcevoy commented Aug 27, 2019 •

edited

siddhesh commented Oct 9, 2019 •

edited

nSircombe commented Oct 23, 2019 •

edited