Local build testing broken on Fedora 35 by #21154. #21529

charris · 2022-05-17T19:12:49Z

Testing fails locally when checking long double with

ERROR numpy/core/tests/test_longdouble.py - UserWarning: Signature b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf\xff\x7f\x00\x00\x00\x00...
ERROR numpy/core/tests/test_multiarray.py - KeyboardInterrupt
ERROR numpy/core/tests/test_scalar_methods.py - UserWarning: Signature b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf\xff\x7f\x00\x00\x00...
ERROR numpy/core/tests/test_scalarmath.py - UserWarning: Signature b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf\xff\x7f\x00\x00\x00\x00...
ERROR numpy/core/tests/test_umath.py - UserWarning: Signature b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf\xff\x7f\x00\x00\x00\x00' for...

There are three problems I can see:

The warning was not issued previous to ENH, BLD: Fix math feature detection for wasm #21154.
The warning should be ignored, maybe should not be issued (numpy/core/getlimits.py line 346).
Why is CI not failing?

EDIT: Ignoring the warning leads to actual errrors:

ERROR numpy/core/tests/test_longdouble.py - AttributeError: 'MachAr' object has no attribute '_str_smallest_normal'
ERROR numpy/core/tests/test_scalar_methods.py - AttributeError: 'MachAr' object has no attribute '_str_smallest_normal'
ERROR numpy/core/tests/test_scalarmath.py - AttributeError: 'MachAr' object has no attribute '_str_smallest_normal'
ERROR numpy/core/tests/test_umath.py - AttributeError: 'MachAr' object has no attribute '_str_smallest_normal'

EDIT2: gcc (GCC) 11.3.1 20220421 (Red Hat 11.3.1-2)

The text was updated successfully, but these errors were encountered:

charris · 2022-05-17T19:51:45Z

@hoodmane Thoughts?

hoodmane · 2022-05-17T19:58:15Z

Does this reproduce in some docker image? If you have a reproduction I can look into it.

charris · 2022-05-17T20:05:27Z

I'm just doing python runtests.py -,m'full' on my local machine. Did a bisect to find the troublesome commit.

hoodmane · 2022-05-17T20:09:59Z

Okay, I will try running that and see if it also happens on my local machine (Ubuntu 20.04 LTS). I guess so far there is no reason to think the problem is Fedora-specific.

charris · 2022-05-17T20:16:13Z

More info:

CFLAGS = "-Wall -O3 -pipe -fomit-frame-pointer -fno-strict-aliasing -Wmaybe-uninitialized -Wdeprecated-declarations -march=native -Wimplicit-function-declaration"

Changing the optimization level to -O2 doesn't help. The failures occur during test collection with Pytest 6.2.5.

hoodmane · 2022-05-17T20:18:57Z

I get some errors in numpy/typing/tests/test_typing.py::test_reveal but the test suite runs:

Details

$ git checkout 3f9fada45852a8f1b3fd9e0b760f34425099896c
<Blah blah blah detached head>
$ python runtests.py -m'full'
Building, see build.log...
    ... build in progress
Build OK
/home/hood/Documents/programming/numpy/build/testenv/lib/python3.9/site-packages/numpy/_pytesttester.py:145: DeprecationWarning: 

  `numpy.distutils` is deprecated since NumPy 1.23.0, as a result
  of the deprecation of `distutils` itself. It will be removed for
  Python >= 3.12. For older Python versions it will remain present.
  It is recommended to use `setuptools < 60.0` for those Python versions.
  For more details, see:
    https://numpy.org/devdocs/reference/distutils_status_migration.html 

<... test progress info ...>

  from numpy.distutils import cpuinfo
NumPy version 1.23.0.dev0+974.g3f9fada45
NumPy relaxed strides checking option: True
NumPy CPU features:  SSE SSE2 SSE3 SSSE3* SSE41* POPCNT* SSE42* AVX* F16C* FMA3* AVX2* AVX512F* AVX512CD* AVX512_KNL? AVX512_KNM? AVX512_SKX* AVX512_CLX* AVX512_CNL* AVX512_ICL*

...

FAILED numpy/typing/tests/test_typing.py::test_reveal[numeric] - AssertionError: Reveal mismatch at line 108
FAILED numpy/typing/tests/test_typing.py::test_reveal[arrayterator] - AssertionError: Reveal mismatch at line 12
FAILED numpy/typing/tests/test_typing.py::test_reveal[stride_tricks] - AssertionError: Reveal mismatch at line 24
FAILED numpy/typing/tests/test_typing.py::test_reveal[ndarray_misc] - AssertionError: Reveal mismatch at line 168
FAILED numpy/typing/tests/test_typing.py::test_reveal[dtype] - AssertionError: Reveal mismatch at line 51
FAILED numpy/typing/tests/test_typing.py::test_reveal[nditer] - AssertionError: Reveal mismatch at line 10
FAILED numpy/typing/tests/test_typing.py::test_reveal[array_constructors] - AssertionError: Reveal mismatch at line 171
FAILED numpy/typing/tests/test_typing.py::test_reveal[flatiter] - AssertionError: Reveal mismatch at line 8
FAILED numpy/typing/tests/test_typing.py::test_reveal[multiarray] - AssertionError: Reveal mismatch at line 37
FAILED numpy/typing/tests/test_typing.py::test_reveal[fromnumeric] - AssertionError: Reveal mismatch at line 122
FAILED numpy/typing/tests/test_typing.py::test_reveal[testing] - AssertionError: Reveal mismatch at line 33
FAILED numpy/typing/tests/test_typing.py::test_reveal[index_tricks] - AssertionError: Reveal mismatch at line 19
12 failed, 20000 passed, 168 skipped, 31 xfailed, 3 xpassed in 431.85s (0:07:11)

charris · 2022-05-17T20:19:13Z

Pytest 7.1.2 also has the problem.

hoodmane · 2022-05-17T20:19:31Z

I am using pytest v6.2.5

charris · 2022-05-17T20:20:19Z

What compiler are you using?

charris · 2022-05-17T20:23:10Z

I believe @seberg also runs Fedora and he hasn't reported any problems. I suppose it could be hardware dependent, but that would be strange.

hoodmane · 2022-05-17T20:25:21Z

Info about my Python and compiler:

Details

Python 3.9.5 (default, Nov 23 2021, 15:27:38) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sysconfig
>>> sysconfig.get_config_var("CC")
'x86_64-linux-gnu-gcc -pthread'

$ x86_64-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=x86_64-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 9.3.0-17ubuntu1~20.04' --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-9 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)

hoodmane · 2022-05-17T20:28:35Z

Well if we can find a docker image with Fedora 35 + Python, it would be great to check if it reproduces in that.

charris · 2022-05-17T20:40:58Z

Out of curiosity, did you change any of the strings used to detect the long double type? The problem occurs during the Pytest collection because of the code that is run during the test module import. There seem to be at least two bugs:

The type isn't detected before the fallback (it used to be).
The fallback doesn't set machar._str_smallest_normal, and I would guess machar._str_smallest_subnormal as well.

hoodmane · 2022-05-17T21:21:01Z

I don't think I changed long double support detection. If long double is missing, then maybe feature_detection_math.h would be troublesome because it mixes declarations with long double with other declarations that would be expected to exist even if there is no long double support. But if the machine does support long double I'm not exactly sure where that would have changed as a result of my patch. I think the long double detection happens before math library detection? I will have to look at the code again more carefully.

charris · 2022-05-17T21:37:25Z

For some reason I'm suspicious of strtold_l, and maybe the changes in OPTIONAL_STDFUNCS.

charris · 2022-05-17T22:28:23Z

OK, I found the source of the problem, the tested value -0.1 goes through a double before being converted to long double and then to a string. Compare float128 from string:

In [11]: array('-0.1', np.float128).tostring()[0:10]
<ipython-input-11-65dd16c3070a>:1: DeprecationWarning: tostring() is deprecated. Use tobytes() instead.
  array('-0.1', np.float128).tostring()[0:10]
Out[11]: b'\xcd\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf'

to from python float (double):

In [12]: array(-0.1, np.float128).tostring()[0:10]
<ipython-input-12-c826205185fb>:1: DeprecationWarning: tostring() is deprecated. Use tobytes() instead.
  array(-0.1, np.float128).tostring()[0:10]
Out[12]: b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf'

The latter is what is being checked and found wanting.

charris · 2022-05-17T23:03:06Z

That is what is going on, but I don't know why it happens.

hoodmane · 2022-05-17T23:42:53Z

Huh. Well these two things specify numbers that are not equal: array('-0.1', np.float128) means the 128 bit float closest to -1/10, whereas array(-0.1, np.float128) means the 128 bit float that is upcast from the closest double to -1/10. Because the representation is little endian, the first two bytes represent the least significant part of the mantissa and are different, whereas the mantissa and the more significant parts of the exponent are the same. It's surprising that 96 bits of the 112 bit mantissas match when we only know for certain that 53 bits will match.

seberg · 2022-05-18T01:22:44Z

I think we had a similar problem that might be solved with -1/10 here on systems that were missing longdouble (use float64 instead, but maybe still store it differently).

It might be that pytest changed how it shows/deals with warnings, and previously it was just ignored because it was already triggered during test import.

(I mainly use debian for long time, so I would not have noticed fedora changes.)

charris · 2022-05-18T03:06:35Z

It is the string conversion that fails. If I install the broken version and import

In [3]: np.array('-0.1', dtype='<f16').tobytes()[0:10]
Out[3]: b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf'

Which is not correct. That is why I am suspecting the changes respecting "strtold_l".

Your division workaround works

In [7]: (np.array(-1, dtype='<f16')/10).tobytes()[0:10]
Out[7]: b'\xcd\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf'

That doesn't help with the broken string conversion, though. ISTR that we have our own conversion routine for long doubles.

charris · 2022-05-18T03:47:53Z

And the problem is that HAVE_STRTOLD_L is not defined in the failing versions.

Working version, v1.22.3:

charris@fc [numpy.git ((v1.22.3))]$ grep -r HAVE_STRTOLD_L build
build/src.linux-x86_64-3.10/numpy/core/include/numpy/config.h:#define HAVE_STRTOLD_L 1

Failing version

charris@fc [numpy.git (main)]$ grep -r HAVE_STRTOLD_L build
<nada>

charris · 2022-05-18T03:50:21Z

I suspect the other moved functions have suffered a similar fate.

hoodmane · 2022-05-18T04:11:07Z

For the record, on my system on 3f9fada:

$ grep -r HAVE_STRTOLD_L build
build/src.linux-x86_64-3.9/numpy/core/include/numpy/config.h:#define HAVE_STRTOLD_L 1

charris · 2022-05-18T04:28:51Z

I get

charris@fc [~]$ diff config-before.h config-after.h
36d35
< #define HAVE_STRTOLD_L 1
65a65
> #define HAVE_ATTRIBUTE_OPTIMIZE_OPT_2 1
215d214
< #define NPY_RELAXED_STRIDES_CHECKING 1

Try pulling main from upstream and running git clean -xdfq before python3 runtests.py -b to make sure you are current. I'm also running Python 3.10.4, but I don't think that matters.

hoodmane · 2022-05-18T04:55:02Z

Yeah, I still see #define HAVE_STRTOLD_L 1 in config.h when I build from current main branch with Python 3.10.4:

$ git checkout main
$ git pull upstream main
$ git clean -xdfq
$ python3.10 runtests.py -b
$ grep -r HAVE_STRTOLD_L build
build/src.linux-x86_64-3.10/numpy/core/include/numpy/config.h:#define HAVE_STRTOLD_L 1

charris · 2022-05-18T08:08:40Z

The problem is probably compiler related:

INFO: gcc: _configtest.c
_configtest.c: In function ‘main’:
_configtest.c:9:3: error: argument 1 null where non-null expected [-Werror=nonnull]
    9 |   strtold_l((const char*) 0, (char**) 0, (locale_t) 0);
      |   ^~~~~~~~~
In file included from _configtest.c:1:
/usr/include/stdlib.h:303:20: note: in a call to function ‘strtold_l’ declared ‘nonnull’
  303 | extern long double strtold_l (const char *__restrict __nptr,
      |                    ^~~~~~~~~
_configtest.c:9:3: error: argument 3 null where non-null expected [-Werror=nonnull]
    9 |   strtold_l((const char*) 0, (char**) 0, (locale_t) 0);
      |   ^~~~~~~~~
In file included from _configtest.c:1:
/usr/include/stdlib.h:303:20: note: in a call to function ‘strtold_l’ declared ‘nonnull’
  303 | extern long double strtold_l (const char *__restrict __nptr,
      |                    ^~~~~~~~~

charris · 2022-05-18T08:57:21Z

Werror=nonnull apparently comes along with -Wformat, which is turned on by -Wall. It can be disabled in CFLAGS with -Wno-nonnull. Seems like a recent gcc thing. Not sure what turns it into an error instead of a warning, and it is likely a bug for it the happen for prototypes. It can also be turned off in the code with a pragma.

seberg · 2022-05-18T17:27:13Z

@charris it sounds like we still need to add that warning ignore prgama to ensure we robustify against this issue? A similar thing was observed in another PR/issue and I suspect it was for the same reason (but on a different platform).

GCC 11 fails the configuration test for detecting `strtold_l` when the `-Wall` flag is present because `-Werror=nonnull` is activated. The fix is to add a pragma to `numpy/core/feature_detection_locale.h` to ignore that error. Closes numpy#21529.

charris · 2022-05-18T18:28:03Z

See #21534. Note that this is only for GCC.

GCC 11 fails the configuration test for detecting `strtold_l` when the `-Wall` flag is present because `-Werror=nonnull` is activated. The fix is to add a pragma to `numpy/core/feature_detection_locale.h` to ignore that error. Closes numpy#21529.

charris added the 00 - Bug label May 17, 2022

charris closed this as completed May 18, 2022

charris mentioned this issue May 18, 2022

BUG: Fix GCC error during build configuration #21534

Merged

charris mentioned this issue May 18, 2022

BUG: Fix GCC error during build configuration #21536

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local build testing broken on Fedora 35 by #21154. #21529

Local build testing broken on Fedora 35 by #21154. #21529

charris commented May 17, 2022 •

edited

charris commented May 17, 2022

hoodmane commented May 17, 2022

charris commented May 17, 2022

hoodmane commented May 17, 2022

charris commented May 17, 2022

hoodmane commented May 17, 2022

charris commented May 17, 2022

hoodmane commented May 17, 2022

charris commented May 17, 2022

charris commented May 17, 2022

hoodmane commented May 17, 2022

hoodmane commented May 17, 2022

charris commented May 17, 2022 •

edited

hoodmane commented May 17, 2022

charris commented May 17, 2022

charris commented May 17, 2022

charris commented May 17, 2022

hoodmane commented May 17, 2022 •

edited

seberg commented May 18, 2022

charris commented May 18, 2022

charris commented May 18, 2022

charris commented May 18, 2022

hoodmane commented May 18, 2022

charris commented May 18, 2022

hoodmane commented May 18, 2022 •

edited

charris commented May 18, 2022

charris commented May 18, 2022

seberg commented May 18, 2022

charris commented May 18, 2022

Local build testing broken on Fedora 35 by #21154. #21529

Local build testing broken on Fedora 35 by #21154. #21529

Comments

charris commented May 17, 2022 • edited

charris commented May 17, 2022

hoodmane commented May 17, 2022

charris commented May 17, 2022

hoodmane commented May 17, 2022

charris commented May 17, 2022

hoodmane commented May 17, 2022

charris commented May 17, 2022

hoodmane commented May 17, 2022

charris commented May 17, 2022

charris commented May 17, 2022

hoodmane commented May 17, 2022

hoodmane commented May 17, 2022

charris commented May 17, 2022 • edited

hoodmane commented May 17, 2022

charris commented May 17, 2022

charris commented May 17, 2022

charris commented May 17, 2022

hoodmane commented May 17, 2022 • edited

seberg commented May 18, 2022

charris commented May 18, 2022

charris commented May 18, 2022

charris commented May 18, 2022

hoodmane commented May 18, 2022

charris commented May 18, 2022

hoodmane commented May 18, 2022 • edited

charris commented May 18, 2022

charris commented May 18, 2022

seberg commented May 18, 2022

charris commented May 18, 2022

charris commented May 17, 2022 •

edited

charris commented May 17, 2022 •

edited

hoodmane commented May 17, 2022 •

edited

hoodmane commented May 18, 2022 •

edited