Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local build testing broken on Fedora 35 by #21154. #21529

Closed
charris opened this issue May 17, 2022 · 29 comments · Fixed by #21534
Closed

Local build testing broken on Fedora 35 by #21154. #21529

charris opened this issue May 17, 2022 · 29 comments · Fixed by #21534
Labels

Comments

@charris
Copy link
Member

charris commented May 17, 2022

Testing fails locally when checking long double with

ERROR numpy/core/tests/test_longdouble.py - UserWarning: Signature b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf\xff\x7f\x00\x00\x00\x00...
ERROR numpy/core/tests/test_multiarray.py - KeyboardInterrupt
ERROR numpy/core/tests/test_scalar_methods.py - UserWarning: Signature b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf\xff\x7f\x00\x00\x00...
ERROR numpy/core/tests/test_scalarmath.py - UserWarning: Signature b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf\xff\x7f\x00\x00\x00\x00...
ERROR numpy/core/tests/test_umath.py - UserWarning: Signature b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf\xff\x7f\x00\x00\x00\x00' for...

There are three problems I can see:

  1. The warning was not issued previous to ENH, BLD: Fix math feature detection for wasm  #21154.
  2. The warning should be ignored, maybe should not be issued (numpy/core/getlimits.py line 346).
  3. Why is CI not failing?

EDIT: Ignoring the warning leads to actual errrors:

ERROR numpy/core/tests/test_longdouble.py - AttributeError: 'MachAr' object has no attribute '_str_smallest_normal'
ERROR numpy/core/tests/test_scalar_methods.py - AttributeError: 'MachAr' object has no attribute '_str_smallest_normal'
ERROR numpy/core/tests/test_scalarmath.py - AttributeError: 'MachAr' object has no attribute '_str_smallest_normal'
ERROR numpy/core/tests/test_umath.py - AttributeError: 'MachAr' object has no attribute '_str_smallest_normal'

EDIT2: gcc (GCC) 11.3.1 20220421 (Red Hat 11.3.1-2)

@charris
Copy link
Member Author

charris commented May 17, 2022

@hoodmane Thoughts?

@hoodmane
Copy link
Contributor

Does this reproduce in some docker image? If you have a reproduction I can look into it.

@charris
Copy link
Member Author

charris commented May 17, 2022

I'm just doing python runtests.py -,m'full' on my local machine. Did a bisect to find the troublesome commit.

@hoodmane
Copy link
Contributor

Okay, I will try running that and see if it also happens on my local machine (Ubuntu 20.04 LTS). I guess so far there is no reason to think the problem is Fedora-specific.

@charris
Copy link
Member Author

charris commented May 17, 2022

More info:

CFLAGS = "-Wall -O3 -pipe -fomit-frame-pointer -fno-strict-aliasing -Wmaybe-uninitialized -Wdeprecated-declarations -march=native -Wimplicit-function-declaration"

Changing the optimization level to -O2 doesn't help. The failures occur during test collection with Pytest 6.2.5.

@hoodmane
Copy link
Contributor

I get some errors in numpy/typing/tests/test_typing.py::test_reveal but the test suite runs:

Details
$ git checkout 3f9fada45852a8f1b3fd9e0b760f34425099896c
<Blah blah blah detached head>
$ python runtests.py -m'full'
Building, see build.log...
    ... build in progress
Build OK
/home/hood/Documents/programming/numpy/build/testenv/lib/python3.9/site-packages/numpy/_pytesttester.py:145: DeprecationWarning: 

  `numpy.distutils` is deprecated since NumPy 1.23.0, as a result
  of the deprecation of `distutils` itself. It will be removed for
  Python >= 3.12. For older Python versions it will remain present.
  It is recommended to use `setuptools < 60.0` for those Python versions.
  For more details, see:
    https://numpy.org/devdocs/reference/distutils_status_migration.html 

<... test progress info ...>

  from numpy.distutils import cpuinfo
NumPy version 1.23.0.dev0+974.g3f9fada45
NumPy relaxed strides checking option: True
NumPy CPU features:  SSE SSE2 SSE3 SSSE3* SSE41* POPCNT* SSE42* AVX* F16C* FMA3* AVX2* AVX512F* AVX512CD* AVX512_KNL? AVX512_KNM? AVX512_SKX* AVX512_CLX* AVX512_CNL* AVX512_ICL*

...

FAILED numpy/typing/tests/test_typing.py::test_reveal[numeric] - AssertionError: Reveal mismatch at line 108
FAILED numpy/typing/tests/test_typing.py::test_reveal[arrayterator] - AssertionError: Reveal mismatch at line 12
FAILED numpy/typing/tests/test_typing.py::test_reveal[stride_tricks] - AssertionError: Reveal mismatch at line 24
FAILED numpy/typing/tests/test_typing.py::test_reveal[ndarray_misc] - AssertionError: Reveal mismatch at line 168
FAILED numpy/typing/tests/test_typing.py::test_reveal[dtype] - AssertionError: Reveal mismatch at line 51
FAILED numpy/typing/tests/test_typing.py::test_reveal[nditer] - AssertionError: Reveal mismatch at line 10
FAILED numpy/typing/tests/test_typing.py::test_reveal[array_constructors] - AssertionError: Reveal mismatch at line 171
FAILED numpy/typing/tests/test_typing.py::test_reveal[flatiter] - AssertionError: Reveal mismatch at line 8
FAILED numpy/typing/tests/test_typing.py::test_reveal[multiarray] - AssertionError: Reveal mismatch at line 37
FAILED numpy/typing/tests/test_typing.py::test_reveal[fromnumeric] - AssertionError: Reveal mismatch at line 122
FAILED numpy/typing/tests/test_typing.py::test_reveal[testing] - AssertionError: Reveal mismatch at line 33
FAILED numpy/typing/tests/test_typing.py::test_reveal[index_tricks] - AssertionError: Reveal mismatch at line 19
12 failed, 20000 passed, 168 skipped, 31 xfailed, 3 xpassed in 431.85s (0:07:11)

@charris
Copy link
Member Author

charris commented May 17, 2022

Pytest 7.1.2 also has the problem.

@hoodmane
Copy link
Contributor

I am using pytest v6.2.5

@charris
Copy link
Member Author

charris commented May 17, 2022

What compiler are you using?

@charris
Copy link
Member Author

charris commented May 17, 2022

I believe @seberg also runs Fedora and he hasn't reported any problems. I suppose it could be hardware dependent, but that would be strange.

@hoodmane
Copy link
Contributor

Info about my Python and compiler:

Details
Python 3.9.5 (default, Nov 23 2021, 15:27:38) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sysconfig
>>> sysconfig.get_config_var("CC")
'x86_64-linux-gnu-gcc -pthread'
$ x86_64-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=x86_64-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 9.3.0-17ubuntu1~20.04' --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-9 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)

@hoodmane
Copy link
Contributor

Well if we can find a docker image with Fedora 35 + Python, it would be great to check if it reproduces in that.

@charris
Copy link
Member Author

charris commented May 17, 2022

Out of curiosity, did you change any of the strings used to detect the long double type? The problem occurs during the Pytest collection because of the code that is run during the test module import. There seem to be at least two bugs:

  1. The type isn't detected before the fallback (it used to be).
  2. The fallback doesn't set machar._str_smallest_normal, and I would guess machar._str_smallest_subnormal as well.

@hoodmane
Copy link
Contributor

I don't think I changed long double support detection. If long double is missing, then maybe feature_detection_math.h would be troublesome because it mixes declarations with long double with other declarations that would be expected to exist even if there is no long double support. But if the machine does support long double I'm not exactly sure where that would have changed as a result of my patch. I think the long double detection happens before math library detection? I will have to look at the code again more carefully.

@charris
Copy link
Member Author

charris commented May 17, 2022

For some reason I'm suspicious of strtold_l, and maybe the changes in OPTIONAL_STDFUNCS.

@charris
Copy link
Member Author

charris commented May 17, 2022

OK, I found the source of the problem, the tested value -0.1 goes through a double before being converted to long double and then to a string. Compare float128 from string:

In [11]: array('-0.1', np.float128).tostring()[0:10]
<ipython-input-11-65dd16c3070a>:1: DeprecationWarning: tostring() is deprecated. Use tobytes() instead.
  array('-0.1', np.float128).tostring()[0:10]
Out[11]: b'\xcd\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf'

to from python float (double):

In [12]: array(-0.1, np.float128).tostring()[0:10]
<ipython-input-12-c826205185fb>:1: DeprecationWarning: tostring() is deprecated. Use tobytes() instead.
  array(-0.1, np.float128).tostring()[0:10]
Out[12]: b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf'

The latter is what is being checked and found wanting.

@charris
Copy link
Member Author

charris commented May 17, 2022

That is what is going on, but I don't know why it happens.

@hoodmane
Copy link
Contributor

hoodmane commented May 17, 2022

Huh. Well these two things specify numbers that are not equal: array('-0.1', np.float128) means the 128 bit float closest to -1/10, whereas array(-0.1, np.float128) means the 128 bit float that is upcast from the closest double to -1/10. Because the representation is little endian, the first two bytes represent the least significant part of the mantissa and are different, whereas the mantissa and the more significant parts of the exponent are the same. It's surprising that 96 bits of the 112 bit mantissas match when we only know for certain that 53 bits will match.

@seberg
Copy link
Member

seberg commented May 18, 2022

I think we had a similar problem that might be solved with -1/10 here on systems that were missing longdouble (use float64 instead, but maybe still store it differently).

It might be that pytest changed how it shows/deals with warnings, and previously it was just ignored because it was already triggered during test import.

(I mainly use debian for long time, so I would not have noticed fedora changes.)

@charris
Copy link
Member Author

charris commented May 18, 2022

It is the string conversion that fails. If I install the broken version and import

In [3]: np.array('-0.1', dtype='<f16').tobytes()[0:10]
Out[3]: b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf'

Which is not correct. That is why I am suspecting the changes respecting "strtold_l".

Your division workaround works

In [7]: (np.array(-1, dtype='<f16')/10).tobytes()[0:10]
Out[7]: b'\xcd\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf'

That doesn't help with the broken string conversion, though. ISTR that we have our own conversion routine for long doubles.

@charris
Copy link
Member Author

charris commented May 18, 2022

And the problem is that HAVE_STRTOLD_L is not defined in the failing versions.

Working version, v1.22.3:

charris@fc [numpy.git ((v1.22.3))]$ grep -r HAVE_STRTOLD_L build
build/src.linux-x86_64-3.10/numpy/core/include/numpy/config.h:#define HAVE_STRTOLD_L 1

Failing version

charris@fc [numpy.git (main)]$ grep -r HAVE_STRTOLD_L build
<nada>

@charris
Copy link
Member Author

charris commented May 18, 2022

I suspect the other moved functions have suffered a similar fate.

@hoodmane
Copy link
Contributor

For the record, on my system on 3f9fada:

$ grep -r HAVE_STRTOLD_L build
build/src.linux-x86_64-3.9/numpy/core/include/numpy/config.h:#define HAVE_STRTOLD_L 1

@charris
Copy link
Member Author

charris commented May 18, 2022

I get

charris@fc [~]$ diff config-before.h config-after.h
36d35
< #define HAVE_STRTOLD_L 1
65a65
> #define HAVE_ATTRIBUTE_OPTIMIZE_OPT_2 1
215d214
< #define NPY_RELAXED_STRIDES_CHECKING 1

Try pulling main from upstream and running git clean -xdfq before python3 runtests.py -b to make sure you are current. I'm also running Python 3.10.4, but I don't think that matters.

@hoodmane
Copy link
Contributor

hoodmane commented May 18, 2022

Yeah, I still see #define HAVE_STRTOLD_L 1 in config.h when I build from current main branch with Python 3.10.4:

$ git checkout main
$ git pull upstream main
$ git clean -xdfq
$ python3.10 runtests.py -b
$ grep -r HAVE_STRTOLD_L build
build/src.linux-x86_64-3.10/numpy/core/include/numpy/config.h:#define HAVE_STRTOLD_L 1

@charris
Copy link
Member Author

charris commented May 18, 2022

The problem is probably compiler related:

INFO: gcc: _configtest.c
_configtest.c: In function ‘main’:
_configtest.c:9:3: error: argument 1 null where non-null expected [-Werror=nonnull]
    9 |   strtold_l((const char*) 0, (char**) 0, (locale_t) 0);
      |   ^~~~~~~~~
In file included from _configtest.c:1:
/usr/include/stdlib.h:303:20: note: in a call to function ‘strtold_l’ declared ‘nonnull’
  303 | extern long double strtold_l (const char *__restrict __nptr,
      |                    ^~~~~~~~~
_configtest.c:9:3: error: argument 3 null where non-null expected [-Werror=nonnull]
    9 |   strtold_l((const char*) 0, (char**) 0, (locale_t) 0);
      |   ^~~~~~~~~
In file included from _configtest.c:1:
/usr/include/stdlib.h:303:20: note: in a call to function ‘strtold_l’ declared ‘nonnull’
  303 | extern long double strtold_l (const char *__restrict __nptr,
      |                    ^~~~~~~~~

@charris
Copy link
Member Author

charris commented May 18, 2022

Werror=nonnull apparently comes along with -Wformat, which is turned on by -Wall. It can be disabled in CFLAGS with -Wno-nonnull. Seems like a recent gcc thing. Not sure what turns it into an error instead of a warning, and it is likely a bug for it the happen for prototypes. It can also be turned off in the code with a pragma.

@charris charris closed this as completed May 18, 2022
@seberg
Copy link
Member

seberg commented May 18, 2022

@charris it sounds like we still need to add that warning ignore prgama to ensure we robustify against this issue? A similar thing was observed in another PR/issue and I suspect it was for the same reason (but on a different platform).

charris added a commit to charris/numpy that referenced this issue May 18, 2022
GCC 11 fails the configuration test for detecting `strtold_l` when the
`-Wall` flag is present because `-Werror=nonnull` is activated. The fix
is to add a pragma to `numpy/core/feature_detection_locale.h` to ignore
that error.

Closes numpy#21529.
@charris
Copy link
Member Author

charris commented May 18, 2022

See #21534. Note that this is only for GCC.

charris added a commit to charris/numpy that referenced this issue May 18, 2022
GCC 11 fails the configuration test for detecting `strtold_l` when the
`-Wall` flag is present because `-Werror=nonnull` is activated. The fix
is to add a pragma to `numpy/core/feature_detection_locale.h` to ignore
that error.

Closes numpy#21529.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants