Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is wrong if np.zeros(2) == 0 returns array([False, True], dtype=bool)? #6251

Closed
andsor opened this issue Aug 26, 2015 · 36 comments
Closed

Comments

@andsor
Copy link

andsor commented Aug 26, 2015

Hi, my first issue here in this great package so please bear with me :-)
As I encounter the above and this strange behaviour in a HPC environment.

import numpy as np
for i in range(42):
    result = np.zeros(i) == 0.0
    print(i, np.nonzero(np.invert(result))[0])

prints

0 []
1 []
2 [0]
3 [0]
4 [0]
5 [0]
6 [0]
7 [0]
8 []
9 [8]
10 [8]
11 [8]
12 [8]
13 [8]
14 [8]
15 [8]
16 []
17 [16]
18 [16]
19 [16]
20 [16]
21 [16]
22 [16]
23 [16]
24 []
25 [24]
26 [24]
27 [24]
28 [24]
29 [24]
30 [24]
31 [24]
32 []
33 [32]
34 [32]
35 [32]
36 [32]
37 [32]
38 [32]
39 [32]
40 []
41 [40]
np.show_config()
lapack_mkl_info:
  NOT AVAILABLE
openblas_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE
blas_opt_info:
    define_macros = [('ATLAS_INFO', '"\\"3.10.2\\""')]
    libraries = ['ptf77blas', 'ptcblas', 'atlas']
    language = c
    library_dirs = ['/usr/nld/atlas-3.10.2/lib']
atlas_threads_info:
    define_macros = [('ATLAS_INFO', '"\\"3.10.2\\""')]
    libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas']
    language = f77
    library_dirs = ['/usr/nld/atlas-3.10.2/lib']
lapack_opt_info:
    define_macros = [('ATLAS_INFO', '"\\"3.10.2\\""')]
    libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas']
    language = f77
    library_dirs = ['/usr/nld/atlas-3.10.2/lib']
atlas_blas_threads_info:
    define_macros = [('ATLAS_INFO', '"\\"3.10.2\\""')]
    libraries = ['ptf77blas', 'ptcblas', 'atlas']
    language = c
    library_dirs = ['/usr/nld/atlas-3.10.2/lib']
mkl_info:
  NOT AVAILABLE
openblas_lapack_info:
  NOT AVAILABLE

NumPy compilation with

export ATLAS=/usr/nld/atlas-3.10.2/lib/libtatlas.so
export LAPACK=/usr/nld/atlas-3.10.2/lib/liblapack.a
export BLAS=/usr/nld/atlas-3.10.2/lib/libcblas.a
  • Python 3.4.3 CPython
  • GCC 4.3.4 [gcc-4_3-branch revision 152973
  • Linux-2.6.32.59-0.7-default-x86_64-with-SuSE-11-x86_64
$ pip list
decorator (4.0.2)
ipython (4.0.0)
ipython-genutils (0.1.0)
nose (1.3.7)
numpy (1.9.2)
path.py (7.7.1)
pexpect (3.3)
pickleshare (0.5)
pip (7.1.2)
pkgconfig (1.1.0)
readline (6.2.4.1)
scipy (0.16.0)
setuptools (18.2)
simplegeneric (0.8.1)
six (1.9.0)
traitlets (4.0.0)
$ /bin/ls /usr/nld/atlas-3.10.2/lib 
libatlas.a  libcblas.a  libf77blas.a  liblapack.a  libptcblas.a  libptf77blas.a  libsatlas.so  libtatlas.so

By the way, under Python 2.7.9 on the same system with the same setup, this bug does not occur.

@andsor
Copy link
Author

andsor commented Aug 26, 2015

In a fresh environment with the same Python 3.4.3, I installed NumPy without ATLAS etc, still giving the same error.

@pv
Copy link
Member

pv commented Aug 26, 2015 via email

@andsor
Copy link
Author

andsor commented Aug 26, 2015

Hey, thanks. So the comparison operation is giving the incorrect results, for example here is the output for i == 2:

2
[  0.   0.]
[False  True]
[ True False]
[0]

@andsor
Copy link
Author

andsor commented Aug 26, 2015

Here is the Python 2.7.9

$ /usr/nld/python-2.7.9-cluster/bin/python
Python 2.7.9 (default, Mar  5 2015, 16:40:16) 
[GCC 4.3.4 [gcc-4_3-branch revision 152973]] on linux2

And here Python 3.4.3

$ /usr/nld/python-3.4.3-cluster/bin/python3
Python 3.4.3 (default, Mar  5 2015, 17:21:36) 
[GCC 4.3.4 [gcc-4_3-branch revision 152973]] on linux

@pv
Copy link
Member

pv commented Aug 26, 2015 via email

@andsor
Copy link
Author

andsor commented Aug 26, 2015

Thanks, I'll check that. BTW I just compiled NumPy 1.9.2 under the GCC-4.3-compiled Python 3.4.3 above, but used GCC 4.7.2, and the bug disappears.

@pv
Copy link
Member

pv commented Aug 26, 2015 via email

@andsor
Copy link
Author

andsor commented Aug 26, 2015

Wow, even across different Python versions with the same compiler.

@andsor
Copy link
Author

andsor commented Aug 26, 2015

Ok, NumPy 1.7.2 under GCC 4.3.4 compiled Python 3.4.3 works

@andsor
Copy link
Author

andsor commented Aug 26, 2015

OK confirmed that under GCC 4.3.4 compiled Python 3.4.3, NumPy 1.9.2 compiled with GCC 4.3.4
including the change of numpy/core/src/umath/simd.inc.src from
.
#ifdef NPY_HAVE_SSE2_INTRINSICS
.
to
.
#undef NPY_HAVE_SSE2_INTRINSICS
#ifdef NPY_HAVE_SSE2_INTRINSICS
.
and reinstalling from a clean build works. And np.zeros(2) == 0.0 returns array([ True, True], dtype=bool).

@andsor
Copy link
Author

andsor commented Aug 26, 2015

So, to summarize, the bug seems to occur in my setting if all the following conditions are
met:

  • the cluster standard version of the C compiler (GCC 4.3.4)
  • SSE2 vectorization introduced after NumPy 1.7.2
  • Python 3.4.3

@juliantaylor
Copy link
Contributor

A compiler that miscompiles vectorized code on a HPC cluster is a pretty clear cut case to convince admins to upgrade the version. The cluster is likely just producing wrong results for every program.

though I'd like to check that out, are there any custom patches on the gcc?
can you provide the disassembly of sse2_binary_DOUBLE_equal via objdump -d on umath.so

@juliantaylor
Copy link
Contributor

I assume numpy.test() fails too?

@andsor
Copy link
Author

andsor commented Aug 26, 2015

numpy.test() fails with a lot of AssertionError: Arrays are not equal

@andsor
Copy link
Author

andsor commented Aug 26, 2015

Ran 5484 tests in 23.528s

FAILED (KNOWNFAIL=6, SKIP=14, errors=104, failures=1046)

@andsor
Copy link
Author

andsor commented Aug 26, 2015

Here is the objdump of umath.so | grep sse
https://gist.github.com/andsor/4ad5460f8bc641124233

@andsor
Copy link
Author

andsor commented Aug 26, 2015

How do I find out custom patches on the GCC?

@andsor
Copy link
Author

andsor commented Aug 26, 2015

Thanks for your help!

@pv
Copy link
Member

pv commented Aug 26, 2015 via email

@juliantaylor
Copy link
Contributor

as it works with python2 I assume its an aliasing issue in the ordered compare part. A bit strange I didn't think there would be anything for the compiler to screw up.

Out of curiosity what kind of hardware is the cluster using that it is using gcc 4.3? itanium? 4.3 shouldn't support anything much newer than that.

@andsor
Copy link
Author

andsor commented Aug 26, 2015

@pv Ah I see , obviously. Is there a standard way to upload that 3.9 M ?

@andsor
Copy link
Author

andsor commented Aug 26, 2015

Ok here is the full output of $ objdump -d numpy-test/lib/python3.4/site-packages/numpy/core/umath.cpython-34m.so

https://gist.github.com/andsor/4ad5460f8bc641124233#file-umath-objdump

@njsmith
Copy link
Member

njsmith commented Aug 26, 2015

Re hardware: "Linux-2.6.32.59-0.7-default-x86_64-with-SuSE-11-x86_64"
On Aug 26, 2015 1:40 PM, "Julian Taylor" notifications@github.com wrote:

as it works with python2 I assume its an aliasing issue in the ordered
compare part. A bit strange I didn't think there would be anything for the
compiler to screw up.

Out of curiosity what kind of hardware is the cluster using that it is
using gcc 4.3? itanium? 4.3 shouldn't support anything much newer than that.


Reply to this email directly or view it on GitHub
#6251 (comment).

@juliantaylor
Copy link
Contributor

xz compressed it should go down to a few kilobyte, you can send that per email e.g. to jtaylor.debian@googlemail.com

@andsor
Copy link
Author

andsor commented Aug 26, 2015

Here is the objdump for the Python 2.7.9 version:
https://gist.github.com/andsor/4ad5460f8bc641124233#file-umath-python-2-7-9-objdump

@juliantaylor
Copy link
Contributor

is the python 3 build a debug build? its did not even inline npy_is_aligned which is like three instructions. the python2 build did

@juliantaylor
Copy link
Contributor

oh wait both didn't... so don't expect good performance with that build
It is a strict aliasing issue, should be fixable. But I still recommend you use a newer compiler

@andsor
Copy link
Author

andsor commented Aug 26, 2015

Well I am not that familiar with building Python, but the admins say they configured and installed it with

./configure --prefix=/usr/nld/python-3.4.3-cluster --with-ensurepip=install
make ; make install

However, I am not sure whether it was a clean install or whether they used it for the desktops before. Just guessing here, sorry.

@juliantaylor
Copy link
Contributor

can you put a numpy build log into the gist

@andsor
Copy link
Author

andsor commented Aug 26, 2015

juliantaylor added a commit to juliantaylor/numpy that referenced this issue Aug 26, 2015
Didn't think the violation could cause issues but apparently some
compilers do mess it up. Also the old code is overly complicated, don't
know what I was thinking ...

Closes numpygh-6251
@juliantaylor
Copy link
Contributor

hm ok looks like a normal build, still a bit strange assembly.
can you try this patch https://github.com/numpy/numpy/pull/6252.diff
apply it with patch -p1 <filename from the numpy root

@andsor
Copy link
Author

andsor commented Aug 26, 2015

Magnificient. That fixes it.

@andsor
Copy link
Author

andsor commented Aug 26, 2015

(Of course still better to have a newer compiler, though)

@juliantaylor
Copy link
Contributor

good thanks for testing.

@andsor
Copy link
Author

andsor commented Aug 26, 2015

For the record: np.test()

Running unit tests for numpy
NumPy version 1.9.2
NumPy is installed in /home/sorge/numpy-test/lib/python3.4/site-packages/numpy
Python version 3.4.3 (default, Mar 5 2015, 17:21:36) [GCC 4.3.4 [gcc-4_3-branch revision 152973]]
nose version 1.3.7
...
Ran 5592 tests in 35.359s

OK (KNOWNFAIL=6, SKIP=14)
<nose.result.TextTestResult run=5592 errors=0 failures=0>

@andsor
Copy link
Author

andsor commented Aug 26, 2015

Thanks for all your help guys. That's it for me tonight!

andsor pushed a commit to andsor/numpy that referenced this issue Aug 26, 2015
Didn't think the violation could cause issues but apparently some
compilers do mess it up. Also the old code is overly complicated, don't
know what I was thinking ...

Closes numpygh-6251
jaimefrio pushed a commit to jaimefrio/numpy that referenced this issue Mar 22, 2016
Didn't think the violation could cause issues but apparently some
compilers do mess it up. Also the old code is overly complicated, don't
know what I was thinking ...

Closes numpygh-6251
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants