Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: nan returned by np.linalg.det while it should be 0 on arm64 mac #22025

Open
matteoacrossi opened this issue Jul 22, 2022 · 34 comments
Open
Labels

Comments

@matteoacrossi
Copy link

matteoacrossi commented Jul 22, 2022

Describe the issue:

np.linalg.det on a 4x4 real matrix returns nan, while it should be 0.

If the same code example is run on an intel64 linux machine, the determinant is zero for both matrices.

This happens both with conda-forge and pypi numpy, on both python 3.9 and 3.10.

Reproduce the code example:

import numpy as np
np.linalg.det(np.array([[1.0000000000000000e+000, 0.0000000000000000e+000,
         0.0000000000000000e+000, 0.0000000000000000e+000],
        [0.0000000000000000e+000, 1.0000000000000000e+000,
         3.8307904270117927e-146, 1.4674955295685193e-291],
        [0.0000000000000000e+000, 3.8307904270117927e-146,
         1.4674955295685193e-291, 0.0000000000000000e+000],
        [0.0000000000000000e+000, 1.4674955295685193e-291,
         0.0000000000000000e+000, 0.0000000000000000e+000]]))
/opt/homebrew/Caskroom/miniforge/base/envs/numpy-test/lib/python3.10/site-packages/numpy/linalg/linalg.py:2154: RuntimeWarning: overflow encountered in det
  r = _umath_linalg.det(a, signature=signature)
/opt/homebrew/Caskroom/miniforge/base/envs/numpy-test/lib/python3.10/site-packages/numpy/linalg/linalg.py:2154: RuntimeWarning: invalid value encountered in det
  r = _umath_linalg.det(a, signature=signature)
nan

This one instead works fine:

np.linalg.det(np.array([[1.0000000000000000e+000, 0.0000000000000000e+000,
         0.0000000000000000e+000, 0.0000000000000000e+000],
        [0.0000000000000000e+000, 1.0000000000000000e+000,
         3.8307904315347111e-146, 1.4674955330337903e-291],
        [0.0000000000000000e+000, 3.8307904315347111e-146,
         1.4674955330337899e-291, 0.0000000000000000e+000],
        [0.0000000000000000e+000, 1.4674955330337903e-291,
         0.0000000000000000e+000, 0.0000000000000000e+000]]))

I'm also attaching a .zip file with the matrices saved as .npy files in case there are numerical precision problems:
numpy_nan.zip.

test_nan.npy gives the nan while for test_ok.npy the determinant is correct.

Error message:

No response

NumPy/Python version information:

1.23.1 3.10.0 | packaged by conda-forge | (default, Nov 20 2021, 02:27:15) [Clang 11.1.0 ]

Output of np.show_config():

>>> np.show_config()
openblas64__info:
    libraries = ['openblas64_', 'openblas64_']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None)]
    runtime_library_dirs = ['/usr/local/lib']
blas_ilp64_opt_info:
    libraries = ['openblas64_', 'openblas64_']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None)]
    runtime_library_dirs = ['/usr/local/lib']
openblas64__lapack_info:
    libraries = ['openblas64_', 'openblas64_']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None), ('HAVE_LAPACKE', None)]
    runtime_library_dirs = ['/usr/local/lib']
lapack_ilp64_opt_info:
    libraries = ['openblas64_', 'openblas64_']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None), ('HAVE_LAPACKE', None)]
    runtime_library_dirs = ['/usr/local/lib']
Supported SIMD extensions in this NumPy install:
    baseline = NEON,NEON_FP16,NEON_VFPV4,ASIMD
    found = ASIMDHP
    not found = ASIMDDP,ASIMDFHM
@rossbar
Copy link
Contributor

rossbar commented Jul 22, 2022

Thanks for the report @matteoacrossi - since these are only 4x4 arrays, would you mind embedding the examples here in the issue? I don't like downloading random zip files from the internet :)

You can embed the examples in a fenced Python code block in markdown, like so:

```python
test_nan = np.array([<whatever>])
test_ok = np.array([<whatever>])
```

@matteoacrossi
Copy link
Author

@rossbar sure, I updated the OP

@rossbar
Copy link
Contributor

rossbar commented Jul 22, 2022

Ah in this case there may be precision problems with the text representation of the floating point numbers. I can't reproduce but maybe that's because my values aren't the same as yours. I guess we will need to find a better way to share exact values...

@matteoacrossi
Copy link
Author

That is why I put the .npy files in the zip file. But I can reproduce the problem by pasting the text in the OP on my system (M1 Pro Macbook Pro)

@RBrearton
Copy link

FYI, with numpy 1.22.4 and python 3.10.5, the following code (copy pastable) gives nan on on my machine (macbook pro, M1 max chip) when it should clearly give 0. It's only slightly modified (simplified a bit) from OP's example.

import numpy as np

np.linalg.det(np.array([[1, 0, 0, 0],
                        [0, 1, 3.8307904270117927e-146, 0],
                        [0, 3.8307904270117927e-146, 1.4674955295685193e-291, 0],
                        [0, 0, 0, 0]]))

The first time I run the code in an interactive python environment, I also get the warning:

/path/to/env/lib/python3.10/site-packages/numpy/linalg/linalg.py:2146: RuntimeWarning: overflow encountered in det
  r = _umath_linalg.det(a, signature=signature)
/path/to/env/lib/python3.10/site-packages/numpy/linalg/linalg.py:2146: RuntimeWarning: invalid value encountered in det
  r = _umath_linalg.det(a, signature=signature)

Weirdly, the RuntimeWarning only shows up once, but the output is always nan (maybe there's just something I don't know about RuntimeWarnings and this makes sense?)

I played around with the last digit of the element whose exponent is e-291 for fun. The values 1.4674955295685193e-291 and 1.4674955295685194e-291 give NaN, but 1.4674955295685192e-291 and 1.4674955295685195e-291 give 0. The values that give 0 also, predictably, don't give a RuntimeWarning.

Looks like a fun bug. I would've been happy to dig deeper, but I have 0 idea how to investigate further.

@seberg
Copy link
Member

seberg commented Aug 8, 2022

Weirdly, the RuntimeWarning only shows up once, but the output is always nan (maybe there's just something I don't know about RuntimeWarnings and this makes sense?)

This is the default behavior of warnings in Python, you can customize it and further customize it using np.errstate.

It sounds like an OpenBLAS issue, is this in rosetta or native mode? Could you also install threadpoolctl and paste the output of threadpoolctl.threadpool_info()

@matteoacrossi
Copy link
Author

matteoacrossi commented Aug 8, 2022

I'm not using Rosetta (at least, I'm confident that I'm not).

This is the output of threadpoolctl.threadpool_info() when using the conda-forge numpy:

>>> threadpoolctl.threadpool_info()
[{'user_api': 'blas', 
  'internal_api': 'openblas', 
  'prefix': 'libopenblas', 
  'filepath': '/opt/homebrew/Caskroom/miniforge/base/envs/numpy-test/lib/python3.10/site-packages/numpy/.dylibs/libopenblas64_.0.dylib', 
  'version': '0.3.20', 
  'threading_layer': 'pthreads', 
  'architecture': 'armv8', 
  'num_threads': 10}]

When using the pip-installed numpy I get

>>> threadpoolctl.threadpool_info()
[]

Edit (mattip): reformatting

@mattip
Copy link
Member

mattip commented Aug 8, 2022

From the output above, you are running the arm64 variant and using the homebrew-provided OpenBLAS 0.3.20. It is strange that the pip-installed numpy does not show anything, I think you need to import numpy first:

python -c "import numpy, threadpoolctl, pprint; \
pprint.pprint(threadpoolctl.threadpool_info())"

OpenBLAS released 0.3.21 yesterday, I wonder if that will change anything.

@matteoacrossi
Copy link
Author

Indeed, after importing numpy I got the output, sorry about that:

[{'architecture': 'armv8',
  'filepath': '/opt/homebrew/Caskroom/miniforge/base/envs/testnumpy/lib/python3.10/site-packages/numpy/.dylibs/libopenblas64_.0.dylib',
  'internal_api': 'openblas',
  'num_threads': 10,
  'prefix': 'libopenblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.20'}]

How do I get the latest openblas?

@mattip
Copy link
Member

mattip commented Aug 8, 2022

How do I get the latest openblas?

You can build from source (not recommended) or wait a few weeks until the conda forge feedstock is updated, and then install that. Note that it seems you are only using the pip installed numpy, to install from conda forge you would have to do

pip uninstall numpy
conda install openblas==0.3.21
conda install --force-reinstall numpy

@h-vetinari
Copy link
Contributor

or wait a few weeks until the conda forge feedstock is updated

my goodness, I hope that's not the general impression of the time necessary for builds to arrive in conda-forge 😅

Normally it should be a routine thing (couple of days depending how quickly someone sees it). Occasionally there are other issues - seems there are some segfaults with 0.3.21 that will need solving, and in this case that means a currently indeterminate hold for analysis & fix, but hopefully it'll not be "weeks" even so.

@mattip
Copy link
Member

mattip commented Aug 8, 2022

Sorry, I didn't mean to disparage the great work the conda-forge people do to get packages out quickly.

@h-vetinari
Copy link
Contributor

No offence taken. Long delays can & do happen, but hopefully not often enough to be considered the norm. ;-)

@h-vetinari
Copy link
Contributor

Openblas 0.3.21 is available in conda-forge. Could you give it a whirl?

@matteoacrossi
Copy link
Author

I tried updating Openblas 0.3.21 with conda-forge (checked with threadpool_info()) and I get the same nan problem. Also updated numpy to 1.23.2 but no change.

@h-vetinari
Copy link
Contributor

Could you post the output of conda list please?

@matteoacrossi
Copy link
Author

conda list
# packages in environment at /opt/homebrew/Caskroom/miniforge/base/envs/numpy-test:
#
# Name                    Version                   Build  Channel
bzip2                     1.0.8                h3422bc3_4    conda-forge
ca-certificates           2022.6.15            h4653dfc_0    conda-forge
libblas                   3.9.0           16_osxarm64_openblas    conda-forge
libcblas                  3.9.0           16_osxarm64_openblas    conda-forge
libcxx                    14.0.6               h04bba0f_0    conda-forge
libffi                    3.4.2                h3422bc3_5    conda-forge
libgfortran               5.0.0           11_3_0_hd922786_25    conda-forge
libgfortran5              11.3.0              hdaf2cc0_25    conda-forge
liblapack                 3.9.0           16_osxarm64_openblas    conda-forge
libopenblas               0.3.21          openmp_hc731615_2    conda-forge
libsqlite                 3.39.2               h2c9beb0_1    conda-forge
libzlib                   1.2.12               ha287fd2_2    conda-forge
llvm-openmp               14.0.4               hd125106_0    conda-forge
ncurses                   6.3                  h07bb92c_1    conda-forge
numpy                     1.23.2          py310h127c7cf_0    conda-forge
openssl                   3.0.5                h7aea29f_1    conda-forge
pip                       22.2.2             pyhd8ed1ab_0    conda-forge
python                    3.10.0          h43b31ca_3_cpython    conda-forge
python_abi                3.10                    2_cp310    conda-forge
readline                  8.1.2                h46ed386_0    conda-forge
setuptools                65.1.0          py310hbe9552e_0    conda-forge
sqlite                    3.39.2               h40dfcc0_1    conda-forge
threadpoolctl             3.1.0              pyh8a188c0_0    conda-forge
tk                        8.6.12               he1e0b03_0    conda-forge
tzdata                    2022c                h191b570_0    conda-forge
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
xz                        5.2.6                h57fd34a_0    conda-forge

BTW @h-vetinari, it says here that it is a known issue. Is there a issue I can subscribe to?

@h-vetinari
Copy link
Contributor

Thanks. Looks like you're on the newest gfortran builds, which makes this yet another issue compared to the previous ones.

BTW @h-vetinari, it says here that it is a known issue. Is there a issue I can subscribe to?

Known only in the sense that I was collecting issue links I was aware of, not "known" in the sense that we're not trying to fix it. ;)
That issue is not a bad place to subscribe to.

@h-vetinari
Copy link
Contributor

Known only in the sense that I was collecting issue links I was aware of, not "known" in the sense that we're not trying to fix it. ;)

I had missed that my overview issue had been edited. I don't have an issue for tracking this in upstream openblas unfortunately.

@martin-frbg
Copy link

martin-frbg commented Aug 24, 2022

probably older problem, may not be limited to m1 (at least i could reproduce it with armv8 target as well, while truly generic C kernels are ok - have not tried on non-apple arm64 yet though). not sure if genuinely openblas bug or just an fma accuracy effect or compiler behaviour wrt denormals. istr there are more numpy issues about linalg.det division by zero warnings ?

@seberg
Copy link
Member

seberg commented Feb 21, 2023

istr there are more numpy issues about linalg.det division by zero warnings ?

There are at least in the complex version. I tracked down the reason (on the OpenBLAS version I am using, which also shows this issue) to the code (also below) https://github.com/xianyi/OpenBLAS/blob/974acb39ff86121a5a94be4853f58bd728b56b81/lapack/getf2/zgetf2_k.c#L115-L125

	if (fabs(temp1) >= fabs(temp2)){
	  ratio = temp2 / temp1;
	  den = dp1 /(temp1 * ( 1 + ratio * ratio));
	  temp3 =  den;
	  temp4 = -ratio * den;
	} else {
	  ratio = temp1 / temp2;
	  den = dp1 /(temp2 * ( 1 + ratio * ratio));
	  temp3 =  ratio * den;
	  temp4 = -den;
	}

The issue is that it is (on my machine) compiled to:

    0x1065527e0 <+532>: 0x1e60c120   fabs   d0, d9
    0x1065527e4 <+536>: 0x1e60c141   fabs   d1, d10
    0x1065527e8 <+540>: 0x1e612000   fcmp   d0, d1
->  0x1065527ec <+544>: 0x1e6a1920   fdiv   d0, d9, d10
    0x1065527f0 <+548>: 0x1f402001   fmadd  d1, d0, d0, d8
    0x1065527f4 <+552>: 0x1e610941   fmul   d1, d10, d1
    0x1065527f8 <+556>: 0x1e611901   fdiv   d1, d8, d1
    0x1065527fc <+560>: 0x1e610800   fmul   d0, d0, d1
    0x106552800 <+564>: 0x1e614021   fneg   d1, d1
    0x106552804 <+568>: 0x1e691942   fdiv   d2, d10, d9
    0x106552808 <+572>: 0x1f422043   fmadd  d3, d2, d2, d8
    0x10655280c <+576>: 0x1e630923   fmul   d3, d9, d3
    0x106552810 <+580>: 0x1e631903   fdiv   d3, d8, d3
    0x106552814 <+584>: 0x1e614042   fneg   d2, d2
    0x106552818 <+588>: 0x1e620862   fmul   d2, d3, d2
    0x10655281c <+592>: 0x1e63bc00   fcsel  d0, d0, d3, lt
    0x106552820 <+596>: 0x1e62bc21   fcsel  d1, d1, d2, lt

(the marked line is the one that causes the warning, it divides 1 by 0.)

If you look closer, the compiler executes both branches there and merges the result or so. That is wrong w.r.t. to FPEs. (I am surprise that doing 4 instead of 2 divisions seems like a an unsafe optimization.)

Wondering if @Developer-Ecosystem-Engineering wants to have a look at that. It isn't quite related to this issue probably, but does lead to spurious warnings in the tests.

@martin-frbg
Copy link

Uh, I do not think the compiler should be doing this, is this GCC or LLVM being creative ? One could probably slap a #pragma on this file to disable optimization at least with "known bad" compiler versions...

@seberg
Copy link
Member

seberg commented Feb 21, 2023

Should be clang on M1, maybe more. It does seem s a bit overly creative. If these were vector registers/instructions it would be less weird...

I am seeing it with both the wheel and conda-forge openblas on M1. (I assume both is clang on Mac, but would have to dig for the versions. Probably 13.0.1 for conda, as that is what Python has.)

@seberg
Copy link
Member

seberg commented Feb 21, 2023

Not identical code, but I think this shows similar thing on clang 14+15 for armv8, I think: https://godbolt.org/z/xE96cTvh7

IIRC, MacOS versioned clang differently, so that may match up.

@charris
Copy link
Member

charris commented Feb 21, 2023

We ran into a similar problem with gcc-11.1 where the compiler executed both branches of an if statement in parallel for speed, but did not reset the flags for the branch not taken. The problem was fixed in the next minor gcc release. A local temporary fix was to compile at a lower optimization level.

@martin-frbg
Copy link

@charris thanks - do you recall if this was also on M1 (or at least either of OSX/arm64) ?

@charris
Copy link
Member

charris commented Feb 21, 2023

do you recall if this was also on M1

No, it was Intel. I only noticed because fedora is cutting edge and I upgrade twice a year, which has its drawbacks :) See #18949 for the discussion.

@martin-frbg
Copy link

Thx, good to know (though probably no point in trying to proactively "fix" such things everywhere in OpenBLAS' C codes). Now to pragma or not to pragma this known case of zgetf2_k.c ...

@seberg
Copy link
Member

seberg commented Feb 21, 2023

Hehe, I was hoping @Developer-Ecosystem-Engineering has a thought. Also not sure what pragma might actually work (beyond -O0).

The original issue here is maybe still more interesting. I can repro it in the 1.24.2 wheels locally on M1, but I guess that doesn't mean it isn't fixed already.

@martin-frbg
Copy link

I'm not even sure if clang supports more fine-grained than "optimize off" (especially if we want it to work with older versions), must admit I did not look at who`s behind the account you tagged 🙂

@seberg
Copy link
Member

seberg commented Feb 21, 2023

Ah, the account is apple engineer(s) looking into M1 specific issues mainly.

@martin-frbg
Copy link

Returning to the original issue, on non-OSX arm64 I can reproduce it even with a fully generic, C-only build of OpenBLAS with compiler optimizations turned off. Have not tracked down the bit of code where the (alleged) overflow occurs though.

@matteoacrossi
Copy link
Author

matteoacrossi commented Feb 26, 2023

I instead switched to accelerate from OpenBLAS and don't reproduce the problem anymore.

@martin-frbg
Copy link

ok, problem seems to be in OpenBLAS' lapack/getf2 (getf2_k.c) where a division by a minuscule value overflows (code prevents division by exact zero only, should probably cut off around 1.e-300 or so)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants