-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: nan returned by np.linalg.det while it should be 0 on arm64 mac #22025
Comments
Thanks for the report @matteoacrossi - since these are only 4x4 arrays, would you mind embedding the examples here in the issue? I don't like downloading random zip files from the internet :) You can embed the examples in a fenced Python code block in markdown, like so:
|
@rossbar sure, I updated the OP |
Ah in this case there may be precision problems with the text representation of the floating point numbers. I can't reproduce but maybe that's because my values aren't the same as yours. I guess we will need to find a better way to share exact values... |
That is why I put the |
FYI, with numpy 1.22.4 and python 3.10.5, the following code (copy pastable) gives nan on on my machine (macbook pro, M1 max chip) when it should clearly give 0. It's only slightly modified (simplified a bit) from OP's example. import numpy as np
np.linalg.det(np.array([[1, 0, 0, 0],
[0, 1, 3.8307904270117927e-146, 0],
[0, 3.8307904270117927e-146, 1.4674955295685193e-291, 0],
[0, 0, 0, 0]])) The first time I run the code in an interactive python environment, I also get the warning:
Weirdly, the RuntimeWarning only shows up once, but the output is always nan (maybe there's just something I don't know about RuntimeWarnings and this makes sense?) I played around with the last digit of the element whose exponent is e-291 for fun. The values 1.4674955295685193e-291 and 1.4674955295685194e-291 give NaN, but 1.4674955295685192e-291 and 1.4674955295685195e-291 give 0. The values that give 0 also, predictably, don't give a RuntimeWarning. Looks like a fun bug. I would've been happy to dig deeper, but I have 0 idea how to investigate further. |
This is the default behavior of warnings in Python, you can customize it and further customize it using It sounds like an OpenBLAS issue, is this in rosetta or native mode? Could you also install |
I'm not using Rosetta (at least, I'm confident that I'm not). This is the output of >>> threadpoolctl.threadpool_info()
[{'user_api': 'blas',
'internal_api': 'openblas',
'prefix': 'libopenblas',
'filepath': '/opt/homebrew/Caskroom/miniforge/base/envs/numpy-test/lib/python3.10/site-packages/numpy/.dylibs/libopenblas64_.0.dylib',
'version': '0.3.20',
'threading_layer': 'pthreads',
'architecture': 'armv8',
'num_threads': 10}] When using the pip-installed numpy I get >>> threadpoolctl.threadpool_info()
[] Edit (mattip): reformatting |
From the output above, you are running the
OpenBLAS released 0.3.21 yesterday, I wonder if that will change anything. |
Indeed, after importing [{'architecture': 'armv8',
'filepath': '/opt/homebrew/Caskroom/miniforge/base/envs/testnumpy/lib/python3.10/site-packages/numpy/.dylibs/libopenblas64_.0.dylib',
'internal_api': 'openblas',
'num_threads': 10,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.20'}] How do I get the latest openblas? |
You can build from source (not recommended) or wait a few weeks until the conda forge feedstock is updated, and then install that. Note that it seems you are only using the
|
my goodness, I hope that's not the general impression of the time necessary for builds to arrive in conda-forge 😅 Normally it should be a routine thing (couple of days depending how quickly someone sees it). Occasionally there are other issues - seems there are some segfaults with 0.3.21 that will need solving, and in this case that means a currently indeterminate hold for analysis & fix, but hopefully it'll not be "weeks" even so. |
Sorry, I didn't mean to disparage the great work the conda-forge people do to get packages out quickly. |
No offence taken. Long delays can & do happen, but hopefully not often enough to be considered the norm. ;-) |
Openblas 0.3.21 is available in conda-forge. Could you give it a whirl? |
I tried updating Openblas 0.3.21 with conda-forge (checked with |
Could you post the output of |
BTW @h-vetinari, it says here that it is a known issue. Is there a issue I can subscribe to? |
Thanks. Looks like you're on the newest gfortran builds, which makes this yet another issue compared to the previous ones.
Known only in the sense that I was collecting issue links I was aware of, not "known" in the sense that we're not trying to fix it. ;) |
I had missed that my overview issue had been edited. I don't have an issue for tracking this in upstream openblas unfortunately. |
probably older problem, may not be limited to m1 (at least i could reproduce it with armv8 target as well, while truly generic C kernels are ok - have not tried on non-apple arm64 yet though). not sure if genuinely openblas bug or just an fma accuracy effect or compiler behaviour wrt denormals. istr there are more numpy issues about linalg.det division by zero warnings ? |
There are at least in the complex version. I tracked down the reason (on the OpenBLAS version I am using, which also shows this issue) to the code (also below) https://github.com/xianyi/OpenBLAS/blob/974acb39ff86121a5a94be4853f58bd728b56b81/lapack/getf2/zgetf2_k.c#L115-L125 if (fabs(temp1) >= fabs(temp2)){
ratio = temp2 / temp1;
den = dp1 /(temp1 * ( 1 + ratio * ratio));
temp3 = den;
temp4 = -ratio * den;
} else {
ratio = temp1 / temp2;
den = dp1 /(temp2 * ( 1 + ratio * ratio));
temp3 = ratio * den;
temp4 = -den;
} The issue is that it is (on my machine) compiled to:
(the marked line is the one that causes the warning, it divides 1 by 0.) If you look closer, the compiler executes both branches there and merges the result or so. That is wrong w.r.t. to FPEs. (I am surprise that doing 4 instead of 2 divisions seems like a an unsafe optimization.) Wondering if @Developer-Ecosystem-Engineering wants to have a look at that. It isn't quite related to this issue probably, but does lead to spurious warnings in the tests. |
Uh, I do not think the compiler should be doing this, is this GCC or LLVM being creative ? One could probably slap a #pragma on this file to disable optimization at least with "known bad" compiler versions... |
Should be clang on M1, maybe more. It does seem s a bit overly creative. If these were vector registers/instructions it would be less weird... I am seeing it with both the wheel and conda-forge openblas on M1. (I assume both is clang on Mac, but would have to dig for the versions. Probably 13.0.1 for conda, as that is what Python has.) |
Not identical code, but I think this shows similar thing on clang 14+15 for armv8, I think: https://godbolt.org/z/xE96cTvh7 IIRC, MacOS versioned clang differently, so that may match up. |
We ran into a similar problem with gcc-11.1 where the compiler executed both branches of an if statement in parallel for speed, but did not reset the flags for the branch not taken. The problem was fixed in the next minor gcc release. A local temporary fix was to compile at a lower optimization level. |
@charris thanks - do you recall if this was also on M1 (or at least either of OSX/arm64) ? |
No, it was Intel. I only noticed because fedora is cutting edge and I upgrade twice a year, which has its drawbacks :) See #18949 for the discussion. |
Thx, good to know (though probably no point in trying to proactively "fix" such things everywhere in OpenBLAS' C codes). Now to pragma or not to pragma this known case of zgetf2_k.c ... |
Hehe, I was hoping @Developer-Ecosystem-Engineering has a thought. Also not sure what pragma might actually work (beyond The original issue here is maybe still more interesting. I can repro it in the 1.24.2 wheels locally on M1, but I guess that doesn't mean it isn't fixed already. |
I'm not even sure if clang supports more fine-grained than "optimize off" (especially if we want it to work with older versions), must admit I did not look at who`s behind the account you tagged 🙂 |
Ah, the account is apple engineer(s) looking into M1 specific issues mainly. |
Returning to the original issue, on non-OSX arm64 I can reproduce it even with a fully generic, C-only build of OpenBLAS with compiler optimizations turned off. Have not tracked down the bit of code where the (alleged) overflow occurs though. |
I instead switched to |
ok, problem seems to be in OpenBLAS' lapack/getf2 (getf2_k.c) where a division by a minuscule value overflows (code prevents division by exact zero only, should probably cut off around 1.e-300 or so) |
Describe the issue:
np.linalg.det
on a 4x4 real matrix returnsnan
, while it should be 0.If the same code example is run on an intel64 linux machine, the determinant is zero for both matrices.
This happens both with conda-forge and pypi numpy, on both python 3.9 and 3.10.
Reproduce the code example:
This one instead works fine:
I'm also attaching a .zip file with the matrices saved as
.npy
files in case there are numerical precision problems:numpy_nan.zip.
test_nan.npy
gives thenan
while fortest_ok.npy
the determinant is correct.Error message:
No response
NumPy/Python version information:
1.23.1 3.10.0 | packaged by conda-forge | (default, Nov 20 2021, 02:27:15) [Clang 11.1.0 ]
Output of
np.show_config()
:The text was updated successfully, but these errors were encountered: