New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad numerical matrix inversion with Ubuntu 18.04, gcc 7.5, and a Xeon Skylake-W processor #31007
Comments
Attachment: inverse_prob.py.gz |
comment:3
I have confirmed that this issue is fixed in the upstream repo:
However, the problem is present even in the most recent official release of 0.3.12. |
comment:4
Is there a particular commit that we can apply as a patch? |
Attachment: avx2.patch.gz |
comment:5
By bisecting, I determined that it is the commit fix avx2 detection which fixes the problem. It is possible to apply this as a patch to our current OpenBLAS 0.3.9, see attachment, and this does resolve the issue on my system. The underlying issue seems to be this, where OpenBLAS is incorrectly identifying my processor as lacking AVX2, even though it has it. I don't exactly understand how this causes the observed behavior; if I add NO_AVX2=1 the |
Commit: |
New commits:
|
Author: Nathan Dunfield |
comment:8
This does not seem to be relevant anymore, as the fix is included in Openblas 0.3.13, which is contained in Sage since 9.3.beta6. |
comment:9
Replying to @mwageringel:
Yes, I agree this was fixed by #29913 (Upgrade OpenBLAS to 0.3.13). What is the correct way to close this ticket in this situation? |
comment:10
Replying to @NathanDunfield:
Thanks for the confirmation. We can set the milestone to duplicate and give a positive review. |
Reviewer: Nathan Dunfield, Markus Wageringel |
Changed author from Nathan Dunfield to none |
I built Sage 9.2 from source on Ubuntu 18.04 with gcc 7.5 in a Docker container running on an iMac Pro with a Xeon W-2150B processor (Skylake-W). Inversion of CDF matrices gives bad answers, which I traced down to scipy and finally to OpenBLAS 0.3.9. Attached is a simple script that takes an 18 x 18 matrix of complex numbers, "inverts" it, multiplies by the original, and compares to the identity. The expected behavior is:
but on the system in question I get a wildly wrong answer:
The OpenBLAS library was built with Haswell as the target, as its file name is:
Copying a version OpenBLAS built on Sandy Bridge hardware fixes the problem. On the same physical hardware with the Xeon W-2150B processor, the problem does not manifest itself in:
a) macOS 10.15, Python 3.8, scipy 1.5.2
b) Debian Sid, with official SageMath 9.2 package.
c) Ubuntu 18.04 and 20.04, with the python3-scipy package.
d) Latest Anaconda Docker image.
The failing Docker image in question has been posted as
computop/sage:9.2a1
.Possibly related to #27961.
CC: @jhpalmieri @vbraun @mkoeppe @embray
Component: numerical
Branch/Commit: u/mkoeppe/bad_numerical_matrix_inversion_with_ubuntu_18_04__gcc_7_5__and_a_xeon_skylake_w_processor @
2cc103f
Reviewer: Nathan Dunfield, Markus Wageringel
Issue created by migration from https://trac.sagemath.org/ticket/31007
The text was updated successfully, but these errors were encountered: