-
-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: use OpenBLAS64 bit interfaces #13956
Comments
>>> np.__config__.show()
blas_mkl_info:
NOT AVAILABLE
blis_info:
NOT AVAILABLE
openblas_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
NOT AVAILABLE
openblas_lapack_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)] For symmetric matrix File "blahblah/python3.6/site-packages/numpy/linalg/linalg.py", line 1456, in eigh
w, vt = gufunc(a, signature=signature, extobj=extobj)
ValueError: On entry to DORMTR parameter number 12 had an illegal value When one try the line It is worth noting this issue is not due to memory limitation, since diagonalization for matrix of this dimension requires only 12GB memory (and I watched on htop for the whole process), while I have 256GB memory on the workstation. Though in other words, you need more than 12GB memory to repeat and catch such error. For matrix with dimension no more than 32766, everything seems fine. Therefore, I would be surprised if this issue has nothing to do with int16. Update: 3) Test on another workstation with Ubuntu16.04, and python3.5.2 with pip installed numpy 1.16.2. The problem is exactly the same as described above. Summary: Though the error shows somehow slightly different pattern, the error persists for a wide range of distributions and lapack backends. Relevant posts: |
Short reproducer
|
@mattip do you have a simple setup where you can test it with BLIS or maybe MKL to see if this is numpy or openblas? Otherwise, I guess I will try to remember to test it on our new machine. |
I have identified the issue, it turns out to be an old issue related with 32bit int interface of lapack. See #5906 (comment). If I am understanding right, this issue is still here in numpy lapack-lite interface implementation. (The 4 bit int input parameter So basically, various of lapack routines including eigen solver and svd would be broken if the matrix dimension is large enough. And such crashing size of matrix is not really large enough considering modern hardwares (matrix size around O(10^4) times O(10^4) with working memory size O(10)GB ). IMHO, the need to do eigen or svd decomposition on matrix with size larger than 32767*32767 is becoming common due to the development of hardwares. So it would be better the 32bit lapack interface in numpy get improved soon. Otherwise, I am not aware of any reasonable and simple approach to do such numerical task in python now (directly calling lapack routine using C or Fortran is always possible though). Temporary workaround: For |
Yep. I was discussing this yesterday. I think we should open an issue so we could optionally use libraries compiled with 64 bit integers. They aren't common yet, so it should be a flag and there needs to some way of checking what the libraries use, but the time is coming, if not already past, when 32 bits will no longer serve. @eric-wieser Is it possible to compile the current fallback library with 64 bit integers? IIRC, it is a typedef. I know that most modern Fortran compilers have a flag that chooses between 32 and 64 bit indexes, so theoretically, it can be done. The problem might be compatibility issues from having NumPy out there with different precisions. |
I believe Julia already uses 64 bit BLAS via OpenBLAS - e.g. JuliaLang/julia#4923 Previous discussion on Numpy lapack_lite: #5906 |
Tagging with 1.18.0, not that we must fix it by then, but it would be awesome if we can make some progress here and do not forget about it. |
@isuruf & @martin-frbg may also be able to think of some roadbloacks for this, if there are any |
Can't think of any roadblocks, might actually make sense to use this example in the OpenBLAS FAQ and descriptions of the INTERFACE64 build parameter. (I guess the current wording in Makefile.rule - dating back to GotoBLAS - could be both confusing and discouraging) |
Note that for scipy, you'll have to provide 32bit int interface even if you switch to 64bit interface to avoid breaking downstream packages as downstream packages use |
#15012 implements Julia's approach to this problem. (i) Build openblas with |
We have been testing 64-bit OpenBLAS for a while now. We should start releasing 64-bit wheels and see what breaks. |
Pushing this off again. Perhaps we should go to 64 bits for the 1.21 release? |
Good idea to package 64bit OpenBLAS in the future. What do you mean with the "dtype bundled functions", the NumPy fallbacks? |
I was thinking of |
I just checked, our |
Please remind us to ship 64-bit OpenBLAS in the wheels after the 1.21 release, so there will be time to test it before 1.22. |
1.21.0 has shipped - time to try this out? |
Sure. This should be a change to use |
This change was made, and I think we're happy that it's working. Despite some uncertainty about whether it broke something in SciPy, IIRC the conclusion was that that was unrelated. So I think we're shipping |
I am planning on it. The change was made early in the release cycle, so the major downstream projects should have tested against it. |
I'm going to close this for now. If we need to change things later we can reopen. |
The issue of np.linalg.eigh returning wrong results or crashing is still real (using numpy 1.26.2): >>> import numpy as np
>>> n=32767
>>> b=np.random.rand(n)
>>> m_32767=np.diag(b)
>>> m_32767.shape
(32767, 32767)
>>> V_32767=np.linalg.eigh(m_32767)
** On entry to DSTEDC parameter number 8 had an illegal value
** On entry to DORMTR parameter number 12 had an illegal value
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/pearu/miniconda3/envs/jax-cuda-dev/lib/python3.11/site-packages/numpy/linalg/linalg.py", line 1487, in eigh
w, vt = gufunc(a, signature=signature, extobj=extobj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/pearu/miniconda3/envs/jax-cuda-dev/lib/python3.11/site-packages/numpy/linalg/linalg.py", line 118, in _raise_linalgerror_eigenvalues_nonconvergence
raise LinAlgError("Eigenvalues did not converge")
numpy.linalg.LinAlgError: Eigenvalues did not converge where the exception "Eigenvalues did not converge" is very likely wrong and misleading. With n == 32766, the above example works fine. The underlying problem is that when computing While switching to lapack implementations that uses 64-bit integer inputs, the overflow issue is seemingly resolved but in fact it is just harder to reproduce because the critical I have implemented a solution to the same problem in JAX (google/jax#19288) that will lead to an overflow exception rather than wrong results or crashes. I think something similar is appropriate for NumPy as well. |
Can you please open a new issue instead @pearu? 64-bit OpenBLAS in wheels was implemented a long time ago and is a large feature that is not about this particular bug. |
Reproducing code example:
See this gist for reproducing codes and system version information.
Basically, I used intelpython3, whose python and numpy are shipped with intel parallel studio xe 2019.3. Numpy version is 1.16.1.
For symmetric matrix with dimension larger or equal than 32767,
np.linalg.eigh()
returns wrong results with all zeros immediately (no error message). Other eigen functions likeeigvalsh
works as expected. 32767 reminds me of possible issues with data type int16.This may also be an issue from intel mkl or intel version of numpy or python(very unlikely, since the issue exists for various distributions of numpy, including the default one linked to openblas). Anyway, I will open the issue here until I have further analysis on this problem and find better place to report the issue.The text was updated successfully, but these errors were encountered: