Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: use OpenBLAS64 bit interfaces #13956

Closed
refraction-ray opened this issue Jul 10, 2019 · 26 comments
Closed

ENH: use OpenBLAS64 bit interfaces #13956

refraction-ray opened this issue Jul 10, 2019 · 26 comments

Comments

@refraction-ray
Copy link

refraction-ray commented Jul 10, 2019

Reproducing code example:

See this gist for reproducing codes and system version information.

Basically, I used intelpython3, whose python and numpy are shipped with intel parallel studio xe 2019.3. Numpy version is 1.16.1.

For symmetric matrix with dimension larger or equal than 32767, np.linalg.eigh() returns wrong results with all zeros immediately (no error message). Other eigen functions like eigvalsh works as expected. 32767 reminds me of possible issues with data type int16.

This may also be an issue from intel mkl or intel version of numpy or python (very unlikely, since the issue exists for various distributions of numpy, including the default one linked to openblas). Anyway, I will open the issue here until I have further analysis on this problem and find better place to report the issue.

@refraction-ray
Copy link
Author

refraction-ray commented Jul 10, 2019

  1. Also tested on pip installed numpy(1.16.3) with python(3.6.5) on Ubuntu 18.04 server. Similar issue happened.
>>> np.__config__.show()
blas_mkl_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
  NOT AVAILABLE
openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]

For symmetric matrix m with dimension no less than 32767, np.linalg.eigh(m) fails with the error

File "blahblah/python3.6/site-packages/numpy/linalg/linalg.py", line 1456, in eigh
    w, vt = gufunc(a, signature=signature, extobj=extobj)
ValueError: On entry to DORMTR parameter number 12 had an illegal value

When one try the line np.linalg.eigh(m) for the second time, python exit with Segmentation fault (core dumped) error.
Though np.linalg.eigvalsh(m) behaves correctly.

It is worth noting this issue is not due to memory limitation, since diagonalization for matrix of this dimension requires only 12GB memory (and I watched on htop for the whole process), while I have 256GB memory on the workstation. Though in other words, you need more than 12GB memory to repeat and catch such error.

For matrix with dimension no more than 32766, everything seems fine. Therefore, I would be surprised if this issue has nothing to do with int16.

Update: 3) Test on another workstation with Ubuntu16.04, and python3.5.2 with pip installed numpy 1.16.2. The problem is exactly the same as described above.
4) Test on some supercomputers with RedHat 6.5, python 3.6.3 and openblas linked numpy 1.14.5. For symmetric matrix with size larger or equal than 32767, python crashes with Segmentation fault at first try of np.linalg.eigh(m). Again, eigvalsh works well, matrix with size smaller than 32767 works well.
5) Test on the same supercomputer, but with Anaconda 4.2.0, python 3.5.2 and numpy 1.13.1, error the same as 4).

Summary: Though the error shows somehow slightly different pattern, the error persists for a wide range of distributions and lapack backends.

Relevant posts:

@mattip
Copy link
Member

mattip commented Jul 10, 2019

Short reproducer

n=32767
b=np.random.rand(n)
m_32767=np.diag(b)
m_32767.shape
V_32767=np.linalg.eigh(m_32767)

@seberg
Copy link
Member

seberg commented Jul 11, 2019

@mattip do you have a simple setup where you can test it with BLIS or maybe MKL to see if this is numpy or openblas? Otherwise, I guess I will try to remember to test it on our new machine.

@refraction-ray
Copy link
Author

refraction-ray commented Jul 15, 2019

I have identified the issue, it turns out to be an old issue related with 32bit int interface of lapack. See #5906 (comment). If I am understanding right, this issue is still here in numpy lapack-lite interface implementation. (The 4 bit int input parameter lwork strongly limited the possible range of the length of work space and thus the matrix dimension. To be more specific, as for lapack routine dsyevd, lwork argument requires an integer at least $1+6n+2n^2$. When the matrix dimension n is $2^15-1$, the value of lwork is more than $2^31-1$, leading to an overflow in int type which is 4 byte).

So basically, various of lapack routines including eigen solver and svd would be broken if the matrix dimension is large enough. And such crashing size of matrix is not really large enough considering modern hardwares (matrix size around O(10^4) times O(10^4) with working memory size O(10)GB ).

IMHO, the need to do eigen or svd decomposition on matrix with size larger than 32767*32767 is becoming common due to the development of hardwares. So it would be better the 32bit lapack interface in numpy get improved soon. Otherwise, I am not aware of any reasonable and simple approach to do such numerical task in python now (directly calling lapack routine using C or Fortran is always possible though).

Temporary workaround: For eigh, use scipy.linalg.eigh(). I have inspected the source code on both sides, and it seems to me that eigh in numpy utilizes dsyevd lapack routine while eigh in scipy utilizes dsyevr routine. Somehow, arguments for dsyevr routine is small, (size of work space is in O(N) order) and thus free from 32bit int overflow for matrix with large enough dimension.

@charris
Copy link
Member

charris commented Jul 15, 2019

Yep. I was discussing this yesterday. I think we should open an issue so we could optionally use libraries compiled with 64 bit integers. They aren't common yet, so it should be a flag and there needs to some way of checking what the libraries use, but the time is coming, if not already past, when 32 bits will no longer serve.

@eric-wieser Is it possible to compile the current fallback library with 64 bit integers? IIRC, it is a typedef.
@tylerjereddy Do you know if it is possible to compile OpenBLAS with 64 bit integers?
@stefanv Maybe we should put this on the road map, possibly along with quad precision.
@rgommers @pv Any thoughts about scipy in this regard?

I know that most modern Fortran compilers have a flag that chooses between 32 and 64 bit indexes, so theoretically, it can be done. The problem might be compatibility issues from having NumPy out there with different precisions.

@matthew-brett
Copy link
Contributor

I believe Julia already uses 64 bit BLAS via OpenBLAS - e.g. JuliaLang/julia#4923

Previous discussion on Numpy lapack_lite: #5906

@seberg
Copy link
Member

seberg commented Jul 16, 2019

Tagging with 1.18.0, not that we must fix it by then, but it would be awesome if we can make some progress here and do not forget about it.

@tylerjereddy
Copy link
Contributor

@isuruf & @martin-frbg may also be able to think of some roadbloacks for this, if there are any

@martin-frbg
Copy link

Can't think of any roadblocks, might actually make sense to use this example in the OpenBLAS FAQ and descriptions of the INTERFACE64 build parameter. (I guess the current wording in Makefile.rule - dating back to GotoBLAS - could be both confusing and discouraging)

@isuruf
Copy link
Contributor

isuruf commented Jul 18, 2019

Note that for scipy, you'll have to provide 32bit int interface even if you switch to 64bit interface to avoid breaking downstream packages as downstream packages use cimport scipy.linalg.cython_blas. This is doable with symbols prefixed.

@pv
Copy link
Member

pv commented Nov 29, 2019

#15012 implements Julia's approach to this problem. (i) Build openblas with make INTERFACE64=1 SYMBOLSUFFIX=64_ to get libopenblas64_.so. (ii) export NPY_USE_BLAS64_=1 and build numpy to get 64-bit blas/lapack used everywhere. (iii) The symbol suffix should prevent the ABI clashes that usually plague 64-bit blas/lapack, so stuff doesn't segfault if you also use a Python library linked to 32-bit BLAS use it embedded in an application linked to 32-bit BLAS.

@seberg seberg modified the milestones: 1.19.0 release, 1.20.0 release May 6, 2020
@mattip
Copy link
Member

mattip commented May 6, 2020

We have been testing 64-bit OpenBLAS for a while now. We should start releasing 64-bit wheels and see what breaks.

@charris
Copy link
Member

charris commented Nov 23, 2020

Pushing this off again. Perhaps we should go to 64 bits for the 1.21 release?

@charris
Copy link
Member

charris commented May 4, 2021

@mattip @seberg We should discuss using the 64 bit libraries for the 1.21 wheels. If not for 1.21, we should plan on releasing 64 bit wheels for 1.22.0.dev. I'd like to standardize on 64 bits going forward, although we should check that the dtype bundled functions also support it.

@charris charris added the triage review Issue/PR to be discussed at the next triage meeting label May 4, 2021
@seberg
Copy link
Member

seberg commented May 4, 2021

Good idea to package 64bit OpenBLAS in the future. What do you mean with the "dtype bundled functions", the NumPy fallbacks?

@charris
Copy link
Member

charris commented May 4, 2021

"dtype bundled functions"

I was thinking of dtype->f->dotfunc, which may already be fixed.

@seberg
Copy link
Member

seberg commented May 4, 2021

I just checked, our ->dotfunc seems to be more or less fine. It works around blas limitations and our path uses intp.

@mattip mattip changed the title Wrong results on linalg.eigh for matrix with dimension larger than 32767 ENH: use OpenBLAS64 bit interfaces May 5, 2021
@mattip mattip modified the milestones: 1.21.0 release, 1.22.0 release May 5, 2021
@mattip
Copy link
Member

mattip commented May 5, 2021

Please remind us to ship 64-bit OpenBLAS in the wheels after the 1.21 release, so there will be time to test it before 1.22.

@mattip mattip removed the triage review Issue/PR to be discussed at the next triage meeting label May 5, 2021
@h-vetinari
Copy link
Contributor

Please remind us to ship 64-bit OpenBLAS in the wheels after the 1.21 release, so there will be time to test it before 1.22.

1.21.0 has shipped - time to try this out?

@mattip
Copy link
Member

mattip commented Jun 28, 2021

Sure. This should be a change to use NPY_USE_BLAS_ILP64 in the MacPython/numpy-wheels repo, (similar to the way it is done in this repo's CI and a release note here.

@rgommers
Copy link
Member

This change was made, and I think we're happy that it's working. Despite some uncertainty about whether it broke something in SciPy, IIRC the conclusion was that that was unrelated. So I think we're shipping 1.22.0rc1 with a 64-bit OpenBLAS, right @charris?

@charris
Copy link
Member

charris commented Nov 12, 2021

So I think we're shipping 1.22.0rc1 with a 64-bit OpenBLAS

I am planning on it. The change was made early in the release cycle, so the major downstream projects should have tested against it.

@charris charris closed this as completed Nov 12, 2021
@charris
Copy link
Member

charris commented Nov 12, 2021

I'm going to close this for now. If we need to change things later we can reopen.

@pearu
Copy link
Contributor

pearu commented Jan 10, 2024

The issue of np.linalg.eigh returning wrong results or crashing is still real (using numpy 1.26.2):

>>> import numpy as np
>>> n=32767
>>> b=np.random.rand(n)
>>> m_32767=np.diag(b)
>>> m_32767.shape
(32767, 32767)
>>> V_32767=np.linalg.eigh(m_32767)
 ** On entry to DSTEDC parameter number  8 had an illegal value
 ** On entry to DORMTR parameter number 12 had an illegal value
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/pearu/miniconda3/envs/jax-cuda-dev/lib/python3.11/site-packages/numpy/linalg/linalg.py", line 1487, in eigh
    w, vt = gufunc(a, signature=signature, extobj=extobj)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pearu/miniconda3/envs/jax-cuda-dev/lib/python3.11/site-packages/numpy/linalg/linalg.py", line 118, in _raise_linalgerror_eigenvalues_nonconvergence
    raise LinAlgError("Eigenvalues did not converge")
numpy.linalg.LinAlgError: Eigenvalues did not converge

where the exception "Eigenvalues did not converge" is very likely wrong and misleading. With n == 32766, the above example works fine.

The underlying problem is that when computing lwork for the lapack syevd function using expression 1 + 6 * n + 2 * n * n, it will overflow when n == 32767. This problem is not unique to syevd but it exists for all lapack functions that work array sizes are quadratic wrt input sizes.

While switching to lapack implementations that uses 64-bit integer inputs, the overflow issue is seemingly resolved but in fact it is just harder to reproduce because the critical n size will be 2147483647 where the issue re-merges when the lwork expression above will overflow for int64.

I have implemented a solution to the same problem in JAX (google/jax#19288) that will lead to an overflow exception rather than wrong results or crashes. I think something similar is appropriate for NumPy as well.

@pearu pearu reopened this Jan 10, 2024
@rgommers
Copy link
Member

Can you please open a new issue instead @pearu? 64-bit OpenBLAS in wheels was implemented a long time ago and is a large feature that is not about this particular bug.

@pearu
Copy link
Contributor

pearu commented Jan 10, 2024

@rgommers , done in #25564.

@pearu pearu closed this as completed Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests