Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: linalg: 64-bit BLAS/LAPACK #11193

Draft
wants to merge 24 commits into
base: main
Choose a base branch
from
Draft

ENH: linalg: 64-bit BLAS/LAPACK #11193

wants to merge 24 commits into from

Conversation

pv
Copy link
Member

@pv pv commented Dec 9, 2019

This is a draft for discussion (do not merge).

I'd suggest implementing support for 64-bit BLAS (=ILP64) in the following way:

The public BLAS/LAPACK wrappers in scipy.linalg will continue providing 32-bit BLAS/LAPACK, and new extensions _fblas_64, _flapack_64, cython_lapack_64, cython_blas_64 are added to provide ILP64 BLAS routine API. For those cases where BLAS/LAPACK usage is a non-public detail (e.g. ARPACK), we can build only the 64-bit version.

The above is necessary, as a single Python extension DLL cannot link to both types of BLAS because the symbol names generally clash. While dynamically linking to different BLAS in different extension works in the usual case, it is not enough to avoid symbol clashes when e.g. the Python interpreter is embedded inside an application also linked to BLAS/LAPACK (on Linux, see here section 1.5.4 --- TODO: less problematic on osx & windows iiuc, but needs checking). To avoid this issue, we also add support for BLAS/LAPACK symbol name mangling which resolves it.

TODO: also static linking to the 32-bit BLAS probably also resolves the issue (easier for Intel MKL users)

In scipy.linalg, we'd add ilp64= flag to the blas/lapack function getters, as in nrm2 = get_blas_func('nrm2', ilp64='maybe'/True/False), and a nrm2.int_dtype attribute so that it's easier to deal with the work array types.

We might want to eventually drop the 32-bit BLAS altogether. One backward-compatible option could be to autogenerate int32 interface wrappers --- but I'm not sure how easy it is to get the specs automatically.

I added some BLAS64 support in numpy master some days ago. The changes in this "PR" assume the numpy.distutils setup in the follow-up PR numpy/numpy#15069. Currently, you can do

import numpy as np
from scipy.linalg import norm
x = np.zeros([2**31], dtype=np.float64)
x[-1] = 1
res = norm(x)
print(res)
assert res == 1.0

so I think the plan is feasible.

  • Finish ILP64 BLAS numpy.distutils support (ENH: add support for ILP64 OpenBLAS (without symbol suffix) numpy/numpy#15069)
  • Implement Fortran BLAS/LAPACK symbol name mangling
  • Implement scipy.linalg f2py .pyf file int32->int64 search-and-replace
  • Support ilp64 variants in get_blas/lapack_funcs
  • Decide what to do in cases where routines return integer arrays --- do we just bump to np.int64?
  • Use ilp64 variants when available everywhere in scipy.linalg
  • Add tests...

@pv pv force-pushed the blas-ilp64 branch 3 times, most recently from dfe6353 to 99ae717 Compare December 13, 2019 19:37
@tylerjereddy tylerjereddy added scipy.linalg needs-decision Items that need further discussion before they are merged or closed labels Dec 13, 2019
@tylerjereddy
Copy link
Contributor

Presumably the tests can be marked up in a similar way to your NumPy PR to avoid crushing machines with insufficient memory.

So far looks good upstream from the PRs I've reviewed related to this & my local tests with NumPy have been encouraging when using OpenBLAS + ILP64.

I don't have strong views on the linalg design decisions but maybe i.e., @ilayn might chime in?

@ilayn
Copy link
Member

ilayn commented Dec 16, 2019

This is really nice and, unfortunately, rather the low level part that I am not too familiar with but one thing I imagine will stay is the 32-bit version. Because many problems would start not fitting into memory with ILP64 while they were barely fitting in with the 32-bit due to the extra allocation that would take place for the same info. Hence, I guess if we enable this we are going to maintain both for some time.

I can do the legwork on the linalg side if needed but one question:

Are we going to need to modify the INTEGER keywords in the wrappers to INTEGER*4 or INTEGER*8 (or whatever the syntax would be) depending on the linked library?

It would have been amazing if we could choose the array type integer depending on the problem size but I think that is too ambitious.

@pv
Copy link
Member Author

pv commented Dec 16, 2019 via email

@pv pv closed this Dec 23, 2019
@pv pv reopened this Dec 23, 2019
@pv pv force-pushed the blas-ilp64 branch 2 times, most recently from 1aeec0c to 65203d6 Compare December 23, 2019 11:59
@pv
Copy link
Member Author

pv commented Dec 23, 2019

Ok, this is all the low-hanging fruit. What's left is:

scipy/sparse/linalg/dsolve/_superlu.cpython-37m-x86_64-linux-gnu.so
scipy/linalg/_fblas.cpython-37m-x86_64-linux-gnu.so
scipy/linalg/_flapack.cpython-37m-x86_64-linux-gnu.so
scipy/linalg/_flinalg.cpython-37m-x86_64-linux-gnu.so
scipy/linalg/_interpolative.cpython-37m-x86_64-linux-gnu.so
scipy/linalg/cython_blas.cpython-37m-x86_64-linux-gnu.so
scipy/linalg/cython_lapack.cpython-37m-x86_64-linux-gnu.so

SuperLU supports only 32-bit int, and interpolative has assumptions about 4-byte integers in hard-coded work array sizes in many places, so their integer size cannot be changed. These two might be dealt with relatively lightweight 64-to-32 lapack wrappers.

I looked a bit into it, and general 64-to-32-bit LAPACK adapters are not really possible to write, because sizes of work arrays in several cases are not known (due to liwork queries). Such wrappers however are possible to do for all of BLAS, and for several LAPACK routines, but the rest of LAPACK would need to be compiled from sources. However, this would still avoid shipping the multi-arch kernels from openblas twice, so it might be useful to do in view of the binary distribution sizes.

This PR is probably too big to review, so I'll eventually split to a bit more manageable chunks.

@pv pv force-pushed the blas-ilp64 branch 5 times, most recently from 8a06cdb to e399fdd Compare January 1, 2020 21:49
@rgommers
Copy link
Member

This sounds like a good plan to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C/C++ Items related to the internal C/C++ code base enhancement A new feature or improvement Fortran Items related to the internal Fortran code base needs-decision Items that need further discussion before they are merged or closed scipy.linalg
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants