-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: linalg: 64-bit BLAS/LAPACK #11193
base: main
Are you sure you want to change the base?
Conversation
dfe6353
to
99ae717
Compare
Presumably the tests can be marked up in a similar way to your NumPy PR to avoid crushing machines with insufficient memory. So far looks good upstream from the PRs I've reviewed related to this & my local tests with NumPy have been encouraging when using OpenBLAS + ILP64. I don't have strong views on the linalg design decisions but maybe i.e., @ilayn might chime in? |
This is really nice and, unfortunately, rather the low level part that I am not too familiar with but one thing I imagine will stay is the 32-bit version. Because many problems would start not fitting into memory with ILP64 while they were barely fitting in with the 32-bit due to the extra allocation that would take place for the same info. Hence, I guess if we enable this we are going to maintain both for some time. I can do the legwork on the linalg side if needed but one question: Are we going to need to modify the It would have been amazing if we could choose the array type integer depending on the problem size but I think that is too ambitious. |
The integer size is here changed by compiler flags (and f2py flags), so only C code requires manual changes.
It's indeed already possible to switch between ilp64 and non-ilp64 as you like, for scipy.linalg. For fortran code, you'd need to compile two versions (possible, but not necessarily sensible).
I wouldn't be so sure the memory usage matters in practice, as it's probably usually only 25% or so increase, and more often likely much less. For lapack, it's about routines with large iwork arrays, but are there any where the iwork really is usually significant compared to the fp arrays?
I think practical advantages of using single blas library are much more important than cases working close to machine memory (ie, if its a problem, just buy an extra 8GB, or recompile scipy with 32-bit BLAS)
…On December 16, 2019 8:37:51 AM UTC, Ilhan Polat ***@***.***> wrote:
This is really nice and, unfortunately, rather the low level part that
I am not too familiar with but one thing I imagine will stay is the
32-bit version. Because many problems would start not fitting into
memory with ILP64 while they were barely fitting in with the 32-bit due
to the extra allocation that would take place for the same info. Hence,
I guess if we enable this we are going to maintain both for some time.
I can do the legwork on the linalg side if needed but one question:
Are we going to need to modify the `INTEGER` keywords in the wrappers
to `INTEGER*4` or `INTEGER*8` (or whatever the syntax would be)
depending on the linked library?
It would have been amazing if we could choose the array type integer
depending on the problem size but I think that is too ambitious.
|
1aeec0c
to
65203d6
Compare
Ok, this is all the low-hanging fruit. What's left is:
SuperLU supports only 32-bit int, and interpolative has assumptions about 4-byte integers in hard-coded work array sizes in many places, so their integer size cannot be changed. These two might be dealt with relatively lightweight 64-to-32 lapack wrappers. I looked a bit into it, and general 64-to-32-bit LAPACK adapters are not really possible to write, because sizes of work arrays in several cases are not known (due to liwork queries). Such wrappers however are possible to do for all of BLAS, and for several LAPACK routines, but the rest of LAPACK would need to be compiled from sources. However, this would still avoid shipping the multi-arch kernels from openblas twice, so it might be useful to do in view of the binary distribution sizes. This PR is probably too big to review, so I'll eventually split to a bit more manageable chunks. |
856bf68
to
a4016ff
Compare
8a06cdb
to
e399fdd
Compare
This sounds like a good plan to me. |
This is a draft for discussion (do not merge).
I'd suggest implementing support for 64-bit BLAS (=ILP64) in the following way:
The public BLAS/LAPACK wrappers in scipy.linalg will continue providing 32-bit BLAS/LAPACK, and new extensions
_fblas_64
,_flapack_64
,cython_lapack_64
,cython_blas_64
are added to provide ILP64 BLAS routine API. For those cases where BLAS/LAPACK usage is a non-public detail (e.g. ARPACK), we can build only the 64-bit version.The above is necessary, as a single Python extension DLL cannot link to both types of BLAS because the symbol names generally clash. While dynamically linking to different BLAS in different extension works in the usual case, it is not enough to avoid symbol clashes when e.g. the Python interpreter is embedded inside an application also linked to BLAS/LAPACK (on Linux, see here section 1.5.4 --- TODO: less problematic on osx & windows iiuc, but needs checking). To avoid this issue, we also add support for BLAS/LAPACK symbol name mangling which resolves it.
TODO: also static linking to the 32-bit BLAS probably also resolves the issue (easier for Intel MKL users)
In scipy.linalg, we'd add ilp64= flag to the blas/lapack function getters, as in
nrm2 = get_blas_func('nrm2', ilp64='maybe'/True/False)
, and anrm2.int_dtype
attribute so that it's easier to deal with the work array types.We might want to eventually drop the 32-bit BLAS altogether. One backward-compatible option could be to autogenerate int32 interface wrappers --- but I'm not sure how easy it is to get the specs automatically.
I added some BLAS64 support in numpy master some days ago. The changes in this "PR" assume the numpy.distutils setup in the follow-up PR numpy/numpy#15069. Currently, you can do
so I think the plan is feasible.