-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: scipy.sparse.linalg.spsolve: fix memory error caused from overflowing signed 32-bit int input to doubleCalloc #14979
Conversation
Who are some of the code owners that would be best to request a pull request code review from @tylerjereddy ? |
@liviofetahu We are basically pulling the changes to SuperLU from upstream https://github.com/xiaoyeli/superlu They should be fixed there to avoid divergence between copies. |
Hmm we also have the patch file https://github.com/scipy/scipy/blob/master/scipy/sparse/linalg/dsolve/SuperLU/scipychanges.patch I'm not sure as sure as I was writing my previous comment. |
I took a look at the patch file @ilayn and it seems like this bug is not fixed there. |
@liviofetahu thanks for this fix. To make the request of @ilayn more clear, can you please:
|
Thanks for the instructions @rgommers -- here's the PR and the linked issue that I submitted to the other repo that you mentioned: |
Fixed the signature of doubleMalloc and particularly doubleCalloc: possibly malloc needs to allocate >= 2^31 bytes. Fixed the signature of doubleMalloc and doubleCalloc in the implementation file: `SUPERLU_MALLOC` now passes the correct value to `void *superlu_python_module_malloc(size_t size)` which internally calls `malloc(size)`. `work = doubleCalloc(n*nrhs)` now executes for right-hand sides B of size 46679 by 46680 filled with doubles, and some `(size_t)` casts that fix proper indexing into buffers such as `work_col = &work[(size_t)j * (size_t)n]`.
8afa39d
to
0d95f6b
Compare
Thanks @liviofetahu! Upstream PR was already merged, so no need anymore to include in our patch file. I rebased to fix the merge conflicts. |
Thanks for taking care of this @rgommers -- I forgot to mention that I forgot to include my changes in scipychanges.patch but I'm glad to hear from you that there is no need anymore to include the changes in the patch file. |
There was also a follow up commit upstream to fix this bug for other precisions, but rather than sync just that patch, we should just update all of SuperLU in one go. |
That sounds good -- now it shouldn't be hard to put these changes into the other SuperLU source files that handle the other types of precision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI failures are unrelated, in it goes. Thanks @liviofetahu!
Sounds good @rgommers -- working on the fixes for the other source files in the upstream SuperLU repo that are responsible for the various kinds of precision. |
You mean you'll submit a new PR here to sync xiaoyeli/superlu@afe15f3? That is the fix for the other precisions, unless I'm missing something. |
Oh, it seems like the changes to the other source files have been made -- thanks for pointing it out. Looks like we're all set now. |
BUG: For big linear systems AX = B such as when the right hand side B is of size 46679 x 46680 containing doubles,
spsolve(A, B)
gets stuck in a runtime memory error that looks like below:which is caused by the Python spsolve function making a call to
x, info = _superlu.gssv(N, A.nnz, A.data, A.indices, A.indptr, b, flag, options=options)
where the _superlu module is implemented in C and exposed to Python through the low level Python C Extension Modules API.Py_gssv in C calls gssv (inner-ly), and gssv in turn calls dgstrs, with this last one calling
doubleCalloc(n * nrhs)
whereint n = L->nrow
andint nrhs = B->ncol
. Finally, doubleCalloc makes a call to the SUPERLU_MALLOC macrowith the macro being defined as the
void* superlu_python_module_malloc(size_t size)
function that sets some membermem_ptr = malloc(size);
and returns nullptr if malloc returns nullptr (meaning that the system wasn’t able to return a valid pointer to usable contiguous memory because it couldn’t allocate it), so buf in doubleCalloc is a null pointer and doubleCalloc aborts with the failure message explained above -- all this due to(L->nrow * B->ncol)
exceeding the maximum value of a signed 32-bit int.Fixes #14984