Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: scipy.sparse.linalg.spsolve: fix memory error caused from overflowing signed 32-bit int input to doubleCalloc #14979

Merged
merged 1 commit into from
Nov 16, 2021

Conversation

liviofetahu
Copy link
Contributor

@liviofetahu liviofetahu commented Nov 4, 2021

BUG: For big linear systems AX = B such as when the right hand side B is of size 46679 x 46680 containing doubles, spsolve(A, B) gets stuck in a runtime memory error that looks like below:

Python(8835,0x10eafadc0) malloc: can't allocate region
:*** mach_vm_map(size=18446744065245585408, flags: 100) failed (error code=3)
Python(8835,0x10eafadc0) malloc: *** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
  File "sparse_solve.py", line 14, in <module>
    x = spsolve(A, B)
  File "/Users/liviofetahu/sesco-llc/matricez/env/lib/python3.8/site-packages/scipy-1.8.0.dev0+1921.c30fefc-py3.8-macosx-10.15-x86_64.egg/scipy/sparse/linalg/dsolve/linsolve.py", line 204, in spsolve
    x, info = _superlu.gssv(N, A.nnz, A.data, A.indices, A.indptr,
RuntimeError: SUPERLU_MALLOC failed for buf in doubleCalloc()
 at line 693 in file scipy/sparse/linalg/dsolve/SuperLU/SRC/dmemory.c

which is caused by the Python spsolve function making a call to x, info = _superlu.gssv(N, A.nnz, A.data, A.indices, A.indptr, b, flag, options=options) where the _superlu module is implemented in C and exposed to Python through the low level Python C Extension Modules API.
Py_gssv in C calls gssv (inner-ly), and gssv in turn calls dgstrs, with this last one calling doubleCalloc(n * nrhs) where int n = L->nrow and int nrhs = B->ncol. Finally, doubleCalloc makes a call to the SUPERLU_MALLOC macro

buf = (double*) SUPERLU_MALLOC((size_t)n * sizeof(double));
if( !buf ) {
	ABORT("SUPERLU_MALLOC failed for buf in doubleCalloc()\n");
}

with the macro being defined as the void* superlu_python_module_malloc(size_t size) function that sets some member mem_ptr = malloc(size); and returns nullptr if malloc returns nullptr (meaning that the system wasn’t able to return a valid pointer to usable contiguous memory because it couldn’t allocate it), so buf in doubleCalloc is a null pointer and doubleCalloc aborts with the failure message explained above -- all this due to (L->nrow * B->ncol) exceeding the maximum value of a signed 32-bit int.

Fixes #14984

@liviofetahu liviofetahu changed the title Fix memory allocation in spsolve BUG: scipy.sparse.linalg.spsolve runtime memory error caused from bad input to malloc Nov 5, 2021
@liviofetahu liviofetahu changed the title BUG: scipy.sparse.linalg.spsolve runtime memory error caused from bad input to malloc BUG: scipy.sparse.linalg.spsolve: fix runtime memory error caused from overflowing signed 32-bit input values to doubleCalloc Nov 5, 2021
@liviofetahu liviofetahu changed the title BUG: scipy.sparse.linalg.spsolve: fix runtime memory error caused from overflowing signed 32-bit input values to doubleCalloc BUG: scipy.sparse.linalg.spsolve: fix runtime memory error caused from overflowing signed 32-bit int input to doubleCalloc Nov 5, 2021
@liviofetahu liviofetahu changed the title BUG: scipy.sparse.linalg.spsolve: fix runtime memory error caused from overflowing signed 32-bit int input to doubleCalloc BUG: scipy.sparse.linalg.spsolve: fix memory error caused from overflowing signed 32-bit int input to doubleCalloc Nov 5, 2021
@liviofetahu
Copy link
Contributor Author

Who are some of the code owners that would be best to request a pull request code review from @tylerjereddy ?

@ilayn
Copy link
Member

ilayn commented Nov 5, 2021

@liviofetahu We are basically pulling the changes to SuperLU from upstream https://github.com/xiaoyeli/superlu They should be fixed there to avoid divergence between copies.

@ilayn
Copy link
Member

ilayn commented Nov 5, 2021

Hmm we also have the patch file https://github.com/scipy/scipy/blob/master/scipy/sparse/linalg/dsolve/SuperLU/scipychanges.patch

I'm not sure as sure as I was writing my previous comment.

@liviofetahu
Copy link
Contributor Author

Hmm we also have the patch file https://github.com/scipy/scipy/blob/master/scipy/sparse/linalg/dsolve/SuperLU/scipychanges.patch

I'm not sure as sure as I was writing my previous comment.

I took a look at the patch file @ilayn and it seems like this bug is not fixed there.

@rgommers
Copy link
Member

@liviofetahu thanks for this fix. To make the request of @ilayn more clear, can you please:

  1. Add these changes to scipychanges.patch, so that in case we update to a new SuperLU version this fix does not silently disappear.
  2. Submit these changes also as a PR to https://github.com/xiaoyeli/superlu

@liviofetahu
Copy link
Contributor Author

@liviofetahu thanks for this fix. To make the request of @ilayn more clear, can you please:

  1. Add these changes to scipychanges.patch, so that in case we update to a new SuperLU version this fix does not silently disappear.
  2. Submit these changes also as a PR to https://github.com/xiaoyeli/superlu

Thanks for the instructions @rgommers -- here's the PR and the linked issue that I submitted to the other repo that you mentioned:
xiaoyeli/superlu#56
xiaoyeli/superlu#57

Fixed the signature of doubleMalloc and particularly doubleCalloc:
possibly malloc needs to allocate >= 2^31 bytes.

Fixed the signature of doubleMalloc and doubleCalloc in the
implementation file: `SUPERLU_MALLOC` now passes the correct value to
`void *superlu_python_module_malloc(size_t size)` which internally calls
`malloc(size)`.

`work = doubleCalloc(n*nrhs)` now executes for right-hand sides B of
size 46679 by 46680 filled with doubles, and some `(size_t)` casts that
fix proper indexing into buffers such as
`work_col = &work[(size_t)j * (size_t)n]`.
@rgommers
Copy link
Member

Thanks @liviofetahu! Upstream PR was already merged, so no need anymore to include in our patch file. I rebased to fix the merge conflicts.

@liviofetahu
Copy link
Contributor Author

Thanks @liviofetahu! Upstream PR was already merged, so no need anymore to include in our patch file. I rebased to fix the merge conflicts.

Thanks for taking care of this @rgommers -- I forgot to mention that I forgot to include my changes in scipychanges.patch but I'm glad to hear from you that there is no need anymore to include the changes in the patch file.

@rgommers
Copy link
Member

There was also a follow up commit upstream to fix this bug for other precisions, but rather than sync just that patch, we should just update all of SuperLU in one go.

@rgommers rgommers added this to the 1.8.0 milestone Nov 16, 2021
@rgommers rgommers added the maintenance Items related to regular maintenance tasks label Nov 16, 2021
@liviofetahu
Copy link
Contributor Author

There was also a follow up commit upstream to fix this bug for other precisions, but rather than sync just that patch, we should just update all of SuperLU in one go.

That sounds good -- now it shouldn't be hard to put these changes into the other SuperLU source files that handle the other types of precision.

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI failures are unrelated, in it goes. Thanks @liviofetahu!

@rgommers rgommers merged commit 9484b9d into scipy:master Nov 16, 2021
@liviofetahu
Copy link
Contributor Author

CI failures are unrelated, in it goes. Thanks @liviofetahu!

Sounds good @rgommers -- working on the fixes for the other source files in the upstream SuperLU repo that are responsible for the various kinds of precision.

@rgommers
Copy link
Member

You mean you'll submit a new PR here to sync xiaoyeli/superlu@afe15f3? That is the fix for the other precisions, unless I'm missing something.

@liviofetahu
Copy link
Contributor Author

liviofetahu commented Nov 16, 2021

You mean you'll submit a new PR here to sync xiaoyeli/superlu@afe15f3? That is the fix for the other precisions, unless I'm missing something.

Oh, it seems like the changes to the other source files have been made -- thanks for pointing it out. Looks like we're all set now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Items related to regular maintenance tasks scipy.sparse.linalg
Projects
None yet
4 participants