BUG: scipy.sparse.linalg.spsolve: fix memory error caused from overflowing signed 32-bit int input to doubleCalloc #14979

liviofetahu · 2021-11-04T20:58:15Z

BUG: For big linear systems AX = B such as when the right hand side B is of size 46679 x 46680 containing doubles, spsolve(A, B) gets stuck in a runtime memory error that looks like below:

Python(8835,0x10eafadc0) malloc: can't allocate region
:*** mach_vm_map(size=18446744065245585408, flags: 100) failed (error code=3)
Python(8835,0x10eafadc0) malloc: *** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
  File "sparse_solve.py", line 14, in <module>
    x = spsolve(A, B)
  File "/Users/liviofetahu/sesco-llc/matricez/env/lib/python3.8/site-packages/scipy-1.8.0.dev0+1921.c30fefc-py3.8-macosx-10.15-x86_64.egg/scipy/sparse/linalg/dsolve/linsolve.py", line 204, in spsolve
    x, info = _superlu.gssv(N, A.nnz, A.data, A.indices, A.indptr,
RuntimeError: SUPERLU_MALLOC failed for buf in doubleCalloc()
 at line 693 in file scipy/sparse/linalg/dsolve/SuperLU/SRC/dmemory.c

which is caused by the Python spsolve function making a call to x, info = _superlu.gssv(N, A.nnz, A.data, A.indices, A.indptr, b, flag, options=options) where the _superlu module is implemented in C and exposed to Python through the low level Python C Extension Modules API.
Py_gssv in C calls gssv (inner-ly), and gssv in turn calls dgstrs, with this last one calling doubleCalloc(n * nrhs) where int n = L->nrow and int nrhs = B->ncol. Finally, doubleCalloc makes a call to the SUPERLU_MALLOC macro

buf = (double*) SUPERLU_MALLOC((size_t)n * sizeof(double));
if( !buf ) {
	ABORT("SUPERLU_MALLOC failed for buf in doubleCalloc()\n");
}

with the macro being defined as the void* superlu_python_module_malloc(size_t size) function that sets some member mem_ptr = malloc(size); and returns nullptr if malloc returns nullptr (meaning that the system wasn’t able to return a valid pointer to usable contiguous memory because it couldn’t allocate it), so buf in doubleCalloc is a null pointer and doubleCalloc aborts with the failure message explained above -- all this due to (L->nrow * B->ncol) exceeding the maximum value of a signed 32-bit int.

Fixes #14984

liviofetahu · 2021-11-05T19:45:18Z

Who are some of the code owners that would be best to request a pull request code review from @tylerjereddy ?

ilayn · 2021-11-05T20:50:30Z

@liviofetahu We are basically pulling the changes to SuperLU from upstream https://github.com/xiaoyeli/superlu They should be fixed there to avoid divergence between copies.

ilayn · 2021-11-05T20:54:04Z

Hmm we also have the patch file https://github.com/scipy/scipy/blob/master/scipy/sparse/linalg/dsolve/SuperLU/scipychanges.patch

I'm not sure as sure as I was writing my previous comment.

liviofetahu · 2021-11-07T16:49:30Z

Hmm we also have the patch file https://github.com/scipy/scipy/blob/master/scipy/sparse/linalg/dsolve/SuperLU/scipychanges.patch

I'm not sure as sure as I was writing my previous comment.

I took a look at the patch file @ilayn and it seems like this bug is not fixed there.

rgommers · 2021-11-10T10:20:59Z

@liviofetahu thanks for this fix. To make the request of @ilayn more clear, can you please:

Add these changes to scipychanges.patch, so that in case we update to a new SuperLU version this fix does not silently disappear.
Submit these changes also as a PR to https://github.com/xiaoyeli/superlu

liviofetahu · 2021-11-15T19:12:57Z

@liviofetahu thanks for this fix. To make the request of @ilayn more clear, can you please:

Add these changes to scipychanges.patch, so that in case we update to a new SuperLU version this fix does not silently disappear.

Submit these changes also as a PR to https://github.com/xiaoyeli/superlu

Thanks for the instructions @rgommers -- here's the PR and the linked issue that I submitted to the other repo that you mentioned:
xiaoyeli/superlu#56
xiaoyeli/superlu#57

Fixed the signature of doubleMalloc and particularly doubleCalloc: possibly malloc needs to allocate >= 2^31 bytes. Fixed the signature of doubleMalloc and doubleCalloc in the implementation file: `SUPERLU_MALLOC` now passes the correct value to `void *superlu_python_module_malloc(size_t size)` which internally calls `malloc(size)`. `work = doubleCalloc(n*nrhs)` now executes for right-hand sides B of size 46679 by 46680 filled with doubles, and some `(size_t)` casts that fix proper indexing into buffers such as `work_col = &work[(size_t)j * (size_t)n]`.

rgommers · 2021-11-16T15:45:35Z

Thanks @liviofetahu! Upstream PR was already merged, so no need anymore to include in our patch file. I rebased to fix the merge conflicts.

liviofetahu · 2021-11-16T15:48:16Z

Thanks @liviofetahu! Upstream PR was already merged, so no need anymore to include in our patch file. I rebased to fix the merge conflicts.

Thanks for taking care of this @rgommers -- I forgot to mention that I forgot to include my changes in scipychanges.patch but I'm glad to hear from you that there is no need anymore to include the changes in the patch file.

rgommers · 2021-11-16T15:56:15Z

There was also a follow up commit upstream to fix this bug for other precisions, but rather than sync just that patch, we should just update all of SuperLU in one go.

liviofetahu · 2021-11-16T16:03:12Z

There was also a follow up commit upstream to fix this bug for other precisions, but rather than sync just that patch, we should just update all of SuperLU in one go.

That sounds good -- now it shouldn't be hard to put these changes into the other SuperLU source files that handle the other types of precision.

rgommers

CI failures are unrelated, in it goes. Thanks @liviofetahu!

liviofetahu · 2021-11-16T16:42:41Z

CI failures are unrelated, in it goes. Thanks @liviofetahu!

Sounds good @rgommers -- working on the fixes for the other source files in the upstream SuperLU repo that are responsible for the various kinds of precision.

rgommers · 2021-11-16T17:15:44Z

You mean you'll submit a new PR here to sync xiaoyeli/superlu@afe15f3? That is the fix for the other precisions, unless I'm missing something.

liviofetahu · 2021-11-16T17:18:05Z

You mean you'll submit a new PR here to sync xiaoyeli/superlu@afe15f3? That is the fix for the other precisions, unless I'm missing something.

Oh, it seems like the changes to the other source files have been made -- thanks for pointing it out. Looks like we're all set now.

tylerjereddy added the scipy.sparse.linalg label Nov 4, 2021

liviofetahu changed the title ~~Fix memory allocation in spsolve~~ BUG: scipy.sparse.linalg.spsolve runtime memory error caused from bad input to malloc Nov 5, 2021

liviofetahu changed the title ~~BUG: scipy.sparse.linalg.spsolve runtime memory error caused from bad input to malloc~~ BUG: scipy.sparse.linalg.spsolve: fix runtime memory error caused from overflowing signed 32-bit input values to doubleCalloc Nov 5, 2021

liviofetahu changed the title ~~BUG: scipy.sparse.linalg.spsolve: fix runtime memory error caused from overflowing signed 32-bit input values to doubleCalloc~~ BUG: scipy.sparse.linalg.spsolve: fix runtime memory error caused from overflowing signed 32-bit int input to doubleCalloc Nov 5, 2021

liviofetahu changed the title ~~BUG: scipy.sparse.linalg.spsolve: fix runtime memory error caused from overflowing signed 32-bit int input to doubleCalloc~~ BUG: scipy.sparse.linalg.spsolve: fix memory error caused from overflowing signed 32-bit int input to doubleCalloc Nov 5, 2021

rgommers force-pushed the fix-spsolve-allocation branch from 8afa39d to 0d95f6b Compare November 16, 2021 15:45

rgommers added this to the 1.8.0 milestone Nov 16, 2021

rgommers added the maintenance Items related to regular maintenance tasks label Nov 16, 2021

rgommers approved these changes Nov 16, 2021

View reviewed changes

rgommers merged commit 9484b9d into scipy:master Nov 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: scipy.sparse.linalg.spsolve: fix memory error caused from overflowing signed 32-bit int input to doubleCalloc #14979

BUG: scipy.sparse.linalg.spsolve: fix memory error caused from overflowing signed 32-bit int input to doubleCalloc #14979

liviofetahu commented Nov 4, 2021 •

edited

Loading

liviofetahu commented Nov 5, 2021

ilayn commented Nov 5, 2021

ilayn commented Nov 5, 2021

liviofetahu commented Nov 7, 2021

rgommers commented Nov 10, 2021

liviofetahu commented Nov 15, 2021

rgommers commented Nov 16, 2021

liviofetahu commented Nov 16, 2021

rgommers commented Nov 16, 2021

liviofetahu commented Nov 16, 2021

rgommers left a comment

liviofetahu commented Nov 16, 2021

rgommers commented Nov 16, 2021

liviofetahu commented Nov 16, 2021 •

edited

Loading

BUG: scipy.sparse.linalg.spsolve: fix memory error caused from overflowing signed 32-bit int input to doubleCalloc #14979

BUG: scipy.sparse.linalg.spsolve: fix memory error caused from overflowing signed 32-bit int input to doubleCalloc #14979

Conversation

liviofetahu commented Nov 4, 2021 • edited Loading

liviofetahu commented Nov 5, 2021

ilayn commented Nov 5, 2021

ilayn commented Nov 5, 2021

liviofetahu commented Nov 7, 2021

rgommers commented Nov 10, 2021

liviofetahu commented Nov 15, 2021

rgommers commented Nov 16, 2021

liviofetahu commented Nov 16, 2021

rgommers commented Nov 16, 2021

liviofetahu commented Nov 16, 2021

rgommers left a comment

Choose a reason for hiding this comment

liviofetahu commented Nov 16, 2021

rgommers commented Nov 16, 2021

liviofetahu commented Nov 16, 2021 • edited Loading

liviofetahu commented Nov 4, 2021 •

edited

Loading

liviofetahu commented Nov 16, 2021 •

edited

Loading