Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not segfault in svd(a) with VT.size > INT_MAX #20349

Merged
merged 2 commits into from Mar 28, 2024
Merged

Conversation

ev-br
Copy link
Member

@ev-br ev-br commented Mar 28, 2024

Reference issue

closes #14001

What does this implement/fix?

linalg.svd with too large matrices may segfault if m*n > int_max on large-memory machines (on smaller memory machines it may fail with a MemoryError instead). The root cause is an integer overflow in indexing 2D arrays, deep in the LAPACK code.

Thus, detect a possible error condition in the f2py wrapper and error out early.

Additional information

Suggested by @pearu in #14001 (comment)

Adding this kind of check to f2py may still make sense IMO. No reason to not guard against this specific segfault in the meantime though.

@ev-br ev-br added defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.linalg labels Mar 28, 2024
@ilayn
Copy link
Member

ilayn commented Mar 28, 2024

Looks good to me. We can also catch these in the Python side without passing them already to f2py but that's a minor point.

Copy link
Contributor

@pearu pearu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest using an alternative approach that does not involve using sqrt and F_INT_MAX, and that provides perhaps a more useful exception message for the end users.

Notice also that checking overflow in m * n is an approximation for resolving the problem. There still exists combinations of m and n such that m * n does not overflow but can cause segfaults because the int overflow may occur from other terms that are added to m * n result (each lapack function may have different expression that involves m * n).

check(m <= sqrt(F_INT_MAX)/n)
integer intent(hide),depend(m,n) :: minmn = MIN(m,n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another approach of detecting integer overflow that does not involve introducing F_INT_MAX and not using sqrt, is (untested):

Suggested change
check(m <= sqrt(F_INT_MAX)/n)
integer intent(hide),depend(m,n) :: minmn = MIN(m,n)
integer intent(hide),depend(m,n) :: mn_overflow = m * n
check(m == 0 || mn_overflow / m == n)
integer intent(hide),depend(m,n) :: minmn = MIN(m,n)

or similar.

Notice that when the check fails, the check expression is displayed to the user. So, it is advisable to use an expression that users interpret it correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just use

check(m == 0 || (m * n) / m == n)

@ev-br ev-br force-pushed the gesdd_segfault branch 4 times, most recently from 7664655 to 0ac0e41 Compare March 28, 2024 11:58
if lapack_driver == 'gesdd' and compute_uv:
max_mn = max(a1.shape)
# XXX: revisit int32 when ILP64 lapack becomes a thing
if max_mn > math.sqrt(numpy.iinfo(numpy.int32).max):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using max(m, n) > sqrt(float32_max) is unnecessarily restrictive for cases where min(m, n) is relatively small.

Why not use exact overflow condition that is defined by gesdd source code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ISTM the product of m*n does not really matter. Let a1.shape == (2, 53130) and compute_uv==True. The orthogonal matrix VT has shape (53130, 53130) and causes the overflow.

In the example of #14001 (comment), note that
print*, n*n gives -1472170396

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Demo:

$ nano gesdd.i4.f90 
ev-br@qgpu3:~/temp$ gfortran gesdd.i4.f90 -lblas -llapack
ev-br@qgpu3:~/temp$ ./a.out 
 -1472170396
 info =            0
 opt lwork =      1700166

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7f1028a0c51f in ???
	at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
#1  0x7f102a82577b in ???
#2  0x7f102a83d898 in ???
#3  0x7f102a83da70 in ???
#4  0x7f102a7d1769 in ???
#5  0x561ff340c261 in ???
#6  0x561ff340c43a in ???
#7  0x7f10289f3d8f in __libc_start_call_main
	at ../sysdeps/nptl/libc_start_call_main.h:58
#8  0x7f10289f3e3f in __libc_start_main_impl
	at ../csu/libc-start.c:392
#9  0x561ff340b138 in ???
Segmentation fault (core dumped)

with

$ cat gesdd.i4.f90 
implicit none

character*1 :: jobz = 'A'

integer*4 :: m, n
integer*4 :: lda, ldu, ldvt, lwork
integer*4 :: info
integer*4, allocatable :: iwork(:)
real*8, allocatable :: a(:, :)
real*8, allocatable :: s(:)
real*8, allocatable :: u(:, :)
real*8, allocatable :: vt(:, :)
real*8, allocatable :: work(:)


! m < n only
m = 2 !4799    ! <<<<< HERE
n = 53130

lda = m
ldu = m
ldvt = n
allocate(a(lda, n), s(n), u(ldu, m), vt(ldvt, n))
allocate(iwork(8*n))

a = 1.d0

! workspace query
allocate(work(1))
lwork = -1

call dgesdd(jobz, m, n, a, lda, s, u, ldu, vt, ldvt, work, lwork, iwork, info)

lwork = int(work(1))

print*, n*n
print*, 'info = ', info
print*, 'opt lwork = ', lwork


deallocate(work)
allocate(work(lwork))

call dgesdd(jobz, m, n, a, lda, s, u, ldu, vt, ldvt, work, lwork, iwork, info)
print*, 'info = ', info
!print*, s
print*, 'max(sigma) = ', maxval(s) 

end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, relaxed the condition for full_matrices=False, where the VT size is n, min(m, n).

@ev-br
Copy link
Member Author

ev-br commented Mar 28, 2024

Pushed an update: 1) The failure mode in gh-14001 is not m*n overflows; it's when the U and VT matrices are requested, they have the shape (m, m) and (n, n) and one of n*n or m*m overflows. So check that. 2) Move the check to the python level to simplify things and give a more informative error message.

linalg.svd with too large matrices may segfault if
max(m, n)*max(m, n) > int_max on large-memory machines (on smaller memory machines
it may fail with a MemoryError instead). The root cause is an integer overflow
in indexing 2D arrays, deep in the LAPACK code.
Thus, detect a possible error condition and bail out early.
@ev-br ev-br changed the title Do not segfault in svd(a) with a.size > INT_MAX Do not segfault in svd(a) with VT.size > INT_MAX Mar 28, 2024
Copy link
Contributor

@pearu pearu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one minor nit, otherwise, LGTM! Thanks, @ev-br!

scipy/linalg/_decomp_svd.py Outdated Show resolved Hide resolved
Co-authored-by: Pearu Peterson <pearu.peterson@gmail.com>
@ev-br
Copy link
Member Author

ev-br commented Mar 28, 2024

Thanks for the reviews Pearu, Ilhan. Merging as approved

@ev-br ev-br merged commit 4a343ab into scipy:main Mar 28, 2024
29 checks passed
@ev-br ev-br added this to the 1.14.0 milestone Mar 28, 2024
@tylerjereddy tylerjereddy added the backport-candidate This fix should be ported by a maintainer to previous SciPy versions. label Mar 28, 2024
@rgommers rgommers mentioned this pull request Mar 29, 2024
92 tasks
@tylerjereddy tylerjereddy modified the milestones: 1.14.0, 1.13.0 Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-candidate This fix should be ported by a maintainer to previous SciPy versions. defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.linalg
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pycharm scipy SVD returning error code without message
4 participants