Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer overflow in scipy.sparse.linalg.eigsh and scipy.sparse.linalg.eigs for large values of the parameter maxiter (raises ARPACK error -4) #11261

Open
ktgeier opened this issue Dec 22, 2019 · 2 comments

Comments

@ktgeier
Copy link

ktgeier commented Dec 22, 2019

The functions scipy.sparse.linalg.eigsh and scipy.sparse.linalg.eigs raise the misleading error message

scipy.sparse.linalg.eigen.arpack.arpack.ArpackError: ARPACK error -4: The maximum number of Arnoldi update iterations allowed must be greater than zero.

if the parameter maxiter exceeds the maximum value a 32-bit integer can hold (see example below).

The use of 32-bit integers is probably enforced by the ARPACK interface and the parameter maxiter is apparently passed without checking for overflow. An overflowing integer then appears to ARPACK as a negative value, thereby provoking the error quoted above.

In my application, I am using scipy.sparse.linalg.eigsh to find the quantum mechanical ground state of a Hamiltonian represented as an instance of scipy.sparse.linalg.LinearOperator. The default value of the option maxiter is n * 10, where n is the dimension of the linear operator. Now, for large systems, the default value of maxiter can easily exceed the maximum value a 32-bit integer can hold. The above error message is then very confusing, especially if the parameter maxiter is not even set explicitly.

As a minimum fix, I suggest to check the variable maxiter for overflow, and if this is the case, raise an error with a meaningful message. This could be realised by adding the following conditional statement to the __init__ method of the class _ArpackParams in the file arpack.py:

if maxiter <= 0:  # this check is already done
    raise ValueError("maxiter must be positive, maxiter=%d" % maxiter)
elif maxiter > np.iinfo(np.int32).max:  # this check should be added
    raise ValueError("maxiter must not exceed the maximum value a 32-bit integer can hold, maxiter=%d" % maxiter")

In addition, it would be good to choose the default value of maxiter more carefully, respecting the limit set by the use of 32-bit integers. I'm not sure though if it is easy to come up with a better, general heuristics for a reasonable cutoff value.

For a very large system, choosing the default value as a multiple of the dimension of the linear operator, as it is currently the case, practically amounts to setting no cutoff at all (since the computation will take forever if it doesn't converge after a reasonable number of iterations). One solution would therefore be to use maxiter=np.iinfo(np.int32).max as the default value and leave it to the user to come up with a reasonable cutoff depending on the respective use case. Or, in order to be more in line with the current behaviour, use maxiter=min(n * 10, np.iinfo(np.int32).max) as the default value.

Reproducing code example:

import numpy as np
from scipy.sparse.linalg import eigsh


identity = np.eye(42)

maxiter = np.iinfo(np.int32).max + 1

evals, evecs = eigsh(identity, maxiter=maxiter)

print(evals)

Error message:

Traceback (most recent call last):
  File "scipy_arpack_maxiter_overflow.py", line 9, in <module>
    evals, evecs = eigsh(identity, maxiter=maxiter)
  File "/usr/local/lib/python3.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1687, in eigsh
    params.iterate()
  File "/usr/local/lib/python3.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 573, in iterate
    raise ArpackError(self.info, infodict=self.iterate_infodict)
scipy.sparse.linalg.eigen.arpack.arpack.ArpackError: ARPACK error -4: The maximum number of Arnoldi update iterations allowed must be greater than zero.

Scipy/Numpy/Python version information:

1.4.1 1.18.0 sys.version_info(major=3, minor=7, micro=5, releaselevel='final', serial=0)
@rlucas7
Copy link
Member

rlucas7 commented Mar 22, 2020

@laplacesdemon64 could you take a look at the proposal in #11302 and comment if that would solve the issue you are seeing?

It might help if you could comment on the magnitude of n values you use in your code.

@ktgeier
Copy link
Author

ktgeier commented Mar 23, 2020

@rlucas7 In my application, n = 300540195 (corresponding to the Hilbert space dimension of a Bose-Hubbard system with 16 sites and 16 particles). This value is about 0.14 * np.iinfo(np.int32).max and, although not far from the limits of a 32-bit integer, it still fits. However, the default choice maxiter = n * 10 already overflows.

I support your proposal #11302 of moving BLAS/LAPACK/ARPACK etc. to ILP64. This is particularly important for scientific computing, where state-of-the-art applications are touching the limits of 32-bit integers. In particular, the use of 64-bit integers in the ARPACK interface would eliminate the problem I encountered since n * 10 would easily fit a 64-bit integer for reasonable problem sizes n.

Nonetheless, in my opinion, one should in general always check for overflow when copying an arbitrary-size Python integer to some finite-size integer datatype.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants