Integer overflow in scipy.sparse.linalg.eigsh and scipy.sparse.linalg.eigs for large values of the parameter maxiter (raises ARPACK error -4) #11261

ktgeier · 2019-12-22T20:36:15Z

The functions scipy.sparse.linalg.eigsh and scipy.sparse.linalg.eigs raise the misleading error message

scipy.sparse.linalg.eigen.arpack.arpack.ArpackError: ARPACK error -4: The maximum number of Arnoldi update iterations allowed must be greater than zero.

if the parameter maxiter exceeds the maximum value a 32-bit integer can hold (see example below).

The use of 32-bit integers is probably enforced by the ARPACK interface and the parameter maxiter is apparently passed without checking for overflow. An overflowing integer then appears to ARPACK as a negative value, thereby provoking the error quoted above.

In my application, I am using scipy.sparse.linalg.eigsh to find the quantum mechanical ground state of a Hamiltonian represented as an instance of scipy.sparse.linalg.LinearOperator. The default value of the option maxiter is n * 10, where n is the dimension of the linear operator. Now, for large systems, the default value of maxiter can easily exceed the maximum value a 32-bit integer can hold. The above error message is then very confusing, especially if the parameter maxiter is not even set explicitly.

As a minimum fix, I suggest to check the variable maxiter for overflow, and if this is the case, raise an error with a meaningful message. This could be realised by adding the following conditional statement to the __init__ method of the class _ArpackParams in the file arpack.py:

if maxiter <= 0:  # this check is already done
    raise ValueError("maxiter must be positive, maxiter=%d" % maxiter)
elif maxiter > np.iinfo(np.int32).max:  # this check should be added
    raise ValueError("maxiter must not exceed the maximum value a 32-bit integer can hold, maxiter=%d" % maxiter")

In addition, it would be good to choose the default value of maxiter more carefully, respecting the limit set by the use of 32-bit integers. I'm not sure though if it is easy to come up with a better, general heuristics for a reasonable cutoff value.

For a very large system, choosing the default value as a multiple of the dimension of the linear operator, as it is currently the case, practically amounts to setting no cutoff at all (since the computation will take forever if it doesn't converge after a reasonable number of iterations). One solution would therefore be to use maxiter=np.iinfo(np.int32).max as the default value and leave it to the user to come up with a reasonable cutoff depending on the respective use case. Or, in order to be more in line with the current behaviour, use maxiter=min(n * 10, np.iinfo(np.int32).max) as the default value.

Reproducing code example:

import numpy as np
from scipy.sparse.linalg import eigsh


identity = np.eye(42)

maxiter = np.iinfo(np.int32).max + 1

evals, evecs = eigsh(identity, maxiter=maxiter)

print(evals)

Error message:

Traceback (most recent call last):
  File "scipy_arpack_maxiter_overflow.py", line 9, in <module>
    evals, evecs = eigsh(identity, maxiter=maxiter)
  File "/usr/local/lib/python3.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1687, in eigsh
    params.iterate()
  File "/usr/local/lib/python3.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 573, in iterate
    raise ArpackError(self.info, infodict=self.iterate_infodict)
scipy.sparse.linalg.eigen.arpack.arpack.ArpackError: ARPACK error -4: The maximum number of Arnoldi update iterations allowed must be greater than zero.

Scipy/Numpy/Python version information:

1.4.1 1.18.0 sys.version_info(major=3, minor=7, micro=5, releaselevel='final', serial=0)

The text was updated successfully, but these errors were encountered:

rlucas7 · 2020-03-22T21:47:40Z

@laplacesdemon64 could you take a look at the proposal in #11302 and comment if that would solve the issue you are seeing?

It might help if you could comment on the magnitude of n values you use in your code.

ktgeier · 2020-03-23T10:45:30Z

@rlucas7 In my application, n = 300540195 (corresponding to the Hilbert space dimension of a Bose-Hubbard system with 16 sites and 16 particles). This value is about 0.14 * np.iinfo(np.int32).max and, although not far from the limits of a 32-bit integer, it still fits. However, the default choice maxiter = n * 10 already overflows.

I support your proposal #11302 of moving BLAS/LAPACK/ARPACK etc. to ILP64. This is particularly important for scientific computing, where state-of-the-art applications are touching the limits of 32-bit integers. In particular, the use of 64-bit integers in the ARPACK interface would eliminate the problem I encountered since n * 10 would easily fit a 64-bit integer for reasonable problem sizes n.

Nonetheless, in my opinion, one should in general always check for overflow when copying an arbitrary-size Python integer to some finite-size integer datatype.

rlucas7 added scipy.sparse scipy.sparse.linalg labels Dec 29, 2019

pv mentioned this issue Jan 1, 2020

ENH: build infrastructure for ILP64 BLAS + ARPACK conversion #11302

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integer overflow in scipy.sparse.linalg.eigsh and scipy.sparse.linalg.eigs for large values of the parameter maxiter (raises ARPACK error -4) #11261

Integer overflow in scipy.sparse.linalg.eigsh and scipy.sparse.linalg.eigs for large values of the parameter maxiter (raises ARPACK error -4) #11261

ktgeier commented Dec 22, 2019

rlucas7 commented Mar 22, 2020

ktgeier commented Mar 23, 2020

Integer overflow in scipy.sparse.linalg.eigsh and scipy.sparse.linalg.eigs for large values of the parameter maxiter (raises ARPACK error -4) #11261

Integer overflow in scipy.sparse.linalg.eigsh and scipy.sparse.linalg.eigs for large values of the parameter maxiter (raises ARPACK error -4) #11261

Comments

ktgeier commented Dec 22, 2019

Reproducing code example:

Error message:

Scipy/Numpy/Python version information:

rlucas7 commented Mar 22, 2020

ktgeier commented Mar 23, 2020