Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Too large work array required -- computation cannot be performed with standard 32-bit LAPACK. #10337

Closed
seralouk opened this issue Jun 20, 2019 · 6 comments

Comments

@seralouk
Copy link

seralouk commented Jun 20, 2019

Dear Scipy community,

I am trying to run an SVD on a large matrix and I get the following error:

ValueError: Too large work array required -- computation cannot be performed with standard 32-bit LAPACK.

My code:

import numpy as np
from scipy.linalg import svd

a = np.ones((30000,30000))
u,s,vh = svd(a)
import sys, scipy, numpy; print(scipy.__version__, numpy.__version__, sys.version_info)

returns:

('1.1.0', '1.15.4', sys.version_info(major=2, minor=7, micro=15, releaselevel='final', serial=0))

I would expect to be able to run an SVD on a 30000 by 30000 matrix.

import platform
platform.architecture()

returns:

('64bit', '')

UPDATE:

It seems that svd(a, lapack_driver='gesvd') works, however I do not understand why the standard svd(a) with the deafult driver gesdd fails.

@AidanGG
Copy link
Contributor

AidanGG commented Jun 22, 2019

gesdd uses the divide and conquer approach (see Gu and Eisenstat 1995) for the SVD after the matrix has been reduced to bidiagonal form, which has much higher memory requirements than the implicit QR algorithm used in gesvd for the same problem.

You can see from the MKL gesdd documentation application notes the required working array size. For m=n=30000, this exceeds the 32-bit integers used in the BLAS/LAPACK interface. Also see numpy/numpy#5906.

I've had to run massive SVDs before, and the options that I'm aware of are to use MKL's or OpenBLAS' ILP64 LAPACK/ScaLAPACK directly, or Elemental.

@seralouk
Copy link
Author

Thanks for the details.

For n=m=30000 as dimensions of a matrix a the following seems to run without error:

u,s,v = svd(a, lapack_driver="gesvd")

I am not aware of MKL's or OpenBLAS' ILP64 LAPACK/ScaLAPACK directly, or Elemental. Could you guide me a bit around me?

@AidanGG
Copy link
Contributor

AidanGG commented Jun 23, 2019

Yes, gesvd will work because the QR algorithm it uses for the bidiagonal SVD stage does not require as much working space.

Since the QR algorithm is much slower than the bidiagonal divide and conquer (BDC) of gesdd, I suggested the alternatives for using BDC if you really wanted to use it. Otherwise, you can continue to use gesvd.

For shared memory systems:

  • LAPACK/LAPACKE through MKL with 64-bit ints, or OpenBLAS built with INTERFACE64=1. Other vendors may offer 64-bit interfaces as well (Cray etc.) (Fortran/C)
  • Eigen through the BDCSVD implementation (C++)

For distributed memory systems, using MPI:

  • ScaLAPACK, again with 64-bit ints (Fortran/C)
  • Elemental, built with 64-bit ints and backed by a BLAS/LAPACK implementation built with 64-bit ints, as above (C++)

@ilayn
Copy link
Member

ilayn commented Jun 24, 2019

As @AidanGG mentioned, this is a property of the underlying method of how the SVD is computed. I'll close this issue but please feel free to continue the discussion.

@gokceneraslan
Copy link
Contributor

One possibly dumb question: if I switch to MKL with 64-bit ints, or OpenBLAS built with INTERFACE64=1, would scipy directly use it without any modifications and it'd solve the Too large work array required error or not?

@AidanGG
Copy link
Contributor

AidanGG commented Jun 28, 2019

I believe that according to this issue numpy/numpy#5906 it will not, but someone more knowledgeable than me regarding how numpy and scipy work will have to confirm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants