Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError in scipy.linalg.eigvalsh for large matrix #8205

Closed
mgbukov opened this issue Dec 7, 2017 · 20 comments · Fixed by #11304
Closed

ValueError in scipy.linalg.eigvalsh for large matrix #8205

mgbukov opened this issue Dec 7, 2017 · 20 comments · Fixed by #11304
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.linalg
Milestone

Comments

@mgbukov
Copy link

mgbukov commented Dec 7, 2017

scipy.linalg.eigvalsh() throws ValueError() for large matrices. The bug appears for arbitrary matrices.

However, it also shows up in trivial examples, such as a large identity matrix, and large diagonal matrices with random coefficients in [0,1].

Reproducing code example:

import numpy as np 
import scipy.linalg as la

H=np.eye(6470)
#np.random.seed(0)
#H=np.diag(np.random.uniform(size=6470))
E=la.eigvalsh(H)

Error message:

Traceback (most recent call last):
  File "example0.py", line 7, in <module>
    E=la.eigvalsh(H)
  File "/Users/mbukov/anaconda3/lib/python3.6/site-packages/scipy/linalg/decomp.py", line 734, in eigvalsh
    check_finite=check_finite)
  File "/Users/mbukov/anaconda3/lib/python3.6/site-packages/scipy/linalg/decomp.py", line 384, in eigh
    iu=a1.shape[0], overwrite_a=overwrite_a)
ValueError: On entry to DSBRDB parameter number 12 had an illegal value

Scipy/Numpy/Python version information:

0.19.1 1.13.0 sys.version_info(major=3, minor=6, micro=2, releaselevel='final', serial=0)
@ilayn
Copy link
Member

ilayn commented Dec 7, 2017

Hi @mgbukov this is a duplicate of #6666. I've tested again with SciPy 1.0 and 1.1 and I can't still reproduce it. It might be a conda/lapack/mkl combo bug. I'm on OpenBLAS built with :

>>> la.lapack.ilaver()
(3, 7, 0)

@mgbukov
Copy link
Author

mgbukov commented Dec 8, 2017

Hi @ilayn, could be that this is a lapack issue. I my case, I also have

>>> la.lapack.ilaver()
(3, 7, 0)

@ilayn
Copy link
Member

ilayn commented Dec 8, 2017

Judging by the google results statistics, this looks like a special bug for MKL but not LAPACK.

@ilayn ilayn added defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.linalg labels Dec 8, 2017
@chris-n-self
Copy link

chris-n-self commented Dec 20, 2017

I'm having this problem too, I recently changed from homebrew python (on mac OS X) to anaconda and that's when it started happening (https://stackoverflow.com/questions/47836266/error-when-diagonalising-large-matrices-using-anaconda-scipy). So perhaps it is an anaconda bug and should be reported to them.

@ilayn
Copy link
Member

ilayn commented Dec 20, 2017

@chris-n-self I've checked it again and the routines reported here don't seem to be Reference LAPACK routines. But instead they pop up with MKL searches (such as DSBRDB, ZHBRDB). From the naming they sound like banded Symmetric/Hermitian matrix reduction to (again?) banded matrix routines.

That tells me that this has to be reported to Intel MKL which is a particular implementation, let alone the fact that we can't even fix anything about LAPACK bugs. This is beyond our reach. And I'm almost certain beyond Anaconda too.

If you can test a nonMKL SciPy inside anaconda that would settle the issue.

@brad-alt
Copy link

brad-alt commented Dec 22, 2017

@ilayn I'm able to reproduce this error in conda python2.7 using MKL -
For me the call to scipy.linalg.eigvalsh resolves to MKL function dsyevr. I wrote a c program to call MKL's fortran function dsyevr that successfully computed the eigenvals of a 6470+ matrix, and I tried some much larger matrices without issue as well. There is still probably an issue with MKL since this all works when scipy is using OpenBLAS. However, perhaps the MKL issue is still specific to how scipy calls dsyevr (it's got these work size parameters that should be optimized), and there can be a workaround.. I guess I'll look into the cython code to understand that better, which may reveal the MKL issue...

summary of test
My code is basically Intel's example for dsyevr, but I increased array size and had to be dynamically allocated to prevent segmentation fault (I can post code if interested): https://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_lapack_examples/dsyevr_ex.c.htm

Here is the make and test run:
brad@brad-G11CD:/projects/iss8205/dsyevr_mkl$ make
cc -m64 -I/opt/intel/compilers_and_libraries_2018.1.163/linux/mkl/include test.c -Wl,--start-group /opt/intel/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64/libmkl_intel_lp64.a /opt/intel/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64/libmkl_sequential.a /opt/intel/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm -ldl -o test
brad@brad-G11CD:
/projects/iss8205/dsyevr_mkl$ ./test
DSYEVR Example Program Results
The total number of eigenvalues found:6470

@ilayn
Copy link
Member

ilayn commented Dec 22, 2017

@brd4790 Thank you for the analysis. Indeed I've seen that we don't query liwork, lwork as we should have done but going with minimal required needed.

Did you call DSYEVR thrice with lwork, liwork supplying -1? If yes could you please also test in your C program if lowering liwork to 10*n and lwork to 26*n produce problems ?

@brad-alt
Copy link

@ilayn thanks for the suggestion! That does indeed reproduce the error and returns all zero eigenvalues. Same as noted in #6666 , this works for n = 6143 and fails for n > 6143. I can report this to Intel. Otherwise, what do you suggest? Is it worth optimizing the work parameters to work around this issue? Thanks!

Error message was "Intel MKL ERROR: Parameter 12 was incorrect on entry to DSBRDB."

@ilayn
Copy link
Member

ilayn commented Dec 22, 2017

@brd4790 Ouch. No, that means the blame is probably on us in the f2py wrapper 😃 I'll try to compile a PR to fix this as soon as I can confirm it locally.

In the meantime which lwork and liwork values did you use in your previous C program if I may ask? I mean the one that didn't give any errors.

Thank you for dissecting this further for us by the way.

EDIT: By the way, this still shows that MKL has an internal bug probably assuming the optimal lwork size but nevertheless we should have used lwork and liwork queries anyhow.

@brad-alt
Copy link

@ilayn right, that's what I was thinking is that MKL has a bug since dysevr is not living up to it's documented contract. I'll still report that to Intel for what it's worth.

To calculate the optimal values I called dysevr with lwork & liwork set to -1... I have no idea what the optimization is doing. For n=6144 , the optimization returns lwork = 768000 and liwork = 61440. So liwork is at the minimum, but not lwork.

happy to help, thank you!

@ilayn
Copy link
Member

ilayn commented Dec 23, 2017

@brd4790 It seems that the wrapper did not expose every possibility and while modifying it I've found out that proper fix will break the <s/d>evr signatures. I've already fixed the wrapper but it needs a dev-team decision. So it will take some more discussion to fix this apparently. Take a look at the linked PR above.

@brad-alt
Copy link

@ilayn thanks, that's cool.. I've never seen pyf signature file before. Just out of curiosity, what's the issue with marking the work sizes as optional and leaving the default values at the minimums?

@brad-alt
Copy link

FYI: Intel responded to me and referenced a similar issue in 2018 "MKLD-3350"
link: intel MKL topic 753935

@ilayn
Copy link
Member

ilayn commented Dec 24, 2017

Interesting, yesterday I got the same problem with DORMQR having an illegal parameter when I exposed all the inputs/outputs. There might be a bit more to this story then.

@brad-alt
Copy link

brad-alt commented Oct 5, 2018

@ilayn FYI Intel has fixed the MKL bug with minimum work sizes in release 19.0
https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/753935

@hzlucq
Copy link

hzlucq commented Mar 4, 2019

@ilayn , I have IMKL/2019.2.187, numpy-1.16.2, scipy-1.2.1 and python/3.7.0, but I still have the same error with hermitian matrix , but it is correct with symm matrix as shown with the simple example code:
#################
(ENV_mkl_scipy) [luhuizho@build-node ~]$ cat simple.py
import numpy as np
from scipy.linalg import eigvalsh
print('*'10+ ' symm matrix ' + ''10)
eigvalsh(np.eye(6144))
print('
'10+ ' h matrix ' + ''*10)
eigvalsh(np.eye(6144)*1j)
##############
********** symm matrix **********
********** h matrix **********

Intel MKL ERROR: Parameter 12 was incorrect on entry to ZHBRDB.

We don't have trouble with numpy/scipy+openblas.

Could you help ?

thanks,

HuiZhong

@marvinlenk
Copy link

The problem still exists for me, using Anaconda 2019.10-py37 with mkl 2019.4 and scipy 1.3.1. A rather comprehensive description of my problem can be found here: https://stackoverflow.com/questions/54314529/mkl-error-parameter-12-for-large-matrices-with-scipy-linalg-eigvalsh-in-an

To make a long story short - I used the scipy lapack wrappers to do what is written in the eigvalsh (or rather eigh) routine and it reproduces the same error as mentioned right at the beginning, spitting out

Intel MKL ERROR: Parameter 12 was incorrect on entry to ZHBRDB

Then I checked the zheev function in Fortran - so the same function that eigh uses - linked against the same mkl as scipy and it worked. So I guesss the problem is hidden in the wrapper, probably some problems with calculating the work in combination with using MKL, since non-MKL versions work perfectly fine. Deeper analysis is atm beyond my skills.

@ev-br
Copy link
Member

ev-br commented Nov 7, 2019

I used the scipy lapack wrappers to do what is written in the eigvalsh (or rather eigh) routine and it reproduces the same error

and

Then I checked the zheev function in Fortran - so the same function that eigh uses

It would be helpful if you could add here both programs/scripts.

@ilayn
Copy link
Member

ilayn commented Jan 7, 2020

@marvinlenk #11304 I've renewed the wrappers for evr family. Unfortunately, the signature has changed as a result of this unification. Would be great if you can give any feedback you might have. I've already tested your SO example and all works, seemingly, OK .

@ilayn ilayn added this to the 1.5.0 milestone Feb 11, 2020
@marvinlenk
Copy link

@marvinlenk #11304 I've renewed the wrappers for evr family. Unfortunately, the signature has changed as a result of this unification. Would be great if you can give any feedback you might have. I've already tested your SO example and all works, seemingly, OK .

@ilayn Sorry for the late reply - it seems to be working like a charm now, thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.linalg
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants