-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A changing of dtype in scipy.stats._multivariate.multivariate_normal_gen() makes covariance matrix no longer positive semidefinite #9942
Comments
Thanks for reporting this issue. Could you add
Thanks |
Numpy 1.16.0, SciPy 1.1.0. I pickled and then zipped my |
hm, if I run abc = pickle.load( open( 'cov.pkl', 'rb' ) )``, i get any ideas? |
@chrisb83 are you using numpy <1.16? pickle files aren't really portable like that (and a security risk, I wouldn't open some random ones). |
@ZhaofengWu if you want to provide large arrays as example and can't give the code to reproduce them, then please do so as |
Thanks. I will give a .npy in a few hours. |
Here are the two |
yes, I can reproduce this. Reason:
If you look at
I don't know where this condition comes from |
Also note that the smallest eigenvalue differs depending on dtype, so even if
|
Thanks! I think I understand now where this issue comes from, but I don't have enough statistic background to know if this is the expected behavior. Although from a user's point of view, it feels like changing the |
Unfortunately, I don't have further insights on the criterion to determine whether the matrix is PSD. In any case, it seems to be a borderline case since there are eigenvalues slightly below 0 (could be due a minor numerical errors of course). Maybe some other people know more about this |
Can this method scipy/scipy/stats/_multivariate.py Line 378 in edfb7ae
|
@ZhaofengWu Potentially, but could you try adding a tiny bit of the identity matrix to your covariance matrix? Sometimes that is done to improve numerical stability without affecting much else. See gh-13231 for a little more information. |
@tirthasheshpatel @tupui Do you think that the For instance, this user could perform an eigendecomposition on their matrix and store the decomposition instead of the original matrix. Reducing precision would no longer change whether the matrix is semidefinite or not. |
I think Covariance addresses the problem. Although, when I try to sample from the multivariate normal using the covariance object the OP provided, it gives me the same error:
Any idea @mdhaber why this could be happening or if it is related to the |
Yup, there's a small bug in scipy/scipy/stats/_multivariate.py Lines 858 to 860 in 3c0bc3f
It should pass in |
Closing this. Thanks for your work on the @ZhaofengWu FYI: With the new import numpy as np
from scipy import stats
with np.load('cov.npy.zip') as data:
cov = data['cov']
# compute the eigen-decomposition
w, v = np.linalg.eigh(cov)
# create a covariance object using the decomposition
cov_obj = stats.Covariance.from_eigendecomposition((w, v))
# create the distribution using the covariance object instead.
dist = stats.multivariate_normal(np.zeros(cov.shape[0], dtype=cov.dtype), cov_obj)
# now, the methods `pdf`, etc will utilize the decomposition to make the
# computation more efficient and also avoid recomputing the eigen-decomposition
# every time `pdf` or any other method of unfrozen distributions is called.
dist.logpdf(np.zeros(cov.shape[0])) # 46479.578745868996 I hope this addresses your issue! |
Thanks @tirthasheshpatel! @ZhaofengWu Note also that I'd recommend saving the eigenvalue decomposition of your data if you want full control over whether your covariance is treated as singular or not when you load it up again. |
The
multivariate_normal_frozen
class calls the_process_parameters
function of themultivariate_normal_gen
class.scipy/scipy/stats/_multivariate.py
Lines 734 to 735 in edfb7ae
For my covariance matrix (
cov.ndim == 2
), this method does nothing but changes its dtype fromfloat32
tofloat64
. However, this change of dtype apparently makes a positive semidefinite covariance matrix no longer one. Using PDB to pause below this function call, I get the following output (cov
is my original covariance matrix, andself.cov
is the output of_process_parameters
:The text was updated successfully, but these errors were encountered: