Pinv enhancements #289

Closed
wants to merge 7 commits into from

4 participants

@jakevdp
SciPy member

This is a PR based on some of @vene's work in scikit-learn: see scikit-learn/scikit-learn#1015

There are two enhancements:

  • scipy.linalg.pinv2 is sped up by only computing necessary singular values, and not allocating the entire psigma matrix

  • scipy.linalg.pinvh added: this uses eigh to significantly speed up the pseudo-inverse computation in the symmetric/hermitian case

Here are some simple benchmarks:

old pinv2 vs new pinv2

old version:

In [1]: import numpy as np
In [2]: from scipy.linalg import pinv2
In [3]: np.random.seed(0)
In [4]: X = np.random.random((500, 100))
In [5]: %timeit pinv2(X)
10 loops, best of 3: 172 ms per loop

new version:

...
In [5]: %timeit pinv2(X)
10 loops, best of 3: 38.1 ms per loop

pinvh vs new pinv2:

In [1]: import numpy as np
In [2]: from scipy.linalg import pinv2, pinvh
In [3]: np.random.seed(0)
In [4]: X = np.random.random((500, 400))
In [5]: X = np.dot(X, X.T) # make symmetric positive semi-definite
In [6]: %timeit pinv2(X)
1 loops, best of 3: 736 ms per loop
In [7]: %timeit pinvh(X)
1 loops, best of 3: 320 ms per loop
@charris charris commented on the diff Aug 16, 2012
scipy/linalg/basic.py
if rcond is not None:
cond = rcond
if cond in [None,-1]:
- eps = np.finfo(np.float).eps
- feps = np.finfo(np.single).eps
- _array_precision = {'f': 0, 'd': 1, 'F': 0, 'D': 1}
- cond = {0: feps*1e3, 1: eps*1e6}[_array_precision[t]]
- m, n = a.shape
- cutoff = cond*np.maximum.reduce(s)
- psigma = np.zeros((m, n), t)
- for i in range(len(s)):
- if s[i] > cutoff:
- psigma[i,i] = 1.0/np.conjugate(s[i])
- #XXX: use lapack/blas routines for dot
- return np.transpose(np.conjugate(np.dot(np.dot(u,psigma),vh)))
+ t = u.dtype.char.lower()
+ factor = {'f': 1E3, 'd': 1E6}
@charris
SciPy member
charris added a note Aug 16, 2012

I wonder if the updated version in np.matrix_rank would be a better choice for the default condition? See the discussion at http://thread.gmane.org/gmane.comp.python.numeric.general/50396/focus=50912

@jakevdp
SciPy member
jakevdp added a note Aug 16, 2012

Interesting discussion: here I just duplicated the previous behavior. Any change would be a potential compatibility issue: before making that decision we should probably discuss it on the mailing list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jakevdp
SciPy member

I just had another thought: we should add a lower keyword to pinvh which is passed to eigh.

@agramfort agramfort and 1 other commented on an outdated diff Aug 17, 2012
scipy/linalg/basic.py
+
+ Returns
+ -------
+ B : array, shape (N, N)
+
+ Raises
+ ------
+ LinAlgError
+ If eigenvalue does not converge
+
+ Examples
+ --------
+ >>> from numpy import *
+ >>> a = random.randn(9, 6)
+ >>> a = np.dot(a, a.T)
+ >>> B = pinv_symmetric(a)

docstring is not up to date right?

@jakevdp
SciPy member
jakevdp added a note Aug 17, 2012

right - good catch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jakevdp
SciPy member

@pv - I think this is ready for merge. @charris's comment about matrix rank would be interesting to explore, but I think it's an enhancement that would affect more than just this PR.

@pv pv commented on an outdated diff Aug 25, 2012
scipy/linalg/basic.py
+ True
+ >>> allclose(B, dot(B, dot(a, B)))
+ True
+
+ """
+ a = np.asarray_chkfinite(a)
+ s, u = decomp.eigh(a, lower=lower)
+
+ if rcond is not None:
+ cond = rcond
+ if cond in [None, -1]:
+ t = u.dtype.char.lower()
+ factor = {'f': 1E3, 'd': 1E6}
+ cond = factor[t] * np.finfo(t).eps
+
+ # unlike svd case, eigh can lead to negative eigenvalues
@pv
SciPy member
pv added a note Aug 25, 2012

More illuminating comment: "For hermitian matrices, singular values equal abs(eigenvalues)"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@pv
SciPy member
pv commented Aug 25, 2012

@jakevdp: looks good to me. Changing the default cutoff is maybe out of scope for this PR. It seems we don't have tests for single-precision data types?

@charris
SciPy member

It might be worth bringing over some of Matthew's documentation for np.matrix_rank, I think it is quite good. But I agree that changing the current default choices for the cutoff is independent of this PR.

@josef-pkt
SciPy member

a suggestion: can we add a keyword argument to optionally return the rank that was used, or eigen or singular values?

looking briefly through the scikit learn PR
"Yes... though I don't think it's possible to do this quickly using the current pinv or pinv2. We should add a flag that optionally returns the number of singular values used in the computation."

In statsmodels, we have currently the problem that we are doing two svd, one for the pinv and one to get the rank. The only option I saw so far was to copy pinv directly into statsmodels with the additional return.

@jakevdp
SciPy member

@josef-pkt: good suggestion on the flag to return the rank. It's a few lines of code that add a lot of flexibility to the use of the routines.

I think I've addressed all the comments (short of the re-working of singular value cutoff selection). This should be ready to merge.

@pv pv added a commit that referenced this pull request Oct 21, 2012
@pv pv ENH: linalg: Merge PR #289 Pinv enhancements
This is a PR based on some of @vene's work in scikit-learn: see
scikit-learn/scikit-learn#1015

There are three enhancements:

- scipy.linalg.pinv2 is sped up by only computing necessary singular
  values, and not allocating the entire psigma matrix

- scipy.linalg.pinvh added: this uses eigh to significantly speed up the
  pseudo-inverse computation in the symmetric/hermitian case

- add return_rank keyword to obtain the effective rank from all pinv*
  routines.
5adeb12
@pv
SciPy member
pv commented Oct 21, 2012

Thanks, merged.

@pv pv closed this Oct 21, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment