Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak corrcoef #7414

Merged
merged 3 commits into from
Mar 14, 2016
Merged

Tweak corrcoef #7414

merged 3 commits into from
Mar 14, 2016

Conversation

charris
Copy link
Member

@charris charris commented Mar 13, 2016

Clip real and imag parts of result of corrcoef to the interval [-1, 1]. This doesn't really fix the issue with complex Hermitean results, nor is the diagonal guaranteed to be one, but it does take care of issue #7392. I put it here pending a decision of the proper course to take.

The input arrays are documented to have ndim <=2, so check for that
and raise a ValueError on failure.
c[i,:] /= (d * d[i])
stddev = sqrt(d.real)
c /= stddev[:, None]
c /= stddev[None, :]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we checked that this is faster? (I would certainly hope that it is, but you never know :-).) Also, having read up a bit more I withdraw my worry about numerical stability, in case you want to do the multiply-by-inverse thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's about the same. Using multiplication makes it 2% faster for the test problem in the original fix. I think most of the time is always going to be in cov. However, I prefer this version for its clarity, OTOH, you get two zero division warnings when one of the diagonals is zero (which can happen).

@njsmith
Copy link
Member

njsmith commented Mar 13, 2016

Seems like a nice little improvement to me.

@@ -2517,6 +2523,12 @@ def corrcoef(x, y=None, rowvar=1, bias=np._NoValue, ddof=np._NoValue):

Notes
-----
Due to floating point rounding the resulting array may not be Hermitean,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't that be "Hermit**_i**_an"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Thanks.

The non-nan elements of the result of corrcoef should satisfy the
inequality abs(x) <= 1 and the non-nan elements of the diagonal should
be exactly one. We can't guarantee those results due to roundoff, but
clipping the real and imaginary parts to the interval [-1, 1] improves
things to a small degree.

Closes numpy#7392.
This doesn't actually test much, as we don't have any inputs where that
was not already the case. But at least it is there and perhaps a fuzz
test can be added at a later date.
@charris
Copy link
Member Author

charris commented Mar 14, 2016

I'm going to merge this if there are no complaints.

@njsmith
Copy link
Member

njsmith commented Mar 14, 2016

👍

charris added a commit that referenced this pull request Mar 14, 2016
@charris charris merged commit 03e772a into numpy:master Mar 14, 2016
@charris charris deleted the tweak-corrcoef branch March 14, 2016 22:18
@charris charris removed this from the 1.11.0 release milestone Mar 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants