BUG: Fix bug for kendall corr when in DF num and bool #11560

Merged
merged 1 commit into from Nov 13, 2015

Conversation

Projects
None yet
2 participants
Contributor

roman-khomenko commented Nov 9, 2015

Hi,

  1. When DataFrame contain Numerics and Booleans, than numpy will have type object,
    so np.isfinite(mat) will raise Exception.

I've fixed this by using com._ensure_float64 like for other correlation.

  1. I've skipped half of computation, because correlation is symmetrical

@jreback jreback commented on an outdated diff Nov 9, 2015

pandas/core/frame.py
corrf = nanops.get_corr_func(method)
K = len(cols)
correl = np.empty((K, K), dtype=float)
mask = np.isfinite(mat)
for i, ac in enumerate(mat):
for j, bc in enumerate(mat):
- valid = mask[i] & mask[j]
- if valid.sum() < min_periods:
- c = NA
- elif not valid.all():
- c = corrf(ac[valid], bc[valid])
+ if i > j:
+ continue
+ elif i == j:
+ correl[i, i] = 1.
@jreback

jreback Nov 9, 2015

Contributor

this is not correct if all of the values are NaN, then the result is Nan, so you can check that case or let this fall thru. Pls test for this (if its not done already)

@jreback jreback commented on an outdated diff Nov 9, 2015

pandas/tests/test_frame.py
@@ -8028,6 +8028,14 @@ def test_corr_int(self):
df3.cov()
df3.corr()
+ def test_corr_int_and_boolean(self):
+ # when dtypes of pandas series are different
+ # then ndarray will have dtype=object,
+ # so it need to be properly handled
+ df = DataFrame({"a": [True, False], "b": [1, 0]})
+ for meth in ['pearson', 'kendall', 'spearman']:
+ df.corr(meth)
@jreback

jreback Nov 9, 2015

Contributor

you need to test if you are getting the correct values.

jreback added the Numeric label Nov 9, 2015

Contributor

roman-khomenko commented Nov 9, 2015

@jreback Jeff,
I've fixed handling NaN and added test for that.

@jreback jreback commented on an outdated diff Nov 10, 2015

pandas/tests/test_frame.py
@@ -8028,6 +8030,19 @@ def test_corr_int(self):
df3.cov()
df3.corr()
+ def test_corr_int_and_boolean(self):
+ tm._skip_if_no_scipy()
+
+ # when dtypes of pandas series are different
+ # then ndarray will have dtype=object,
+ # so it need to be properly handled
+ df = DataFrame({"a": [True, False], "b": [1, 0]})
+
+ expected = np.ones((2, 2), dtype=np.float64)
@jreback

jreback Nov 10, 2015

Contributor

construct an actual DataFrame here and use assert_frame_equal for comparison

Contributor

jreback commented Nov 10, 2015

couple of comments. pls add a whatsnew (put in bug fixes), use this PR number as the issue number. squash, then ping when green.

jreback added this to the 0.17.1 milestone Nov 10, 2015

Contributor

roman-khomenko commented Nov 10, 2015

@jreback Done

@jreback jreback added a commit that referenced this pull request Nov 13, 2015

@jreback jreback Merge pull request #11560 from roman-khomenko/roman-khomenko/fix-kend…
…all-for-num-and-bool

BUG: Fix bug for kendall corr when in DF num and bool
49cd89b

@jreback jreback merged commit 49cd89b into pandas-dev:master Nov 13, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Contributor

jreback commented Nov 13, 2015

thanks!

roman-khomenko deleted the roman-khomenko:roman-khomenko/fix-kendall-for-num-and-bool branch Nov 13, 2015

Contributor

roman-khomenko commented Nov 13, 2015

@jreback Thank you for pandas!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment