Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a more helpful error message for invalid correlation methods in DataFrame.corr #22298

Merged
merged 8 commits into from
Aug 18, 2018

Conversation

dsaxton
Copy link
Member

@dsaxton dsaxton commented Aug 12, 2018

DataFrame.corr currently returns a KeyError for invalid correlation methods. The proposed change would instead return a ValueError with an error message reminding the user of the valid correlation methods.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.normal(size=(10, 3)))
df.corr(method="turkey")

@pep8speaks
Copy link

pep8speaks commented Aug 12, 2018

Hello @dsaxton! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 14, 2018 at 15:49 Hours UTC

@codecov
Copy link

codecov bot commented Aug 13, 2018

Codecov Report

Merging #22298 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22298      +/-   ##
==========================================
- Coverage   92.08%   92.05%   -0.03%     
==========================================
  Files         169      169              
  Lines       50706    50713       +7     
==========================================
- Hits        46691    46683       -8     
- Misses       4015     4030      +15
Flag Coverage Δ
#multiple 90.46% <100%> (-0.03%) ⬇️
#single 42.25% <0%> (-0.09%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 97.25% <100%> (+0.01%) ⬆️
pandas/core/series.py 93.73% <100%> (+0.01%) ⬆️
pandas/core/internals/blocks.py 93.83% <0%> (-0.81%) ⬇️
pandas/core/dtypes/missing.py 92.98% <0%> (-0.59%) ⬇️
pandas/util/testing.py 85.85% <0%> (-0.21%) ⬇️
pandas/core/generic.py 96.44% <0%> (-0.05%) ⬇️
pandas/plotting/_core.py 83.48% <0%> (ø) ⬆️
pandas/core/indexes/multi.py 95.41% <0%> (+0.07%) ⬆️
pandas/core/ops.py 96.71% <0%> (+0.14%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 601d71f...5ba0bc8. Read the comment docs.

@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Aug 13, 2018
@TomAugspurger
Copy link
Contributor

Thanks. Could you add a release note in 0.24.0, and a test to ensure that code is hit for bad inputs?

@gfyoung gfyoung added the Error Reporting Incorrect or improved errors from pandas label Aug 13, 2018
@dsaxton
Copy link
Member Author

dsaxton commented Aug 13, 2018

I'm working on some additional modifications to the DataFrame correlation methods to address some potential bugs I found and close another existing issue, is it okay if I close this pull request and reopen another with all the changes?

@jreback
Copy link
Contributor

jreback commented Aug 13, 2018

no pls don’t

put independent changes in independent PRs

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also test / fix this on Series as well.

@@ -6682,6 +6682,9 @@ def corr(self, method='pearson', min_periods=1):
c = corrf(ac, bc)
correl[i, j] = c
correl[j, i] = c
else:
raise ValueError("method must be either 'pearson', "
"'spearman' or 'kendall'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add ....{method} was supplied'.format(method=method)

@@ -130,6 +130,11 @@ def test_corr_cov_independent_index_column(self):
assert result.index is not result.columns
assert result.index.equals(result.columns)

def test_corr_invalid_method(self):
df = pd.DataFrame(np.random.normal(size=(10, 2)))
with pytest.raises(ValueError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check this with an assert_raises_regex (e.g. match the error)

add the gh issue number as a comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Is the convention here to use the most specific regex that matches the error message, or just one that is reasonably specific?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reasonable is fine

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments. ping on green.

if method in ['pearson', 'spearman', 'kendall']:
return nanops.nancorr(this.values, other.values, method=method,
min_periods=min_periods)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need the else here

@@ -130,6 +130,13 @@ def test_corr_cov_independent_index_column(self):
assert result.index is not result.columns
assert result.index.equals(result.columns)

def test_corr_invalid_method(self):
df = pd.DataFrame(np.random.normal(size=(10, 2)))
pttrn = ("method must be either 'pearson', 'spearman', "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pttrn -> msg

@@ -130,6 +130,13 @@ def test_corr_cov_independent_index_column(self):
assert result.index is not result.columns
assert result.index.equals(result.columns)

def test_corr_invalid_method(self):
df = pd.DataFrame(np.random.normal(size=(10, 2)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the gh issue number as a comment

@@ -778,6 +778,14 @@ def test_corr_rank(self):
tm.assert_almost_equal(A.corr(B, method='kendall'), kexp)
tm.assert_almost_equal(A.corr(B, method='spearman'), sexp)

def test_corr_invalid_method(self):
s1 = pd.Series(np.random.randn(10))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

def test_corr_invalid_method(self):
s1 = pd.Series(np.random.randn(10))
s2 = pd.Series(np.random.randn(10))
pttrn = ("method must be either 'pearson', 'spearman', "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pttrn -> msg

@@ -468,6 +468,7 @@ Other API Changes
- :meth:`PeriodIndex.tz_convert` and :meth:`PeriodIndex.tz_localize` have been removed (:issue:`21781`)
- :class:`Index` subtraction will attempt to operate element-wise instead of raising ``TypeError`` (:issue:`19369`)
- :class:`pandas.io.formats.style.Styler` supports a ``number-format`` property when using :meth:`~pandas.io.formats.style.Styler.to_excel` (:issue:`22015`)
- :meth:`DataFrame.corr` now raises a ``ValueError`` instead of a ``KeyError`` when supplied with an invalid method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue number

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the PR number as we don't have an issue number

@dsaxton
Copy link
Member Author

dsaxton commented Aug 15, 2018

@jreback Looks like the tests passed. Let me know if there are any other changes that should be made, if not, thanks for all your help.

@TomAugspurger TomAugspurger merged commit 92dcf5f into pandas-dev:master Aug 18, 2018
@TomAugspurger
Copy link
Contributor

Thanks @dsaxton!

@dsaxton
Copy link
Member Author

dsaxton commented Aug 18, 2018

@TomAugspurger Happy to help!

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants