Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Inconsistent correlation between constant series (varies with number of rows) #37448

Closed
2 of 3 tasks
anders-kiaer opened this issue Oct 27, 2020 · 3 comments · Fixed by #37453
Closed
2 of 3 tasks
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@anders-kiaer
Copy link

anders-kiaer commented Oct 27, 2020


Code Sample, a copy-pastable example

import pandas as pd

for length in [2, 3, 5, 10, 20]:
    print(pd.DataFrame(length*[[0.42, 0.1]], columns=["A", "B"]).corr())

gives

    A   B
A NaN NaN
B NaN NaN
    A    B
A NaN  NaN
B NaN  1.0
     A   B
A  1.0 NaN
B  NaN NaN
     A    B
A  1.0 -1.0
B -1.0  1.0
     A    B
A  1.0  1.0
B  1.0  1.0

Problem description

Inconsistent output with slightly varying number of rows. Would expect correlation between series where at least one of them is constant, to be NaN.

This makes e.g. code dependent on dropna() usage after calculating corr() difficult/error prone, as behaviour is inconsistent.

Expected Output

Either consistent NaN output when calculating correlation with constant data, or a warning in pandas.DataFrame.corr documentation stating that returned correlation between constant series can be anything from [1.0, -1.0, NaN].

@phofl
Copy link
Member

phofl commented Oct 27, 2020

Hi, thanks for your report.

The inconsistent output is a floating number issue.

for length in [2, 3, 5, 10, 20]:
    print(pd.DataFrame(length*[[2, 1]], columns=["A", "B"]).corr())

returns constantly nan.

@phofl phofl added Numeric Operations Arithmetic, Comparison, and Logical operations and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 27, 2020
@anders-kiaer
Copy link
Author

anders-kiaer commented Oct 28, 2020

Thanks for quick reply @phofl!

These issues are perhaps somewhat related mathematically/conceptually: scipy/scipy#3728 numpy/numpy#9631

@phofl
Copy link
Member

phofl commented Oct 28, 2020

The underlying problem is the same, yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants