Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unrelated column affects the result of Kruskal test #74

Closed
cbedetti opened this issue Nov 29, 2019 · 1 comment
Closed

Unrelated column affects the result of Kruskal test #74

cbedetti opened this issue Nov 29, 2019 · 1 comment
Assignees
Labels
bug 💥

Comments

@cbedetti
Copy link

@cbedetti cbedetti commented Nov 29, 2019

Here is an example

>>> import numpy
>>> import pingouin
>>>
>>> df = pingouin.read_dataset('anova')
>>> pingouin.kruskal(data=df, dv='Pain threshold', between='Hair color')
             Source  ddof1       H     p-unc
Kruskal  Hair color      3  10.589  0.014172
>>>
>>> unrelated = list(range(len(df)))
>>> unrelated[10] = numpy.nan
>>> df["unrelated"] = unrelated
>>> pingouin.kruskal(data=df, dv='Pain threshold', between='Hair color')
             Source  ddof1      H     p-unc
Kruskal  Hair color      3  9.899  0.019444

The documentation says "NaN values are automatically removed" but I don't think it should remove lines with NaN values in unrelated columns.

I don't know if other tests behave in the same way.

@raphaelvallat raphaelvallat self-assigned this Nov 29, 2019
@raphaelvallat raphaelvallat added the bug 💥 label Nov 29, 2019
@raphaelvallat
Copy link
Owner

@raphaelvallat raphaelvallat commented Nov 29, 2019

You are absolutely right, thanks for pointing that out @cbedetti. I just fixed that in commit 48565ef (develop branch), and this will be included in the next stable release (December 2019).

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 💥
Projects
None yet
Development

No branches or pull requests

2 participants