Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistics Review: kruskal (Trac #112) #639

Closed
scipy-gitbot opened this issue Apr 25, 2013 · 3 comments · Fixed by #5126
Closed

Statistics Review: kruskal (Trac #112) #639

scipy-gitbot opened this issue Apr 25, 2013 · 3 comments · Fixed by #5126

Comments

@scipy-gitbot
Copy link

Original ticket http://projects.scipy.org/scipy/ticket/112 on 2006-04-02 by @rkern, assigned to @rkern.

The function kruskal in file source:trunk/Lib/stats/stats.py needs review.

Please look over the StatisticsReview guidelines and add your comments below.

@aeklant
Copy link
Contributor

aeklant commented Jul 1, 2015

@rgommers ping

To Do:

  • Add examples
  • Perhaps add more references
  • Deal with empty input. It currently returns (nan, nan) or ZeroDivisonError. What would be the proper result here?
  • nan input gives inconsistent result, sometimes (nan, nan) and sometimes other results like:
In [31]: stats.kruskal([np.nan], [np.nan])
Out[31]: KruskalResult(statistic=1.0, pvalue=0.31731050786291404)

which looks like a job for the nan_check policy.

@rgommers
Copy link
Member

For the docstring also:

Verifying the results against R may be useful (and if it matches, add a test or note about that match to an existing test). http://www.r-tutor.com/elementary-statistics/non-parametric-methods/kruskal-wallis-test

Nan plan sounds good. Empty input gives KruskalResult(statistic=nan, pvalue=nan)which sounds right to me. The ZeroDivisionError needs fixing.

The code could do with a minor cleanup:

  • some comments (all except the one about Compute sum^2/n) are too obvious and can be removed
  • na is a terrible variable name, can change to num_groups.

aeklant added a commit to aeklant/scipy that referenced this issue Aug 8, 2015
Added empty input handling and tidied up the code.
Documentation was improved and examples added.

closes scipygh-639
@rgommers
Copy link
Member

rgommers commented Aug 9, 2015

Some discussion on exact calculation of p-values for small sample sizes can be found in gh-87. No need to do anything with that discussion/code, but good to link these PRs together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants