Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in spearmanr function to calculate correlation (Trac #1852) #2371

Closed
scipy-gitbot opened this issue Apr 25, 2013 · 2 comments
Closed

Bug in spearmanr function to calculate correlation (Trac #1852) #2371

scipy-gitbot opened this issue Apr 25, 2013 · 2 comments
Labels
invalid Can't be reproduced, or is not actionable Migrated from Trac scipy.stats

Comments

@scipy-gitbot
Copy link

Original ticket http://projects.scipy.org/scipy/ticket/1852 on 2013-02-26 by trac user damani, assigned to unknown.

Encountering a problem with scipy v.11 library in python2.7 which gives
spearmanrcorrel([1,2,3,4,5],[5,6,7,8,7]) = 0.8207 while scipy v.6 in
python2.5 gives spearmanr([1,2,3,4,5],[5,6,7,8,7]) = 0.825(which is
correct according to spearman correlation formula).

The spearman correlation for [1,2,3,4,5],[5,6,7,8,7] calculated online
according to formula available at :
https://statistics.laerd.com/calculators/spearmans-rank-order-correlation-calculator-1.php
, also gives 0.825.

--The definition of spearmanr function in Scipy v.11 is given at :
http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.spearmanr.html#scipy.stats.spearmanr.

Maybe related to Ticket gh-1936.

@scipy-gitbot
Copy link
Author

@josef-pkt wrote on 2013-02-26

If I remember correctly, the change comes from fixing the tie handling. (I don't have R available right now to verify).

The online calculator prints this on top

Please find below the calculation of Spearman's coefficient along with all the working!

Warning: We have detected that you have ties in your data. At undergraduate level, you may be required to know that the following formula should not be used. However, for school work it is probably acceptable - consult your own work/syllabus/teacher to be sure. 

Replacing on of the 7 by something slightly different, I get the same result.

@scipy-gitbot
Copy link
Author

@WarrenWeckesser wrote on 2013-02-26

This is not a bug. As josef pointed out, in the example spearmanr([1,2,3,4,5], [5,6,7,8,7]), there is a tie in the second set of number (7 occurs twice). The value 0.825 comes from using the formula that assumes there are no ties.

See the wikipedia article http://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient; in particular, note that there are two formulas given. The second, expressed in terms of d_i = x_i - y_i, is only valid when there are no ties.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid Can't be reproduced, or is not actionable Migrated from Trac scipy.stats
Projects
None yet
Development

No branches or pull requests

1 participant