Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standard error calculation inconsistent between 'stats' and 'mstats' #3086

Closed
bulli92 opened this issue Nov 20, 2013 · 4 comments
Closed

standard error calculation inconsistent between 'stats' and 'mstats' #3086

bulli92 opened this issue Nov 20, 2013 · 4 comments
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats
Milestone

Comments

@bulli92
Copy link
Contributor

bulli92 commented Nov 20, 2013

The routine sem() in stats and mstats is inconsistent in several ways:

  1. arguments are different
    sem(a, axis=0, ddof=1) for stats
    sem(a, axis=0): for mstats
  2. normalization is different
    / sqrt(n-1) in mstats
    / sqrt(n) in stats
@rgommers
Copy link
Member

The mstats version can get ddof=1 added. The right denominator is sqrt(n), so that's a bug in the mstats version.

@josef-pkt
Copy link
Member

I think these two cancel, they should produce the same result,
1/n * 1/(n-1) can be calculated in two ways
(based on the above description not source)

@rgommers
Copy link
Member

Ah, because of different ddof. You're right. So not a bug right now, but should still be made consistent.

@shaunagm
Copy link
Contributor

shaunagm commented Jan 1, 2014

I'd like to resolve this issue. When you say it should be made consistent, should stats be changed to match mstats, or mstats be changed to match stats?

My understanding is that, with stats, the degree of freedom is being passed into the np.std calculation, so you get:

sqrt(sum-of-squared-deviation/n-1)/sqrt(n)

Whereas with mstats, the degree of freedom is not being passed into the np.std calculation, so you get:

sqrt(sum-of-squared-deviation/n)/sqrt(n-1)

My understanding is that the std calculation is more commonly where you want to incorporate degrees of freedom. The stats method also allows the user to override the 1 degree of freedom, whereas mstats hardcodes it in. So I will go ahead and try to change mstats to stats, but let me know if I should do otherwise!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants