standard error calculation inconsistent between 'stats' and 'mstats' #3086

Closed
bulli92 opened this Issue Nov 20, 2013 · 4 comments

Comments

Projects
None yet
4 participants
Contributor

bulli92 commented Nov 20, 2013

The routine sem() in stats and mstats is inconsistent in several ways:

  1. arguments are different
    sem(a, axis=0, ddof=1) for stats
    sem(a, axis=0): for mstats
  2. normalization is different
    / sqrt(n-1) in mstats
    / sqrt(n) in stats
Owner

rgommers commented Nov 24, 2013

The mstats version can get ddof=1 added. The right denominator is sqrt(n), so that's a bug in the mstats version.

Member

josef-pkt commented Nov 24, 2013

I think these two cancel, they should produce the same result,
1/n * 1/(n-1) can be calculated in two ways
(based on the above description not source)

Owner

rgommers commented Nov 24, 2013

Ah, because of different ddof. You're right. So not a bug right now, but should still be made consistent.

Contributor

shaunagm commented Jan 1, 2014

I'd like to resolve this issue. When you say it should be made consistent, should stats be changed to match mstats, or mstats be changed to match stats?

My understanding is that, with stats, the degree of freedom is being passed into the np.std calculation, so you get:

sqrt(sum-of-squared-deviation/n-1)/sqrt(n)

Whereas with mstats, the degree of freedom is not being passed into the np.std calculation, so you get:

sqrt(sum-of-squared-deviation/n)/sqrt(n-1)

My understanding is that the std calculation is more commonly where you want to incorporate degrees of freedom. The stats method also allows the user to override the 1 degree of freedom, whereas mstats hardcodes it in. So I will go ahead and try to change mstats to stats, but let me know if I should do otherwise!

@rgommers rgommers closed this in 39af4d1 Jan 3, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment