# Unbiased estimates of variance/ std deviation (Trac #502) #1100

Closed
opened this Issue Oct 19, 2012 · 8 comments

Projects
None yet
1 participant

### numpy-gitbot commented Oct 19, 2012

 Original ticket http://projects.scipy.org/numpy/ticket/502 on 2007-04-18 by @pierregm, assigned to unknown. Unbiased estimates of the variance and standard deviation are used far more often than their biased counterparts. Currently, the var/std methods/functions return biased estimates. Unbiased estimates can be obtained simply by multiplying the variance by n/float(n-1) (where n is the size of the array along a particular axis). This extra step (and the test on n it implies) becomes quickly tedious when used repeatedly. I suggest the introduction of 2 new methods (and the corresponding functions), varu and stdu, that would give direct access to the unbiased estimates.

### numpy-gitbot commented Oct 19, 2012

 Milestone changed to `1.1` by @alberts on 2007-05-12

### numpy-gitbot commented Oct 19, 2012

 @charris wrote on 2007-05-13 There have been several discussions of this, and, IIRC, the upshot was that the biased estimates were preferred for numerical reasons. People want the unbiased estimate ofter enough that I think we should add a flag in the call, something like bias={0,1}.

### numpy-gitbot commented Oct 19, 2012

 @rkern wrote on 2007-05-13 My preference is to not use the term "bias," since it's wrong. The standard definition of bias, E[X-Xtrue], is inappropriate for quantities like variance and standard deviation. Notably, the square root of the N-1, "unbiased" variance is not an unbiased estimate of the standard deviation. Using an appropriate definition of bias, E[log(X/Xtrue)], does give coherent estimates of variance and standard deviation. However, the factor is N-2, not N-1. My preference is to add a parameter that is subtracted from N rather than a flag that switches between N and N-1. The shortest name a I can come up with is a bit obscure, though, "ddof" for "change in degrees of freedom".

### numpy-gitbot commented Oct 19, 2012

 Attachment added by trac user zouave on 2007-07-12: diff-std

### numpy-gitbot commented Oct 19, 2012

 trac user zouave wrote on 2007-07-12 I've just posted a patch that adds a "ddof" parameter as described above (by rkern), as well as a "pop_mean" parameter that allows users to supply an out-of-sample population mean to be used instead of calculating the in-sample mean. It affects ndarrays, masked arrays and matrices; doesn't affect fromnumeric.py stuff. The appropriate on-line doc is also updated (to the best of my knowledge). Any comments, questions, suggestions [or insults] are more than welcome. Honestly, my hope is to have this fixed by 1.0.4 (and that this release will come soon).

### numpy-gitbot commented Oct 19, 2012

 @mdehoon wrote on 2008-02-20 charris wrote: something like bias={0,1}. rkern wrote: My preference is to not use the term "bias," since it's wrong. One solution is to use ml={1,0}, with ml standing for maximum likelihood. Dividing by N gives the maximum likelihood estimate, and this is true both for the variance and the standard deviation.

### numpy-gitbot commented Oct 19, 2012

 @teoliphant wrote on 2008-03-07 In r4853 the ddof feature was added to var and std. So, this ticket can be closed.

### numpy-gitbot commented Oct 19, 2012

 Milestone changed to `1.0.5` by @stefanv on 2008-03-11