Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SUMM/ENH: inference for variance, variance estimation #8261

Open
josef-pkt opened this issue May 2, 2022 · 6 comments
Open

SUMM/ENH: inference for variance, variance estimation #8261

josef-pkt opened this issue May 2, 2022 · 6 comments

Comments

@josef-pkt
Copy link
Member

josef-pkt commented May 2, 2022

I don't know where to put hypothesis tests and confidence intervals for variance and standard deviation.
It should have the usual test_, confint_, tost_, power functions for one and two sample cases as in rates and proportions.

update: preliminary decision on module name variance_moments I didn't really like any of the shorter names

oneway case is handled in oneway with standard tests. This is mainly equality of scale and dispersion measure, not necessarily of variance.

I want something specific to variance (and not other dispersion measures), that can be for example used in functions like zscore standardized mean or coefficient of variation. one usage MOVER confint for those.

related issue: where do we put skew and kurtosis functions. (e.g. extensions to skew and kurtosis tests, we need one-sided alternatives)
can we have a module for variance and higher moments.

related:
#2765 organizing cov and corr, which is now in stats.multivariate
(outlier) robust scale is in robust and some open PRs, note, those can only be converted to variance measure under specific distributionall assumptions) related robust definitions of skew and kurtosis #6790 and in stattools
weightstats has descriptive statistics in the class, but only inference on one and two sample means.

inferential statistics for a single correlation coefficient
I'm not sure where that goes

specific to variance

I would like the kurtosis corrected version of Bonett, and similar variation on it.
supporting code will need kurtosis estimators
(aside: standard kurtosis estimate is downward (!) biased for heavy tailed distributions)

(I didn't see any reference that uses score confidence intervals or score test for variance. Score test for one variance should be easy. )

Stata also has bonnet confint, minitab has a good working paper for bonnet, and a kurtosis test

other ideas:
Can we use transformation to normality #3224 of underlying data to get a better variance inference?, e.g. for data that are closer to log-nromal (I have not looked for references for this)

Can we use a "working 4th moment or kurtosis function" with a 4th moment robust variance estimate to improve inference?. e.g. based on GLM Gamma (corresponds to quadratic 4th moment if endog is variance/squared residuals

Right now, I mainly want Bonnet test which looks easy to implement but needs a location.

related multivariate version, covariance, correlation matrix PR #6696
e.g.
https://github.com/statsmodels/statsmodels/pull/6696/files#diff-fc0daeb7f971edb0e8a8866ed966f86ced2c81767eb394f3a81a6a4761184595R180
has option for given kurt(osist) and general, the latter uses 4th moment estimate for cov_cov
I doubt those functions work for the univariate case, i.e. cov is a single variance.

@josef-pkt
Copy link
Member Author

after reading around, it looks like we want to have at least 3 to 5 kurtosis estimators for use in the variance confint.

standard kurtosis estimate is unbiased
others are biased with min MSE
and the best candidate uses trimmed mean in 4th sentral moment estimate (but standard mean in variance estimate (following original Benett. One article trimms only the upper tail for heavily skewed distributions.

AFAICS (not reading everything) Minitab uses SJ test to distinguish sample by Low, Medium, and High kurtosis, but only uses it to add warnings and minimum sample size recommendations to the Bonett test.

default method for confint_variance ?
maybe no default, method required initially.
My guess is that the best default would be conditional on skew and kurtosis, i.e. method="auto"

Benett is a good default except for highly skewed or/and heavy tailed distributions. In those cases some adjustments or transformation would be more accurate

dataplot https://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/sdconfli.htm
adds option for (using median in 4th moment estimate and using t instead of normal distribution)
Niwitpong and Kirdwichai (2008), "Adjusted Bonett Confidence Interval for Standard Deviation of Non-Normal Distributions", Thailand Statistician, Vol. 6, No. 1, pp. 1-6.
but that one is very consevative overal and still somewhat conservative for heavytailed skewed distributions

two sample comparison
The following uses MOVER confint with Bennet univariate confints for linear combination of variances following the methods of Donner and Zhu et al. There is another article like that. (We will get it almost for free when we have the univariate confint)

Suwan, Sirima, and Sa-aat Niwitpong. 2013. “Interval Estimation for a Linear Function of Variances of Nonnormal Distributions That Utilize the Kurtosis.” Applied Mathematical Sciences 7: 4909–18. https://doi.org/10.12988/ams.2013.37366.

@josef-pkt
Copy link
Member Author

found an article that makes connection to multivariate case by Yuan Bentler #4144

Yuan, Ke-Hai, Peter M. Bentler, and Wei Zhang. 2005. “The Effect of Skewness and Kurtosis on Mean and Covariance Structure Analysis: The Univariate Case and Its Multivariate Implication.” Sociological Methods & Research 34 (2): 240–58.
https://doi.org/10.1177/0049124105280200.

looks good based on very brief skimming
uses chisquare with statistic adjusted for kurtosis
I think that was close to my initial readings for kurtosis adjustments,but this article has largely the univariate case

@josef-pkt
Copy link
Member Author

my current selected reading list
there are more but I haven't read or looked closely at those or didn't keep track of them
(80 articles since I moved here from proportion)

one sample variance or standard deviation

Bonett, Douglas G. 2006. “Approximate Confidence Interval for Standard Deviation of Nonnormal Distributions.” Computational Statistics & Data Analysis 50 (3): 775–82. https://doi.org/10.1016/j.csda.2004.10.003.

Minitab documentation, white paper on 1-Sample Standard Deviation Test

Searls, Donald T., and Pichai Intarapanich. 1990. “A Note on an Estimator for the Variance That Utilizes the Kurtosis.” The American Statistician 44 (4): 295–96. https://doi.org/10.1080/00031305.1990.10475745.

Akyüz, Hayriye Esra, and Hamza Gamgam. 2017. “Interval Estimation for Nonnormal Population Variance with Kurtosis Coeffificient Based on Trimmed Mean.” Turkiye Klinikleri Journal of Biostatistics 9 (3): 213–21. https://doi.org/10.5336/biostatic.2017-57348.

Banik, Shipra, Ahmed N. Albatineh, Moustafa Omar Ahmed Abu-Shawiesh, and B. M. Golam Kibria. 2014. “Estimating the Population Standard Deviation with Confidence Interval: A Simulation Study under Skewed and Symmetric Conditions.” International Journal of Statistics in Medical Research 3 (4): 356–67. https://doi.org/10.6000/1929-6029.2014.03.04.4.

Burch, Brent D. 2014. “Estimating Kurtosis and Confidence Intervals for the Variance under Nonnormality.” Journal of Statistical Computation and Simulation 84 (12): 2710–20. https://doi.org/10.1080/00949655.2013.840628.

———. 2017. “Distribution-Dependent and Distribution-Free Confidence Intervals for the Variance.” Statistical Methods & Applications 26 (4): 629–48. https://doi.org/10.1007/s10260-017-0385-z.

Niwitpong, Sa-aat, and Pianpool Kirdwichai. 2008. “Adjusted Bonett Confidence Interval for Standard Deviation of Non-Normal Distributions.” Thailand Statistician 6 (1): 1–16.

Wencheko, Eshetu, and Honest W. Chipoyera. 2009. “Estimation of the Variance When Kurtosis Is Known.” Statistical Papers 50 (3): 455–64. https://doi.org/10.1007/s00362-007-0084-1.

Yuan, Ke-Hai, Peter M. Bentler, and Wei Zhang. 2005. “The Effect of Skewness and Kurtosis on Mean and Covariance Structure Analysis: The Univariate Case and Its Multivariate Implication.” Sociological Methods & Research 34 (2): 240–58.
https://doi.org/10.1177/0049124105280200.
(different approach for the others)

two samples

I didn't check the references systematically yet

Bonett, Douglas G. 2006. “Robust Confidence Interval for a Ratio of Standard Deviations.” Applied Psychological Measurement 30 (5): 432–39. https://doi.org/10.1177/0146621605279551.

Jan, Show-Li, and Gwowen Shieh. 2022. “A Comparative Study of TOST and UMPT Procedures for Evaluating Dispersion Equivalence.” Statistics in Biopharmaceutical Research 14 (2): 162–67. https://doi.org/10.1080/19466315.2020.1821762.

Shoemaker, Lewis H. 2003. “Fixing the F Test for Equal Variances.” The American Statistician 57 (2): 105–14. https://doi.org/10.1198/0003130031441.

Suwan, Sirima, and Sa-aat Niwitpong. 2013. “Interval Estimation for a Linear Function of Variances of Nonnormal Distributions That Utilize the Kurtosis.” Applied Mathematical Sciences 7: 4909–18. https://doi.org/10.12988/ams.2013.37366.

estimating kurtosis and skew

Joanes, D. N., and C. A. Gill. 1998. “Comparing Measures of Sample Skewness and Kurtosis.” Journal of the Royal Statistical Society: Series D (The Statistician) 47 (1): 183–89. https://doi.org/10.1111/1467-9884.00122.

An, Lihua, and S. Ejaz Ahmed. 2008. “Improving the Performance of Kurtosis Estimator.” Computational Statistics & Data Analysis 52 (5): 2669–81. https://doi.org/10.1016/j.csda.2007.09.024.

Guo, Yawen, and B. M. Golam Kibria. 2017. “Testing the Population Kurtosis Parameter: An Empirical Study with Applications.” International Journal of Computational and Theoretical Statistics 04 (01): 45–63. https://doi.org/10.12785/IJCTS/040104.

others

(I didn't read yet in this direction)

Bonett also has articles on regression residuals, and other spread/dispersion measures like MAD
e.g.

Bonett, Douglas G. 2005a. “Confidence Interval for Residual Mean Absolute Deviation in Regression Models.” Journal of Statistical Computation and Simulation 75 (8): 673–78. https://doi.org/10.1080/00949650412331299148.

———. 2005b. “Robust Confidence Interval for a Residual Standard Deviation.” Journal of Applied Statistics 32 (10): 1089–94. https://doi.org/10.1080/02664760500165339.

Bonett, Douglas G, and Edith Seier. 2003. “Confidence Intervals for Mean Absolute Deviations.” The American Statistician 57 (4): 233–36. https://doi.org/10.1198/0003130032323.

@josef-pkt
Copy link
Member Author

I just realized that the references use different "contrasts" for the variance confidence interval in the univariate case

all based on distribution of sum of squares S = (nobs - ddof) * var

  • Bonett uses log(S / sigma2)
  • standard tests use S / sigma2 ~ as chi2 or as normal N(1, V). In this case confint of sigma has critical values in the denominator
  • usual normal S - sigma2 ~ N(0, V). In this case confint is the usual with critical values in half length

@josef-pkt
Copy link
Member Author

It looks like Benett is mainly available in Minitab, minitab has one sample and two sample ratio version
Minitab docs describe both pooled and separate kurtosis estimates. Bonett uses only pooled.
They obtain the 2-sample hypothesis test by inverting the Bonett confidence interval

R package asympTest has kurtosis corrected inference for variance.
details on how they compute standard errors are only available in the working paper version, Rjournal article is shortened, docstrings don't contain details
https://arxiv.org/abs/0902.0506?context=math.ST

table 2 footer contains formula for variance of variance,
Does not estimate kurtosis directly as in Benett, estimates 4th-moment minus 2nd moment (kind of).
(I don't remember in which article I have seen that one used. Bruch ? Yuan/Bentley ?)

@josef-pkt
Copy link
Member Author

check variance of log-normal distribution
scipy/scipy#10801

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant