-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: minimum (gof) distance estimator #7412
Comments
for copula Weiß, G. Copula parameter estimation by maximum-likelihood and minimum-distance estimators: a simulation study. Comput Stat 26, 31–54 (2011). https://doi.org/10.1007/s00180-010-0203-7 I only looked at abstract. Sounds pretty negative on MD compared to MLE |
an application browsing some literature the following and related references Markatou, Marianthi, Ayanedranath Basu, and Bruce Lindsay. 1997. “Weighted Likelihood Estimating Equations: The Discrete Case with Applications to Logistic Regression.” Journal of Statistical Planning and Inference, Robust Statistics and Data Analysis, Part II, 57 (2): 215–32. https://doi.org/10.1016/S0378-3758(96)00045-6. (I downloaded this and added it to zotero in 2014 and in 2016 when working on robust estimators) aside: Hellinger distance is a density distance not edf/cdf distance. related literature: trimmed likelihood (downweighted to zero or dropped observations) |
this article looks, looks explicit enough to translate to code It looks like they use the distance for aggregate relative frequencies or probabilities, see equ. 3.2 to 3. 6, especially 3.3 empirical relative frequency. |
two likely candidates for implementation
no correction for Fisher consistency include, requires adjustment, bias correction of estimated parameters both are advertised as not needing a kernel density estimator, i.e. they use empirical density. For discrete models, Hellinger distance or power divergence can be computed directly on finite/countable support. maybe similarly for multinomial models, i.e. the chisquare distance or similar that I computed for ordered model, (similar to hosmer lemeshow test) (we might get measures for outlier identification based on the implied weights, even at MLE, even if we don't do robust estimation) |
aside AIC for MD very brief look at the following article Kurata, Sumito, and Etsuo Hamada. 2018. “A Robust Generalization and Asymptotic Properties of the Model Selection Criterion Family.” Communications in Statistics - Theory and Methods 47 (3): 532–47. https://doi.org/10.1080/03610926.2017.1307405. same authors, has comparison to other related IC versions |
aside Appendix in Luceno 2006 includes computational formulas for ks D, ad A2, cvm W2, and variations of AD with different weights in denominator. Luceño, Alberto. 2006. “Fitting the Generalized Pareto Distribution to Data Using Maximum Goodness-of-Fit Estimators.” Computational Statistics & Data Analysis 51 (2): 904–17. https://doi.org/10.1016/j.csda.2005.09.011. note: |
test statistic has weighted sum of chisquare distributions #3363 Basu, A., A. Mandal, N. Martin, and L. Pardo. 2013. “Testing Statistical Hypotheses Based on the Density Power Divergence.” Annals of the Institute of Statistical Mathematics 65 (2): 319–48. https://doi.org/10.1007/s10463-012-0372-y. |
maximizing Lq likelihood is off the table for now. The correction, parameter transformation for (fisher) consistency in the literature is almost non-existing. No (clear) description of the transformation for specific cases, and I don't see a general way of deriving or computing it. Also, reading some of the small print in Ferrari, Young 2010. It has assumption It might be possible to implement some special cases, maybe for Beta regression (*). But that might require some guess work how the parameter transformation for consistency is actually done. some later articles by other authors also don't say anything or much about reparameterization and (fisher) consistency. (I didn't really read any of those articles, but was looking specifically for fisher consistency or bias correction.) It is possible that the parameter transformation for canonical GLM/LEF is just theta = theta_e / q, but that's in comments, but I haven't seen any derivation or proof. Ribeiro, Terezinha K. A., and Silvia L. P. Ferrari. 2020. “Robust Estimation in Beta Regression via Maximum Lq-Likelihood.” ArXiv:2010.11368 [Stat], October. http://arxiv.org/abs/2010.11368. similar: density power divergence for Beta regression, (I didn't look much at it) For several models the fisher consistency term for density power divergence is analytically available. |
https://rdrr.io/rforge/distrMod/man/MDEstimator.html
https://stackoverflow.com/questions/67007706/how-to-calculate-efficient-minimum-distance-in-python-regression
edit https://rdrr.io/rforge/fitdistrplus/man/mgedist.html
edf based gof criteria for estimation, KS, CvM, AD following Luceno 2006
minimize gof statistic, Hellinger, AD, Cramer von Mises to estimate parametric models or parameters of distribution.
Those are robust to outliers.
I looked at Hellinger a long time ago, but no code in sandbox.
Only related in sandbox is mutual info, and that was more as correlation measure.
this might be useful if we want to estimate predictive distributions.
#7142
maybe:
For some distributions, MLE for parameter estimation had a bad reputation, e.g. genextreme, and alternative estimators become popular in some fields, e.g. minimum spacings.
(however, even in many of those case MLE works fine with some limitations and good starting values.)
One issue for this types of estimators, tests is whether they extend to conditional distribution, e.g. in regression setting with explanatory parameters.
related to gof testing in regression models:
#7154 gof EDF tests for regression
#5408
#3904
The text was updated successfully, but these errors were encountered: