ENH: minimum (gof) distance estimator #7412

josef-pkt · 2021-04-10T13:45:18Z

https://rdrr.io/rforge/distrMod/man/MDEstimator.html
https://stackoverflow.com/questions/67007706/how-to-calculate-efficient-minimum-distance-in-python-regression

edit https://rdrr.io/rforge/fitdistrplus/man/mgedist.html
edf based gof criteria for estimation, KS, CvM, AD following Luceno 2006

minimize gof statistic, Hellinger, AD, Cramer von Mises to estimate parametric models or parameters of distribution.

Those are robust to outliers.

I looked at Hellinger a long time ago, but no code in sandbox.
Only related in sandbox is mutual info, and that was more as correlation measure.

this might be useful if we want to estimate predictive distributions.
#7142

maybe:
For some distributions, MLE for parameter estimation had a bad reputation, e.g. genextreme, and alternative estimators become popular in some fields, e.g. minimum spacings.
(however, even in many of those case MLE works fine with some limitations and good starting values.)

One issue for this types of estimators, tests is whether they extend to conditional distribution, e.g. in regression setting with explanatory parameters.
related to gof testing in regression models:
#7154 gof EDF tests for regression
#5408
#3904

josef-pkt · 2021-04-10T14:25:01Z

for copula

Weiß, G. Copula parameter estimation by maximum-likelihood and minimum-distance estimators: a simulation study. Comput Stat 26, 31–54 (2011). https://doi.org/10.1007/s00180-010-0203-7

I only looked at abstract. Sounds pretty negative on MD compared to MLE

josef-pkt · 2021-04-10T20:50:24Z

an application browsing some literature
weighted likelihood representation for some minimum distance estimators for outlier robust estimation including discrete models
#4266

the following and related references

Markatou, Marianthi, Ayanedranath Basu, and Bruce Lindsay. 1997. “Weighted Likelihood Estimating Equations: The Discrete Case with Applications to Logistic Regression.” Journal of Statistical Planning and Inference, Robust Statistics and Data Analysis, Part II, 57 (2): 215–32. https://doi.org/10.1016/S0378-3758(96)00045-6.

(I downloaded this and added it to zotero in 2014 and in 2016 when working on robust estimators)

aside: Hellinger distance is a density distance not edf/cdf distance.

related literature: trimmed likelihood (downweighted to zero or dropped observations)

josef-pkt · 2021-04-11T18:22:24Z

this article looks, looks explicit enough to translate to code
Lu, Zudi, Yer Van Hui, and Andy H. Lee. 2003. “Minimum Hellinger Distance Estimation for Finite Mixtures of Poisson Regression Models and Its Applications.” Biometrics 59 (4): 1016–26. https://doi.org/10.1111/j.0006-341X.2003.00117.x.

It looks like they use the distance for aggregate relative frequencies or probabilities, see equ. 3.2 to 3. 6, especially 3.3 empirical relative frequency.
AFAIR, that is also what we use in Vuong test for Poisson, and what I used as diagnostic (plot).
Hellinger uses sqrt.

josef-pkt · 2021-04-19T18:21:42Z

two likely candidates for implementation

"density power divergence approach"
includes bias correction term for fisher consistency of estimating equations, in literature available for Logit and Poisson, and gaussian
references Basu, Gosh including article on GLM regression
"Maximum Lq-likelihood Estimation"

no correction for Fisher consistency include, requires adjustment, bias correction of estimated parameters

both are advertised as not needing a kernel density estimator, i.e. they use empirical density.

For discrete models, Hellinger distance or power divergence can be computed directly on finite/countable support.
-> check for predicted aggregate distribution or frequency for count models as in my ZI count notebook, as measure of fit when comparing models. I only used squared (chisquare ?) difference in probs, AFAIR.

maybe similarly for multinomial models, i.e. the chisquare distance or similar that I computed for ordered model, (similar to hosmer lemeshow test)
This would be more a gof statistic used for model selection, and not for estimating parameters of a given model.

(we might get measures for outlier identification based on the implied weights, even at MLE, even if we don't do robust estimation)

josef-pkt · 2021-04-19T18:59:55Z

aside AIC for MD

very brief look at the following article
They have a version of AIC that uses ratio tr(inv(J) * K) which looks analogous to GAIC and TIC with information matrix ratio
(definition 1 and following parts)

Kurata, Sumito, and Etsuo Hamada. 2018. “A Robust Generalization and Asymptotic Properties of the Model Selection Criterion Family.” Communications in Statistics - Theory and Methods 47 (3): 532–47. https://doi.org/10.1080/03610926.2017.1307405.

same authors, has comparison to other related IC versions
Kurata, Sumito, and Etsuo Hamada. 2020. “On the Consistency and the Robustness in Model Selection Criteria.” Communications in Statistics - Theory and Methods 49 (21): 5175–95. https://doi.org/10.1080/03610926.2019.1615093.

josef-pkt · 2021-04-21T19:37:37Z

aside
computational formulas for EDF based gof statistics are in GOF class in statsmodels.sandbox.distributions.gof_new
GOF class includes d, d+, d-, a2, w2, and u2, v, a. (I don't remember what A is, likely a variant of ad A2. needs docstrings)

Appendix in Luceno 2006 includes computational formulas for ks D, ad A2, cvm W2, and variations of AD with different weights in denominator.

Luceño, Alberto. 2006. “Fitting the Generalized Pareto Distribution to Data Using Maximum Goodness-of-Fit Estimators.” Computational Statistics & Data Analysis 51 (2): 904–17. https://doi.org/10.1016/j.csda.2005.09.011.

note:
GOF class computes cdf values in __init__ and assumes iid observations.
This needs extension to independent but not identically distributed observations, e.g. rv that depend on explanatory variables.
Also: creating random samples, rvs inside class is not really appropriate and should be removed. This was written in analogy to scipy ks test. However, we might need rvs for the bootstrap case as in the extra bootstrap functions.

josef-pkt · 2021-04-23T18:20:11Z

test statistic has weighted sum of chisquare distributions #3363

Basu, A., A. Mandal, N. Martin, and L. Pardo. 2013. “Testing Statistical Hypotheses Based on the Density Power Divergence.” Annals of the Institute of Statistical Mathematics 65 (2): 319–48. https://doi.org/10.1007/s10463-012-0372-y.

josef-pkt · 2021-05-07T17:00:49Z

maximizing Lq likelihood is off the table for now.

The correction, parameter transformation for (fisher) consistency in the literature is almost non-existing. No (clear) description of the transformation for specific cases, and I don't see a general way of deriving or computing it.

Also, reading some of the small print in Ferrari, Young 2010. It has assumption
"Let q_n be a sequence such that q_n ->1 as n -> inf"
for the consistency proof.
This means asymptotically the estimator is MLE, q = 1 corresponds to MLE.

It might be possible to implement some special cases, maybe for Beta regression (*). But that might require some guess work how the parameter transformation for consistency is actually done.

some later articles by other authors also don't say anything or much about reparameterization and (fisher) consistency.
Reparameterization, parameter transformation might be computationally simpler than a Fisher consistency term in the estimating equation, but theoretical derivation the transformation is very unclear.

(I didn't really read any of those articles, but was looking specifically for fisher consistency or bias correction.)

It is possible that the parameter transformation for canonical GLM/LEF is just theta = theta_e / q, but that's in comments, but I haven't seen any derivation or proof.

Ribeiro, Terezinha K. A., and Silvia L. P. Ferrari. 2020. “Robust Estimation in Beta Regression via Maximum Lq-Likelihood.” ArXiv:2010.11368 [Stat], October. http://arxiv.org/abs/2010.11368.
Seems to have explicit reparameterization whithout showing where it comes from.
(I cannot find the supplementary material mentioned in the article)

similar: density power divergence for Beta regression, (I didn't look much at it)
Ghosh, Abhik. 2019. “Robust Inference under the Beta Regression Model with Application to Health Care Studies.” Statistical Methods in Medical Research 28 (3): 871–88. https://doi.org/10.1177/0962280217738142.

For several models the fisher consistency term for density power divergence is analytically available.
Poisson and similar require summing over points of density, similar to what we have in predict_prob.

josef-pkt added comp-distributions type-enh comp-base labels Apr 10, 2021

josef-pkt mentioned this issue Apr 12, 2021

ENH: transformed likelihood mixin #7414

Open

josef-pkt mentioned this issue Apr 23, 2021

SUMM/ENH: LR like test for misspecified models, sum of chisquare distribution #3363

Open

josef-pkt mentioned this issue May 9, 2021

ENH: minimum density power divergence as robust estimator #7440

Open

josef-pkt mentioned this issue May 30, 2021

ENH: pareto tails of a distribution #7472

Open

josef-pkt mentioned this issue Jun 12, 2021

ENH: add discretized count distribution #7488

Merged

josef-pkt mentioned this issue Jun 30, 2021

ENH: add scale="mle" or profile-mle to GLM #7526

Open

josef-pkt mentioned this issue Jan 22, 2024

ENH: nonlinear robust estimation, curve fitting, application to distribution parameter estimation #9128

Open

josef-pkt mentioned this issue Feb 13, 2024

ENH: outlier robust extreme value statistics and distributions #9152

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: minimum (gof) distance estimator #7412

ENH: minimum (gof) distance estimator #7412

josef-pkt commented Apr 10, 2021 •

edited

josef-pkt commented Apr 10, 2021

josef-pkt commented Apr 10, 2021

josef-pkt commented Apr 11, 2021

josef-pkt commented Apr 19, 2021

josef-pkt commented Apr 19, 2021 •

edited

josef-pkt commented Apr 21, 2021 •

edited

josef-pkt commented Apr 23, 2021

josef-pkt commented May 7, 2021 •

edited

ENH: minimum (gof) distance estimator #7412

ENH: minimum (gof) distance estimator #7412

Comments

josef-pkt commented Apr 10, 2021 • edited

josef-pkt commented Apr 10, 2021

josef-pkt commented Apr 10, 2021

josef-pkt commented Apr 11, 2021

josef-pkt commented Apr 19, 2021

josef-pkt commented Apr 19, 2021 • edited

josef-pkt commented Apr 21, 2021 • edited

josef-pkt commented Apr 23, 2021

josef-pkt commented May 7, 2021 • edited

josef-pkt commented Apr 10, 2021 •

edited

josef-pkt commented Apr 19, 2021 •

edited

josef-pkt commented Apr 21, 2021 •

edited

josef-pkt commented May 7, 2021 •

edited