SUMM: outlier robust descriptive statistics, including correlation, acf #3152

josef-pkt · 2016-08-11T16:30:17Z

partially a followup to #838

other statistics for which we should also get outlier robust estimators

cov

...

acf, pacf

Dürre, Alexander, Roland Fried, and Tobias Liboschik. 2015. “Robust Estimation of (Partial) Autocorrelation.” Wiley Interdisciplinary Reviews: Computational Statistics 7 (3): 205–22. doi:10.1002/wics.1351.
(overview article, found by chance, I didn't search systematically)
(part of outlier handling in time series analysis, more specific issues are for parametric intervention based models)
(I also have a draft for nonparametric trimming/winsorizing based of fft)

josef-pkt · 2016-08-11T16:34:05Z

scikit-learn has robust estimators for cov, but I don't think it can be used for multivariate auto-covariance or auto-correlation, at least not without ignoring that the same observations show up in several lagged series.

josef-pkt · 2020-03-02T02:32:11Z

robust mean estimate

trimean with variation of Gastwirth, takes average of median and two quantiles, e.g. 0.25 and 0.75 (like iqr)
found by chance (semi-random search while reading around for #6526),

it looks like we can get t-test and confidence intervals (based on approximation with normal reference (*)):

Patel, Kartik R., Govind S. Mudholkar, and J. L. Indrasiri Fernando. 1988. “Student’s t Approximations for Three Simple Robust Estimators.” Journal of the American Statistical Association 83 (404): 1203–10. https://doi.org/10.2307/2290158.

I also saw some other articles with inference for order statistics/quantile based statistics, but didn't pay attention.
Note: MAD is the basic statistic for Levene-BF(median) for oneway comparison of variance/dispersion. see also #6563

aside abandoned Julia package has trimean and a few other
https://github.com/mrxiaohe/RobustStats.jl

(*) just an idea:
The correct distribution of trimean depends on the local density at the 3 quartiles according to Kartik et al. This is too messy so simple approximations are used.
However, we have the local kernel density estimation for standard errors in QuantileRegression model. trimean would be an average sum of the estimate (of the constant) at three different quantiles.
But in QuantileRegression we don't estimate multiple quantiles at the same time, so we would have to combine the estimates (of the constant) from three different models.

josef-pkt added wishlist type-enh comp-robust comp-stats labels Aug 11, 2016

josef-pkt mentioned this issue Dec 14, 2022

robust descriptive measures, skew, kurtosis #838

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SUMM: outlier robust descriptive statistics, including correlation, acf #3152

SUMM: outlier robust descriptive statistics, including correlation, acf #3152

josef-pkt commented Aug 11, 2016

josef-pkt commented Aug 11, 2016

josef-pkt commented Mar 2, 2020

SUMM: outlier robust descriptive statistics, including correlation, acf #3152

SUMM: outlier robust descriptive statistics, including correlation, acf #3152

Comments

josef-pkt commented Aug 11, 2016

josef-pkt commented Aug 11, 2016

josef-pkt commented Mar 2, 2020