Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SUMM: outlier robust descriptive statistics, including correlation, acf #3152

Open
josef-pkt opened this issue Aug 11, 2016 · 2 comments
Open

Comments

@josef-pkt
Copy link
Member

partially a followup to #838

other statistics for which we should also get outlier robust estimators

cov

...

acf, pacf

Dürre, Alexander, Roland Fried, and Tobias Liboschik. 2015. “Robust Estimation of (Partial) Autocorrelation.” Wiley Interdisciplinary Reviews: Computational Statistics 7 (3): 205–22. doi:10.1002/wics.1351.
(overview article, found by chance, I didn't search systematically)
(part of outlier handling in time series analysis, more specific issues are for parametric intervention based models)
(I also have a draft for nonparametric trimming/winsorizing based of fft)

@josef-pkt
Copy link
Member Author

scikit-learn has robust estimators for cov, but I don't think it can be used for multivariate auto-covariance or auto-correlation, at least not without ignoring that the same observations show up in several lagged series.

@josef-pkt
Copy link
Member Author

robust mean estimate

trimean with variation of Gastwirth, takes average of median and two quantiles, e.g. 0.25 and 0.75 (like iqr)
found by chance (semi-random search while reading around for #6526),

it looks like we can get t-test and confidence intervals (based on approximation with normal reference (*)):

Patel, Kartik R., Govind S. Mudholkar, and J. L. Indrasiri Fernando. 1988. “Student’s t Approximations for Three Simple Robust Estimators.” Journal of the American Statistical Association 83 (404): 1203–10. https://doi.org/10.2307/2290158.

I also saw some other articles with inference for order statistics/quantile based statistics, but didn't pay attention.
Note: MAD is the basic statistic for Levene-BF(median) for oneway comparison of variance/dispersion. see also #6563

aside abandoned Julia package has trimean and a few other
https://github.com/mrxiaohe/RobustStats.jl

(*) just an idea:
The correct distribution of trimean depends on the local density at the 3 quartiles according to Kartik et al. This is too messy so simple approximations are used.
However, we have the local kernel density estimation for standard errors in QuantileRegression model. trimean would be an average sum of the estimate (of the constant) at three different quantiles.
But in QuantileRegression we don't estimate multiple quantiles at the same time, so we would have to combine the estimates (of the constant) from three different models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant