Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SUMM/ENH: extreme value distributions, pareto #5185

Open
josef-pkt opened this issue Sep 12, 2018 · 3 comments
Open

SUMM/ENH: extreme value distributions, pareto #5185

josef-pkt opened this issue Sep 12, 2018 · 3 comments

Comments

@josef-pkt
Copy link
Member

josef-pkt commented Sep 12, 2018

Another ancient theme with nothing public in statsmodels.

a brief github search with python repos for extreme value
https://github.com/wafo-project/pywafo package by Per A. Brodtkorb but GPL
https://github.com/kikocorreoso/scikit-extremes MIT alpha not released yet, GEV
https://github.com/AlexLSmith/extreme-value-analysis MIT, some interesting parts
https://github.com/kniehaus/EVT (no license, looks abandoned)

there are several python packages on pareto/power distribution

Almost everything in R is GPL
http://www.ral.ucar.edu/~ericg/softextreme.php contains a list of R packages

there are a few user functions in Stata that I never looked at
https://ideas.repec.org/c/boc/bocode/s457953.html extreme
https://davegiles.blogspot.com/2015/01/extreme-value-modelling-in-stata.html (blog)
https://ideas.repec.org/c/boc/bocode/s456832.html paretofit
https://ideas.repec.org/c/boc/bocode/s456892.html gevfit

statsmodels:
statsmodels.miscmodels.tests.test_generic_mle has MyPareto as a test case, no externally verified numbers
sandbox distribution has genpareto and try_pot for some experiments and draft functions (from 2010)
sandbox distribution copula also has extreme value copulas
old times:
maximum spacing estimator in old scipy issue, (Per Brodtkorb)
discussion about how well MLE works with good choice of start_params (Per Brodtkorb and I )
lmoments python wrapper of Fortran code in hydroclimpy

this is mostly univariate distribution estimation without explanatory variables
R's fExtremes might provide a useful table of content for useful helper functions

I never looked at regression version of those, i.e. conditional on explanatory variables as in VGAM

(motivation: I was trying variation of miscmodels MyPareto for #5143 )

@josef-pkt
Copy link
Member Author

Statsmodels doesn't have much of a comparative advantage for the univariate distribution case.
Regression with link functions would be more our domain.

One extra benefit is that these provide test cases for generic features, like the MyPareto unit tests.

Another usecase that I never implemented, kernel density estimation with Pareto tails (available in matlab)

@josef-pkt
Copy link
Member Author

aside
https://cran.r-project.org/web/packages/texmex/index.html seems to have a sandwich cov_params estimator (but I didn't read the small print)

@josef-pkt
Copy link
Member Author

another point, AFAIR:
pareto in scipy and numpy have lower bound=1 and not 0. The distribution with lower bound=0 is lomax.
(In my current uncommitted example I use fixed loc=-1 to shift the lower bound to zero)
threshold in generalized pareto in scipy/numpy version needs to subtract 1.
(x - threshold) / scale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant