New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
statsmodels.distributions.edgeworth.ExpandedNormal - 4th cumulant higher than 4 #8326
Comments
original PR by @ev-br is #1325 I haven't looked at this in a long time, and don't remember details. The problem is that the simple orthogonal polynomials can have negative pdf intervals and nonmonotonic cdf if the distribution is too far away from the base distributions.
The first to cumulants are mean and variance but the polynomials are computed in terms of the standardized distribution, loc=0, scale=1. |
(semi-random idea) It should be possible to use a orthogonal polynomial distribution approximation after a nonlinear transformation that brings the distribution closer to the normal or other base distribution. I'm not sure how this would work out, some of those flexible transformation distributions already have a large number of parameters, and we would mainly need a transformation that reduces skewness and/or kurtosis. We would loose the simple parameterization in terms of cumulants. |
Thanks for your comments. I had a look at the references in edgeworth.py, but I did not find this bound. I'm trying to generate slightly degenerated normal distributions and I compare the two methods ExpandedNormal and pdf_moments. If I generate some samples from the distributions, especially from the one made with ExpandedNormal and a high fourth cumulant, I get two different results one with rvs-sampling and one with itsample package. The latter does not show the expected tails of the distribution while the former does. So I try to investigate what is the reason for this and it came out that if the fourth cumulant is not as high (below 4, sometimes 5) both methods produce similiar samples. I'm wondering that the two methods can give different results (beside the PRNG) because both use inverse cdf tool (line 1008 in _distn_infrastructure.py). Is it possible that this observation is interrelated to the negative pdf? |
That's possible or likely, if the cdf is not monotonic, then it will mess up the ppf and inverse cdf random sampling. In the mailing list thread linked to in the original PR, Evgeni mentions negative pdf regions in the tail for the Edgeworth expansion. You can check by computing the pdf on a grid in the possibly affected region. |
You mean to vary the value of the cumulants and check the different methods, similiar to a Monte Carlo? |
I don't remember details, sorry. The git history of my sandbox repo (github remembers!) shows a separate commit, but not much else. It's possible that the threshold is from some playing around for a couple of distrubtions. |
@cube2022 |
In line 166 of edgeworth.py is checked if the imag-part is zero and if abs(r) is smaller than 4. If I choose a fourth cumulant higher than 4 I get this warning and yes in this case the imag-parts are zero and abs(r) is smaller 4. But: where does this limit of 4 comes from?
Is it based on a paper?
By the way: I assume that the first cumulant is zero and the second is 1. (scaling, centering in line 163,164) I use statsmodels 0.13.2
The text was updated successfully, but these errors were encountered: