Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

statsmodels.distributions.edgeworth.ExpandedNormal - 4th cumulant higher than 4 #8326

Open
cube2022 opened this issue Jun 22, 2022 · 7 comments

Comments

@cube2022
Copy link

In line 166 of edgeworth.py is checked if the imag-part is zero and if abs(r) is smaller than 4. If I choose a fourth cumulant higher than 4 I get this warning and yes in this case the imag-parts are zero and abs(r) is smaller 4. But: where does this limit of 4 comes from?
Is it based on a paper?
By the way: I assume that the first cumulant is zero and the second is 1. (scaling, centering in line 163,164) I use statsmodels 0.13.2

@cube2022 cube2022 changed the title statsmodels.distributions.edgeworth.ExpandedNormal - 4.cumulant higher than 4 statsmodels.distributions.edgeworth.ExpandedNormal - 4th cumulant higher than 4 Jun 22, 2022
@josef-pkt
Copy link
Member

original PR by @ev-br is #1325

I haven't looked at this in a long time, and don't remember details.
Maybe Evgeni found the condition and bound in one of the cited articles.

The problem is that the simple orthogonal polynomials can have negative pdf intervals and nonmonotonic cdf if the distribution is too far away from the base distributions.
I had looked at several methods to "fix" those negative regions, but they all require numerical integration to compute or adjust the integration constant, AFAIR. This removes the main advantage of orthogonal polynomials which are simple to compute.

By the way: I assume that the first cumulant is zero and the second is 1. (scaling, centering in line 163,164)

The first to cumulants are mean and variance
line 190 : mu, sigma = cum[0], np.sqrt(cum[1])

but the polynomials are computed in terms of the standardized distribution, loc=0, scale=1.

@josef-pkt
Copy link
Member

josef-pkt commented Jun 22, 2022

(semi-random idea)

It should be possible to use a orthogonal polynomial distribution approximation after a nonlinear transformation that brings the distribution closer to the normal or other base distribution.
PR #7246

I'm not sure how this would work out, some of those flexible transformation distributions already have a large number of parameters, and we would mainly need a transformation that reduces skewness and/or kurtosis. We would loose the simple parameterization in terms of cumulants.
There is possibly a problem: we don't have a simple way to compute ppf of the orthogonal polynomial expansion distribution and we might need it for to apply additional transformations.

@josef-pkt josef-pkt added this to the 0.15 milestone Jun 22, 2022
@cube2022
Copy link
Author

Thanks for your comments. I had a look at the references in edgeworth.py, but I did not find this bound. I'm trying to generate slightly degenerated normal distributions and I compare the two methods ExpandedNormal and pdf_moments. If I generate some samples from the distributions, especially from the one made with ExpandedNormal and a high fourth cumulant, I get two different results one with rvs-sampling and one with itsample package. The latter does not show the expected tails of the distribution while the former does. So I try to investigate what is the reason for this and it came out that if the fourth cumulant is not as high (below 4, sometimes 5) both methods produce similiar samples. I'm wondering that the two methods can give different results (beside the PRNG) because both use inverse cdf tool (line 1008 in _distn_infrastructure.py). Is it possible that this observation is interrelated to the negative pdf?

@josef-pkt
Copy link
Member

That's possible or likely, if the cdf is not monotonic, then it will mess up the ppf and inverse cdf random sampling.

In the mailing list thread linked to in the original PR, Evgeni mentions negative pdf regions in the tail for the Edgeworth expansion.
I was working mainly on Gram-Charlier expansion, and there the negative regions where just outside the "shoulders", i.e. when pdf became small after the initial peak area, in the examples that I looked at.

You can check by computing the pdf on a grid in the possibly affected region.

@cube2022
Copy link
Author

cube2022 commented Jun 22, 2022

You can check by computing the pdf on a grid in the possibly affected region.<

You mean to vary the value of the cumulants and check the different methods, similiar to a Monte Carlo?

@ev-br
Copy link
Contributor

ev-br commented Jun 22, 2022

I don't remember details, sorry. The git history of my sandbox repo (github remembers!) shows a separate commit, but not much else. It's possible that the threshold is from some playing around for a couple of distrubtions.

https://github.com/ev-br/edgeworth/commits/master

@josef-pkt
Copy link
Member

@cube2022
no, compute the pdf for different values of x for given cumulants with a 4th cumulant like 5.
i.e. compute and plot the pdf(x) function for a parameter that might have a negative valued region.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants