statsmodels.distributions.edgeworth.ExpandedNormal - 4th cumulant higher than 4 #8326

cube2022 · 2022-06-22T12:38:18Z

In line 166 of edgeworth.py is checked if the imag-part is zero and if abs(r) is smaller than 4. If I choose a fourth cumulant higher than 4 I get this warning and yes in this case the imag-parts are zero and abs(r) is smaller 4. But: where does this limit of 4 comes from?
Is it based on a paper?
By the way: I assume that the first cumulant is zero and the second is 1. (scaling, centering in line 163,164) I use statsmodels 0.13.2

josef-pkt · 2022-06-22T14:05:19Z

original PR by @ev-br is #1325

I haven't looked at this in a long time, and don't remember details.
Maybe Evgeni found the condition and bound in one of the cited articles.

The problem is that the simple orthogonal polynomials can have negative pdf intervals and nonmonotonic cdf if the distribution is too far away from the base distributions.
I had looked at several methods to "fix" those negative regions, but they all require numerical integration to compute or adjust the integration constant, AFAIR. This removes the main advantage of orthogonal polynomials which are simple to compute.

By the way: I assume that the first cumulant is zero and the second is 1. (scaling, centering in line 163,164)

The first to cumulants are mean and variance
line 190 : mu, sigma = cum[0], np.sqrt(cum[1])

but the polynomials are computed in terms of the standardized distribution, loc=0, scale=1.

josef-pkt · 2022-06-22T14:29:54Z

(semi-random idea)

It should be possible to use a orthogonal polynomial distribution approximation after a nonlinear transformation that brings the distribution closer to the normal or other base distribution.
PR #7246

I'm not sure how this would work out, some of those flexible transformation distributions already have a large number of parameters, and we would mainly need a transformation that reduces skewness and/or kurtosis. We would loose the simple parameterization in terms of cumulants.
There is possibly a problem: we don't have a simple way to compute ppf of the orthogonal polynomial expansion distribution and we might need it for to apply additional transformations.

cube2022 · 2022-06-22T17:49:11Z

Thanks for your comments. I had a look at the references in edgeworth.py, but I did not find this bound. I'm trying to generate slightly degenerated normal distributions and I compare the two methods ExpandedNormal and pdf_moments. If I generate some samples from the distributions, especially from the one made with ExpandedNormal and a high fourth cumulant, I get two different results one with rvs-sampling and one with itsample package. The latter does not show the expected tails of the distribution while the former does. So I try to investigate what is the reason for this and it came out that if the fourth cumulant is not as high (below 4, sometimes 5) both methods produce similiar samples. I'm wondering that the two methods can give different results (beside the PRNG) because both use inverse cdf tool (line 1008 in _distn_infrastructure.py). Is it possible that this observation is interrelated to the negative pdf?

josef-pkt · 2022-06-22T18:00:21Z

That's possible or likely, if the cdf is not monotonic, then it will mess up the ppf and inverse cdf random sampling.

In the mailing list thread linked to in the original PR, Evgeni mentions negative pdf regions in the tail for the Edgeworth expansion.
I was working mainly on Gram-Charlier expansion, and there the negative regions where just outside the "shoulders", i.e. when pdf became small after the initial peak area, in the examples that I looked at.

You can check by computing the pdf on a grid in the possibly affected region.

cube2022 · 2022-06-22T18:11:14Z

You can check by computing the pdf on a grid in the possibly affected region.<

You mean to vary the value of the cumulants and check the different methods, similiar to a Monte Carlo?

ev-br · 2022-06-22T18:28:23Z

I don't remember details, sorry. The git history of my sandbox repo (github remembers!) shows a separate commit, but not much else. It's possible that the threshold is from some playing around for a couple of distrubtions.

https://github.com/ev-br/edgeworth/commits/master

josef-pkt · 2022-06-22T18:28:46Z

@cube2022
no, compute the pdf for different values of x for given cumulants with a 4th cumulant like 5.
i.e. compute and plot the pdf(x) function for a parameter that might have a negative valued region.

cube2022 changed the title ~~statsmodels.distributions.edgeworth.ExpandedNormal - 4.cumulant higher than 4~~ statsmodels.distributions.edgeworth.ExpandedNormal - 4th cumulant higher than 4 Jun 22, 2022

josef-pkt added the comp-distributions label Jun 22, 2022

josef-pkt added this to the 0.15 milestone Jun 22, 2022

josef-pkt mentioned this issue Jun 22, 2022

SUMM: roadmap for 0.15 josef #8217

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

statsmodels.distributions.edgeworth.ExpandedNormal - 4th cumulant higher than 4 #8326

statsmodels.distributions.edgeworth.ExpandedNormal - 4th cumulant higher than 4 #8326

cube2022 commented Jun 22, 2022

josef-pkt commented Jun 22, 2022

josef-pkt commented Jun 22, 2022 •

edited

cube2022 commented Jun 22, 2022

josef-pkt commented Jun 22, 2022

cube2022 commented Jun 22, 2022 •

edited

ev-br commented Jun 22, 2022 •

edited

josef-pkt commented Jun 22, 2022

statsmodels.distributions.edgeworth.ExpandedNormal - 4th cumulant higher than 4 #8326

statsmodels.distributions.edgeworth.ExpandedNormal - 4th cumulant higher than 4 #8326

Comments

cube2022 commented Jun 22, 2022

josef-pkt commented Jun 22, 2022

josef-pkt commented Jun 22, 2022 • edited

cube2022 commented Jun 22, 2022

josef-pkt commented Jun 22, 2022

cube2022 commented Jun 22, 2022 • edited

ev-br commented Jun 22, 2022 • edited

josef-pkt commented Jun 22, 2022

josef-pkt commented Jun 22, 2022 •

edited

cube2022 commented Jun 22, 2022 •

edited

ev-br commented Jun 22, 2022 •

edited