# ENH: add the Edgeworth expansion based on the normal distribution #1325

Merged
merged 6 commits into from Oct 9, 2014

None yet

### 4 participants

Contributor
commented Jan 24, 2014
 This PR adds the Edgeworth expansion (normal distribution + cumulants), as discussed in this ML thread: https://groups.google.com/forum/#!topic/pystatsmodels/so83mPP9nkE (time flies) No significant changes since that email exchange last year.
Member
 Thanks Evgeni, it might take me a bit of time going back to this. as motivation: I coded a test recently, that used 4 term Edgeworth expansion to get better approximate p-values, because the test statistic has considerable skew in small samples. (However, I gave up after mean and variance, because I worried finding more typos in another two pages of formulas will take too much time.)
Contributor
commented Jan 24, 2014
 Well, I guess one advantage of this is that at least some part of the math is checked (FWIW) and covered by tests. (on at least the chi square example, where we know the answers). Speaking of which, I don't really understand the TravisCI failure --- distributions.test() passes locally with the scipy master, but statsmodels.test() errs left and right.
Member
 My guess is that it's a backwards compatibility issue. TravisCI uses the default scipy, which is 0.9.0 in Ubuntu precise
statsmodels/distributions/edgeworth.py
 + self._coef, self._mu, self._sigma = self._compute_coefs_pdf(cum) + self._herm_pdf = HermiteE(self._coef) + if self._coef.size > 2: + self._herm_cdf = HermiteE(-self._coef[1:]) + else: + self._herm_cdf = lambda x: 0. + + # warn if pdf(x) < 0 for some values of x within 4 sigma + r = np.real_if_close(self._herm_pdf.roots()) + r = (r - self._mu) / self._sigma + if r[(np.imag(r) == 0) & (np.abs(r) < 4)].any(): + mesg = 'PDF has zeros at %s ' % r + warnings.warn(mesg, UserWarning) + + kwds.update({'momtype': 0}) # use pdf, not ppf in self.moment() + super(ExpandedNormal, self).__init__(**kwds)
 josef-pkt Member given the test failure, I think kwds needs to include name
and 1 other commented on an outdated diff Jan 24, 2014
statsmodels/distributions/tests/test_edgeworth.py
 + assert_allclose(ne2.pdf(x), stats.norm.pdf(x, loc=3, scale=2)) + + def test_chi2_moments(self): + # construct the expansion for \chi^2 + N, df = 6, 15 + cum = [_chi2_cumulant(n+1, df) for n in range(N)] + ne = ExpandedNormal(cum, name='edgw_chi2') + + # compare the moments + assert_allclose([_chi2_moment(n, df) for n in range(N)], + [ne.moment(n) for n in range(N)]) + + # compate the pdf + m, s = df, np.sqrt(2*df) + x = np.linspace(m - s, m + s, 10) + assert_allclose(ne.pdf(x), stats.chi2.pdf(x, df=df),
 josef-pkt Member shape parameter df has to be positional in older scipy ev-br Contributor Indeed! Will fix both as soon as I can --- it's a bit awkward to do from the phone :-) /Zh On Jan 24, 2014 4:36 PM, "Josef Perktold" notifications@github.com wrote: In statsmodels/distributions/tests/test_edgeworth.py:  assert_allclose(ne2.pdf(x), stats.norm.pdf(x, loc=3, scale=2))  def test_chi2_moments(self):  # construct the expansion for \chi^2   N, df = 6, 15   cum = [_chi2_cumulant(n+1, df) for n in range(N)]   ne = ExpandedNormal(cum, name='edgw_chi2')   # compare the moments   assert_allclose([_chi2_moment(n, df) for n in range(N)],   [ne.moment(n) for n in range(N)])   # compate the pdf   m, s = df, np.sqrt(2*df)   x = np.linspace(m - s, m + s, 10)   assert_allclose(ne.pdf(x), stats.chi2.pdf(x, df=df),  shape parameter df has to be positional in older scipy — Reply to this email directly or view it on GitHubhttps://github.com/statsmodels/statsmodels/pull/1325/files#r9154059 .
Member
 It looks like you use two features of the distributions that are only in very recent scipy. Do you have a default name in scipy master?
Contributor
commented Jan 25, 2014
 After fixing the issues with the name and named args, there's still a numerical problem of some sort. Locally, all tests pass with scipy master, but with scipy 0.9 it fails to reconstruct a chi-square distribution [same failure on TravisCI]: pdf starts oscillating much earlier, hence ppf cannot be defined etc. Am wondering what were the relevant changes between 0.9 and 0.13. Will have a look into it.
Member
 the ppf brentq error is also because of a previous version of the ppf inversion that had fixed limits, .xa, .xb. It was change in scipy to use an expanding search if the f at the bounds don't have the same sign. base on the error message on TravisCI in _ppf_single_call Note the old version required a large xa and xb to be set in the definition of the distribution. I haven't tried the example, and don't know why the function itself would be different. Also I assume you are testing against a recent numpy with polynomial while TravisCI uses your compatibility function. I have no idea if that matters.
Member
 Excuse to bump scipy version in master for 0.6.0?
Member
 Also time to discuss numpy version?
Member
 Ubuntu LTS (recent scikit-learn discussion) ? some of the goodies in scipy are too recent and we would have to bump too high, many of Evgeni's improvements are only one release old. Do we need to move to more recent numpy than 1.6 ?
Contributor
commented Jan 28, 2014
 The current LTS has numpy 1.6 already. Ubuntu 13.10 ships scipy 0.12 and numpy 1.7; the next LTS is going to be 14.04, due this Spring. Unless I misread this: https://launchpad.net/ubuntu/+source/python-scipy, there's a chance of even having 0.13 in next LTS :-). The previous LTS, 10.04 is definitely at the end of life, even its 'server edition'. As far as this one is concerned: named args & default distribution name (which require scipy 0.13) are not needed, are in fact, removed from this PR. No problem here Hermite polynomials need numpy 1.6, so even the current LTS is OK. Something with xa and xb: In fact I don't know when they were finally removed. So far I did not manage to have a proper look at the failure beyond noticing that a (rather strange) combination of numpy 1.8 and scipy 0.9 still fails.
referenced this pull request Feb 20, 2014
Merged

#### Bug norm expan shapes #1403

Member
commented Apr 3, 2014
 Can you rebase this on master? We've bumped our minimum versions and we'll see how Travis does now.
Contributor
commented Apr 3, 2014
 @jseabold Done. I don't have a proper dev setup for statsmodels at the moment, so all I can say is that tests pass locally with scipy master & numpy 1.8. TravisCI?
Member
commented Apr 3, 2014
 Yes, travis should let us know.
 Coverage remained the same when pulling c3ffcae on ev-br:edgeworth into c70a709 on statsmodels:master.
referenced this pull request Apr 3, 2014
Merged

#### REF: ensure O(N log N) when using fft for acf #1196

Member
commented Apr 3, 2014
 Failure is unrelated. Good to merge?
Contributor
commented Apr 3, 2014
 Might want to remove the old numpy compat for _hermite (will do tomorrow)
Member
commented Apr 3, 2014
 Ok. Ping the PR when it's ready. (I won't be online tomorrow, likely).
 Coverage remained the same when pulling b191422 on ev-br:edgeworth into c70a709 on statsmodels:master.
Contributor
commented Apr 4, 2014
 Done. I think it's usable as is, even though there's plenty of room for improvement. For one, ppf (hence rvs) is sloth-slow since it goes through the generic path (no Cornish-Fisher).
Member
commented Apr 4, 2014
 Thanks. Would you mind also making a bullet point note in docs/source/release/version0.6.rst. We're trying to be better about release notes and advertising improvements.
Contributor
commented Apr 4, 2014
 Added a line. Feel free to edit or expand if desired.
 Coverage remained the same when pulling 233b88b on ev-br:edgeworth into c70a709 on statsmodels:master.
Member
 It looks like this could go in after a rebase?
added this to the 0.6 milestone Sep 26, 2014
added some commits Jan 24, 2014
 ev-br ENH: add the Edgeworth expansion based on the normal distribution Given a set of cumulants, construct a matching distribution as an asymptotic series starting from the normal distribution. a64fab6 ev-br MAINT: make it work with scipy < 0.13 377ea81 ev-br TST: remove an overly optimistic test 5d0e967 ev-br MAINT: remove hermite compat wrapper for numpy < 1.6 2427d58 ev-br DOC: add a note to release notes 539ab42
Contributor
commented Sep 28, 2014
 rebased
 Coverage increased (+0.05%) when pulling 09016f7 on ev-br:edgeworth into cfebe9a on statsmodels:master.
 Coverage increased (+0.05%) when pulling 3301a64 on ev-br:edgeworth into cfebe9a on statsmodels:master.
 ev-br MAINT: distributions: tweak the __init__ dec35ee
Contributor
commented Sep 28, 2014
 I'm not sure why trying to import NoseWrapper in distributions.__init__ causes test failures.
 Coverage increased (+0.05%) when pulling dec35ee on ev-br:edgeworth into cfebe9a on statsmodels:master.
Member
commented Oct 8, 2014
 Thanks. Merge? The NoseWrapper circular imports happened with the recent warnings refactor. Probably needs a fix but not critical IMO.
Member
commented Oct 9, 2014
 Gonna go ahead and merge this. Thanks.
merged commit 953d944 into statsmodels:master Oct 9, 2014

#### 1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details
referenced this pull request in scipy/scipy Dec 13, 2014
Closed

#### Statistics Review: pdf_fromgamma (Trac #172) #699

deleted the ev-br:edgeworth branch Dec 13, 2014