# ENH: add the Edgeworth expansion based on the normal distribution #1325

Merged
merged 6 commits into from Oct 9, 2014

## Conversation

Projects
None yet
4 participants
Contributor

### ev-br commented Jan 24, 2014

 This PR adds the Edgeworth expansion (normal distribution + cumulants), as discussed in this ML thread: https://groups.google.com/forum/#!topic/pystatsmodels/so83mPP9nkE (time flies) No significant changes since that email exchange last year.
Member

### josef-pkt commented Jan 24, 2014

 Thanks Evgeni, it might take me a bit of time going back to this. as motivation: I coded a test recently, that used 4 term Edgeworth expansion to get better approximate p-values, because the test statistic has considerable skew in small samples. (However, I gave up after mean and variance, because I worried finding more typos in another two pages of formulas will take too much time.)
Contributor Author

### ev-br commented Jan 24, 2014

 Well, I guess one advantage of this is that at least some part of the math is checked (FWIW) and covered by tests. (on at least the chi square example, where we know the answers). Speaking of which, I don't really understand the TravisCI failure --- distributions.test() passes locally with the scipy master, but statsmodels.test() errs left and right.
Member

### josef-pkt commented Jan 24, 2014

 My guess is that it's a backwards compatibility issue. TravisCI uses the default scipy, which is 0.9.0 in Ubuntu precise

### josef-pkt reviewed Jan 24, 2014

 warnings.warn(mesg, UserWarning) kwds.update({'momtype': 0}) # use pdf, not ppf in self.moment() super(ExpandedNormal, self).__init__(**kwds)

#### josef-pkt Jan 24, 2014

Member

given the test failure, I think kwds needs to include name

### josef-pkt reviewed Jan 24, 2014

 # compate the pdf m, s = df, np.sqrt(2*df) x = np.linspace(m - s, m + s, 10) assert_allclose(ne.pdf(x), stats.chi2.pdf(x, df=df),

#### josef-pkt Jan 24, 2014

Member

shape parameter df has to be positional in older scipy

#### ev-br Jan 24, 2014

Author Contributor

Indeed! Will fix both as soon as I can --- it's a bit awkward to do from
the phone :-)

/Zh
On Jan 24, 2014 4:36 PM, "Josef Perktold" notifications@github.com wrote:

In statsmodels/distributions/tests/test_edgeworth.py:

•    assert_allclose(ne2.pdf(x), stats.norm.pdf(x, loc=3, scale=2))

• def test_chi2_moments(self):
•    # construct the expansion for \chi^2

•    N, df = 6, 15

•    cum = [_chi2_cumulant(n+1, df) for n in range(N)]

•    ne = ExpandedNormal(cum, name='edgw_chi2')

•    # compare the moments

•    assert_allclose([_chi2_moment(n, df) for n in range(N)],

•                    [ne.moment(n) for n in range(N)])

•    # compate the pdf

•    m, s = df, np.sqrt(2*df)

•    x = np.linspace(m - s, m + s, 10)

•    assert_allclose(ne.pdf(x), stats.chi2.pdf(x, df=df),


shape parameter df has to be positional in older scipy

Reply to this email directly or view it on GitHubhttps://github.com//pull/1325/files#r9154059
.

Member

### josef-pkt commented Jan 24, 2014

 It looks like you use two features of the distributions that are only in very recent scipy. Do you have a default name in scipy master?
Contributor Author

### ev-br commented Jan 25, 2014

 After fixing the issues with the name and named args, there's still a numerical problem of some sort. Locally, all tests pass with scipy master, but with scipy 0.9 it fails to reconstruct a chi-square distribution [same failure on TravisCI]: pdf starts oscillating much earlier, hence ppf cannot be defined etc. Am wondering what were the relevant changes between 0.9 and 0.13. Will have a look into it.
Member

### josef-pkt commented Jan 25, 2014

 the ppf brentq error is also because of a previous version of the ppf inversion that had fixed limits, .xa, .xb. It was change in scipy to use an expanding search if the f at the bounds don't have the same sign. base on the error message on TravisCI in _ppf_single_call Note the old version required a large xa and xb to be set in the definition of the distribution. I haven't tried the example, and don't know why the function itself would be different. Also I assume you are testing against a recent numpy with polynomial while TravisCI uses your compatibility function. I have no idea if that matters.
Member

### jseabold commented Jan 28, 2014

 Excuse to bump scipy version in master for 0.6.0?
Member

### jseabold commented Jan 28, 2014

 Also time to discuss numpy version?
Member

### josef-pkt commented Jan 28, 2014

 Ubuntu LTS (recent scikit-learn discussion) ? some of the goodies in scipy are too recent and we would have to bump too high, many of Evgeni's improvements are only one release old. Do we need to move to more recent numpy than 1.6 ?
Contributor Author

### ev-br commented Jan 28, 2014

 The current LTS has numpy 1.6 already. Ubuntu 13.10 ships scipy 0.12 and numpy 1.7; the next LTS is going to be 14.04, due this Spring. Unless I misread this: https://launchpad.net/ubuntu/+source/python-scipy, there's a chance of even having 0.13 in next LTS :-). The previous LTS, 10.04 is definitely at the end of life, even its 'server edition'. As far as this one is concerned: named args & default distribution name (which require scipy 0.13) are not needed, are in fact, removed from this PR. No problem here Hermite polynomials need numpy 1.6, so even the current LTS is OK. Something with xa and xb: In fact I don't know when they were finally removed. So far I did not manage to have a proper look at the failure beyond noticing that a (rather strange) combination of numpy 1.8 and scipy 0.9 still fails.

Merged

Member

### jseabold commented Apr 3, 2014

 Can you rebase this on master? We've bumped our minimum versions and we'll see how Travis does now.
Contributor Author

### ev-br commented Apr 3, 2014

 @jseabold Done. I don't have a proper dev setup for statsmodels at the moment, so all I can say is that tests pass locally with scipy master & numpy 1.8. TravisCI?
Member

### jseabold commented Apr 3, 2014

 Yes, travis should let us know.

### coveralls commented Apr 3, 2014

 Coverage remained the same when pulling c3ffcae on ev-br:edgeworth into c70a709 on statsmodels:master.

Merged

Member

### jseabold commented Apr 3, 2014

 Failure is unrelated. Good to merge?
Contributor Author

### ev-br commented Apr 3, 2014

 Might want to remove the old numpy compat for _hermite (will do tomorrow)
Member

### jseabold commented Apr 3, 2014

 Ok. Ping the PR when it's ready. (I won't be online tomorrow, likely).

### coveralls commented Apr 4, 2014

 Coverage remained the same when pulling b191422 on ev-br:edgeworth into c70a709 on statsmodels:master.
Contributor Author

### ev-br commented Apr 4, 2014

 Done. I think it's usable as is, even though there's plenty of room for improvement. For one, ppf (hence rvs) is sloth-slow since it goes through the generic path (no Cornish-Fisher).
Member

### jseabold commented Apr 4, 2014

 Thanks. Would you mind also making a bullet point note in docs/source/release/version0.6.rst. We're trying to be better about release notes and advertising improvements.
Contributor Author

### ev-br commented Apr 4, 2014

 Added a line. Feel free to edit or expand if desired.

### coveralls commented Apr 4, 2014

 Coverage remained the same when pulling 233b88b on ev-br:edgeworth into c70a709 on statsmodels:master.
Member

### jseabold commented Sep 26, 2014

 It looks like this could go in after a rebase?

### ev-br added some commits Jan 24, 2014

 ENH: add the Edgeworth expansion based on the normal distribution 
Given a set of cumulants, construct a matching distribution
as an asymptotic series starting from the normal distribution.
 a64fab6 
 MAINT: make it work with scipy < 0.13 
 377ea81 
 TST: remove an overly optimistic test 
 5d0e967 
 MAINT: remove hermite compat wrapper for numpy < 1.6 
 2427d58 
 DOC: add a note to release notes 
 539ab42 

### ev-brforce-pushed the ev-br:edgeworth branch from 233b88b to 539ab42Sep 28, 2014

Contributor Author

### ev-br commented Sep 28, 2014

 rebased

### coveralls commented Sep 28, 2014

 Coverage increased (+0.05%) when pulling 09016f7 on ev-br:edgeworth into cfebe9a on statsmodels:master.

### coveralls commented Sep 28, 2014

 Coverage increased (+0.05%) when pulling 3301a64 on ev-br:edgeworth into cfebe9a on statsmodels:master.
 MAINT: distributions: tweak the __init__ 
 dec35ee 

### ev-brforce-pushed the ev-br:edgeworth branch from 2e431ba to dec35eeSep 28, 2014

Contributor Author

### ev-br commented Sep 28, 2014

 I'm not sure why trying to import NoseWrapper in distributions.__init__ causes test failures.

### coveralls commented Sep 28, 2014

 Coverage increased (+0.05%) when pulling dec35ee on ev-br:edgeworth into cfebe9a on statsmodels:master.
Member

### jseabold commented Oct 8, 2014

 Thanks. Merge? The NoseWrapper circular imports happened with the recent warnings refactor. Probably needs a fix but not critical IMO.
Member

### jseabold commented Oct 9, 2014

 Gonna go ahead and merge this. Thanks.

### jseabold added a commit that referenced this pull request Oct 9, 2014

 Merge pull request #1325 from ev-br/edgeworth 
ENH: add the Edgeworth expansion based on the normal distribution
 953d944 

### jseabold merged commit 953d944 into statsmodels:master Oct 9, 2014 1 check passed

#### 1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details

Closed