Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add the Edgeworth expansion based on the normal distribution #1325

Merged
merged 6 commits into from Oct 9, 2014

Conversation

Projects
None yet
4 participants
@ev-br
Copy link
Contributor

commented Jan 24, 2014

This PR adds the Edgeworth expansion (normal distribution + cumulants), as discussed in this ML thread:
https://groups.google.com/forum/#!topic/pystatsmodels/so83mPP9nkE

(time flies) No significant changes since that email exchange last year.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Jan 24, 2014

Thanks Evgeni, it might take me a bit of time going back to this.

as motivation:
I coded a test recently, that used 4 term Edgeworth expansion to get better approximate p-values, because the test statistic has considerable skew in small samples.
(However, I gave up after mean and variance, because I worried finding more typos in another two pages of formulas will take too much time.)

@ev-br

This comment has been minimized.

Copy link
Contributor Author

commented Jan 24, 2014

Well, I guess one advantage of this is that at least some part of the math is checked (FWIW) and covered by tests.
(on at least the chi square example, where we know the answers).
Speaking of which, I don't really understand the TravisCI failure --- distributions.test() passes locally with the scipy master, but statsmodels.test() errs left and right.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Jan 24, 2014

My guess is that it's a backwards compatibility issue. TravisCI uses the default scipy, which is 0.9.0 in Ubuntu precise

warnings.warn(mesg, UserWarning)

kwds.update({'momtype': 0}) # use pdf, not ppf in self.moment()
super(ExpandedNormal, self).__init__(**kwds)

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Jan 24, 2014

Member

given the test failure, I think kwds needs to include name

# compate the pdf
m, s = df, np.sqrt(2*df)
x = np.linspace(m - s, m + s, 10)
assert_allclose(ne.pdf(x), stats.chi2.pdf(x, df=df),

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Jan 24, 2014

Member

shape parameter df has to be positional in older scipy

This comment has been minimized.

Copy link
@ev-br

ev-br Jan 24, 2014

Author Contributor

Indeed! Will fix both as soon as I can --- it's a bit awkward to do from
the phone :-)

/Zh
On Jan 24, 2014 4:36 PM, "Josef Perktold" notifications@github.com wrote:

In statsmodels/distributions/tests/test_edgeworth.py:

  •    assert_allclose(ne2.pdf(x), stats.norm.pdf(x, loc=3, scale=2))
    
  • def test_chi2_moments(self):
  •    # construct the expansion for \chi^2
    
  •    N, df = 6, 15
    
  •    cum = [_chi2_cumulant(n+1, df) for n in range(N)]
    
  •    ne = ExpandedNormal(cum, name='edgw_chi2')
    
  •    # compare the moments
    
  •    assert_allclose([_chi2_moment(n, df) for n in range(N)],
    
  •                    [ne.moment(n) for n in range(N)])
    
  •    # compate the pdf
    
  •    m, s = df, np.sqrt(2*df)
    
  •    x = np.linspace(m - s, m + s, 10)
    
  •    assert_allclose(ne.pdf(x), stats.chi2.pdf(x, df=df),
    

shape parameter df has to be positional in older scipy


Reply to this email directly or view it on GitHubhttps://github.com//pull/1325/files#r9154059
.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Jan 24, 2014

It looks like you use two features of the distributions that are only in very recent scipy.
Do you have a default name in scipy master?

@ev-br

This comment has been minimized.

Copy link
Contributor Author

commented Jan 25, 2014

After fixing the issues with the name and named args, there's still a numerical problem of some sort. Locally, all tests pass with scipy master, but with scipy 0.9 it fails to reconstruct a chi-square distribution [same failure on TravisCI]: pdf starts oscillating much earlier, hence ppf cannot be defined etc. Am wondering what were the relevant changes between 0.9 and 0.13. Will have a look into it.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Jan 25, 2014

the ppf brentq error is also because of a previous version of the ppf inversion that had fixed limits, .xa, .xb.

It was change in scipy to use an expanding search if the f at the bounds don't have the same sign.
base on the error message on TravisCI in _ppf_single_call

Note the old version required a large xa and xb to be set in the definition of the distribution.

I haven't tried the example, and don't know why the function itself would be different.
Also I assume you are testing against a recent numpy with polynomial while TravisCI uses your compatibility function. I have no idea if that matters.

@jseabold

This comment has been minimized.

Copy link
Member

commented Jan 28, 2014

Excuse to bump scipy version in master for 0.6.0?

@jseabold

This comment has been minimized.

Copy link
Member

commented Jan 28, 2014

Also time to discuss numpy version?

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Jan 28, 2014

Ubuntu LTS (recent scikit-learn discussion) ?

some of the goodies in scipy are too recent and we would have to bump too high, many of Evgeni's improvements are only one release old.

Do we need to move to more recent numpy than 1.6 ?

@ev-br

This comment has been minimized.

Copy link
Contributor Author

commented Jan 28, 2014

The current LTS has numpy 1.6 already.
Ubuntu 13.10 ships scipy 0.12 and numpy 1.7; the next LTS is going to be 14.04, due this Spring.
Unless I misread this: https://launchpad.net/ubuntu/+source/python-scipy, there's a chance of even having 0.13 in next LTS :-).
The previous LTS, 10.04 is definitely at the end of life, even its 'server edition'.

As far as this one is concerned:

  • named args & default distribution name (which require scipy 0.13) are not needed, are in fact, removed from this PR. No problem here
  • Hermite polynomials need numpy 1.6, so even the current LTS is OK.
  • Something with xa and xb: In fact I don't know when they were finally removed.

So far I did not manage to have a proper look at the failure beyond noticing that a (rather strange) combination of numpy 1.8 and scipy 0.9 still fails.

@josef-pkt josef-pkt added PR labels Feb 19, 2014

@josef-pkt josef-pkt referenced this pull request Feb 20, 2014

Merged

Bug norm expan shapes #1403

@jseabold

This comment has been minimized.

Copy link
Member

commented Apr 3, 2014

Can you rebase this on master? We've bumped our minimum versions and we'll see how Travis does now.

@ev-br

This comment has been minimized.

Copy link
Contributor Author

commented Apr 3, 2014

@jseabold Done. I don't have a proper dev setup for statsmodels at the moment, so all I can say is that tests pass locally with scipy master & numpy 1.8. TravisCI?

@jseabold

This comment has been minimized.

Copy link
Member

commented Apr 3, 2014

Yes, travis should let us know.

@coveralls

This comment has been minimized.

Copy link

commented Apr 3, 2014

Coverage Status

Coverage remained the same when pulling c3ffcae on ev-br:edgeworth into c70a709 on statsmodels:master.

@jseabold

This comment has been minimized.

Copy link
Member

commented Apr 3, 2014

Failure is unrelated. Good to merge?

@ev-br

This comment has been minimized.

Copy link
Contributor Author

commented Apr 3, 2014

Might want to remove the old numpy compat for _hermite (will do tomorrow)

@jseabold

This comment has been minimized.

Copy link
Member

commented Apr 3, 2014

Ok. Ping the PR when it's ready. (I won't be online tomorrow, likely).

@coveralls

This comment has been minimized.

Copy link

commented Apr 4, 2014

Coverage Status

Coverage remained the same when pulling b191422 on ev-br:edgeworth into c70a709 on statsmodels:master.

@ev-br

This comment has been minimized.

Copy link
Contributor Author

commented Apr 4, 2014

Done.
I think it's usable as is, even though there's plenty of room for improvement. For one, ppf (hence rvs) is sloth-slow since it goes through the generic path (no Cornish-Fisher).

@jseabold

This comment has been minimized.

Copy link
Member

commented Apr 4, 2014

Thanks. Would you mind also making a bullet point note in docs/source/release/version0.6.rst. We're trying to be better about release notes and advertising improvements.

@ev-br

This comment has been minimized.

Copy link
Contributor Author

commented Apr 4, 2014

Added a line. Feel free to edit or expand if desired.

@coveralls

This comment has been minimized.

Copy link

commented Apr 4, 2014

Coverage Status

Coverage remained the same when pulling 233b88b on ev-br:edgeworth into c70a709 on statsmodels:master.

@jseabold

This comment has been minimized.

Copy link
Member

commented Sep 26, 2014

It looks like this could go in after a rebase?

@jseabold jseabold added this to the 0.6 milestone Sep 26, 2014

@ev-br ev-br force-pushed the ev-br:edgeworth branch from 233b88b to 539ab42 Sep 28, 2014

@ev-br

This comment has been minimized.

Copy link
Contributor Author

commented Sep 28, 2014

rebased

@ev-br ev-br force-pushed the ev-br:edgeworth branch from 79d0e88 to 09016f7 Sep 28, 2014

@coveralls

This comment has been minimized.

Copy link

commented Sep 28, 2014

Coverage Status

Coverage increased (+0.05%) when pulling 09016f7 on ev-br:edgeworth into cfebe9a on statsmodels:master.

@coveralls

This comment has been minimized.

Copy link

commented Sep 28, 2014

Coverage Status

Coverage increased (+0.05%) when pulling 3301a64 on ev-br:edgeworth into cfebe9a on statsmodels:master.

@ev-br ev-br force-pushed the ev-br:edgeworth branch from 2e431ba to dec35ee Sep 28, 2014

@ev-br

This comment has been minimized.

Copy link
Contributor Author

commented Sep 28, 2014

I'm not sure why trying to import NoseWrapper in distributions.__init__ causes test failures.

@coveralls

This comment has been minimized.

Copy link

commented Sep 28, 2014

Coverage Status

Coverage increased (+0.05%) when pulling dec35ee on ev-br:edgeworth into cfebe9a on statsmodels:master.

@jseabold

This comment has been minimized.

Copy link
Member

commented Oct 8, 2014

Thanks. Merge? The NoseWrapper circular imports happened with the recent warnings refactor. Probably needs a fix but not critical IMO.

@jseabold

This comment has been minimized.

Copy link
Member

commented Oct 9, 2014

Gonna go ahead and merge this. Thanks.

jseabold added a commit that referenced this pull request Oct 9, 2014

Merge pull request #1325 from ev-br/edgeworth
ENH: add the Edgeworth expansion based on the normal distribution

@jseabold jseabold merged commit 953d944 into statsmodels:master Oct 9, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details

@ev-br ev-br deleted the ev-br:edgeworth branch Dec 13, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.