Bootstrapping #420

Open
jseabold opened this Issue Aug 8, 2012 · 7 comments

Projects

None yet

6 participants

Owner
jseabold commented Aug 8, 2012

See about incorporating this code into a bootstrapping framework. It's BSD licensed.

https://bitbucket.org/cevans/bootstrap/src

[Edit: Updated repo https://github.com/cgevans/scikits-bootstrap/]

Member
rgommers commented Aug 8, 2012

If you plan to work on this now: you may want to wait a few days. Someone just contacted me saying he had an updated version of that code that could be included in scipy. I mentioned also statsmodels and asked him to bring it up on the ML.

Owner
jseabold commented Aug 8, 2012

Sure. I won't be working on this anytime soon likely. There's plenty else to do right now.

cgevans commented Aug 8, 2012

As the author of the first package, I'd be happy to help, but I'm not really sure how the code or bootstrapping in general would fit into statsmodels. All I've been using the code for has been to get confidence limits for a function applied to independent data.

Like @cgevans, I would be happy to help incorporating the code into statsmodels. In addition to using the code for calculating confidence limits of a statistics or calculating effect size and associated confidence limits when comparing two datasets, I also use it to calculate confidence bands of linear/non-linear regressions (bootstrapping of the residuals).

Owner

I appreciate if someone would work more systematically on getting bootstrap into statsmodels. (our current status is bits and pieces and examples)

Bootstrap (or resampling in general including permutation tests) is a huge topic, and both packages could be pretty directly included with bootstrapping for univariate statistics and for one or two sample tests.

I don't see yet how either package can be applied to multivariate statistics or used in support of the (regression) models. Vectorizing (the statistic is 1d with more than one element) wouldn't be very difficult, but I don't think I have seen much for bca or abc confidence intervals in the multivariate case.

based on this we can work on extensions (or create separate code for other use cases)

one comment:
https://github.com/cgevans/scikits-bootstrap/blob/master/scikits/bootstrap/bootstrap.py#L181
outsourcing this is good, but I think an iterator would be more efficient (in terms of memory)

about unit tests: Constantine has already started. It isn't clear to me how to verify some of the code, much is straightforward and regression tests should be enough, but more complicated methods like bca and abc could hide bugs.

My plan was to have pure bootstrap code (generic code and utilities, along with the code for Monte Carlo and Permutation tests) in statsmodels.resampling and application code in the corresponding directories. statsmodels.resampling is currently empty and the existing code is still spread out.

I'm busy with other things, but would be glad to review any pull requests.

Contributor
cdeil commented May 31, 2015

It would be great if the existing bootstrap functionality from those packages could be integrated into statsmodels (and updated to work with Python 3 if that's not already the case, see comment here).

Note: looks like @edschofield has started on implementing statsmodels.resampling in #2339.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment