-
-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Boost dependencies should be modular #545
Comments
Thanks for the request and tips about Boost. I didn't even know they'd done that---we've been using it since before it moved to GitHub and I haven't really paid attention to their version control software. Any recommendations for integrating with the submodule structure that we already have: CmdStan, RStan, PyStan depend on Stan depend on Math. Stan uses phoenix as part of the parser, for instance. Part of the reason we just tossed in all of Boost is that it's easier on our developers---we don't have to fiddle with new bits of Boost as we introduce them (we haven't used any of the non-header-only libs). In R, we depend on Boost through the BH package, which I don't believe is 100% of Boost. And I know Jiqiang used to have a way of cutting down Boost. It's surprising how big it is even after only including what you use---the libraries we use tend to make liberal use of other Boost libs. I'd like to hear from the repo managers: @betanalpha, @seantalts, @syclik, @bgoodri, @ariddell |
mkdir -p /tmp/boost_1.62.0
cd /path/to/stan_math
find stan -name \*\.\[ch]pp -exec bcp --scan --boost=lib/boost_1.62.0 '{}' /
tmp/boost_1.62.0/ \; &> /tmp/boost_1.62.0/bcp.log
cd /path/to/stan
find src -name \*\.\[ch]pp -exec bcp --scan --boost=/path/to/stan_math/lib/
boost_1.62.0 '{}' /tmp/boost_1.62.0/ \; &> /tmp/boost_1.62.0/bcp.log
…On Wed, May 3, 2017 at 2:41 PM, Bob Carpenter ***@***.***> wrote:
Thanks for the request and tips about Boost. I didn't even know they'd
done that---we've been using it since before it moved to GitHub and I
haven't really paid attention to their version control software.
Any recommendations for integrating with the submodule structure that we
already have: CmdStan, RStan, PyStan depend on Stan depend on Math. Stan
uses phoenix as part of the parser, for instance.
Part of the reason we just tossed in all of Boost is that it's easier on
our developers---we don't have to fiddle with new bits of Boost as we
introduce them (we haven't used any of the non-header-only libs).
In R, we depend on Boost through the BH package, which I don't believe is
100% of Boost. And I know Jiqiang used to have a way of cutting down Boost.
It's surprising how big it is even after only including what you use---the
libraries we use tend to make liberal use of other Boost libs.
I'd like to hear from the repo managers: @betanalpha
<https://github.com/betanalpha>, @seantalts <https://github.com/seantalts>,
@syclik <https://github.com/syclik>, @bgoodri <https://github.com/bgoodri>,
@ariddell <https://github.com/ariddell>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#545 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADOrqsvaR1gYsQQVo0Nf2FuclbMcxsXQks5r2MpugaJpZM4NPmfI>
.
|
@bob-carpenter that's an interesting point about the boost libs getting used in stan as well as math. I didn't realize that would be the case. My own use case must be a corner case that doesn't use that subset of functionality in stan. @bgoodri since you already ran that, do you mind telling me the /tmp/boost size vs the ./lib/boost size? |
I'm happy with culling parts of boost if it can be done reliably and automatically. @joincamp the binaries (or Python extension modules) that Stan generates can be rather big. For example, |
@ariddell I'm a few levels removed from this, so I'm trying to make sense of it. I'm using the project https://github.com/facebookincubator/prophet which depends on pystan. I'm trying to make this all fit into a deployable package to run on AWS lambda (Ephemeral disk capacity 512MB or uncompressed zip/jar size 250 MB, depending on the approach I take). There are some strict limitations on uncompressed disk usage, and when I investigated the site-packages directory in my python virtual environment here are the heavy hitters.
After some very inelegant hacking and slashing, I was able to trim down a workable (for me) version that has these sizes:
...Mostly through the killing of modules in the boost library. (using the incorrect assumption that the boost library was only being used by math, and not downstream in stan) Since I was able to make my particular use case work, I was hoping to go about this the right way and see if there was a more general solution. It appears that my assumptions being incorrect may invalidate that though. I might just be in the wild west of monkey-patching. |
Boost is certainly a heavy hitter.
If prophet uses a fixed model you could probably throw away all the source (and _api). |
That should be perfect. It uses pkl files for the models. |
On Wed, May 3, 2017 at 3:32 PM, joincamp ***@***.***> wrote:
@bgoodri <https://github.com/bgoodri> since you already ran that, do you
mind telling me the /tmp/boost size vs the ./lib/boost size?
66M vs. 139M
|
On May 3, 2017, at 8:09 PM, bgoodri ***@***.***> wrote:
On Wed, May 3, 2017 at 3:32 PM, joincamp ***@***.***> wrote:
> @bgoodri <https://github.com/bgoodri> since you already ran that, do you
> mind telling me the /tmp/boost size vs the ./lib/boost size?
66M vs. 139M
70MB is better than 140MB, but still seems ridiculous given how few functions in
Boost we call! It's a nest of package-level includes.
- Bob
|
We don't need googletest at runtime, just for dev.
- Bob
… On May 3, 2017, at 8:09 PM, bgoodri ***@***.***> wrote:
On Wed, May 3, 2017 at 3:32 PM, joincamp ***@***.***> wrote:
> @bgoodri <https://github.com/bgoodri> since you already ran that, do you
> mind telling me the /tmp/boost size vs the ./lib/boost size?
66M vs. 139M
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
gtest and cpplint are already excluded from the PyStan distribution. |
Looks like 10M of that is logs too, the actual boost dir is only 55M.
|
For the record, here's what PyStan trims during release. Suggestions for additions would be welcome. If there are parts of boost we're certain are not going to be used, we can, at least, stop distributing a copy.
(from MANIFEST.in) |
@joincamp, I have exactly the same issue. Do you mind to share a list of modules that you removed from the boost library or any other libraries. It seems you somehow managed to reduce the size of |
@depet I ran through some general cleaning procedures that I found elsewhere. It basically amounted to removing .pyc files, documentation, and tests. Here is a gist of the files in my distributable. Maybe you can diff it to yours to get an idea of what you might be able to remove. https://gist.github.com/joincamp/69f32ee84ef1eb9c1dfac2c5b4449739 |
Thanks a lot @joincamp - much appreciated. |
Closing this issue as the Boost lib has since been upgraded a bunch of times and we also prune it now to reduce the size. We could prune it further if this was desired. Please open another issue if that is the case. |
Summary:
The boost super-library is included, but only a few modules appear to be used. This super-library is very large, and makes it hard to deploy stan(pystan in my case) to architectures that have size limitations (AWS lambda in my case). Rather than using the full super-library, only the necessary modules should be included.
Description:
It appears that only the following boost modules are directly used. (Most likely need to use boostdep to determine the full subset of modules in use)
I don't do much in the way of c++, but I think this project should be using https://svn.boost.org/trac/boost/wiki/ModularBoost instead, or just manually embedding the subset of libraries used.
Reproducible Steps:
In my case, using a project that uses pystan (fbprophet), delete boost modules that are not found anywhere in stan math (e.g. phoenix). Regression test parent project.
Current Output:
When modules like phoenix are removed, there do not appear to be any regressions in down-the-line projects
Expected Output:
Much smaller distributable sizes.
Additional Information:
Maybe http://www.boost.org/doc/libs/master/tools/boostdep/doc/html/ can be used to ensure the right modules are included and http://www.boost.org/doc/libs/1_64_0/tools/bcp/doc/html/index.html to generate the distributable
Current Version:
v2.15.0
The text was updated successfully, but these errors were encountered: