Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary builds from RSPM blow up image size #340

Closed
benz0li opened this issue Jan 26, 2022 · 21 comments · Fixed by #467
Closed

Binary builds from RSPM blow up image size #340

benz0li opened this issue Jan 26, 2022 · 21 comments · Fixed by #467
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@benz0li
Copy link
Contributor

benz0li commented Jan 26, 2022

Reason: The dynamic libraries are not stripped.

rocker/tidyverse (Ubuntu 20.04, RSPM, binary install):

root@89f0a27ec269:/usr/local/lib/R/site-library/vroom/libs# du -h vroom.so 
24M	vroom.so
root@89f0a27ec269:/usr/local/lib/R/site-library/vroom/libs# file vroom.so
vroom.so: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, BuildID[sha1]=88a2384029c2331d1d8f91274db23fb30edb9668, with debug_info, not stripped

custom build (Debian 11, CRAN, source install):

root@bfc0aad5ae64:/usr/local/lib/R/site-library/vroom/libs# du -h vroom.so 
848K	vroom.so
root@bfc0aad5ae64:/usr/local/lib/R/site-library/vroom/libs# file vroom.so
vroom.so: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, BuildID[sha1]=f6d17acb2cf4f8e75af5e97dcb7a17297ff5f9ee, stripped

Total size of /usr/local/lib/R/site-library:

  • rocker/tidyverse: 732 MB
  • custom build: 198 MB

@eddelbuettel Can you think of a reason why the dynamic libraries of the RSPM binaries are not stripped?

@eitsupi
Copy link
Member

eitsupi commented Jan 26, 2022

FYI, since rocker/tidyverse:devel has source installations, we can see this in the comparison between rocker/tidyverse:latest and rocker/tidyverse:devel.

@eddelbuettel
Copy link
Member

@benz0li No idea. I am not all that involved with rspm and use it little. You could try to strip post-installation... Reason may be a simple -g default for system-wide compiler settings.

@benz0li
Copy link
Contributor Author

benz0li commented Jan 27, 2022

@eddelbuettel Thanks for your feedback. Stripping the dynamic libraries post-installation reduces the size of /usr/local/lib/R/site-library as expected.

rocker/tidyverse (Ubuntu 20.04, RSPM, binary install):

root@769bd0aff019:/# du -sh /usr/local/lib/R/site-library/
732M	/usr/local/lib/R/site-library/
root@769bd0aff019:/# find /usr/local/lib/R/site-library/*/libs/ -name \*.so | xargs strip -s -p
root@769bd0aff019:/# du -sh /usr/local/lib/R/site-library/
198M	/usr/local/lib/R/site-library/

@eitsupi If you include stripping [the dynamic libraries] in your Dockerfiles/installation-scripts, it would reduce the image size significantly.

@benz0li
Copy link
Contributor Author

benz0li commented Jan 27, 2022

I will contact RStudio and ask if there is a particular reason why the dynamic libraries of the RSPM binaries are not stripped.

@cboettig
Copy link
Member

🎉 Thanks for the report, the suggestion to strip the shared libs on the images here and the follow up with RStudio :-)

@benz0li
Copy link
Contributor Author

benz0li commented Jan 28, 2022

@jjallaire made sure this information gets to the right person.

@bdeitte
Copy link

bdeitte commented Jan 28, 2022

I'm not sure on the reason here, and I'm checking with the team. You're also welcome to tag myself or @tylfin in the future and we can try to answer public package manager questions that come up.

@eddelbuettel
Copy link
Member

eddelbuettel commented Jan 28, 2022

It may be a side-effect of how 'default' compilation on Linux is setup. Debian, for example, always builds with -g ("devs want debug symbols") but then puts them into optional -dbg packages you can load on demand (which is actually slick). But if then do no strip the symbols you end up with bloat :-/ I blogged about that a few times:

but lets just say that I didn't manage to get R CMD INSTALL changed to use stripping by default 😿

So my suggestion above to do it locally (after RSPMs are unpacked) may be best for Rocker users; you guys could and should look into stripping at source. It will make things smaller and hence faster to ship across the wire and install.

@bdeitte
Copy link

bdeitte commented Jan 29, 2022

This has started an interesting conversation here, so thanks for bringing this up. The thoughts here are that this wasn't originally a conscious choice, that we just run R's package install command with mostly default flags/options. @glin found the commend about --strip interesting but also noted the R extensions manual about CRAN prohibiting the stripping of debug symbols:

1.6 Writing portable packages
The strip utility is platform-specific (and CRAN prohibits removing debug symbols). For example the options --strip-debug and --strip-unneeded of the GNU version are not supported on macOS nor Solaris83: the POSIX standard for strip does not mention any options, and what calling it without options does is platform-dependent. Stripping a .so file could even prevent it being dynamically loaded into R on an untested platform.

Our suggestion right now would be to have you strip this as part of building your images, but we're still discussing it.

@benz0li
Copy link
Contributor Author

benz0li commented Jan 29, 2022

Thanks for all the additional information. Very interesting, indeed.

The following has no effect on package installations from RSPM when building the rocker images:

## ensure installation is stripped
Sys.setenv("_R_SHLIB_STRIP_"="true")

Regarding 1.6 Writing portable packages: As RSPM serves pre-compiled binaries that are specific per Linux distribution + version, is there any harm in stripping the .so files then?

@eddelbuettel
Copy link
Member

The following has no effect on package installations

Yes. It only would if you compiled and linked so that a strip step could be added at linking. But by using RSPM you are relying on pre-made binaries by design so no extra link step. You would need something like (riffing here, untested, but I have done this ...) strip /usr/local/lib/R/site-library/*/libs/*.so (and same for /usr/lib/....).

Maybe at this point you could test the is there any harm question via a locally-modified container? There shouldn't ...

@benz0li
Copy link
Contributor Author

benz0li commented Jan 30, 2022

Not actively using rocker images myself as I have moved to Jupyter (JupyterHub with JupyterLab + code-server) for some time now.
ℹ️ Multi-arch (linux/amd64, linux/arm64/v8) Debian-based images with stripped source-installations of Git, R and add-on packages from CRAN — no problems so far.

@benz0li
Copy link
Contributor Author

benz0li commented Mar 1, 2022

Closing due to inactivity.

@benz0li benz0li closed this as completed Mar 1, 2022
@eitsupi
Copy link
Member

eitsupi commented Mar 1, 2022

I was hoping to hear from @bdeitte about his future plans rather than strip every installation in this repository now.

@eitsupi eitsupi reopened this Mar 1, 2022
@benz0li
Copy link
Contributor Author

benz0li commented Mar 8, 2022

IMHO 1.6 Writing portable packages does not apply for RSPM.

RSPM isn't CRAN and serves pre-compiled binaries that are specific per Linux distribution + version.

@benz0li
Copy link
Contributor Author

benz0li commented Apr 1, 2022

Our suggestion right now would be to have you strip this as part of building your images, but we're still discussing it.

@bdeitte Seems to be quite a lengthy discussion. Have you come to a conclusion yet?

@eitsupi eitsupi added enhancement New feature or request help wanted Extra attention is needed labels Apr 2, 2022
@bdeitte
Copy link

bdeitte commented Apr 27, 2022

Hi @benz0li the key part of the conversation was above. We asked some other groups on their thoughts here, but there wasn't any conclusion to it. We haven't made any changes here, and I've now checked with the team to make sure others have seen the replies in here.

@benz0li
Copy link
Contributor Author

benz0li commented May 4, 2022

Then stripping post-installation it remains.

@benz0li benz0li closed this as completed May 4, 2022
@eitsupi
Copy link
Member

eitsupi commented May 4, 2022

Please keep it open as I will do that after #441 is merged.

@glin
Copy link

glin commented Nov 2, 2022

Hey all, one more update from RSPM here - we're now stripping debug symbols from binary packages by setting _R_SHLIB_STRIP_=true, so Linux binaries for R >= 3.6 will be stripped. This affects any new packages updated since November 1, 2022, and all existing packages were left as-is without stripped symbols.

Since existing packages were left as-is, we still recommend leaving in the manual post-install strip step, but the latest CRAN packages will eventually get smaller by default as packages update over time. And any newly added distros or R versions from now on will have all their binaries stripped, including older packages down to 2017.

Sorry for taking so long, and the hold up came down to being unsure of whether any users would miss the debug symbols in the binaries. And then thinking of ways to provide both stripped and un-stripped versions of binaries, which makes things more complicated. In the end, we were convinced that users probably wouldn't miss the debug symbols, and would likely have to compile from source anyway for debugging. Or even if so, all the benefits from significantly smaller binaries (bandwidth, performance, storage costs, etc.) outweigh those niche debugging cases. Thanks for bringing up the issue originally.

@cboettig
Copy link
Member

cboettig commented Nov 4, 2022

This is wonderful. Thanks @glin for the great work you do and for taking the time to share the update with us. We really appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
6 participants