Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make arrow dependency optional #537

Closed
colearendt opened this issue Oct 14, 2021 · 9 comments
Closed

Make arrow dependency optional #537

colearendt opened this issue Oct 14, 2021 · 9 comments

Comments

@colearendt
Copy link
Member

RHEL 7 systems (which are used often by our customers) do not have a new enough gcc to compile c++11 code. This creates a problem with installing the arrow package, and workarounds often (1) involve IT or (2) are disallowed by policy.

As a result, making the arrow dependency optional would be beneficial for customers.

@jonkeane
Copy link
Contributor

I would be curious if you have any specific examples of issues you've experienced on RHEL 7 (especially recently). We run CentOS 7 (without any additional devtools installation [1]) as part of our CI process on every commit/PR in Arrow and expect that installing Arrow should just work [2]. If it doesn't, we would very very much love to hear about it and help resolve that (feel free to open a jira, or I'm happy to have logs emailed to me if there's need for privacy).

Here's a recent run of that on our CI: https://github.com/apache/arrow/runs/3997101335?check_suite_focus=true
GHA action

[1] – We also test with devtools installed to make sure that upgrading gcc also works. Here's a recent build of that nightly CI job: https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=14451&view=results
[2] – There are two features which are turned off on these builds: S3 support and the mimalloc memory allocator. Both of those require gcc >= 4.9.

@hadley
Copy link
Member

hadley commented Nov 16, 2021

Ok, I'm going to leave this as is then. @colearendt please let me know if you encounter any customers that this affects.

@hadley hadley closed this as completed Nov 16, 2021
@colearendt
Copy link
Member Author

IIRC this was a customer that generated this issue. I can find out who if that would be helpful. They were having trouble with pins, so we suggested moving forwards to the new version, and then installing new pins failed on the arrow package.

Apologies for missing the message @jonkeane - I'll see if I can dig up any of this old info

@colearendt
Copy link
Member Author

colearendt commented Nov 22, 2021

Updated slack thread with info from the customer. They installed devtoolset-10 and then arrow 3.0.0 and that resolved their issue. I wasn't able to install arrow into the default rstudio/r-base:4.0.3-centos7 image, but that may be something else. I'm going to move on to other things, but just wanted to record this info in case I end up back here.

[ 61%] Building CXX object src/arrow/CMakeFiles/arrow_objlib.dir/Unity/unity_21_cxx.cxx.o
g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
make[2]: *** [src/arrow/dataset/CMakeFiles/arrow_dataset_objlib.dir/Unity/unity_1_cxx.cxx.o] Error 4
make[1]: *** [src/arrow/dataset/CMakeFiles/arrow_dataset_objlib.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 63%] Building CXX object src/arrow/CMakeFiles/arrow_objlib.dir/Unity/unity_20_cxx.cxx.o
In file included from /tmp/RtmpXxLJZh/file5603479b4169/src/arrow/CMakeFiles/arrow_objlib.dir/Unity/unity_20_cxx.cxx:8:0:
/tmp/RtmpvsMnw8/R.INSTALL55d9b9ee75b/arrow/tools/cpp/src/arrow/filesystem/mockfs.cc:264:23: warning: ‘arrow::fs::internal::MockFileSystem::Impl’ has a field ‘arrow::fs::internal::MockFileSystem::Impl::root’ whose type uses the anonymous namespace [enabled by default]
 class MockFileSystem::Impl {
                       ^
g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
make[2]: *** [src/arrow/CMakeFiles/arrow_objlib.dir/Unity/unity_21_cxx.cxx.o] Error 4
make[2]: *** Waiting for unfinished jobs....
cc1plus: warning: unrecognized command line option "-Wno-subobject-linkage" [enabled by default]
make[1]: *** [src/arrow/CMakeFiles/arrow_objlib.dir/all] Error 2
gmake: *** [all] Error 2
**** Error building Arrow C++.
------------------------- NOTE ---------------------------
There was an issue preparing the Arrow C++ libraries.
See https://arrow.apache.org/docs/r/articles/install.html
---------------------------------------------------------
ERROR: configuration failed for package ‘arrow’
* removing ‘/opt/R/4.0.3/lib/R/library/arrow’

@jonkeane
Copy link
Contributor

Aaah, we've seen this before, g++: internal compiler error: Killed (program cc1plus) is indicating the compilation process was killed — there are other possibilities, but almost every time I've seen it is because of running in a memory constrained environment and OOMing during the build. How much memory did you have available where this happened?

There are a bunch of factors that contribute to how much RAM is needed during compilation (which features are being compiled, which dependencies need to be compiled, how much parallelism is enabled). However, we've found the biggest culprit of memory in our build process was building with unity enabled (which is exactly where this failed), so we've disabled that by default (starting with the 7.0.0 release) to reduce that chances of people running into this. Unity is supposed to speed up compilation at the expense of requiring more RAM, but in cases like docker containers with relatively limited memory, that ends up in situations like this. We are also working on adding ram requirements to our docs: apache/arrow#11205 to make this a bit clearer. Hopefully these two together will resolve most cases of this, and when it doesn't there's clearer guidance of minimums.

@MarkEdmondson1234
Copy link

I'm seeing very long build times (60mins+) when trying to install arrow for the pins library within a Docker container via install.packages() on my CI, Cloud Buiild. Is there any advice on how to speed it up aside booting a bigger machine, perhaps using a different FROM to build within?

I've looked at the apache-dev/arrow images but it seems some work to install R, tidyverse etc on top. I'm looking for an image that has arrow/pins ideally.

@jonkeane
Copy link
Contributor

jonkeane commented Jan 8, 2022

The quickest and easiest way to install Arrow is to do one of the following:

  • Install it as a binary R package from RStudio Package manager
  • Set the environment variable NOT_CRAN=TRUE before installing. This sets up the installation process to download a binary of Arrow as part of the install process which will be much much quicker.

There's more information in our documentation. I've linked to our nightly docs because we've recently improved them and they are quite a bit clearer about this — we're in the process of releasing right now so those will be on the main page soon enough!

There are a (relatively broad!) set of OSes that RSPM/we support — if you're finding that the image you're using isn't supported, please let us know and we'll see what we can do.

One final note about the apache-dev/arrow images, yeah, those are mostly for our CI process and aren't really designed to be used downstream necessarily. As far as I know, that's not a principled decisions, and the Arrow community might be interested in extending those to be (more) useable like this. If that's something you're interested in (especially helping us out do that!) I would recommend either opening an issue or sending a message to the dev mailing list for discussing this.

@MarkEdmondson1234
Copy link

Great thanks will take a look. For some reason the RStudio binaries weren't being used even though I remember since R 4.0 rocker images did default to that, but will look at those options. In general having an R arrow Docker available will be nice to have.

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Aug 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants