Skip to content

Commit

Permalink
ARROW-17849: [R][Docs] Document changes due to C++17 for centos-7 use…
Browse files Browse the repository at this point in the history
…rs (apache#14440)

* `configure` checks that `R CMD config CXX17` is defined and exits early if not
* README and install.Rmd discuss system dependencies; install.Rmd includes a bash script to run once to configure your CentOS 7 machine to install arrow. Confirmed that this works manually via docker.
* Updated `r_docker_configure.sh` to use that script logic, so our CI should confirm too.
* Various cleanups in nixlibs.R: remove checks for gcc 4.8/4.9 all over, updated system requirements check to handle case of devtoolset-8 but no openssl/curl
* Unrelated purge of windows build script handling for rtools35

Lead-authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Co-authored-by: Jacob Wujciak-Jens <jacob@wujciak.de>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
  • Loading branch information
nealrichardson and assignUser committed Oct 18, 2022
1 parent 5f5ea7b commit cd33544
Show file tree
Hide file tree
Showing 11 changed files with 156 additions and 191 deletions.
1 change: 0 additions & 1 deletion .github/workflows/r.yml
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,6 @@ jobs:
- name: Build Arrow C++
shell: bash
env:
RTOOLS_VERSION: ${{ matrix.config.rtools }}
MINGW_ARCH: ${{ matrix.config.arch }}
run: ci/scripts/r_windows_build.sh
- name: Rename libarrow.zip
Expand Down
15 changes: 11 additions & 4 deletions ci/scripts/r_docker_configure.sh
Original file line number Diff line number Diff line change
Expand Up @@ -72,11 +72,18 @@ fi
if [[ -n "$DEVTOOLSET_VERSION" ]]; then
$PACKAGE_MANAGER install -y centos-release-scl
$PACKAGE_MANAGER install -y "devtoolset-$DEVTOOLSET_VERSION"

# Only add make var if not set
if ! grep -Fq "CXX17=" ~/.R/Makevars &> /dev/null; then

# Enable devtoolset here so that `which gcc` finds the right compiler below
source /opt/rh/devtoolset-${DEVTOOLSET_VERSION}/enable

# Build images which require the devtoolset don't have CXX17 variables
# set as the system compiler doesn't support C++17
if [ ! "`{R_BIN} CMD config CXX17`" ]; then
mkdir -p ~/.R
echo "CXX17=g++ -std=gnu++17 -g -O2 -fpic" >> ~/.R/Makevars
echo "CC = $(which gcc) -fPIC" >> ~/.R/Makevars
echo "CXX17 = $(which g++) -fPIC" >> ~/.R/Makevars
echo "CXX17STD = -std=c++17" >> ~/.R/Makevars
echo "CXX17FLAGS = ${CXX11FLAGS}" >> ~/.R/Makevars
fi
fi

Expand Down
13 changes: 0 additions & 13 deletions ci/scripts/r_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,19 +26,6 @@ pushd ${source_dir}

printenv

if [[ -n "$DEVTOOLSET_VERSION" ]]; then
# enable the devtoolset version to use it
source /opt/rh/devtoolset-$DEVTOOLSET_VERSION/enable

# Build images which require the devtoolset don't have CXX17 variables
# set as the system compiler doesn't support C++17
mkdir -p ~/.R
echo "CC = $(which gcc) -fPIC" >> ~/.R/Makevars
echo "CXX17 = $(which g++) -fPIC" >> ~/.R/Makevars
echo "CXX17STD = -std=c++17" >> ~/.R/Makevars
echo "CXX17FLAGS = ${CXX11FLAGS}" >> ~/.R/Makevars
fi

# Run the nixlibs.R test suite, which is not included in the installed package
${R_BIN} -e 'setwd("tools"); testthat::test_dir(".")'

Expand Down
44 changes: 13 additions & 31 deletions ci/scripts/r_windows_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,26 +23,15 @@ set -ex
# Make sure it is absolute and exported
export ARROW_HOME="$(cd "${ARROW_HOME}" && pwd)"

if [ "$RTOOLS_VERSION" = "35" ]; then
# Use rtools-backports if building with rtools35
curl https://raw.githubusercontent.com/r-windows/rtools-backports/master/pacman.conf > /etc/pacman.conf
pacman --noconfirm -Syy
# lib-4.9.3 is for libraries compiled with gcc 4.9 (Rtools 3.5)
RWINLIB_LIB_DIR="lib-4.9.3"
# This is the default (will build for each arch) but we can set up CI to
# do these in parallel
: ${MINGW_ARCH:="mingw32 mingw64"}
else
# Uncomment L38-41 if you're testing a new rtools dependency that hasn't yet sync'd to CRAN
# curl https://raw.githubusercontent.com/r-windows/rtools-packages/master/pacman.conf > /etc/pacman.conf
# curl -OSsl "http://repo.msys2.org/msys/x86_64/msys2-keyring-r21.b39fb11-1-any.pkg.tar.xz"
# pacman -U --noconfirm msys2-keyring-r21.b39fb11-1-any.pkg.tar.xz && rm msys2-keyring-r21.b39fb11-1-any.pkg.tar.xz
# pacman --noconfirm -Scc

pacman --noconfirm -Syy
RWINLIB_LIB_DIR="lib"
: ${MINGW_ARCH:="mingw32 mingw64 ucrt64"}
fi
# Uncomment L38-41 if you're testing a new rtools dependency that hasn't yet sync'd to CRAN
# curl https://raw.githubusercontent.com/r-windows/rtools-packages/master/pacman.conf > /etc/pacman.conf
# curl -OSsl "http://repo.msys2.org/msys/x86_64/msys2-keyring-r21.b39fb11-1-any.pkg.tar.xz"
# pacman -U --noconfirm msys2-keyring-r21.b39fb11-1-any.pkg.tar.xz && rm msys2-keyring-r21.b39fb11-1-any.pkg.tar.xz
# pacman --noconfirm -Scc

pacman --noconfirm -Syy
RWINLIB_LIB_DIR="lib"
: ${MINGW_ARCH:="mingw32 mingw64 ucrt64"}

export MINGW_ARCH

Expand Down Expand Up @@ -78,26 +67,19 @@ fi
if [ -d mingw64/lib/ ]; then
ls $MSYS_LIB_DIR/mingw64/lib/
# Make the rest of the directory structure
# lib-4.9.3 is for libraries compiled with gcc 4.9 (Rtools 3.5)
mkdir -p $DST_DIR/${RWINLIB_LIB_DIR}/x64
# lib is for the new gcc 8 toolchain (Rtools 4.0)
mkdir -p $DST_DIR/lib/x64
# Move the 64-bit versions of libarrow into the expected location
mv mingw64/lib/*.a $DST_DIR/${RWINLIB_LIB_DIR}/x64
# These may be from https://dl.bintray.com/rtools/backports/
cp $MSYS_LIB_DIR/mingw64/lib/lib{thrift,snappy}.a $DST_DIR/${RWINLIB_LIB_DIR}/x64
mv mingw64/lib/*.a $DST_DIR/lib/x64
# These are from https://dl.bintray.com/rtools/mingw{32,64}/
cp $MSYS_LIB_DIR/mingw64/lib/lib{zstd,lz4,brotli*,bz2,crypto,curl,ss*,utf8proc,re2,aws*}.a $DST_DIR/lib/x64
cp $MSYS_LIB_DIR/mingw64/lib/lib{thrift,snappy,zstd,lz4,brotli*,bz2,crypto,curl,ss*,utf8proc,re2,aws*}.a $DST_DIR/lib/x64
fi

# Same for the 32-bit versions
if [ -d mingw32/lib/ ]; then
ls $MSYS_LIB_DIR/mingw32/lib/
mkdir -p $DST_DIR/${RWINLIB_LIB_DIR}/i386
mkdir -p $DST_DIR/lib/i386
mv mingw32/lib/*.a $DST_DIR/${RWINLIB_LIB_DIR}/i386
cp $MSYS_LIB_DIR/mingw32/lib/lib{thrift,snappy}.a $DST_DIR/${RWINLIB_LIB_DIR}/i386
cp $MSYS_LIB_DIR/mingw32/lib/lib{zstd,lz4,brotli*,bz2,crypto,curl,ss*,utf8proc,re2,aws*}.a $DST_DIR/lib/i386
mv mingw32/lib/*.a $DST_DIR/lib/i386
cp $MSYS_LIB_DIR/mingw32/lib/lib{thrift,snappy,zstd,lz4,brotli*,bz2,crypto,curl,ss*,utf8proc,re2,aws*}.a $DST_DIR/lib/i386
fi

# Do the same also for ucrt64
Expand Down
13 changes: 4 additions & 9 deletions dev/tasks/r/github.packages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ jobs:
rig default {{ '${{ matrix.r_version.r }}' }}$rig_arch
rig system setup-user-lib
rig system add-pak
rig system add-pak
{{ macros.github_setup_local_r_repo(false, true)|indent }}
- name: Prepare Dependency Installation

Expand Down Expand Up @@ -275,18 +275,13 @@ jobs:
ARROW_R_DEV: "TRUE"
LIBARROW_BUILD: "FALSE"
LIBARROW_BINARY: {{ '${{ matrix.config.libarrow_binary }}' }}
DEVTOOLSET_VERSION: {{ '${{ matrix.config.devtoolset }}' }}
shell: bash
run: |
if [[ "$DEVTOOLSET_VERSION" -gt 0 ]]; then
# enable the devtoolset version to use it
source /opt/rh/devtoolset-$DEVTOOLSET_VERSION/enable
fi
Rscript -e '
{{ macros.github_test_r_src_pkg()|indent(8) }}
'
- name: Upload binary artifact
if: matrix.config.devtoolset
if: matrix.config.devtoolset
uses: actions/upload-artifact@v3
with:
name: r-pkg_centos7
Expand All @@ -307,11 +302,11 @@ jobs:
pkg <- pkg[[1]]
warning("Multiple packages found! Using first one.")
}
# Install dependencies from RSPM
install.packages("arrow", repos = "https://packagemanager.rstudio.com/all/__linux__/centos7/latest")
remove.packages("arrow")
install.packages(pkg)
library(arrow)
read_parquet(system.file("v0.7.1.parquet", package = "arrow"))
Expand Down
23 changes: 16 additions & 7 deletions r/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@ access to the Arrow C++ library API and higher-level access through a
efficiency** (`read_csv_arrow()`, `read_json_arrow()`)
- Write CSV files (`write_csv_arrow()`)
- Manipulate and analyze Arrow data with **`dplyr` verbs**
- Read and write files in **Amazon S3** buckets with no additional
function calls
- Read and write files in **Amazon S3** and **Google Cloud Storage**
buckets with no additional function calls
- Exercise **fine control over column types** for seamless
interoperability with databases and data warehouse systems
- Use **compression codecs** including Snappy, gzip, Brotli,
Expand Down Expand Up @@ -64,9 +64,18 @@ additional system dependencies. For macOS and Windows, CRAN hosts binary
packages that contain the Arrow C++ library. On Linux, source package
installation will also build necessary C++ dependencies. For a faster,
more complete installation, set the environment variable
`NOT_CRAN=true`. See `vignette("install", package = "arrow")` for
details. Note that version 9.0.0 was the last version to support
R 3.6 and lower on Windows.
`NOT_CRAN=true`. See `vignette("install", package = "arrow")` for details.

As of version 10.0.0, `arrow` requires C++17 to build. This means that:

* On Windows, you need `R >= 4.0`. Version 9.0.0 was the last version to support
R 3.6.
* On CentOS 7, you can build the latest version of `arrow`,
but you first need to install a newer compiler than the default system compiler,
gcc 4.8. See `vignette("install", package = "arrow")` for guidance.
Note that you only need the newer compiler to build `arrow`:
installing a binary package, as from RStudio Package Manager,
or loading a package you've already installed works fine with the system defaults.

### Installing a development version

Expand Down Expand Up @@ -134,7 +143,7 @@ returns an R `data.frame`. To return an Arrow `Table`, set argument
- `read_json_arrow()`: read a JSON data file

For writing data to single files, the `arrow` package provides the
functions `write_parquet()`, `write_feather()`, and `write_csv_arrow()`.
functions `write_parquet()`, `write_feather()`, and `write_csv_arrow()`.
These can be used with R `data.frame` and Arrow `Table` objects.

For example, let’s write the Star Wars characters data that’s included
Expand Down Expand Up @@ -266,7 +275,7 @@ sw %>%
```

Additionally, equality joins (e.g. `left_join()`, `inner_join()`) are supported
for joining multiple tables.
for joining multiple tables.

```r
jedi <- data.frame(
Expand Down
10 changes: 9 additions & 1 deletion r/configure
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,14 @@ if [ "$ARROW_R_DEV" = "true" ] && [ -f "data-raw/codegen.R" ]; then
${R_HOME}/bin/Rscript data-raw/codegen.R
fi

if [ ! "`${R_HOME}/bin/R CMD config CXX17`" ]; then
echo "------------------------- NOTE ---------------------------"
echo "Cannot install arrow: a C++17 compiler is required."
echo "See https://arrow.apache.org/docs/r/articles/install.html"
echo "---------------------------------------------------------"
exit 1
fi

if [ -f "tools/apache-arrow.rb" ]; then
# If you want to use a local apache-arrow.rb formula, do
# $ cp ../dev/tasks/homebrew-formulae/autobrew/apache-arrow.rb tools/apache-arrow.rb
Expand Down Expand Up @@ -177,7 +185,7 @@ else
# Assume nixlibs.R has handled and messaged about its failure already
#
# TODO: what about non-bundled deps?
# Set CDPATH locally to prevent interference from global CDPATH (if set)
# Set CDPATH locally to prevent interference from global CDPATH (if set)
BUNDLED_LIBS=`CDPATH=''; cd $LIB_DIR && ls *.a`
BUNDLED_LIBS=`echo "$BUNDLED_LIBS" | sed -e "s/\\.a lib/ -l/g" | sed -e "s/\\.a$//" | sed -e "s/^lib/-l/" | tr '\n' ' ' | sed -e "s/ $//"`
PKG_DIRS="-L`pwd`/$LIB_DIR"
Expand Down
Loading

0 comments on commit cd33544

Please sign in to comment.