Skip to content

Commit

Permalink
[SPARK-31701][R][SQL] Bump up the minimum Arrow version as 0.15.1 in …
Browse files Browse the repository at this point in the history
…SparkR

### What changes were proposed in this pull request?

This PR proposes to set the minimum Arrow version as 0.15.1 to be consistent with PySpark side at.

### Why are the changes needed?

It will reduce the maintenance overhead to match the Arrow versions, and minimize the supported range. SparkR Arrow optimization is experimental yet.

### Does this PR introduce _any_ user-facing change?

No, it's the change in unreleased branches only.

### How was this patch tested?

0.15.x was already tested at SPARK-29378, and we're testing the latest version of SparkR currently in AppVeyor. I already manually tested too.

Closes apache#28520 from HyukjinKwon/SPARK-31701.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
  • Loading branch information
HyukjinKwon authored and rshkv committed Jul 15, 2020
1 parent cff4c3f commit a659c98
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 10 deletions.
2 changes: 1 addition & 1 deletion R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Suggests:
testthat,
e1071,
survival,
arrow
arrow (>= 0.15.1)
Collate:
'schema.R'
'generics.R'
Expand Down
13 changes: 4 additions & 9 deletions docs/sparkr.md
Original file line number Diff line number Diff line change
Expand Up @@ -648,20 +648,15 @@ Apache Arrow is an in-memory columnar data format that is used in Spark to effic

## Ensure Arrow Installed

Arrow R library is available on CRAN as of [ARROW-3204](https://issues.apache.org/jira/browse/ARROW-3204). It can be installed as below.
Arrow R library is available on CRAN and it can be installed as below.

```bash
Rscript -e 'install.packages("arrow", repos="https://cloud.r-project.org/")'
```
Please refer [the official documentation of Apache Arrow](https://arrow.apache.org/docs/r/) for more detials.

If you need to install old versions, it should be installed directly from Github. You can use `remotes::install_github` as below.

```bash
Rscript -e 'remotes::install_github("apache/arrow@apache-arrow-0.12.1", subdir = "r")'
```

`apache-arrow-0.12.1` is a version tag that can be checked in [Arrow at Github](https://github.com/apache/arrow/releases). You must ensure that Arrow R package is installed and available on all cluster nodes.
The current supported minimum version is 0.12.1; however, this might change between the minor releases since Arrow optimization in SparkR is experimental.
Note that you must ensure that Arrow R package is installed and available on all cluster nodes.
The current supported minimum version is 0.15.1; however, this might change between the minor releases since Arrow optimization in SparkR is experimental.

## Enabling for Conversion to/from R DataFrame, `dapply` and `gapply`

Expand Down

0 comments on commit a659c98

Please sign in to comment.