forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'master' into gd-convergence-tolerance
- Loading branch information
Showing
2,648 changed files
with
259,685 additions
and
54,755 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,16 @@ | ||
## Contributing to Spark | ||
|
||
Contributions via GitHub pull requests are gladly accepted from their original | ||
author. Along with any pull requests, please state that the contribution is | ||
your original work and that you license the work to the project under the | ||
project's open source license. Whether or not you state this explicitly, by | ||
submitting any copyrighted material via pull request, email, or other means | ||
you agree to license the material under the project's open source license and | ||
warrant that you have the legal authority to do so. | ||
*Before opening a pull request*, review the | ||
[Contributing to Spark wiki](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark). | ||
It lists steps that are required before creating a PR. In particular, consider: | ||
|
||
- Is the change important and ready enough to ask the community to spend time reviewing? | ||
- Have you searched for existing, related JIRAs and pull requests? | ||
- Is this a new feature that can stand alone as a package on http://spark-packages.org ? | ||
- Is the change being proposed clearly explained and motivated? | ||
|
||
Please see the [Contributing to Spark wiki page](https://cwiki.apache.org/SPARK/Contributing+to+Spark) | ||
for more information. | ||
When you contribute code, you affirm that the contribution is your original work and that you | ||
license the work to the project under the project's open source license. Whether or not you | ||
state this explicitly, by submitting any copyrighted material via pull request, email, or | ||
other means you agree to license the material under the project's open source license and | ||
warrant that you have the legal authority to do so. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
*.o | ||
*.so | ||
*.Rd | ||
lib | ||
pkg/man | ||
pkg/html |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# SparkR Documentation | ||
|
||
SparkR documentation is generated using in-source comments annotated using using | ||
`roxygen2`. After making changes to the documentation, to generate man pages, | ||
you can run the following from an R console in the SparkR home directory | ||
|
||
library(devtools) | ||
devtools::document(pkg="./pkg", roclets=c("rd")) | ||
|
||
You can verify if your changes are good by running | ||
|
||
R CMD check pkg/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
# R on Spark | ||
|
||
SparkR is an R package that provides a light-weight frontend to use Spark from R. | ||
|
||
### SparkR development | ||
|
||
#### Build Spark | ||
|
||
Build Spark with [Maven](http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn) and include the `-PsparkR` profile to build the R package. For example to use the default Hadoop versions you can run | ||
``` | ||
build/mvn -DskipTests -Psparkr package | ||
``` | ||
|
||
#### Running sparkR | ||
|
||
You can start using SparkR by launching the SparkR shell with | ||
|
||
./bin/sparkR | ||
|
||
The `sparkR` script automatically creates a SparkContext with Spark by default in | ||
local mode. To specify the Spark master of a cluster for the automatically created | ||
SparkContext, you can run | ||
|
||
./bin/sparkR --master "local[2]" | ||
|
||
To set other options like driver memory, executor memory etc. you can pass in the [spark-submit](http://spark.apache.org/docs/latest/submitting-applications.html) arguments to `./bin/sparkR` | ||
|
||
#### Using SparkR from RStudio | ||
|
||
If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example | ||
``` | ||
# Set this to where Spark is installed | ||
Sys.setenv(SPARK_HOME="/Users/shivaram/spark") | ||
# This line loads SparkR from the installed directory | ||
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths())) | ||
library(SparkR) | ||
sc <- sparkR.init(master="local") | ||
``` | ||
|
||
#### Making changes to SparkR | ||
|
||
The [instructions](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) for making contributions to Spark also apply to SparkR. | ||
If you only make R file changes (i.e. no Scala changes) then you can just re-install the R package using `R/install-dev.sh` and test your changes. | ||
Once you have made your changes, please include unit tests for them and run existing unit tests using the `run-tests.sh` script as described below. | ||
|
||
#### Generating documentation | ||
|
||
The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script. | ||
|
||
### Examples, Unit tests | ||
|
||
SparkR comes with several sample programs in the `examples/src/main/r` directory. | ||
To run one of them, use `./bin/sparkR <filename> <args>`. For example: | ||
|
||
./bin/sparkR examples/src/main/r/dataframe.R | ||
|
||
You can also run the unit-tests for SparkR by running (you need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first): | ||
|
||
R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")' | ||
./R/run-tests.sh | ||
|
||
### Running on YARN | ||
The `./bin/spark-submit` and `./bin/sparkR` can also be used to submit jobs to YARN clusters. You will need to set YARN conf dir before doing so. For example on CDH you can run | ||
``` | ||
export YARN_CONF_DIR=/etc/hadoop/conf | ||
./bin/spark-submit --master yarn examples/src/main/r/dataframe.R | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
## Building SparkR on Windows | ||
|
||
To build SparkR on Windows, the following steps are required | ||
|
||
1. Install R (>= 3.1) and [Rtools](http://cran.r-project.org/bin/windows/Rtools/). Make sure to | ||
include Rtools and R in `PATH`. | ||
2. Install | ||
[JDK7](http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html) and set | ||
`JAVA_HOME` in the system environment variables. | ||
3. Download and install [Maven](http://maven.apache.org/download.html). Also include the `bin` | ||
directory in Maven in `PATH`. | ||
4. Set `MAVEN_OPTS` as described in [Building Spark](http://spark.apache.org/docs/latest/building-spark.html). | ||
5. Open a command shell (`cmd`) in the Spark directory and run `mvn -DskipTests -Psparkr package` |
Oops, something went wrong.