Connecting R Studio to Sparkling Water
If you have connected to H2O from RStudio before, the process for connecting to Sparkling Water from RStudio is very similar.
1. Start Sparkling Shell
export SPARK_HOME="/path/to/spark/installation" export MASTER="local-cluster[3,2,1024]" bin/sparkling-shell
To view the Sparkling Shell status, go to http://localhost:4040/.
import org.apache.spark.h2o._ val h2oContext = new H2OContext(sc).start() import h2oContext._
The last line of the output (appearing above the
scala command prompt in the screenshot above) identifies the IP and port number of the H2O cluster. Copy these numbers to use in the next step.
h2o.init() from RStudio
In RStudio, use the IP and port number specified in the output from the previous step in the
4. Create a Spark DataFrame
The Spark DataFrame can then be published as an H2OFrame and accessed in R.
In Sparkling Shell:
val df = sc.parallelize(1 to 100).toDF // creates Spark DataFrame val hf = h2oContext.asH2OFrame(df) // publishes DataFrame as H2O's Frame
In the output in the screenshot above, the second line below the highlighted line displays the name of the published frame (
5. List Frames in RStudio
View all frames available in RStudio using
The frame can now be used in RStudio (for example, as shown in the screenshot below, using