Sparkling Water Examples

Available Demos And Applications

Example	Description
CraigslistJobTitlesStreamingApp	Stream application - it predicts job category based on incoming job description.
CraigslistJobTitlesApp	Predict job category based on posted job description.
ChicagoCrimeApp	Builds a model predicting a probability of arrest for given crime in Chicago using data in chicago datasets.
CityBikeSharingDemo	Predicts occupancy of City bike stations in NYC.
HamOrSpamDemo	Shows Spam detector with Spark and H2O's algorithms.
ProstateDemo	Run H2O's K-means on prostate dataset.
DeepLearningDemo	Running DeepLearning on a subset of airlines dataset.
AirlinesWithWeatherDemo	Join flights data with weather data and running Deep Learning and GBM.

You can run examples by typing ./bin/run-example.sh <name of demo> or follow text below.

Building and Running Examples

Please see Running Sparkling Water Examples for more information how to build and run examples.

Configuring Sparkling Water Variables

Please see Available Sparkling Water Configuration Properties for more information about possible Sparkling Water configurations.

Step-by-Step Weather Data Example

Run Sparkling shell with an embedded cluster:

export SPARK_HOME="/path/to/spark/installation"
export MASTER="local[*]"
bin/sparkling-shell

To see the Sparkling shell (i.e., Spark driver) status, go to http://localhost:4040/.
Initialize H2O services on top of Spark cluster:

import ai.h2o.sparkling._
val hc = H2OContext.getOrCreate()
import spark.implicits._

Load weather data for Chicago international airport (ORD):

val weatherDataFile = "examples/smalldata/chicago/Chicago_Ohare_International_Airport.csv"
val weatherTable = spark.read.option("header", "true")
  .option("inferSchema", "true")
  .csv(weatherDataFile)
  .withColumn("Date", to_date(regexp_replace('Date, "(\\d+)/(\\d+)/(\\d+)", "$3-$2-$1")))
  .withColumn("Year", year('Date))
  .withColumn("Month", month('Date))
  .withColumn("DayofMonth", dayofmonth('Date))

Load airlines data:

val airlinesDataFile = "examples/smalldata/airlines/allyears2k_headers.csv"
val airlinesTable = spark.read.option("header", "true")
  .option("inferSchema", "true")
  .option("nullValue", "NA")
  .csv(airlinesDataFile)

Select flights destined for Chicago (ORD):

val flightsToORD = airlinesTable.filter('Dest === "ORD")

Compute the number of these flights:

flightsToORD.count

Join the flights data frame with the weather data frame:

val joined = flightsToORD.join(weatherTable, Seq("Year", "Month", "DayofMonth"))

Run deep learning to produce a model estimating arrival delay:

import ai.h2o.sparkling.ml.algos.H2ODeepLearning
val dl = new H2ODeepLearning()
    .setLabelCol("ArrDelay")
    .setColumnsToCategorical(Array("Year", "Month", "DayofMonth"))
    .setEpochs(5)
    .setActivation("RectifierWithDropout")
    .setHidden(Array(100, 100))

val model = dl.fit(joined)

Use the model to estimate the delay on the training data:

val predictions = model.transform(joined)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.rst

README.rst

Sparkling Water Examples

Available Demos And Applications

Building and Running Examples

Configuring Sparkling Water Variables

Step-by-Step Weather Data Example

Files

README.rst

Latest commit

History

README.rst

File metadata and controls

Sparkling Water Examples

Available Demos And Applications

Building and Running Examples

Configuring Sparkling Water Variables

Step-by-Step Weather Data Example