Example | Description |
---|---|
CraigslistJobTitlesStreamingApp | Stream application - it predicts job category based on incoming job description. |
CraigslistJobTitlesApp | Predict job category based on posted job description. |
ChicagoCrimeApp | Builds a model predicting a probability of arrest for given crime in Chicago using data in chicago datasets. |
CityBikeSharingDemo | Predicts occupancy of City bike stations in NYC. |
HamOrSpamDemo | Shows Spam detector with Spark and H2O's algorithms. |
ProstateDemo | Run H2O's K-means on prostate dataset. |
DeepLearningDemo | Running DeepLearning on a subset of airlines dataset. |
AirlinesWithWeatherDemo | Join flights data with weather data and running Deep Learning and GBM. |
You can run examples by typing ./bin/run-example.sh <name of demo>
or follow text below.
Please see Running Sparkling Water Examples for more information how to build and run examples.
Please see Available Sparkling Water Configuration Properties for more information about possible Sparkling Water configurations.
- Run Sparkling shell with an embedded cluster:
export SPARK_HOME="/path/to/spark/installation"
export MASTER="local[*]"
bin/sparkling-shell
- To see the Sparkling shell (i.e., Spark driver) status, go to http://localhost:4040/.
- Initialize H2O services on top of Spark cluster:
import ai.h2o.sparkling._
val hc = H2OContext.getOrCreate()
import spark.implicits._
- Load weather data for Chicago international airport (ORD):
val weatherDataFile = "examples/smalldata/chicago/Chicago_Ohare_International_Airport.csv"
val weatherTable = spark.read.option("header", "true")
.option("inferSchema", "true")
.csv(weatherDataFile)
.withColumn("Date", to_date(regexp_replace('Date, "(\\d+)/(\\d+)/(\\d+)", "$3-$2-$1")))
.withColumn("Year", year('Date))
.withColumn("Month", month('Date))
.withColumn("DayofMonth", dayofmonth('Date))
- Load airlines data:
val airlinesDataFile = "examples/smalldata/airlines/allyears2k_headers.csv"
val airlinesTable = spark.read.option("header", "true")
.option("inferSchema", "true")
.option("nullValue", "NA")
.csv(airlinesDataFile)
- Select flights destined for Chicago (ORD):
val flightsToORD = airlinesTable.filter('Dest === "ORD")
- Compute the number of these flights:
flightsToORD.count
- Join the flights data frame with the weather data frame:
val joined = flightsToORD.join(weatherTable, Seq("Year", "Month", "DayofMonth"))
- Run deep learning to produce a model estimating arrival delay:
import ai.h2o.sparkling.ml.algos.H2ODeepLearning
val dl = new H2ODeepLearning()
.setLabelCol("ArrDelay")
.setColumnsToCategorical(Array("Year", "Month", "DayofMonth"))
.setEpochs(5)
.setActivation("RectifierWithDropout")
.setHidden(Array(100, 100))
val model = dl.fit(joined)
- Use the model to estimate the delay on the training data:
val predictions = model.transform(joined)