Skip to content

streamapprox/spark

Repository files navigation

Spark-based StreamApprox

This prototype implements the online adaptive stratified reservoir sampling (OASRS) algorithm using Spark 2.0.2

Build

Building Spark-based StreamApprox is the same as building Apache Spark. See scripts for Spark/Flink building and installation instructions.

Usage

This prototype supports a sampling function reservoirStratifiedSample() for Spark RDD by implementing the OASRS algorithm. Users can use this function as a PairRDD function of Spark.

    val inputStream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topicsSet)

    //Take a sample using the OASRS algorithm
    val sampleItems = inputStream.map(_._2)
      .map(line => (line.split(",")(0), line.split(",")(1).toDouble))
      .transform(x => x.reservoirStratifiedSample(sampleSize))
      .reduceByKeyAndWindow((a: Double, b: Double) => (a + b), Seconds(10), Seconds(5))
    sampleItems.print()

Support

  • If you have any question please shoot me an email: do.le_quoc@tu-dresden.de
  • Note that we are currently working to adapt our implementation with the new version (kafka-0-10) of Spark-Kafka connector.

License

Published under GNU General Public License v2.0 (GPLv2), see LICENSE

About

No description, website, or topics provided.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published