### Spark Streaming

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, 
fault-tolerant stream processing of live data streams. Data can be ingested from many sources like 
Kafka, Flume, Kinesis, or TCP sockets, and can be processed using complex algorithms expressed 
with high-level functions like map, reduce, join and window. Finally, processed data can be 
pushed out to filesystems, databases, and live dashboards. In fact, you can apply Spark’s machine learning and graph processing algorithms on data streams.

First, we import the names of the Spark Streaming classes and some implicit conversions from StreamingContext into our environment in order to add useful methods to other classes we need (like DStream). StreamingContext is the main entry point for all streaming functionality. We create a local StreamingContext with two execution threads, and a batch interval of 1 second.

In [ ]:
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.kafka._
import org.apache.spark.SparkConf

The cell was cancelled.


In [ ]:
//val sparkConf = new SparkConf().setAppName("DirectKafkaWordCount")
val sparkConf = new SparkConf().setAppName("Test Kafka2") //sparkContext.getConf
val ssc =  new StreamingContext(sparkContext, Seconds(5))
//ssc.checkpoint("checkpoint")
//val Array(zkQuorum, group, topics, numThreads) = args*/
val zkQuorum = "hupi-factory-02-02-05-01:2181"
val group = "DEMO_HUPI_VINCENT"
val topics = "factory02_hupilytics"
val numThreads = "1"
// pour ecrire dans HDFS
val hdfsUrl = "hdfs://hupi-factory-02-01-01-01/user/factory02/hupilytics_events/"
val saveRepo = "test1" 

sparkConf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@10be299e
ssc: org.apache.spark.streaming.StreamingContext = org.apache.spark.streaming.StreamingContext@5abdc031
zkQuorum: String = hupi-factory-02-02-05-01:2181
group: String = DEMO_HUPI_VINCENT
topics: String = factory02_hupilytics
numThreads: String = 1
hdfsUrl: String = hdfs://hupi-factory-02-01-01-01/user/factory02/hupilytics_events/
saveRepo: String = test1


In [ ]:
// Print What is read from Kafka code
val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap
val streamdata = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap)

topicMap: scala.collection.immutable.Map[String,Int] = Map(factory02_hupilytics -> 1)
streamdata: org.apache.spark.streaming.dstream.ReceiverInputDStream[(String, String)] = org.apache.spark.streaming.kafka.KafkaInputDStream@5f7d9de3


In [ ]:
val idsite = 1
streamdata.foreachRDD {
  rdd => {
    val lines = rdd.map(_._2).filter(myString => myString.contains(s"""\"idsite\":"$idsite"""))//.map(myString => FilterIdsite.newResult(idsite, myString))
    val nbLines = lines.count()
    // ecrire dans HDFS
    //if (nbLines != 0) {
     // lines.toDF("value").coalesce(1).write.mode(SaveMode.Append).save(hdfsUrl + saveRepo)
    //}
    // imprimer sur le console
    println("lines pushed are " + nbLines)
    lines.collect().foreach(println)
  }
}

idsite: Int = 1


In [ ]:
ssc.start()
ssc.awaitTermination()

lines pushed are 0
lines pushed are 0
lines pushed are 0
lines pushed are 0
lines pushed are 0
lines pushed are 0
lines pushed are 2
{"action_name":"Accessoires de maison","idsite":"10","rec":"1","r":"005061","h":"17","m":"26","s":"32","url":"http://localhost:8080/8-accessoires-de-maison","urlref":"http://localhost:8080/accessoires-de-maison/7-mug-the-adventure-begins.html","_id":"91c4cd4093d25475","_idts":"1542748424","_idvc":"7","_idn":"0","_refts":"0","_viewts":"1544545425","send_image":"0","pdf":"1","qt":"0","realp":"0","wma":"0","dir":"0","fla":"0","java":"0","gears":"0","ag":"0","cookie":"1","res":"1280x1024","cvar":{"1":["current_ts","1544545592"],"30":["products_impression","6,7,8,9,10,11,15,19"],"42":["lang","FR"],"43":["currency","EUR"]},"gt_ms":"560","client":"factory02","topic":"hupilytics","current_ts":"1544545592","products_impression":"6,7,8,9,10,11,15,19","lang":"FR","currency":"EUR","ua":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like G

The cell was cancelled.


In [ ]:
ssc.stop()