### Spark Streaming

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, 
fault-tolerant stream processing of live data streams. Data can be ingested from many sources like 
Kafka, Flume, Kinesis, or TCP sockets, and can be processed using complex algorithms expressed 
with high-level functions like map, reduce, join and window. Finally, processed data can be 
pushed out to filesystems, databases, and live dashboards. In fact, you can apply Spark’s machine learning and graph processing algorithms on data streams.

First, we import the names of the Spark Streaming classes and some implicit conversions from StreamingContext into our environment in order to add useful methods to other classes we need (like DStream). StreamingContext is the main entry point for all streaming functionality. We create a local StreamingContext with two execution threads, and a batch interval of 1 second.

In [ ]:
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.kafka._
import org.apache.spark.SparkConf

import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.kafka._
import org.apache.spark.SparkConf


In [ ]:
//val sparkConf = new SparkConf().setAppName("DirectKafkaWordCount")
val sparkConf = new SparkConf().setAppName("Test Kafka2") //sparkContext.getConf
val intervalle = 10 // Fenêtre de x secondes / x seconds window

val ssc =  new StreamingContext(sc, Seconds(intervalle)) 
//ssc.checkpoint("checkpoint")
//val Array(zkQuorum, group, topics, numThreads) = args*/
val zkQuorum = "ecoles.node1.pro.hupi.loc"
val group = "DEMO_HUPI_VINCENT"
val topics = "ecoles_hupilytics_scandivie"
val numThreads = "1"

sparkConf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@2cf00f95
intervalle: Int = 10
ssc: org.apache.spark.streaming.StreamingContext = org.apache.spark.streaming.StreamingContext@340b41d0
zkQuorum: String = ecoles.node1.pro.hupi.loc
group: String = DEMO_HUPI_VINCENT
topics: String = ecoles_hupilytics_scandivie
numThreads: String = 1


In [ ]:
val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap
val streamdata = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap)

topicMap: scala.collection.immutable.Map[String,Int] = Map(ecoles_hupilytics_scandivie -> 1)
streamdata: org.apache.spark.streaming.dstream.ReceiverInputDStream[(String, String)] = org.apache.spark.streaming.kafka.KafkaInputDStream@7dd704e7


In [ ]:
// Fonction d'affichage de résultat
def get_output(rdd: RDD[String]) = {
  val li = rdd.collect()
  for(x <- li){
    println(x + " ")
  }
  println("")
  println("")
} 

get_output: (rdd: org.apache.spark.rdd.RDD[String])Unit


### A choisir un output désiré, par exemple, ici si on veut imprimer les messages, il faut commenter nbActions et vice versa

In [ ]:
/*
// Affichage des messages / Display messages
val message = streamdata.map(_._2)
message.foreachRDD(l=>get_output(l))
*/

// ou Nombre d'actions par intervalle de temps
val nbactions = streamdata.count().map(l => " - Nombre d'actions sur le site : "+l.toString)
nbactions.foreachRDD(l => get_output(l))

message: org.apache.spark.streaming.dstream.DStream[String] = org.apache.spark.streaming.dstream.MappedDStream@60980b0f


## Lancer Spark Streaming

In [ ]:
ssc.start()
val duration = 6*intervalle*1000.toLong  // 1 minute // on renvoie les counts dans 1 minute
ssc.awaitTerminationOrTimeout(duration)

 - Nombre d'actions sur le site : 0 


 - Nombre d'actions sur le site : 0 


 - Nombre d'actions sur le site : 0 


 - Nombre d'actions sur le site : 0 


 - Nombre d'actions sur le site : 0 


 - Nombre d'actions sur le site : 1 


duration: Long = 60000
res7: Boolean = false
