### Spark Streaming

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, 
fault-tolerant stream processing of live data streams. Data can be ingested from many sources like 
Kafka, Flume, Kinesis, or TCP sockets, and can be processed using complex algorithms expressed 
with high-level functions like map, reduce, join and window. Finally, processed data can be 
pushed out to filesystems, databases, and live dashboards. In fact, you can apply Spark’s machine learning and graph processing algorithms on data streams.

First, we import the names of the Spark Streaming classes and some implicit conversions from StreamingContext into our environment in order to add useful methods to other classes we need (like DStream). StreamingContext is the main entry point for all streaming functionality. We create a local StreamingContext with two execution threads, and a batch interval of 1 second.

In [ ]:
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.kafka._
import org.apache.spark.SparkConf

import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.kafka._
import org.apache.spark.SparkConf


In [ ]:
//val sparkConf = new SparkConf().setAppName("DirectKafkaWordCount")
val sparkConf = new SparkConf().setAppName("Test Kafka2") //sparkContext.getConf
val ssc =  new StreamingContext(sparkContext, Seconds(10))
ssc.checkpoint("checkpoint")
//val Array(zkQuorum, group, topics, numThreads) = args*/
val zkQuorum = "hupi-factory-02-02-05-01:2181"
val group = "DEMO_HUPI_VINCENT"
val topics = "factory02_test123"
val numThreads = "1"

sparkConf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@6589c25b
ssc: org.apache.spark.streaming.StreamingContext = org.apache.spark.streaming.StreamingContext@7b7b096
zkQuorum: String = hupi-factory-02-02-05-01:2181
group: String = DEMO_HUPI_VINCENT
topics: String = factory02_test123
numThreads: String = 1


In [ ]:
// Print What is read from Kafka code
val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap
val streamdata = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap)
streamdata.foreachRDD {
  rdd => {
    val lines = rdd.map(_._2)
    println("lines pushed are " + lines.count)
  }
}

topicMap: scala.collection.immutable.Map[String,Int] = Map(factory02_test123 -> 1)
streamdata: org.apache.spark.streaming.dstream.ReceiverInputDStream[(String, String)] = org.apache.spark.streaming.kafka.KafkaInputDStream@6fd1ce57


In [ ]:
ssc.start()
ssc.awaitTermination()

lines pushed are 0
lines pushed are 0
lines pushed are 0
lines pushed are 0
lines pushed are 5
lines pushed are 0
lines pushed are 0
lines pushed are 0
