### Spark Streaming

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. Data can be ingested from many sources like Kafka, Flume, Kinesis, or TCP sockets, and can be processed using complex algorithms expressed with high-level functions like map, reduce, join and window. Finally, processed data can be pushed out to filesystems, databases, and live dashboards. In fact, you can apply Spark’s machine learning and graph processing algorithms on data streams.

<img src='http://spark.apache.org/docs/latest/img/streaming-arch.png'/>

#### simple local counting example


In [1]:
from pyspark import SparkContext
from pyspark.streaming import StreamingContext

# Create a local StreamingContext with two working thread and batch interval of 1 second
sc = SparkContext("local[2]", "NetworkWordCount")
ssc = StreamingContext(sc, 1)

In [2]:
# Create a DStream that will connect to hostname:port, like localhost:9999
# Firewalls might block this!
lines = ssc.socketTextStream("localhost", 9999)

In [3]:
# Split each line into words
words = lines.flatMap(lambda line: line.split(" "))

In [4]:
# Count each word in each batch
pairs = words.map(lambda word: (word, 1))
wordCounts = pairs.reduceByKey(lambda x, y: x + y)

# Print the first ten elements of each RDD generated in this DStream to the console
wordCounts.pprint()

Now we open up a Unix terminal and type:

         $ nc -lk 9999
     $ hello world any text you want
     
With this running run the line below, then type Ctrl+C to terminate it.

In [None]:
ssc.start()             # Start the computation
ssc.awaitTermination()  # Wait for the computation to terminate

-------------------------------------------
Time: 2020-06-17 12:12:57
-------------------------------------------

-------------------------------------------
Time: 2020-06-17 12:12:58
-------------------------------------------

-------------------------------------------
Time: 2020-06-17 12:12:59
-------------------------------------------

-------------------------------------------
Time: 2020-06-17 12:13:00
-------------------------------------------

-------------------------------------------
Time: 2020-06-17 12:13:01
-------------------------------------------

-------------------------------------------
Time: 2020-06-17 12:13:02
-------------------------------------------

-------------------------------------------
Time: 2020-06-17 12:13:03
-------------------------------------------

-------------------------------------------
Time: 2020-06-17 12:13:04
-------------------------------------------

-------------------------------------------
Time: 2020-06-17 12:13:05
----------

-------------------------------------------
Time: 2020-06-17 12:14:09
-------------------------------------------

-------------------------------------------
Time: 2020-06-17 12:14:10
-------------------------------------------

-------------------------------------------
Time: 2020-06-17 12:14:11
-------------------------------------------

-------------------------------------------
Time: 2020-06-17 12:14:12
-------------------------------------------

-------------------------------------------
Time: 2020-06-17 12:14:13
-------------------------------------------

-------------------------------------------
Time: 2020-06-17 12:14:14
-------------------------------------------

-------------------------------------------
Time: 2020-06-17 12:14:15
-------------------------------------------

-------------------------------------------
Time: 2020-06-17 12:14:16
-------------------------------------------

-------------------------------------------
Time: 2020-06-17 12:14:17
----------