# Spark Streaming 

Credits: Example from Apache Spark website and Apache-Spark in 7 Days by Karen Yang 

* Being able to process streams of data is increasing becoming a necessity
* Spark provides **DStreams** (Low Level) and **Structured Streaming APIs** (High Level) 
* Windowing is a way to deal with Streaming Data
* Spark processes Streams in Micro Batches
* Some examples include Online Machine Learning, Real-Time Reporting, Incremental-ETL

![img](img/Stream.png)

In [1]:
from pyspark import SparkContext
from pyspark.streaming import StreamingContext

In [2]:
sc = SparkContext("local[2]", "NetworkWordCount")
ssc = StreamingContext(sc, 5)

In [3]:
lines = ssc.socketTextStream("localhost", 9999)

In [4]:
words = lines.flatMap(lambda line: line.split(" "))

In [5]:
pairs = words.map(lambda word: (word, 1))
wordCounts = pairs.reduceByKey(lambda x, y: x + y)

In [6]:
wordCounts.pprint()

In [7]:
ssc.start() 

-------------------------------------------
Time: 2020-01-30 17:43:15
-------------------------------------------

-------------------------------------------
Time: 2020-01-30 17:43:20
-------------------------------------------

-------------------------------------------
Time: 2020-01-30 17:43:25
-------------------------------------------

-------------------------------------------
Time: 2020-01-30 17:43:30
-------------------------------------------
('Science', 1)
('is', 1)
('Data', 1)
('Awesome', 1)

-------------------------------------------
Time: 2020-01-30 17:43:35
-------------------------------------------

-------------------------------------------
Time: 2020-01-30 17:43:40
-------------------------------------------

-------------------------------------------
Time: 2020-01-30 17:43:45
-------------------------------------------

-------------------------------------------
Time: 2020-01-30 17:43:50
-------------------------------------------

----------------------------

In [8]:
ssc.stop()