# Overview of Discretized Streams

**Discretized Streams** (or **DStreams**) are the basic abstraction provided by Spark Streaming. These are continuous streams of data. The DStream could be the input coming from a source, or the output data that was generated by performing functions on the input. DStreams are basically continuous series of RDDs, which is Sparkâ€™s abstraction of an immutable, distributed dataset. 

One of the consequences of this is that any operation applied on a DStream translates to operations on the underlying RDDs. For example, in the earlier example of converting a stream of lines to words, the flatMap operation is applied on each RDD in the lines DStream to generate the RDDs of the words DStream.


### Demo
For testing a Spark Streaming application with test data, we are going to create a DStream based on a queue of RDDs, using `streamingContext.queueStream(queueOfRDDs)`. Each RDD pushed into the queue will be treated as a batch of data in the DStream, and processed like a stream.

In [1]:
import findspark
# your path will likely not have 'matthew' in it. Change it to reflect your path.
findspark.init('/home/matthew/spark-2.1.0-bin-hadoop2.7')

In [2]:
import time
from pyspark import SparkContext
from pyspark.streaming import StreamingContext

In [3]:
if __name__ == "__main__":
    sc = SparkContext(appName="PythonStreamingQueueStream")
    ssc = StreamingContext(sc, 1)
    
    
    rddQueue = []
    for i in range(5):
        rddQueue += [ssc.sparkContext.parallelize([j for j in range(1, 1001)], 10)]
    
    inputStream = ssc.queueStream(rddQueue)
    mappedStream = inputStream.map(lambda x: (x % 10, 1))
    reducedStream = mappedStream.reduceByKey(lambda a, b: a + b)
    reducedStream.pprint()
    
    ssc.start()
    time.sleep(6)
    ssc.stop(stopSparkContext=True, stopGraceFully=True)

-------------------------------------------
Time: 2018-01-17 17:16:21
-------------------------------------------
(0, 100)
(8, 100)
(2, 100)
(4, 100)
(6, 100)
(1, 100)
(3, 100)
(9, 100)
(5, 100)
(7, 100)

-------------------------------------------
Time: 2018-01-17 17:16:22
-------------------------------------------
(0, 100)
(8, 100)
(2, 100)
(4, 100)
(6, 100)
(1, 100)
(3, 100)
(9, 100)
(5, 100)
(7, 100)

-------------------------------------------
Time: 2018-01-17 17:16:23
-------------------------------------------
(0, 100)
(8, 100)
(2, 100)
(4, 100)
(6, 100)
(1, 100)
(3, 100)
(9, 100)
(5, 100)
(7, 100)

-------------------------------------------
Time: 2018-01-17 17:16:24
-------------------------------------------
(0, 100)
(8, 100)
(2, 100)
(4, 100)
(6, 100)
(1, 100)
(3, 100)
(9, 100)
(5, 100)
(7, 100)

-------------------------------------------
Time: 2018-01-17 17:16:25
-------------------------------------------
(0, 100)
(8, 100)
(2, 100)
(4, 100)
(6, 100)
(1, 100)
(3, 100)
(9,

### References
1. https://spark.apache.org/docs/latest/streaming-programming-guide.html#basic-sources
2. https://spark.apache.org/docs/latest/streaming-programming-guide.html#discretized-streams-dstreams
3. https://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#pyspark.streaming.StreamingContext