# **Data Streaming using PySpark [CN7030]**

**`Dr Amin Karami, UEL Docklands Campus, March 2022`**

`E: a.karami@uel.ac.uk`

`W: www.aminkarami.com`

---

**countByValueAndWindow(windowLength, slideInterval, [numTasks])**:	When called on a DStream of (K, V) pairs, returns a new DStream of (K, Long) pairs where the value of each key is its frequency within a sliding window. Like in `reduceByKeyAndWindow`, the number of reduce tasks is configurable through an optional argument.

In [None]:
# Load Spark engine
import findspark
findspark.init()

In [None]:
from pyspark import SparkConf, SparkContext
from pyspark.streaming import StreamingContext

In [None]:
conf = SparkConf().setAppName("CountByValueAndWindow_example")
conf.set("spark.executor.memory", "1g")
conf.set("spark.driver.memory", "1g")
conf.set("spark.cores.max", "2")

# compile all configuration
sc = SparkContext(conf=conf)

sc.setLogLevel("ERROR")
ssc = StreamingContext(sc, 1)
ssc.checkpoint("checkpoint")

In [None]:
lines = ssc.socketTextStream("localhost", 7000)
# open cmd and type: nc -lk 7000

In [None]:
words = lines.flatMap(lambda line: line.split(" "))
words.countByValueAndWindow(10, 5).pprint()

In [None]:
ssc.start()

In [None]:
ssc.stop(stopSparkContext=True, stopGraceFully=True)