### Streaming Terminologies

**Readstream** - to read the streaming data

**Writestream** - to write the streaming data

**Checkpoint** - checkpoint plays key role in fault-tolerant and incremental steam processing piplelines. It maintains intermediate state on HDFS compatible file systems to recover from failures

**Trigger** - data continuously flows into a streaming system. The special event trigger initiates the streaming. Default, Fixed interval, One-time 

**Output mode** - Append, Complete, Update

### Define Schema for Sample Streaming Data

In [0]:
from pyspark.sql.types import StructType,StructField, StringType, IntegerType
schema_defind = StructType([StructField('File', StringType(), True),
                           StructField('Shop', StringType(), True),
                           StructField('Sale_Count', IntegerType(), True)
                           ])

### Create the Folder Structure in DBFS File System

In [0]:
# dbutils.fs.mkdirs("/Volumes/workspace/default/mandy/stream_checkpoint_read/")
# dbutils.fs.mkdirs("/Volumes/workspace/default/mandy/stream_checkpoint_write/")
# dbutils.fs.mkdirs("/Volumes/workspace/default/mandy/stream_read/")
# dbutils.fs.mkdirs("/Volumes/workspace/default/mandy/stream_write/")

# dbutils.fs.rm("/Volumes/workspace/default/mandy/stream_checkpoint_read/", True)
# dbutils.fs.rm("/Volumes/workspace/default/mandy/stream_checkpoint_write/", True)
# dbutils.fs.rm("/Volumes/workspace/default/mandy/stream_read/", True)
# dbutils.fs.rm("/Volumes/workspace/default/mandy/stream_write/", True)

### Read Streaming Data

In [0]:
df = spark.readStream.format("csv").schema(schema_defind).option("header", True).option("sep", ";").load("/Volumes/workspace/default/mandy/stream_read/")

df1 = df.groupBy("Shop").sum("Sale_Count")

display(df1, checkpointLocation = "/Volumes/workspace/default/mandy/stream_checkpoint_read/")

### Write Streaming Data

In [0]:
df4 = df.writeStream.format("parquet").outputMode("append").option("path", "/Volumes/workspace/default/mandy/stream_write/").option("checkpointLocation", "/Volumes/workspace/default/mandy/stream_checkpoint_write/").trigger(availableNow=True).start().awaitTermination()

### Verify the Written Stream Output Data

In [0]:
display(spark.read.parquet("/Volumes/workspace/default/mandy/stream_write/*.parquet"))