## Previewing the Streaming Data

Let us understand how we can preview the streaming data using `console` as well as `memory`. We have seen console already in the past.
* Here is an example to preview the streaming data using `console`. We will preview the data using `update` mode involving aggregations as part of transformations. Launch **Pyspark CLI** and run this script.

```python
spark.conf.set('spark.sql.shuffle.partitions', '2')

import socket
hostname = socket.gethostname()

log_messages = spark. \
    readStream. \
    format("socket"). \
    option("host", hostname). \
    option("port", 9000). \
    load()

from pyspark.sql.functions import split, count, lit

department_count = log_messages. \
    filter(split(split('value', ' ')[6], '/')[1] == 'department'). \
    select(split(split('value', ' ')[6], '/')[2].alias('department')). \
    groupBy('department'). \
    agg(count(lit(1)).alias('department_count'))

department_count. \
    writeStream. \
    outputMode("update"). \
    format("console"). \
    option('truncate', 'false'). \
    trigger(processingTime='5 seconds'). \
    start()
```

Launch Pyspark using below commands and run Spark Structured Streaming Code.

**Using Pyspark2**

```
export PYSPARK_PYTHON=python3

pyspark2 \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

**Using Pyspark3**

```
export PYSPARK_PYTHON=python3

pyspark3 \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

In [None]:
from pyspark.sql import SparkSession

import getpass
username = getpass.getuser()

spark = SparkSession. \
    builder. \
    config('spark.ui.port', '0'). \
    config("spark.sql.warehouse.dir", f"/user/{username}/warehouse"). \
    enableHiveSupport(). \
    appName(f'{username} | Python - Overview of Structured Streaming'). \
    master('yarn'). \
    getOrCreate()

In [None]:
spark.conf.set('spark.sql.shuffle.partitions', '2')

In [None]:
import socket
hostname = socket.gethostname()

In [None]:
log_messages = spark. \
    readStream. \
    format("socket"). \
    option("host", hostname). \
    option("port", 9000). \
    load()

In [None]:
log_messages.isStreaming

In [None]:
log_messages.printSchema()

In [None]:
# outputMode will not have any impact
log_messages. \
    writeStream. \
    format("memory"). \
    queryName("log_messages"). \
    start()

In [None]:
spark.sql('SELECT * FROM log_messages').show(truncate=False)

In [None]:
spark.sql('SELECT count(1) FROM log_messages').show(truncate=False)

In [None]:
spark.sql("""
    SELECT * FROM log_messages
    WHERE split(split(value, ' ')[6], '/')[1] = 'department'
""").show(truncate=False)

In [None]:
spark.sql("""
    SELECT count(1) FROM log_messages
    WHERE split(split(value, ' ')[6], '/')[1] = 'department'
""").show(truncate=False)

In [None]:
spark.sql("""
    SELECT split(split(value, ' ')[6], '/')[2], 
        count(1) 
    FROM log_messages
    WHERE split(split(value, ' ')[6], '/')[1] = 'department'
    GROUP BY split(split(value, ' ')[6], '/')[2]
""").show(truncate=False)