## Preview Data using memory

Let us preview the data consumed from Kafka topic using Spark Structured Streaming APIs.
* We can either use `console` or `memory` as part of `writeStream.format` to preview the data.
* Earlier we have seen how to preview the data using `console`.
* We will also use this notebook to preview the data using format as `memory`. We will register as a view using `queryName`.

In [1]:
from pyspark.sql import SparkSession

import getpass
username = getpass.getuser()

spark = SparkSession. \
    builder. \
    config('spark.jars.packages', 'org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1'). \
    config('spark.ui.port', '0'). \
    config('spark.sql.warehouse.dir', f'/user/{username}/warehouse'). \
    enableHiveSupport(). \
    appName(f'{username} | Python - Kafka and Spark Integration'). \
    master('yarn'). \
    getOrCreate()

In [2]:
kafka_bootstrap_servers = 'w01.itversity.com:9092,w02.itversity.com:9092'

In [3]:
df = spark. \
  readStream. \
  format('kafka'). \
  option('kafka.bootstrap.servers', kafka_bootstrap_servers). \
  option('subscribe', f'{username}_retail'). \
  load()

In [4]:
df.printSchema()

root
 |-- key: binary (nullable = true)
 |-- value: binary (nullable = true)
 |-- topic: string (nullable = true)
 |-- partition: integer (nullable = true)
 |-- offset: long (nullable = true)
 |-- timestamp: timestamp (nullable = true)
 |-- timestampType: integer (nullable = true)



In [None]:
df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)"). \
    writeStream. \
    format("memory"). \
    queryName("log_messages"). \
    start()

In [7]:
spark.sql('SELECT * FROM log_messages').show(truncate=False)

+----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|key |value                                                                                                                                                                                                                                |
+----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|null|148.240.97.17 - - [27/Aug/2021:14:29:36 -0800] "GET /department/outdoors/categories HTTP/1.1" 200 750 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36" |
|null|170.63.83.83 - - [27/Aug/2021:14:29:38 -0800] 

In [9]:
spark.sql('SELECT count(1) FROM log_messages').show(truncate=False)

+--------+
|count(1)|
+--------+
|71      |
+--------+

