## Writing Streaming Data to Files

As we have successfully read the data and see it is being processed using `writeStream.format('console')`, now it is time for us to understand how the data can be written to files.

Here are the steps we need to follow to write the data to files:
1. Ensure the logs are being redirected to Netcat Webserver
2. Read the data using `spark.readStream` with `format('socket')`
3. Use `writeStream.format` with appropriate options related to the file format. We will be using `writeStream.format('csv')` and hence we need to specify checkpoint and target location.
```
socketDF \
    .writeStream \
    .format("csv") \
    .option("checkpointLocation", "/FileStore/retail_logs/gen_logs/checkpoint") \
    .option("path", "/FileStore/retail_logs/gen_logs/data") \
    .start()
```
4. Validate both the checkpoint location as well as data location in which files are being copied to.

In [None]:
from pyspark.sql import SparkSession

import getpass
username = getpass.getuser()

spark = SparkSession. \
    builder. \
    config('spark.ui.port', '0'). \
    config("spark.sql.warehouse.dir", f"/user/{username}/warehouse"). \
    enableHiveSupport(). \
    appName(f'{username} | Python - Overview of Structured Streaming'). \
    master('yarn'). \
    getOrCreate()

In [None]:
socketDF = spark \
    .readStream \
    .format("socket") \
    .option("host", "localhost") \
    .option("port", 9000) \
    .load()

In [None]:
socketDF \
    .writeStream \
    .format("csv") \
    .option("path", "/FileStore/retail_logs/gen_logs/data") \
    .start()

In [None]:
socketDF \
  .writeStream \
  .format("csv") \
  .option("checkpointLocation", "/FileStore/retail_logs/gen_logs/checkpoint") \
  .option("path", "/FileStore/retail_logs/gen_logs/data") \
  .trigger(processingTime='5 seconds') \
  .start()