# Writing Data

Just as there are many ways to read data, we have just as many ways to write data.

In this notebook, we will take a quick peek at how to write data back out to Parquet files.

**Technical Accomplishments:**
- Writing data to Parquet files

##![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Getting Started

Run the following cell to configure our "classroom."

In [0]:
%run "./Includes/Classroom-Setup"

In [0]:
# Mount "/mnt/training" again using "%run "./Includes/Dataset-Mounts-New"" if it is failed in "./Includes/Classroom-Setup"
try:
    files = dbutils.fs.ls("/mnt/training")
except:
    dbutils.fs.unmount('/mnt/training/')


/mnt/training/ has been unmounted.


In [0]:
%run "./Includes/Dataset-Mounts-New"

##![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Writing Data

Let's start with one of our original CSV data sources, **pageviews_by_second.tsv**:

In [0]:
from pyspark.sql.types import *

csvSchema = StructType([
  StructField("timestamp", StringType(), False),
  StructField("site", StringType(), False),
  StructField("requests", IntegerType(), False)
])

csvFile = "/mnt/training/wikipedia/pageviews/pageviews_by_second.tsv"

csvDF = (spark.read
  .option('header', 'true')
  .option('sep', "\t")
  .schema(csvSchema)
  .csv(csvFile)
)

Now that we have a `DataFrame`, we can write it back out as Parquet files or other various formats.

In [0]:
fileName = userhome + "/pageviews_by_second.parquet"
print("Output location: " + fileName)

(csvDF.write                       # Our DataFrameWriter
  .option("compression", "snappy") # One of none, snappy, gzip, and lzo
  .mode("overwrite")               # Replace existing files
  .parquet(fileName)               # Write DataFrame to Parquet files
)

Output location: dbfs:/user/vishal.abnave@borregaard.com/pageviews_by_second.parquet


Now that the file has been written out, we can see it in the DBFS:

In [0]:
display(
  dbutils.fs.ls(fileName)
)

path,name,size,modificationTime
dbfs:/user/vishal.abnave@borregaard.com/pageviews_by_second.parquet/_SUCCESS,_SUCCESS,0,1684249578000
dbfs:/user/vishal.abnave@borregaard.com/pageviews_by_second.parquet/_committed_8295832922756971395,_committed_8295832922756971395,420,1684249577000
dbfs:/user/vishal.abnave@borregaard.com/pageviews_by_second.parquet/_started_8295832922756971395,_started_8295832922756971395,0,1684249569000
dbfs:/user/vishal.abnave@borregaard.com/pageviews_by_second.parquet/part-00000-tid-8295832922756971395-350919e7-eb9f-4ff3-81d5-60ad060dfb3d-89-1-c000.snappy.parquet,part-00000-tid-8295832922756971395-350919e7-eb9f-4ff3-81d5-60ad060dfb3d-89-1-c000.snappy.parquet,15379248,1684249577000
dbfs:/user/vishal.abnave@borregaard.com/pageviews_by_second.parquet/part-00001-tid-8295832922756971395-350919e7-eb9f-4ff3-81d5-60ad060dfb3d-90-1-c000.snappy.parquet,part-00001-tid-8295832922756971395-350919e7-eb9f-4ff3-81d5-60ad060dfb3d-90-1-c000.snappy.parquet,15769715,1684249577000
dbfs:/user/vishal.abnave@borregaard.com/pageviews_by_second.parquet/part-00002-tid-8295832922756971395-350919e7-eb9f-4ff3-81d5-60ad060dfb3d-91-1-c000.snappy.parquet,part-00002-tid-8295832922756971395-350919e7-eb9f-4ff3-81d5-60ad060dfb3d-91-1-c000.snappy.parquet,15743758,1684249576000
dbfs:/user/vishal.abnave@borregaard.com/pageviews_by_second.parquet/part-00003-tid-8295832922756971395-350919e7-eb9f-4ff3-81d5-60ad060dfb3d-92-1-c000.snappy.parquet,part-00003-tid-8295832922756971395-350919e7-eb9f-4ff3-81d5-60ad060dfb3d-92-1-c000.snappy.parquet,14794556,1684249576000


And lastly we can read that same parquet file back in and display the results:

In [0]:
display(
  spark.read.parquet(fileName)
)


timestamp,site,requests
2015-03-30T16:37:18,mobile,1456
2015-03-30T16:51:11,desktop,2917
2015-03-30T17:10:47,desktop,3043
2015-03-30T17:12:39,mobile,1440
2015-03-30T17:31:11,mobile,1462
2015-03-30T17:44:40,mobile,1440
2015-03-30T17:54:24,desktop,3121
2015-03-30T17:56:16,mobile,1427
2015-03-30T18:32:32,desktop,3097
2015-03-30T18:34:24,mobile,1369


## Next steps

Start the next lesson, [Reading Data - Lab]($./6.Reading%20Data%20-%20Lab)