# Writing Parquet Files with PySpark
This notebook demonstrates how to create a DataFrame and write it as Parquet files using PySpark.

In [1]:
# Step 1: Start Spark session
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Write Parquet Example").getOrCreate()

## Create a Sample DataFrame
We create a simple DataFrame with sample data.

In [2]:
data = [
    (1, "Alice", 25),
    (2, "Bob", 30),
    (3, "Charlie", 28)
]
columns = ["id", "name", "age"]
df = spark.createDataFrame(data, schema=columns)
df.show()

+---+-------+---+
| id|   name|age|
+---+-------+---+
|  1|  Alice| 25|
|  2|    Bob| 30|
|  3|Charlie| 28|
+---+-------+---+



## Write DataFrame to Parquet
This will write the DataFrame to disk in Parquet format.

In [3]:
df.write.parquet("./parquet_output/parquet_data", mode="overwrite")

## Optional: Partition and Compress Parquet Output

In [4]:
df.write \
  .partitionBy("age") \
  .option("compression", "snappy") \
  .parquet("./parquet_output/parquet_partitioned", mode="overwrite")

## Read Parquet Back into DataFrame

In [6]:
df_loaded = spark.read.parquet("./parquet_output/parquet_partitioned")
df_loaded.show()

+---+-------+---+
| id|   name|age|
+---+-------+---+
|  3|Charlie| 28|
|  1|  Alice| 25|
|  2|    Bob| 30|
+---+-------+---+

