# Writing Multiple Partitions in Parquet with PySpark
This notebook demonstrates how to write a DataFrame partitioned by multiple columns (e.g., 'country' and 'year') using PySpark.

In [None]:
# Step 1: Start Spark Session
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Write Multiple Partitions").getOrCreate()

## Create a Sample DataFrame
This DataFrame simulates sales data across countries and years.

In [None]:
data = [
    ("USA", 2022, "Alice", 100),
    ("USA", 2023, "Bob", 200),
    ("Canada", 2022, "Charlie", 150),
    ("Canada", 2023, "David", 175),
    ("USA", 2022, "Eve", 120)
]
columns = ["country", "year", "name", "sales"]
df = spark.createDataFrame(data, schema=columns)
df.show()

## Write the DataFrame with Multiple Partitions
This will write the data partitioned by both 'country' and 'year' columns.

In [None]:
df.write \
    .partitionBy("country", "year") \
    .parquet("/tmp/output/multiple_partitions", mode="overwrite")

## Output Folder Structure
- Files will be organized as:
```
/tmp/output/multiple_partitions/
  ├── country=Canada/year=2022/
  ├── country=Canada/year=2023/
  ├── country=USA/year=2022/
  └── country=USA/year=2023/
```