### 🔹 What is pivot() in PySpark?

In PySpark, pivot() is used to rotate rows into columns and perform aggregations.
It’s typically used with groupBy() when you want to summarize and restructure data.

In [0]:
# Sample data: (Region, Product, Sales)
data = [
    ("East", "Apple", 100),
    ("East", "Banana", 150),
    ("West", "Apple", 200),
    ("West", "Banana", 120),
    ("North", "Apple", 50),
    ("North", "Banana", 90),
    ("East", "Apple", 80),
    ("West", "Banana", 70),
]

columns = ["Region", "Product", "Sales"]

df = spark.createDataFrame(data, columns)

df.display()

Region,Product,Sales
East,Apple,100
East,Banana,150
West,Apple,200
West,Banana,120
North,Apple,50
North,Banana,90
East,Apple,80
West,Banana,70


In [0]:
pivot_df = df.groupBy("Region") \
             .pivot("Product") \
             .sum("Sales")

pivot_df.display()

Region,Apple,Banana
East,180,150
West,200,190
North,50,90


In [0]:
pivot_df2 = df.groupBy("Region") \
              .pivot("Product", ["Apple"]) \
              .sum("Sales")

pivot_df2.display()

Region,Apple
East,180
West,200
North,50


In [0]:
from pyspark.sql import functions as F

pivot_df3 = df.groupBy("Region") \
              .pivot("Product") \
              .agg(F.sum("Sales").alias("Total_Sales"),
                   F.avg("Sales").alias("Avg_Sales"))

pivot_df3.display()

Region,Apple_Total_Sales,Apple_Avg_Sales,Banana_Total_Sales,Banana_Avg_Sales
East,180,90.0,150,150.0
West,200,200.0,190,95.0
North,50,50.0,90,90.0


### ✅ In summary,

groupBy() defines the grouping key(s).

pivot() defines the column to pivot.

Aggregation functions (like sum, avg) compute summary values.