**⭐ 1. What This Pattern Solves**

Pivoting transforms rows into columns based on a key. This is the opposite of flattening/unpivoting and is used to create a wide table for reporting or analytics.

Use cases:

Monthly sales per region → columns: Jan, Feb, Mar.

Count of events per event_type.

Transform categorical values into columns for aggregation.

**⭐ 2. SQL Equivalent**

In [0]:
%sql
SELECT *
FROM sales
PIVOT (
    SUM(amount) FOR month IN ('Jan', 'Feb', 'Mar')
);

**⭐ 3. Core Idea**

Group by a key column, use pivot() on another column, and apply an aggregation function (sum, count, avg) to fill values.

**⭐ 4. Template Code (MEMORIZE THIS)**

In [0]:
df.groupBy("group_col") \
  .pivot("pivot_col") \
  .agg({"value_col": "sum"})

**⭐ 5. Detailed Example**

In [0]:
data = [
    ("Alice", "Jan", 100),
    ("Alice", "Feb", 150),
    ("Bob", "Jan", 200),
    ("Bob", "Feb", 50)
]
df = spark.createDataFrame(data, ["name", "month", "amount"])
df.show()

In [0]:
+-----+-----+------+
|name |month|amount|
+-----+-----+------+
|Alice|Jan  |100   |
|Alice|Feb  |150   |
|Bob  |Jan  |200   |
|Bob  |Feb  |50    |
+-----+-----+------+


In [0]:
df_pivot = df.groupBy("name").pivot("month").sum("amount")
df_pivot.show()


In [0]:
+-----+---+---+
|name |Jan|Feb|
+-----+---+---+
|Alice|100|150|
|Bob  |200|50 |
+-----+---+---+


**⭐ 6. Mini Practice Problems**

Pivot category → count of items per user.

Pivot event_type → sum of duration per session.

Pivot region → avg(sales) per product.

**⭐ 7. Full Data Engineering Problem**

You have sales transactions with columns store_id, product, month, sales_amount.

Task: Create a monthly sales report table with store_id as row, months as columns, aggregated by total sales.

Pattern: groupBy(store_id).pivot(month).sum(sales_amount) → write to analytics warehouse for dashboards.

**⭐ 8. Time & Space Complexity**

Time: O(n * m) → n = groups, m = unique pivot values

Space: Wide table can use significant memory if pivot has many unique values

**⭐ 9. Common Pitfalls**

Pivoting on a column with too many unique values → causes performance issues.

Forgetting to aggregate → pivot() requires aggregation.

Null values in pivot column → result in missing columns unless handled.

Using incorrect aggregation function → can produce misleading data.