**⭐ 1. What This Pattern Solves**

explode is used when you have a column containing arrays or maps and you want to flatten it into multiple rows, one for each element.

Use cases:

Each user has multiple orders → create one row per order.

Log messages with multiple tags → one row per tag.

Expanding JSON arrays into individual records.

**⭐ 2. SQL Equivalent**

In [0]:
%sql
-- SQL array explode equivalent
SELECT user_id, order
FROM users
LATERAL VIEW explode(orders) t AS order;

-- n Spark SQL, explode() is the same function used with a LATERAL VIEW

**⭐ 3. Core Idea**

Take a column with nested elements and turn each element into its own row, duplicating the other columns.

**⭐ 4. Template Code (MEMORIZE THIS)**

In [0]:
from pyspark.sql.functions import explode

df.select("id", explode("array_column").alias("exploded_item"))

**⭐ 5. Detailed Example**

In [0]:
data = [
    (1, ["apple", "banana"]),
    (2, ["orange"])
]
df = spark.createDataFrame(data, ["id", "fruits"])
df.show()

In [0]:
+---+------------+
|id |fruits      |
+---+------------+
|1  |[apple,banana]|
|2  |[orange]    |
+---+------------+


In [0]:
from pyspark.sql.functions import explode

df_exploded = df.select("id", explode("fruits").alias("fruit"))
df_exploded.show()

In [0]:
+---+------+
|id |fruit |
+---+------+
|1  |apple |
|1  |banana|
|2  |orange|
+---+------+


**⭐ 6. Mini Practice Problems**

Explode a column of arrays of integers into multiple rows.

Explode a map column into two columns: key and value.

Explode a nested JSON array column from a dataset of users → one row per item.

**⭐ 7. Full Data Engineering Problem**

You have web session logs: each row has a user_id and a list of pages_visited. You need a clickstream table with one row per (user_id, page).

Input: [(1, ["home","about"]), (2, ["home","contact"])]

Goal: Flatten for analytics, join with page metadata, compute metrics like most visited page per user.

Pattern: explode() to flatten → groupBy → join → aggregate.

**⭐ 8. Time & Space Complexity**

Time: O(n * m) → n = number of rows, m = average size of array/map per row

Space: O(n * m) in the output DataFrame

Exploding increases row count proportional to nested elements.

**⭐ 9. Common Pitfalls**

Forgetting to alias the exploded column → messy column names like col or items.

Exploding maps incorrectly without using explode_outer or map_keys/map_values.

Exploding null or empty arrays → no output rows (use explode_outer to keep nulls).