**⭐ 1. What This Pattern Solves**

Flattening is used when you have nested columns (structs, arrays, maps) and you want to convert them into a wide table with individual columns.

Use cases:

Converting JSON or complex Parquet files into tabular format.

Preparing nested event logs for analytics or BI tools.

Flattening exploded arrays/structs for joins or aggregations.

**⭐ 2. SQL Equivalent**

In [0]:
%sql
-- Flatten struct column
SELECT 
    nested_col.field1 AS field1,
    nested_col.field2 AS field2
FROM table;

-- Flatten array of structs (after explode)
SELECT 
    exploded_col.field1,
    exploded_col.field2
FROM table
LATERAL VIEW explode(array_struct_col) t AS exploded_col;

**⭐ 3. Core Idea**

Access nested fields using dot notation (struct_col.field) and optionally explode arrays first. Flattening turns nested structures → individual columns for analytics or joins.

**⭐ 4. Template Code (MEMORIZE THIS)**

In [0]:
# Flatten struct
df.select("id", "struct_col.field1", "struct_col.field2")

# Flatten exploded array of structs
from pyspark.sql.functions import explode
df.select("id", explode("array_col").alias("exploded")) \
  .select("id", "exploded.field1", "exploded.field2")


**⭐ 5. Detailed Example**

In [0]:
from pyspark.sql import Row

data = [
    Row(id=1, info=Row(name="Alice", age=30)),
    Row(id=2, info=Row(name="Bob", age=25))
]

df = spark.createDataFrame(data)
df.show(truncate=False)

In [0]:
+---+-----------+
|id |info       |
+---+-----------+
|1  |{Alice,30} |
|2  |{Bob,25}   |
+---+-----------+

In [0]:
df_flat = df.select("id", "info.name", "info.age")
df_flat.show()

In [0]:
+---+-----+---+
|id |name |age|
+---+-----+---+
|1  |Alice|30 |
|2  |Bob  |25 |
+---+-----+---+

In [0]:
from pyspark.sql.functions import explode

data = [(1, [Row(item="apple", qty=5), Row(item="banana", qty=3)])]
df2 = spark.createDataFrame(data, ["id", "items"])

df2_flat = df2.select("id", explode("items").alias("item"))
df2_flat.select("id", "item.item", "item.qty").show()


In [0]:
+---+-----+---+
|id |item |qty|
+---+-----+---+
|1  |apple|5  |
|1  |banana|3 |
+---+-----+---+

**⭐ 6. Mini Practice Problems**

Flatten a struct column with nested address (address.street, address.city).

Explode and flatten an array of structs representing orders.

Flatten a JSON-parsed struct column into multiple columns.

**⭐ 7. Full Data Engineering Problem**

You ingest nested JSON logs from S3: each row has user (struct) and events (array of structs).

Task: Flatten user → user_id, user_name and explode events → event_type, timestamp for analytics.

Pattern: from_json → explode → select nested fields → write to Silver table.

**⭐ 8. Time & Space Complexity**

Time: O(n * m) if exploding arrays (n rows, m elements per array)

Space: Flattening does not increase rows, but exploding arrays multiplies rows

**⭐ 9. Common Pitfalls**

Forgetting to alias exploded columns → column name conflicts.

Trying to flatten without exploding arrays → still nested.

Accessing deeply nested fields incorrectly → struct_col.array_col.field must be correctly chained.

Flattening too early → may duplicate rows unnecessarily if joins are required first.