**⭐ 1. What This Pattern Solves**

Creating complex columns in PySpark allows you to store multiple values in a single column, which is essential for nested data modeling and semi-structured data.

Use cases:

Arrays for multi-valued attributes (tags, scores, purchased items)

Structs to group related fields (address → street, city, zip)

Maps for key-value pairs (product → quantity)

Preparing data for nested JSON export or analytics pipelines

**⭐ 2. SQL Equivalent**

In [0]:
%sql
-- Array
SELECT array(1, 2, 3) AS nums;

-- Struct
SELECT named_struct('street', '123 St', 'city', 'NY') AS address;

-- Map
SELECT map('apple', 10, 'orange', 5) AS fruit_qty;

**⭐ 3. Core Idea**

Use PySpark functions array(), struct(), and map() to create nested, rich data types in a DataFrame column for downstream operations like explode, flattening, or JSON serialization.

**⭐ 4. Template Code (MEMORIZE THIS)**

In [0]:
from pyspark.sql.functions import array, struct, map

# Array
df = df.withColumn("new_array", array("col1", "col2"))

# Struct
df = df.withColumn("new_struct", struct("col1", "col2"))

# Map
df = df.withColumn("new_map", map("key_col", "value_col"))

**⭐ 5. Detailed Example**

In [0]:
data = [(1, "Alice", "NY", 30),
        (2, "Bob", "CA", 25)]
df = spark.createDataFrame(data, ["id", "name", "state", "age"])
df.show()

In [0]:
+---+-----+-----+---+
|id |name |state|age|
+---+-----+-----+---+
|1  |Alice|NY   |30 |
|2  |Bob  |CA   |25 |
+---+-----+-----+---+

In [0]:
from pyspark.sql.functions import array, struct, map

df2 = df.withColumn("info_array", array("name", "state")) \
        .withColumn("info_struct", struct("name", "age")) \
        .withColumn("info_map", map("name", "state"))

df2.show(truncate=False)

In [0]:
+---+-----+-----+---+-----------+-----------+-------------+
|id |name |state|age|info_array |info_struct|info_map     |
+---+-----+-----+---+-----------+-----------+-------------+
|1  |Alice|NY   |30 |[Alice, NY]|{Alice, 30}|{Alice -> NY}|
|2  |Bob  |CA   |25 |[Bob, CA]  |{Bob, 25}  |{Bob -> CA} |
+---+-----+-----+---+-----------+-----------+-------------+


**⭐ 6. Mini Practice Problems**  

Create an array of age and id for each row.

Create a struct with state and name and explode it later.

Create a map of name → age and extract keys and values.

**⭐ 7. Full Data Engineering Problem**

You have customer transaction data: customer_id, product, price.

Task: Group products purchased into an array, store total amount in a struct, and create a map of product → price for analytics.

Goal: Feed downstream to a nested JSON API or Delta Lake Silver table.

Pattern used: array(), struct(), map() → later can explode() or flatten() as needed. 

**⭐ 8. Time & Space Complexity**

Time: O(n) → creating arrays/structs/maps per row

Space: Increases column size; large arrays or maps can increase memory usage per row.

⭐ 9. Common Pitfalls

Mixing incompatible column types inside arrays or structs (e.g., string + int).

Forgetting alias() for nested structs/maps → hard to access later.

Using map keys that are not unique → only the last value per key is kept.

Creating very large arrays/maps → may cause shuffle/memory issues when exploding later.