### How to Update Nested Columns in PySpark Dataframe?

- In PySpark, updating nested columns (such as updating fields within a struct) can be done using the withColumn function, combined with functions like col, struct, and alias.
- This is especially useful when you're dealing with JSONlike structures or deeply nested data.

**Sample data:**

Suppose we have a DataFrame representing employees with nested columns that store personal information in a struct format:

In [0]:
# Sample data
data = [
    ("John", ("john.doe@example.com", "New York"), 30),
    ("Jane", ("jane.doe@example.com", "Los Angeles"), 28)
]

# Define schema
columns = ["Name", "PersonalInfo", "Age"]

# Create DataFrame
df = spark.createDataFrame(data, columns)
df.show(truncate=False)

# Here, the PersonalInfo column is a struct containing two fields: Email and City.

+----+-----------------------------------+---+
|Name|PersonalInfo                       |Age|
+----+-----------------------------------+---+
|John|{john.doe@example.com, New York}   |30 |
|Jane|{jane.doe@example.com, Los Angeles}|28 |
+----+-----------------------------------+---+



**Coding Task: Update Nested Column**

Suppose you want to update the City field inside the PersonalInfo struct for each employee, and change "New York" to "Boston".

**Solution: Use withColumn and col**
1. First, we will extract the individual fields from the struct.
2. Then, we'll update the required field (City in this case).
3. Finally, we'll reconstruct the struct with the updated field.

In [0]:
from pyspark.sql.functions import col, when

# Update the City field inside the PersonalInfo struct
updated_df = df.withColumn("PersonalInfo",
                            struct(
                                col("PersonalInfo._1").alias("Email"),
                                when(col("PersonalInfo._2") == "New York", "Boston") \
                                .otherwise(col("PersonalInfo._2")).alias("City")
                            )
                        )

updated_df.show(truncate=False)

+----+-----------------------------------+---+
|Name|PersonalInfo                       |Age|
+----+-----------------------------------+---+
|John|{john.doe@example.com, Boston}     |30 |
|Jane|{jane.doe@example.com, Los Angeles}|28 |
+----+-----------------------------------+---+

