<a href="https://colab.research.google.com/github/rahulrajpr/prepare-anytime/blob/main/spark/functions/6_spark_sql_struct_functions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Spark Struct Functions
https://spark.apache.org/docs/latest/sql-ref-functions-builtin.html#struct-functions

> A **Struct** in **Apache Spark** is a **complex column type** that allows you to represent **nested or hierarchical data** within a **DataFrame**.  
> It enables defining a **schema with multiple subfields** inside a single column, providing a **structured and efficient way** to store and query **semi-structured data** such as **JSON** or **Parquet**.


In [None]:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('spark-functions').getOrCreate()

In [None]:
# named_struct

sql = '''
create or replace temp view sample_view as
(
select 'rahul' as name,
        32 as age,
        named_struct('city','muvattupuzha',
                     'state','kerala',
                     'zip',686669) as address
)
'''
spark.sql(sql)

sql = '''
select *,
  address.city as city,
  address.state as state,
  address.zip as zip
from sample_view'''

spark.sql(sql).show(truncate = False)

+-----+---+------------------------------+------------+------+------+
|name |age|address                       |city        |state |zip   |
+-----+---+------------------------------+------------+------+------+
|rahul|32 |{muvattupuzha, kerala, 686669}|muvattupuzha|kerala|686669|
+-----+---+------------------------------+------------+------+------+



In [None]:
sql = '''
create or replace temp view sample_view2 as
(
select *
from sample_view
union all
select
  'lathika' as name,
   32 as age,
   struct('kalady','kerala',686669) as address
)
'''
spark.sql(sql)

sql = '''
select *,
  address.city as city,
  address.state as state,
  address.zip as zip
from sample_view2'''

spark.sql(sql).show(truncate = False)

+-------+---+------------------------------+------------+------+------+
|name   |age|address                       |city        |state |zip   |
+-------+---+------------------------------+------------+------+------+
|rahul  |32 |{muvattupuzha, kerala, 686669}|muvattupuzha|kerala|686669|
|lathika|32 |{kalady, kerala, 686669}      |kalady      |kerala|686669|
+-------+---+------------------------------+------------+------+------+



| Feature | `named_struct` | `struct` |
|---------|----------------|-----------|
| **Purpose** | Create a struct with **explicit field names** | Create a struct from **existing values or columns**, field names come from input |
| **Syntax** | `named_struct('field1', val1, 'field2', val2, ...)` | `struct(val1, val2, ...)` |
| **Field Names** | You **define the names explicitly** (`city`, `state`, `zip`) | Field names are **auto-generated** (`_1`, `_2`, `_3`) if using literals, or same as column names if using columns |
| **Usage Example** | `named_struct('city','muvattupuzha','state','kerala','zip',686669)` | `struct('kalady','kerala',686669)` |
| **Access Fields** | Use dot notation: `address.city`, `address.state`, `address.zip` | Use dot notation **only works if fields have names** (with literals, default names may be `_1`, `_2`, `_3`) |
| **When to Use** | When you need **custom field names** and clarity in schema | When you want **quick struct grouping** of columns or literals and donâ€™t need custom names |
| **Schema Output in your Example** | `address: struct<city:string,state:string,zip:int>` | `address: struct<_1:string,_2:string,_3:int>` (auto field names) |
