<a href="https://colab.research.google.com/github/rahulrajpr/prepare-anytime/blob/main/spark/functions/8_spark_sql_map_functions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Spark Map Functions**
----------------------------------------------
https://spark.apache.org/docs/latest/sql-ref-functions-builtin.html#map-functions
----------------------------------------------

----------------------------------------------
A **Map** in **Apache Spark** is a **complex column type** that allows you to represent **a collection of key-value pairs** within a **single DataFrame column**.  
It enables storing **associative data where each key maps to a value**, providing a **flexible and efficient way** to manage and query **semi-structured or dynamic data** such as **JSON objects, configurations, or property lists**.
----------------------------------------------

### Map vs Struct

| Feature | **StructType** | **MapType** |
|---------|----------------|-------------|
| **Definition** | A complex type that groups multiple **named fields** together into a single column. | A complex type that stores **key-value pairs** inside a single column, like a dictionary. |
| **Key/Field Names** | **Fixed field names** defined at schema design time. | **Dynamic keys** — each row can have different keys. |
| **Access** | Accessed by field name: `struct_col.fieldName` | Accessed by key: `map_col['keyName']` |
| **Use Case** | When you know **all attributes upfront** and want a rigid schema. | When keys are **variable or semi-structured**, e.g., JSON with optional fields. |
| **Order** | Fields have a **fixed order**. | Key order is **not guaranteed**. |
| **Example** | `{street: "Muvattupuzha", city: "Ernakulam", zip: 686669}` | `{"city": "Ernakulam", "hobby": "Chess"}` |



In [None]:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('spark-functions').getOrCreate()

In [None]:
# map : create a map

sql = '''
with cte as
(
  select
  map('name','rahul','profession','dataengineer') as mapGenerated,
  named_struct('name','rahul','profession','dataengineer') as structGenerated
)
select mapGenerated,structGenerated
from cte
'''

spark.sql(sql).show(truncate = False)

+-------------------------------------------+---------------------+
|mapGenerated                               |structGenerated      |
+-------------------------------------------+---------------------+
|{name -> rahul, profession -> dataengineer}|{rahul, dataengineer}|
+-------------------------------------------+---------------------+



In [None]:
# map_concat : create a map

sql = '''
with cte as
(
  select
  map('name','rahul','profession','dataengineer') as mapGenerated1,
  map('age',33,'sports','cricket') as mapGenerated2
)
select map_concat(mapGenerated1,mapGenerated2) as map_concatOut
from cte
'''

spark.sql(sql).show(truncate = False)

+-------------------------------------------------------------------------+
|map_concatOut                                                            |
+-------------------------------------------------------------------------+
|{name -> rahul, profession -> dataengineer, age -> 33, sports -> cricket}|
+-------------------------------------------------------------------------+



In [None]:
# map_contains_key : this case sensitive

sql = '''
with cte as
(
  select
  map('name','rahul','profession','dataengineer') as mapGenerated1
)
select map_contains_key(mapGenerated1,'name') as map_concatOut
from cte
'''

spark.sql(sql).show(truncate = False)

##--

sql = '''
with cte as
(
  select
  map('name','rahul','profession','dataengineer') as mapGenerated1
)
select map_contains_key(mapGenerated1,'Name') as map_concatOut
from cte
'''

spark.sql(sql).show(truncate = False)

+-------------+
|map_concatOut|
+-------------+
|true         |
+-------------+

+-------------+
|map_concatOut|
+-------------+
|false        |
+-------------+



In [None]:
# map_entries : this converts the map into an array of stucts

## this will help to flatten a map and do array operation if in case requred.

sql = '''
with cte as (
  select map('name','rahul','profession','dataengineer') as mapGenerated1
)
select
  mapGenerated1,
  map_entries(mapGenerated1) as map_entriesout
from cte
'''
spark.sql(sql).show(truncate = False)

+-------------------------------------------+-------------------------------------------+
|mapGenerated1                              |map_entriesout                             |
+-------------------------------------------+-------------------------------------------+
|{name -> rahul, profession -> dataengineer}|[{name, rahul}, {profession, dataengineer}]|
+-------------------------------------------+-------------------------------------------+



In [None]:
# map_from_arrays

sql = '''
with cte as (
  select
    array('name','profession') as keyarray,
    array('rahul','dataengineering') as valuearray
)
select
  keyarray,
  valuearray,
  map_from_arrays(keyarray,valuearray) as resultMap
from cte
'''
spark.sql(sql).show(truncate = False)



+------------------+------------------------+----------------------------------------------+
|keyarray          |valuearray              |resultMap                                     |
+------------------+------------------------+----------------------------------------------+
|[name, profession]|[rahul, dataengineering]|{name -> rahul, profession -> dataengineering}|
+------------------+------------------------+----------------------------------------------+



In [None]:
# map_from_entries

# an entry is an array of structs

sql = '''
with cte as (
  select
    array(struct('name','rahul'),
          struct('profession','dataengineer'),
          struct('sports','cricket')) as entry1
)
select
  entry1,
  map_from_entries(entry1) as map_from_entriesOut
from cte
'''
spark.sql(sql).show(truncate = False)


+--------------------------------------------------------------+--------------------------------------------------------------+
|entry1                                                        |map_from_entriesOut                                           |
+--------------------------------------------------------------+--------------------------------------------------------------+
|[{name, rahul}, {profession, dataengineer}, {sports, cricket}]|{name -> rahul, profession -> dataengineer, sports -> cricket}|
+--------------------------------------------------------------+--------------------------------------------------------------+

