# Explode vs Explode_outer


In PySpark, explode and explode_outer are functions used to work with nested data structures, like arrays or maps, by “exploding” (flattening) each element of an array or key-value pair in a map into separate rows.

### **explode()**

The explode() function takes a column with array or map data and creates a new row for each element in the array (or each key-value pair in the map). If the array is empty or null, explode() will drop the row entirely.

**Key Characteristics :**
* Converts each element in an array or each entry in a map into its own row.
* Drops rows with null or empty arrays.



In [1]:
from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark.sql.functions import *  # Import the function
spark = SparkSession.builder.getOrCreate()
from pyspark.sql.functions import regexp_replace, col
from google.colab import drive


In [4]:
# Sample DataFrame with arrays
data = [
      ("Alice", ["Math", "Science"]),
      ("Bob", ["History"]),
      ("Cathy", []), # Empty array
      ("David", None) # Null array
      ]

df = spark.createDataFrame(data, ['Name', 'Subject'])

df.show()


+-----+---------------+
| Name|        Subject|
+-----+---------------+
|Alice|[Math, Science]|
|  Bob|      [History]|
|Cathy|             []|
|David|           NULL|
+-----+---------------+



In [5]:
# Use explode to flatten the array

exploded_df = df.select(df['Name'], explode(df['Subject']).alias('subject'))

#show the result
exploded_df.show()

'''
Explanation:
explode() expands the Subjects array into individual rows.
Rows with empty ([]) or null arrays (None) are removed, which is why Cathy and David do not appear in the output
'''

+-----+-------+
| Name|subject|
+-----+-------+
|Alice|   Math|
|Alice|Science|
|  Bob|History|
+-----+-------+



## explode_outer()
The explode_outer() function works similarly to explode(), but it keeps rows with null or empty arrays.


**Key Characteristics :**
* Converts each element in an array or each entry in a map into its own row.
* Retains rows with null or empty arrays, using null values in the exploded column.

In [6]:
explode_outer_df = df.select(df['Name'], explode_outer(df['Subject']).alias('subject'))
explode_outer_df.show()

+-----+-------+
| Name|subject|
+-----+-------+
|Alice|   Math|
|Alice|Science|
|  Bob|History|
|Cathy|   NULL|
|David|   NULL|
+-----+-------+

