- **Name:** 10_dataframe_null_handling
- **Author:** Shamas Imran
- **Desciption:** Handling NULL values in DataFrames
- **Date:** 19-Aug-2025
<!--
REVISION HISTORY
Version          Date        Author           Desciption
01           19-Aug-2025   Shamas Imran       Replaced nulls using fillna  
                                              Dropped rows with null values  
                                              Demonstrated coalesce and na functions  
-->

In [0]:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("DatapurProgram").getOrCreate()

In [0]:
data = [
    (1, "Shamas", None),
    (2, None, 25),
    (3, "Imran", 30),
    (4, None, None)
]
df = spark.createDataFrame(data, ["id", "name", "age"])

df.show()

In [0]:
df.filter(df["name"].isNull()).show()

In [0]:
df.filter(df["name"].isNotNull()).show()

In [0]:
# Drop rows with any null
df.na.drop().show()

In [0]:
# Drop rows only if "name" is NULL
df_filtered = df.na.drop(subset=["name"])
df_filtered.show()

In [0]:
df.fillna("Missing", subset=["name"]).show()

In [0]:
df.na.fill({"name": "Unknown", "age": 0}).show()

In [0]:
from pyspark.sql import functions as F

df.withColumn(
    "age",
    F.when(df["age"].isNull(), 99).otherwise(df["age"])
).show()