### How would you handle null values in a DataFrame? For example, drop rows with null values in the age column. 

In [0]:
# sample data
data = [
    ("Rohish", 30),
    ("Ajit", None),
    ("Rajani", 25),
    (None, 35),
    ("Eve", None)
]

columns = ["Name", "Age"]

df = spark.createDataFrame(data, columns)
df.show()

+------+----+
|  Name| Age|
+------+----+
|Rohish|  30|
|  Ajit|null|
|Rajani|  25|
|  null|  35|
|   Eve|null|
+------+----+



**How would you handle null values in a DataFrame?**
- [click here to read more](https://github.com/rohish-zade/PySpark/blob/main/dataFrame_API/15_handling_null_values_in_pyspark.ipynb)

**drop rows with null values in the age column.**

In [0]:
# rows with null values in the Age column
from pyspark.sql.functions import col

df.filter(col("Age").isNull()).show()

+----+----+
|Name| Age|
+----+----+
|Ajit|null|
| Eve|null|
+----+----+



In [0]:
# lets drop these rows with null values in the Age column

# To drop rows where the age column has null values in PySpark, use the dropna() function with the subset parameter:
cleaned_df = df.dropna(subset=["Age"])
cleaned_df.show()

+------+---+
|  Name|Age|
+------+---+
|Rohish| 30|
|Rajani| 25|
|  null| 35|
+------+---+



In [0]:
# alternative approch: Using filter() or where()

cleaned_df_2 = df.filter(col("Age").isNotNull())
cleaned_df_2.show()

+------+---+
|  Name|Age|
+------+---+
|Rohish| 30|
|Rajani| 25|
|  null| 35|
+------+---+

