## Dealing with Nulls

Let us understand how to deal with nulls using functions that are available in Spark.

* We can use `coalesce` to return first non null value.
* We also have traditional SQL style functions such as `nvl`. However, they can be used either with `expr` or `selectExpr`.

In [0]:
employees = [(1, "Scott", "Tiger", 1000.0, 10,
                      "united states", "+1 123 456 7890", "123 45 6789"
                     ),
                     (2, "Henry", "Ford", 1250.0, None,
                      "India", "+91 234 567 8901", "456 78 9123"
                     ),
                     (3, "Nick", "Junior", 750.0, '',
                      "united KINGDOM", "+44 111 111 1111", "222 33 4444"
                     ),
                     (4, "Bill", "Gomes", 1500.0, 10,
                      "AUSTRALIA", "+61 987 654 3210", "789 12 6118"
                     )
                ]

In [0]:
employeesDF = spark. \
    createDataFrame(employees,
                    schema="""employee_id INT, first_name STRING, 
                    last_name STRING, salary FLOAT, bonus STRING, nationality STRING,
                    phone_number STRING, ssn STRING"""
                   )

In [0]:
employeesDF.show()

In [0]:
from pyspark.sql.functions import coalesce

In [0]:
employeesDF. \
    withColumn('bonus', coalesce('bonus', 0)). \
    show()

In [0]:
from pyspark.sql.functions import lit

In [0]:
employeesDF. \
    withColumn('bonus1', coalesce('bonus', lit(0))). \
    show()

In [0]:
from pyspark.sql.functions import col

In [0]:
employeesDF. \
    withColumn('bonus1', col('bonus').cast('int')). \
    show()

In [0]:
employeesDF. \
    withColumn('bonus1', coalesce(col('bonus').cast('int'), lit(0))). \
    show()

In [0]:
from pyspark.sql.functions import expr

In [0]:
employeesDF. \
    withColumn('bonus', expr("nvl(bonus, 0)")). \
    show()

In [0]:
employeesDF. \
    withColumn('bonus', expr("nvl(nullif(bonus, ''), 0)")). \
    show()

In [0]:
employeesDF. \
    withColumn('payment', col('salary') + (col('salary') * coalesce(col('bonus').cast('int'), lit(0)) / 100)). \
    show()