## when() and otherwise() in PySpark

The when and otherwise functions in PySpark are used to implement conditional logic similar to an if-else or CASE statement in SQL.

These are part of the pyspark.sql.functions module and are commonly used to create new columns or update existing ones based on certain conditions.

**Syntax:**

```python
# Conditional logic
df.withColumn("new_column", 
              when(condition, value).otherwise(default_value))
```

**when(condition, value)** specifies the condition and the value to return when the condition is

**otherwise(default_value)** specifies the value to return when the condition is false.

**Example**

In [0]:
# creating a sample dataframe
data = [
(1,'Rohish',27,20000,'india','IT'),
(2,'Melody',25,40000,'germany','engineering'),
(3,'Smit',12,60000,'india','sales'),
(4,'Pushpak',44,None,'uk','engineering'),
(5,'Faisal',35,70000,'india','sales'),
(6,None,29,200000,'uk','IT'),
(7,'Chandra',18,65000,'us','IT'),
(8,'Rajesh',16,40000,'us','sales'),
(None,None,None,None,None,None),
(7,'Babali',37,65000,'us','IT')
]

columns = "id int, name string, age int, salary long, country string, dept string"

emp_df = spark.createDataFrame(data, columns)

In [0]:
emp_df.display()

id,name,age,salary,country,dept
1.0,Rohish,27.0,20000.0,india,IT
2.0,Melody,25.0,40000.0,germany,engineering
3.0,Smit,12.0,60000.0,india,sales
4.0,Pushpak,44.0,,uk,engineering
5.0,Faisal,35.0,70000.0,india,sales
6.0,,29.0,200000.0,uk,IT
7.0,Chandra,18.0,65000.0,us,IT
8.0,Rajesh,16.0,40000.0,us,sales
,,,,,
7.0,Babali,37.0,65000.0,us,IT


**Example 1:**  create a new column adult that assigns a label(Yes or No) based on age

- If the age is greater than or equal to 18, then Yes, Otherwise assigns no

In [0]:
from pyspark.sql.functions import *

emp_df.withColumn("Adult", when(col("age")>=18, "Yes") \
                          .when(col("age")<18, "No") \
                          .otherwise("NA")).show()

+----+-------+----+------+-------+-----------+-----+
|  id|   name| age|salary|country|       dept|Adult|
+----+-------+----+------+-------+-----------+-----+
|   1| Rohish|  27| 20000|  india|         IT|  Yes|
|   2| Melody|  25| 40000|germany|engineering|  Yes|
|   3|   Smit|  12| 60000|  india|      sales|   No|
|   4|Pushpak|  44|  null|     uk|engineering|  Yes|
|   5| Faisal|  35| 70000|  india|      sales|  Yes|
|   6|   null|  29|200000|     uk|         IT|  Yes|
|   7|Chandra|  18| 65000|     us|         IT|  Yes|
|   8| Rajesh|  16| 40000|     us|      sales|   No|
|null|   null|null|  null|   null|       null|   NA|
|   7| Babali|  37| 65000|     us|         IT|  Yes|
+----+-------+----+------+-------+-----------+-----+



**Example 2:** create a new column category that assigns a label based on age:
- If the age is greater than or equal to 18, the category will be "Adult".
- Otherwise, the category will be "Minor".

In [0]:
emp_df.withColumn("category", when(col("age") < 18, "Minor") \
                              .when(col("age") >= 18, "Major") \
                              .otherwise("NA")).show()

+----+-------+----+------+-------+-----------+--------+
|  id|   name| age|salary|country|       dept|category|
+----+-------+----+------+-------+-----------+--------+
|   1| Rohish|  27| 20000|  india|         IT|   Major|
|   2| Melody|  25| 40000|germany|engineering|   Major|
|   3|   Smit|  12| 60000|  india|      sales|   Minor|
|   4|Pushpak|  44|  null|     uk|engineering|   Major|
|   5| Faisal|  35| 70000|  india|      sales|   Major|
|   6|   null|  29|200000|     uk|         IT|   Major|
|   7|Chandra|  18| 65000|     us|         IT|   Major|
|   8| Rajesh|  16| 40000|     us|      sales|   Minor|
|null|   null|null|  null|   null|       null|      NA|
|   7| Babali|  37| 65000|     us|         IT|   Major|
+----+-------+----+------+-------+-----------+--------+



**Handling Null Values**

In [0]:
emp_df.withColumn("age", when(col("age").isNull(), lit(19)).otherwise(col("age"))) \
      .withColumn("Adult", when(col("age") >= 18, "YES") \
                            .otherwise("NO")).show()

+----+-------+---+------+-------+-----------+-----+
|  id|   name|age|salary|country|       dept|Adult|
+----+-------+---+------+-------+-----------+-----+
|   1| Rohish| 27| 20000|  india|         IT|  YES|
|   2| Melody| 25| 40000|germany|engineering|  YES|
|   3|   Smit| 12| 60000|  india|      sales|   NO|
|   4|Pushpak| 44|  null|     uk|engineering|  YES|
|   5| Faisal| 35| 70000|  india|      sales|  YES|
|   6|   null| 29|200000|     uk|         IT|  YES|
|   7|Chandra| 18| 65000|     us|         IT|  YES|
|   8| Rajesh| 16| 40000|     us|      sales|   NO|
|null|   null| 19|  null|   null|       null|  YES|
|   7| Babali| 37| 65000|     us|         IT|  YES|
+----+-------+---+------+-------+-----------+-----+

