## When
PySpark supports a way to check multiple conditions in sequence and returns a value when the first condition met by using SQL like case when and `when()`. `otherwise()` expressions, these works similar to `Switch` and `if then else` statements.
#### PySpark When Otherwise 
`when()` is a SQL function that returns a Column type and `otherwise()` is a function of Column, if `otherwise()` is not used, it returns a None/NULL value.
#### PySpark SQL Case When
This is similar to SQL expression, Usage: `CASE WHEN cond1 THEN result WHEN cond2 THEN result... ELSE result` END.

In [1]:
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('when').getOrCreate()
data = [("James","M",60000),("Michael","M",70000),
        ("Robert",None,400000),("Maria","F",500000),
        ("Jen","",None)]

columns = ["name","gender","salary"]
df = spark.createDataFrame(data = data, schema = columns)
df.show()

+-------+------+------+
|   name|gender|salary|
+-------+------+------+
|  James|     M| 60000|
|Michael|     M| 70000|
| Robert|  null|400000|
|  Maria|     F|500000|
|    Jen|      |  null|
+-------+------+------+



### Using when() otherwise() on PySpark DataFrame.
PySpark `when()` is SQL function, in order to use this first you should import and this returns a Column type, `otherwise()` is a function of Column, when `otherwise()` not used and none of the conditions met it assigns None (Null) value. Usage would be like `when(condition).otherwise(default)`.

`when()` function take 2 parameters, first param takes a condition and second takes a literal value or Column, if condition evaluates to true then it returns a value from second param.

The code below replaces the value of gender with a new derived value, when conditions are not matched, we are assigning "" a for null

In [2]:
from pyspark.sql.functions import when

df2 = df.withColumn("new_gender", when(df.gender == "M","Male")
                                 .when(df.gender == "F","Female")
                                 .when(df.gender.isNull() ,"")
                                 .otherwise(df.gender))

df2.show()

+-------+------+------+----------+
|   name|gender|salary|new_gender|
+-------+------+------+----------+
|  James|     M| 60000|      Male|
|Michael|     M| 70000|      Male|
| Robert|  null|400000|          |
|  Maria|     F|500000|    Female|
|    Jen|      |  null|          |
+-------+------+------+----------+



Using `select()`

In [3]:
from pyspark.sql.functions import col

df2 = df.select(col("*"),when(df.gender == "M","Male")
                        .when(df.gender == "F","Female")
                        .when(df.gender.isNull() ,"")
                        .otherwise(df.gender).alias("new_gender"))

df2.show()

+-------+------+------+----------+
|   name|gender|salary|new_gender|
+-------+------+------+----------+
|  James|     M| 60000|      Male|
|Michael|     M| 70000|      Male|
| Robert|  null|400000|          |
|  Maria|     F|500000|    Female|
|    Jen|      |  null|          |
+-------+------+------+----------+



 ### PySpark SQL Case When on DataFrame
  Case When statement is used to execute a sequence of conditions and returns a value when the first condition is met, similar to `SWITH` and `IF THEN ELSE` statements. 
  
  Similarly, PySpark SQL Case When statement can be used on DataFrame, below are some of the examples of using with `withColumn()`, `select()`, `selectExpr()` utilizing `expr()` function.

#### Syntax of SQL CASE WHEN ELSE END

CASE
    WHEN condition1 THEN result_value1
    WHEN condition2 THEN result_value2
    -----
    -----
    ELSE result
END;

* CASE is the start of the expression
* Clause WHEN takes a condition, if condition true it returns a value from THEN
* If the condition is false it goes to the next condition and so on.
* If none of the condition matches, it returns a value from the ELSE clause.
* END is to end the expression

In [4]:
from pyspark.sql.functions import expr, col

#Using Case When on withColumn()
df3 = df.withColumn("new_gender", expr("CASE WHEN gender = 'M' THEN 'Male' " + 
               "WHEN gender = 'F' THEN 'Female' WHEN gender IS NULL THEN ''" +
               "ELSE gender END"))
               
df3.show(truncate=False)

+-------+------+------+----------+
|name   |gender|salary|new_gender|
+-------+------+------+----------+
|James  |M     |60000 |Male      |
|Michael|M     |70000 |Male      |
|Robert |null  |400000|          |
|Maria  |F     |500000|Female    |
|Jen    |      |null  |          |
+-------+------+------+----------+



In [5]:
df4 = df.select(col("*"), expr("CASE WHEN gender = 'M' THEN 'Male' " +
           "WHEN gender = 'F' THEN 'Female' WHEN gender IS NULL THEN ''" +
           "ELSE gender END").alias("new_gender"))

df4.show()           

+-------+------+------+----------+
|   name|gender|salary|new_gender|
+-------+------+------+----------+
|  James|     M| 60000|      Male|
|Michael|     M| 70000|      Male|
| Robert|  null|400000|          |
|  Maria|     F|500000|    Female|
|    Jen|      |  null|          |
+-------+------+------+----------+



### Using Case When on SQL Expression
You can also use Case When with SQL statement after creating a temporary view

Using Cse When on select()

In [16]:
df.createOrReplaceTempView("EMP")

spark.sql("select name, gender, salary, CASE WHEN gender = 'M' THEN 'Male' " + 
               "WHEN gender = 'F' THEN 'Female' WHEN gender IS NULL THEN ''" +
              "ELSE gender END as new_gender from EMP").show()

+-------+------+------+----------+
|   name|gender|salary|new_gender|
+-------+------+------+----------+
|  James|     M| 60000|      Male|
|Michael|     M| 70000|      Male|
| Robert|  null|400000|          |
|  Maria|     F|500000|    Female|
|    Jen|      |  null|          |
+-------+------+------+----------+



In [29]:
from pyspark.sql.functions import col, when
               
df5 = df.withColumn("new_gender", when((col("gender") == 'M') | (col("gender") == 'F'), "Straight")
                                .when(col("gender") == 'null', "unknown")
                                .otherwise("non-declared"))
               

df5.show(truncate=False)

+-------+------+------+------------+
|name   |gender|salary|new_gender  |
+-------+------+------+------------+
|James  |M     |60000 |Straight    |
|Michael|M     |70000 |Straight    |
|Robert |null  |400000|non-declared|
|Maria  |F     |500000|Straight    |
|Jen    |      |null  |non-declared|
+-------+------+------+------------+

