## Selecting columns 6: Adding a new column based on a mapping or condition
In this lecture we learn how to:
- add a new column with a dict mapping from an existing column
- add a new column with an `if-else` condition using `pl.when`
- add a new column with a condition on multiple columns
- add a new column with multiple `if-elif` conditions

In [1]:
import polars as pl

In [2]:
csv_file = "data_titanic.csv"

In [3]:
df = pl.read_csv(csv_file)
df.head(3)

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
i64,i64,i64,str,str,f64,i64,i64,str,f64,str,str
1,0,3,"""Braund, Mr. Ow…","""male""",22.0,1,0,"""A/5 21171""",7.25,,"""S"""
2,1,1,"""Cumings, Mrs. …","""female""",38.0,1,0,"""PC 17599""",71.2833,"""C85""","""C"""
3,1,3,"""Heikkinen, Mis…","""female""",26.0,0,0,"""STON/O2. 31012…",7.925,,"""S"""


## Add a new column based on a mapping from another column
We can add a new column based on a Python `dict` that maps values in an existing column to an new value with the `map_dict` expression.

In the example we map the class number to a class name

In [4]:
(
    df
    .with_columns(
        className = pl.col("Pclass").replace(
            {
                1:'first',
                2:'second',
                3:'third'
            }
        )
    )
    .select('Pclass','className')
    .head(2)
)

Pclass,className
i64,str
3,"""third"""
1,"""first"""


We can think of `replace` as a left join (see the section on combining `DataFrames` if you are not familiar with joins) and in fact Polars implements this as a join by converting the mapping dictionary to a `DataFrame` under-the-hood.

## Add a new column based on a condition on another column

We want to add a new binary column called `firstClass` based on the `Pclass` column where the value is `1` for first class passengers and `0` for second and third class passengers:

`firstClass = 1 if Pclass == 1`,`otherwise = 0`


In [7]:
(
    df
    .with_columns(
        firstClass = pl.when(
                         pl.col("Pclass") == 1
                      )
                      .then(1)
                      .otherwise(0)
    )
    .select('Pclass','firstClass')
    .head(10)
)

Pclass,firstClass
i64,i32
3,0
1,1
3,0
1,1
3,0
3,0
1,1
3,0
3,0
2,0


## Syntax of `pl.when`

The syntax is:
```python

    pl.when(**Boolean Expression**)
    .then(**Value if True**)
    .otherwise(**Value if False)
    .alias(**New Column Name**)
    
```

## Add a new column based on a condition on multiple other columns

We want to add a new binary column called `youngFirstClass` where first class passengers with age under 30 are `1` and all other passengers are 0. 

`youngFirstClass = 1 if (Pclass == 1 and Age < 30) otherwise = 0`


We do this with a combined filter using `() & ()` in `pl.when`

In [8]:
(
    df
    .select(
        [
            pl.col("Pclass"),
            pl.col("Age"),
            pl.when(
                (pl.col("Pclass") == 1) & (pl.col("Age") < 30)
            )
            .then(1)
            .otherwise(0)
            .alias("youngFirstClass")
        ]
    )
    .tail(5)
)

Pclass,Age,youngFirstClass
i64,f64,i32
2,27.0,0
1,19.0,1
3,,0
1,26.0,1
3,32.0,0


## Add a new column based on an `if-elif-else` condition
We are not limited to a single `if-else` condition. 

For example we want to create 3 categories with:
- `1` for young first class passengers 
- `2` for older first class passengers and 
- `0` for all other passengers

`ageClass == 1 if (Pclass == 1 and Age < 30) or
ageClass == 2 if (Pclass == 1 and Age > 30) or
otherwise 0`

We do this by repeating the `.when.then` cycle for each subsequent condition.

The syntax here for a single extra condition is:
```python

    pl.when(**Boolean Expression**)
    .then(**Value if True**)
    .when(**Boolean Expression**)
    .then(**Value if True**)
    .otherwise(**Value if False)
    .alias(**New Column Name**)
    
```


In [9]:
(
    df
    .select(
        [
            pl.col("Pclass"),
            pl.col("Age"),
            pl.when(
                (pl.col("Pclass") == 1) & (pl.col("Age")<30)
            )
            .then(1)
            .when(
                (pl.col("Pclass") == 1) & (pl.col("Age")>=30)
            )
            .then(2)
            .otherwise(0)
            .alias("ageClass")
        ]
    )
    .head(10)
)

Pclass,Age,ageClass
i64,f64,i32
3,22.0,0
1,38.0,2
3,26.0,0
1,35.0,2
3,35.0,0
3,,0
1,54.0,2
3,2.0,0
3,27.0,0
2,14.0,0


The cycles of `.when.then` can be repeated indefinitely.

## Exercises

In the exercises you will develop your understanding of:
- adding a column based on a mapping of another column
- adding a binary column based on a condition on multiple columns
- adding a column based on a nested `if-elif` condition on another column

### Exercise 1 
Create a binary column for whether a passenger is female or male.

Add a column called `isFemale` that maps rows with a female passenger to 1 and rows with a male passenger to 0. Ensure the column has an integer dtype

In [11]:
(
    pl.read_csv(csv_file)
    .select(['Sex',
            pl.when(
                pl.col('Sex') == 'female'
            )
             .then(1)
             .otherwise(0)
             .cast(pl.Int32)
             .alias('isFemale')])
    .head(10)
)

Sex,isFemale
str,i32
"""male""",0
"""female""",1
"""female""",1
"""female""",1
"""male""",0
"""male""",0
"""male""",0
"""male""",0
"""female""",1
"""female""",1


### Exercise 2 
Create a binary column for whether a passenger is female and has relatives aboard

Add a binary column called `femaleRelatives` to indicate whether a passenger is female and is travelling with no relatives (use the parents & children `Parch` column and the siblings column `SibSp`.

In [13]:
(
    pl.read_csv(csv_file)
    .select('Name', 'Sex',
            pl.when(
                (pl.col('Sex') == 'female') & ((pl.col('Parch') + pl.col('SibSp')) == 0))
                .then(1)
                .otherwise(0)
                .alias('femaleRelatives')
            )
    .head()
)

Name,Sex,femaleRelatives
str,str,i32
"""Braund, Mr. Ow…","""male""",0
"""Cumings, Mrs. …","""female""",0
"""Heikkinen, Mis…","""female""",1
"""Futrelle, Mrs.…","""female""",0
"""Allen, Mr. Wil…","""male""",0


### Exercise 3 
Create a column for whether a passenger is female and has parents & children, siblings or both aboard

Add a column called `femaleRelatives` where: 
- `1` indicates a female passenger with no relatives
- `2` indicates a female passenger with parents/children but no siblings
- `3` indicates a female passenger with siblings but no parents/children
- `4` indicates a female passenger with parents/children and siblings
- `5` indicates a male passenger

In [None]:
(
    pl.read_csv(csv_file)
    .select(
        [
            'Sex',
            "SibSp",
            "Parch", 
            <blank>
          ]
    )
    .head()
)

## Solutions

### Solution to Exercise 1 

Add a column called `isFemale` that maps rows with a female passenger to 1 and rows with a male passenger to 0. Ensure the column has an integer dtype

In [14]:
(
    pl.read_csv(csv_file)
    .select(
        [
            'Sex',
            pl.col("Sex").replace({"female":1,"male":0}).cast(pl.Int32).alias("isFemale")
        ]
    )
    .head()
)

Sex,isFemale
str,i32
"""male""",0
"""female""",1
"""female""",1
"""female""",1
"""male""",0


### Solution to Exercise 2 
Create a binary column for whether a passenger is female and has relatives aboard

In [15]:
(
    pl.read_csv(csv_file)
    .select(
        [
            'Sex',
            pl.when(
                (pl.col("Sex")=="female") & (pl.col("SibSp")==0) & (pl.col("Parch") == 0)
            )
            .then(1)
            .otherwise(0)
            .alias("femaleRelatives")

        ]
    )
    .head()
)

Sex,femaleRelatives
str,i32
"""male""",0
"""female""",0
"""female""",1
"""female""",0
"""male""",0


### Solution to Exercise 3 

Create a column for whether a passenger is female and has parents & children, siblings or both aboard

In [16]:
(
    pl.read_csv(csv_file)
    .select(
        [
            'Sex',
            "SibSp",
            "Parch", 
            pl.when(
                (pl.col("Sex")=="female") & (pl.col("SibSp")==0) & (pl.col("Parch") == 0)
            )
            .then(1)
            .when(
                (pl.col("Sex")=="female") & (pl.col("SibSp")==0) & (pl.col("Parch") > 0)
            )
            .then(2)
            .when(
                (pl.col("Sex")=="female") & (pl.col("SibSp") > 0) & (pl.col("Parch") == 0)
            )
            .then(3)
            .when(
                (pl.col("Sex")=="female") & (pl.col("SibSp") > 0) & (pl.col("Parch") > 0)
            )
            .then(4)
            .otherwise(0)
            .alias("femaleRelatives")
          ]
    ).head()
)

Sex,SibSp,Parch,femaleRelatives
str,i64,i64,i32
"""male""",1,0,0
"""female""",1,0,3
"""female""",0,0,1
"""female""",1,0,3
"""male""",0,0,0
