# Data Analysis in Python - XII: Conditional Column Creation

## Introduction


In this lesson, we will review functionality that can be useful for creating derived or custom columns in DataFrames based on conditions. 

Note: 
1. Use the TOC to navigate between sections.


## Need for derived or custom columns

Often, you will want to include a column (or feature) in a DataFrame that will make subsequent analysis easier. However, this column may not be readily available. In such a case, you may need to extract the necessary information from other columns (e.g., by cleaning the data in a column or combining the data from multiple columns).

## Applying conditional logic to create columns

A common use case in data preparation is to create columns by applying conditional logic to one or more other columns. There are serveral ways to do this. We will learn the use of `numpy.where()` and `numpy.select()` functions to implement conditional logic for column creation. 

Let's first load the titanic dataset.

In [1]:
# import pandas library
import pandas as pd

# import seaborn library to load the titanic dataset
import seaborn as sns

# load the titanic data set from the seaborn library
titanic = sns.load_dataset("titanic")
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


Let's create a boolean column called 'adult' based on a passenger's age using the `where()` function from the numpy library.

The syntax for `numpy.where()` is `numpy.where(condition, return value if true, return value if false)`.

In [4]:
# create a boolean column called adult_female using numpy.where() function.

# import numpy
import numpy as np

# create column and assign to the data frame.
titanic['adult']=np.where(titanic['age']>=18,True,False)
titanic

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone,adult
0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False,True
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False,True
2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True,True
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False,True
4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True,True
887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True,True
888,0,3,female,,1,2,23.4500,S,Third,woman,False,,Southampton,no,False,False
889,1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True,True


In [5]:
# filter data on age >= 18 and check adult column. Then check age < 18.
adultPassengers = titanic[titanic['age']>=18]
adultPassengers.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone,adult
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False,True
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False,True
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False,True
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True,True


Next, let's follow the logic used to create the adult_male column and create a column called adult_female. This will require coding a compound condition using pandas syntax. Instead of True and False, let's use Yes and No as values in the column.

In [10]:
# create adult_female column
# age should be at least 18 and sex should be female
titanic['adult_female']=np.where((titanic['age']>=18) &(titanic['sex']=='female'),'Yes','No')
titanic['adult_male']=np.where((titanic['age']>=18) &(titanic['sex']=='male'),'Yes','No')

# display first 10 rows
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone,adult,adult_female
0,0,3,male,22.0,1,0,7.25,S,Third,man,Yes,,Southampton,no,False,True,No
1,1,1,female,38.0,1,0,71.2833,C,First,woman,No,C,Cherbourg,yes,False,True,Yes
2,1,3,female,26.0,0,0,7.925,S,Third,woman,No,,Southampton,yes,True,True,Yes
3,1,1,female,35.0,1,0,53.1,S,First,woman,No,C,Southampton,yes,False,True,Yes
4,0,3,male,35.0,0,0,8.05,S,Third,man,Yes,,Southampton,no,True,True,No


Next, let's replicate the logic used to create the 'who' column which classifies passengers as 'man', 'woman', 'child'. We will store our results in a column called 'who2'.

Here we will use the `numpy.select()` function. The syntax is `numpy.select(condition list, choice list, default=0)`.

In [11]:
# list of conditions
conditions =[
    (titanic['age']>=18)&(titanic['sex']=='male'),
    (titanic['age']>=18)&(titanic['sex']=='female'),
    (titanic['age']<18)
]

# list of choices (who2 values)
choices =[
    'man','woman','child'
]
# call np.select and create the 'who2' column
titanic['who2']=np.select(conditions,choices,'unclear')

# display the first 10 rows
titanic.head(10)

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone,adult,adult_female,who2
0,0,3,male,22.0,1,0,7.25,S,Third,man,Yes,,Southampton,no,False,True,No,man
1,1,1,female,38.0,1,0,71.2833,C,First,woman,No,C,Cherbourg,yes,False,True,Yes,woman
2,1,3,female,26.0,0,0,7.925,S,Third,woman,No,,Southampton,yes,True,True,Yes,woman
3,1,1,female,35.0,1,0,53.1,S,First,woman,No,C,Southampton,yes,False,True,Yes,woman
4,0,3,male,35.0,0,0,8.05,S,Third,man,Yes,,Southampton,no,True,True,No,man
5,0,3,male,,0,0,8.4583,Q,Third,man,No,,Queenstown,no,True,False,No,unclear
6,0,1,male,54.0,0,0,51.8625,S,First,man,Yes,E,Southampton,no,True,True,No,man
7,0,3,male,2.0,3,1,21.075,S,Third,child,No,,Southampton,no,False,False,No,child
8,1,3,female,27.0,0,2,11.1333,S,Third,woman,No,,Southampton,yes,False,True,Yes,woman
9,1,2,female,14.0,1,0,30.0708,C,Second,child,No,,Cherbourg,yes,False,False,No,child
