### Adding a column is an integral part of any Analytics/ Data Science project. A new column can be added in any of the following cases:
- Adding a column using a single column
- Adding a column using multiple columns
- Adding a column after group by

There are many ways to achieve the same in python. I will discuss a few cases using a sample dataset in this blog using the following: 
- comparisons
- apply and lambda function (simple, if else and nested if else)
- map with dictionary and map with lambda function
- np.where
- np.select
- row function
- group by and transform

### Loading the libraries

In [1]:
import pandas as pd

import numpy as np

### Creating a sample dataframe

In [2]:
date = ['2022-01-01', '2022-01-01', '2022-01-01', '2022-01-02', '2022-01-02', 
          '2022-01-03', '2022-01-04', '2022-01-04', '2022-01-05', '2022-01-05']

event = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']

name = ['abc', 'abc', 'klm', 'xyz', 'xyz', 'abc', 'klm', 'xyz', 'abc', 'klm']

type = ['AA', 'AA', 'BB', 'AA', 'BB', 'BB', 'AA', 'AA', 'AA', 'BB']

value = [20, 10, 20, 15, 30, 25, 35, 10, 10, 20]

# Create a data frame using the vectors above
df = pd.DataFrame({'date': date,
                             'event': event,
                             'name': name,
                             'type': type,
                             'value': value})

In [3]:
# dataframe
df

Unnamed: 0,date,event,name,type,value
0,2022-01-01,A,abc,AA,20
1,2022-01-01,B,abc,AA,10
2,2022-01-01,C,klm,BB,20
3,2022-01-02,D,xyz,AA,15
4,2022-01-02,E,xyz,BB,30
5,2022-01-03,F,abc,BB,25
6,2022-01-04,G,klm,AA,35
7,2022-01-04,H,xyz,AA,10
8,2022-01-05,I,abc,AA,10
9,2022-01-05,J,klm,BB,20


### 1. Adding a column using one column

In [4]:
# Adding a flag column using one column
df['one_col_flag'] = (df['type'] == 'AA')*1

df

Unnamed: 0,date,event,name,type,value,one_col_flag
0,2022-01-01,A,abc,AA,20,1
1,2022-01-01,B,abc,AA,10,1
2,2022-01-01,C,klm,BB,20,0
3,2022-01-02,D,xyz,AA,15,1
4,2022-01-02,E,xyz,BB,30,0
5,2022-01-03,F,abc,BB,25,0
6,2022-01-04,G,klm,AA,35,1
7,2022-01-04,H,xyz,AA,10,1
8,2022-01-05,I,abc,AA,10,1
9,2022-01-05,J,klm,BB,20,0


### Using np.where to add a column

In [5]:
df['np_where'] = np.where(df.value > 10, df.value*2, df.value**2)

df

Unnamed: 0,date,event,name,type,value,one_col_flag,np_where
0,2022-01-01,A,abc,AA,20,1,40
1,2022-01-01,B,abc,AA,10,1,100
2,2022-01-01,C,klm,BB,20,0,40
3,2022-01-02,D,xyz,AA,15,1,30
4,2022-01-02,E,xyz,BB,30,0,60
5,2022-01-03,F,abc,BB,25,0,50
6,2022-01-04,G,klm,AA,35,1,70
7,2022-01-04,H,xyz,AA,10,1,100
8,2022-01-05,I,abc,AA,10,1,100
9,2022-01-05,J,klm,BB,20,0,40


### Using Lambda Function to add a column

In [6]:
# Adding a new column using lambda function
df['value_lambda'] = df['value'].apply(lambda x: x**2)

df

Unnamed: 0,date,event,name,type,value,one_col_flag,np_where,value_lambda
0,2022-01-01,A,abc,AA,20,1,40,400
1,2022-01-01,B,abc,AA,10,1,100,100
2,2022-01-01,C,klm,BB,20,0,40,400
3,2022-01-02,D,xyz,AA,15,1,30,225
4,2022-01-02,E,xyz,BB,30,0,60,900
5,2022-01-03,F,abc,BB,25,0,50,625
6,2022-01-04,G,klm,AA,35,1,70,1225
7,2022-01-04,H,xyz,AA,10,1,100,100
8,2022-01-05,I,abc,AA,10,1,100,100
9,2022-01-05,J,klm,BB,20,0,40,400


In [7]:
# Adding a new column using if else in lambda function
df['value_lambda2'] = df['value'].apply(lambda x: x*2 if x >= 20 else x**2)

df

Unnamed: 0,date,event,name,type,value,one_col_flag,np_where,value_lambda,value_lambda2
0,2022-01-01,A,abc,AA,20,1,40,400,40
1,2022-01-01,B,abc,AA,10,1,100,100,100
2,2022-01-01,C,klm,BB,20,0,40,400,40
3,2022-01-02,D,xyz,AA,15,1,30,225,225
4,2022-01-02,E,xyz,BB,30,0,60,900,60
5,2022-01-03,F,abc,BB,25,0,50,625,50
6,2022-01-04,G,klm,AA,35,1,70,1225,70
7,2022-01-04,H,xyz,AA,10,1,100,100,100
8,2022-01-05,I,abc,AA,10,1,100,100,100
9,2022-01-05,J,klm,BB,20,0,40,400,40


In [8]:
# Adding a new column using multiple if else in lambda function
df['value_lambda3'] = df['value'].apply(lambda x: x*10 if x >= 20 else (x**2 if x >10 else x))

df

Unnamed: 0,date,event,name,type,value,one_col_flag,np_where,value_lambda,value_lambda2,value_lambda3
0,2022-01-01,A,abc,AA,20,1,40,400,40,200
1,2022-01-01,B,abc,AA,10,1,100,100,100,10
2,2022-01-01,C,klm,BB,20,0,40,400,40,200
3,2022-01-02,D,xyz,AA,15,1,30,225,225,225
4,2022-01-02,E,xyz,BB,30,0,60,900,60,300
5,2022-01-03,F,abc,BB,25,0,50,625,50,250
6,2022-01-04,G,klm,AA,35,1,70,1225,70,350
7,2022-01-04,H,xyz,AA,10,1,100,100,100,10
8,2022-01-05,I,abc,AA,10,1,100,100,100,10
9,2022-01-05,J,klm,BB,20,0,40,400,40,200


### Using map function to map values in a column to another values

In [9]:
# Using map to map all values in a column to another value using a dictionary
df['map_dict'] = df['value'].map({10:'ten', 20:'twenty'})
df


Unnamed: 0,date,event,name,type,value,one_col_flag,np_where,value_lambda,value_lambda2,value_lambda3,map_dict
0,2022-01-01,A,abc,AA,20,1,40,400,40,200,twenty
1,2022-01-01,B,abc,AA,10,1,100,100,100,10,ten
2,2022-01-01,C,klm,BB,20,0,40,400,40,200,twenty
3,2022-01-02,D,xyz,AA,15,1,30,225,225,225,
4,2022-01-02,E,xyz,BB,30,0,60,900,60,300,
5,2022-01-03,F,abc,BB,25,0,50,625,50,250,
6,2022-01-04,G,klm,AA,35,1,70,1225,70,350,
7,2022-01-04,H,xyz,AA,10,1,100,100,100,10,ten
8,2022-01-05,I,abc,AA,10,1,100,100,100,10,ten
9,2022-01-05,J,klm,BB,20,0,40,400,40,200,twenty


In [10]:
# Using map and lambda function
df['map_lambda'] = df['value'].map(lambda x: x*10 if x >= 20 else (x**2 if x >10 else x))
df

Unnamed: 0,date,event,name,type,value,one_col_flag,np_where,value_lambda,value_lambda2,value_lambda3,map_dict,map_lambda
0,2022-01-01,A,abc,AA,20,1,40,400,40,200,twenty,200
1,2022-01-01,B,abc,AA,10,1,100,100,100,10,ten,10
2,2022-01-01,C,klm,BB,20,0,40,400,40,200,twenty,200
3,2022-01-02,D,xyz,AA,15,1,30,225,225,225,,225
4,2022-01-02,E,xyz,BB,30,0,60,900,60,300,,300
5,2022-01-03,F,abc,BB,25,0,50,625,50,250,,250
6,2022-01-04,G,klm,AA,35,1,70,1225,70,350,,350
7,2022-01-04,H,xyz,AA,10,1,100,100,100,10,ten,10
8,2022-01-05,I,abc,AA,10,1,100,100,100,10,ten,10
9,2022-01-05,J,klm,BB,20,0,40,400,40,200,twenty,200


In [11]:
# Dropping all the previously created column
df.drop(['one_col_flag', 'np_where', 'value_lambda', 'value_lambda2', 'value_lambda3', 'map_dict', 'map_lambda'], axis = 1, inplace  = True)

### 2. Adding a column using two or more columns

In [12]:
# Adding a flag column using two column
df['two_col_flag'] = (df['type'] == 'AA') & (df['value'] >=20)

df

Unnamed: 0,date,event,name,type,value,two_col_flag
0,2022-01-01,A,abc,AA,20,True
1,2022-01-01,B,abc,AA,10,False
2,2022-01-01,C,klm,BB,20,False
3,2022-01-02,D,xyz,AA,15,False
4,2022-01-02,E,xyz,BB,30,False
5,2022-01-03,F,abc,BB,25,False
6,2022-01-04,G,klm,AA,35,True
7,2022-01-04,H,xyz,AA,10,False
8,2022-01-05,I,abc,AA,10,False
9,2022-01-05,J,klm,BB,20,False


In [13]:
# Adding a flag column using two column
df['two_col_flag2'] = ((df['type'] == 'AA') | (df['value'] <20))*1

df

Unnamed: 0,date,event,name,type,value,two_col_flag,two_col_flag2
0,2022-01-01,A,abc,AA,20,True,1
1,2022-01-01,B,abc,AA,10,False,1
2,2022-01-01,C,klm,BB,20,False,0
3,2022-01-02,D,xyz,AA,15,False,1
4,2022-01-02,E,xyz,BB,30,False,0
5,2022-01-03,F,abc,BB,25,False,0
6,2022-01-04,G,klm,AA,35,True,1
7,2022-01-04,H,xyz,AA,10,False,1
8,2022-01-05,I,abc,AA,10,False,1
9,2022-01-05,J,klm,BB,20,False,0


### Adding a column using np.where

In [14]:
df['np_where2'] = np.where((df.value > 10)&(df.type=='AA'), 1, df.value**2)

df

Unnamed: 0,date,event,name,type,value,two_col_flag,two_col_flag2,np_where2
0,2022-01-01,A,abc,AA,20,True,1,1
1,2022-01-01,B,abc,AA,10,False,1,100
2,2022-01-01,C,klm,BB,20,False,0,400
3,2022-01-02,D,xyz,AA,15,False,1,1
4,2022-01-02,E,xyz,BB,30,False,0,900
5,2022-01-03,F,abc,BB,25,False,0,625
6,2022-01-04,G,klm,AA,35,True,1,1
7,2022-01-04,H,xyz,AA,10,False,1,100
8,2022-01-05,I,abc,AA,10,False,1,100
9,2022-01-05,J,klm,BB,20,False,0,400


### Adding a column using np.select

In [15]:
conditions = [(df['type'] == 'AA') & (df['value'] > 10), (df['type'] == 'BB') & (df['value'] >= 20)]

choice = [1,2]

df['np_select'] = np.select(conditions, choice, default = -1)

df

Unnamed: 0,date,event,name,type,value,two_col_flag,two_col_flag2,np_where2,np_select
0,2022-01-01,A,abc,AA,20,True,1,1,1
1,2022-01-01,B,abc,AA,10,False,1,100,-1
2,2022-01-01,C,klm,BB,20,False,0,400,2
3,2022-01-02,D,xyz,AA,15,False,1,1,1
4,2022-01-02,E,xyz,BB,30,False,0,900,2
5,2022-01-03,F,abc,BB,25,False,0,625,2
6,2022-01-04,G,klm,AA,35,True,1,1,1
7,2022-01-04,H,xyz,AA,10,False,1,100,-1
8,2022-01-05,I,abc,AA,10,False,1,100,-1
9,2022-01-05,J,klm,BB,20,False,0,400,2


### Using apply and row function
It can be used in all the cases and any complex column addition can be achieved using row function

In [16]:
def type_event(row):
    if (row['type'] == 'AA') & (row['value'] > 10):
        val = 1
    elif (row['type'] == 'BB') & (row['value'] >= 20):
        val = 2
    else:
        val = 0
    return val

In [17]:
df['row_fun'] = df.apply(type_event, axis = 1)

df

Unnamed: 0,date,event,name,type,value,two_col_flag,two_col_flag2,np_where2,np_select,row_fun
0,2022-01-01,A,abc,AA,20,True,1,1,1,1
1,2022-01-01,B,abc,AA,10,False,1,100,-1,0
2,2022-01-01,C,klm,BB,20,False,0,400,2,2
3,2022-01-02,D,xyz,AA,15,False,1,1,1,1
4,2022-01-02,E,xyz,BB,30,False,0,900,2,2
5,2022-01-03,F,abc,BB,25,False,0,625,2,2
6,2022-01-04,G,klm,AA,35,True,1,1,1,1
7,2022-01-04,H,xyz,AA,10,False,1,100,-1,0
8,2022-01-05,I,abc,AA,10,False,1,100,-1,0
9,2022-01-05,J,klm,BB,20,False,0,400,2,2


### Additional: Adding a column after groupby
Transform is used with function sum to add a column with sum of values for each type group

In [18]:
# sum can be replaced with min, max
df['value_sum'] = df.groupby(['type'])['value'].transform(sum)

df.sort_values(by='type')

Unnamed: 0,date,event,name,type,value,two_col_flag,two_col_flag2,np_where2,np_select,row_fun,value_sum
0,2022-01-01,A,abc,AA,20,True,1,1,1,1,100
1,2022-01-01,B,abc,AA,10,False,1,100,-1,0,100
3,2022-01-02,D,xyz,AA,15,False,1,1,1,1,100
6,2022-01-04,G,klm,AA,35,True,1,1,1,1,100
7,2022-01-04,H,xyz,AA,10,False,1,100,-1,0,100
8,2022-01-05,I,abc,AA,10,False,1,100,-1,0,100
2,2022-01-01,C,klm,BB,20,False,0,400,2,2,95
4,2022-01-02,E,xyz,BB,30,False,0,900,2,2,95
5,2022-01-03,F,abc,BB,25,False,0,625,2,2,95
9,2022-01-05,J,klm,BB,20,False,0,400,2,2,95
