## Medium - How To Create a New Column Based on Values From Other Columns in Pandas

https://towardsdatascience.com/create-new-column-based-on-other-columns-pandas-5586d87de73d

In [1]:
import pandas as pd

df = pd.DataFrame(
    [
        (1, 'Hello', 158, True, 12.8),
        (2, 'Hey', 567, False, 74.2),
        (3, 'Hi', 123, False, 1.1),
        (4, 'Howdy', 578, True, 45.8),
        (5, 'Hello', 418, True, 21.1),
        (6, 'Hi', 98, False, 98.1),
    ],
    columns=['colA', 'colB', 'col C', 'colD', 'colE']
)

In [2]:
print(df)

   colA   colB  col C   colD  colE
0     1  Hello    158   True  12.8
1     2    Hey    567  False  74.2
2     3     Hi    123  False   1.1
3     4  Howdy    578   True  45.8
4     5  Hello    418   True  21.1
5     6     Hi     98  False  98.1


## Using apply() method

In [3]:
#  function that creates a new column based on the values of the column colC
def categorise(row):  
    if row['col C']   > 0   and row['col C'] <= 99 :
        return 'A'
    elif row['col C'] > 100 and row.colE >= 45: # an alternative way to write row.['colE']
        return 'B'
    elif row['col C'] > 200 and row['col C'] <= 299:
        return 'C'
    return 'D'

In [4]:
# all you need to do is to pass the above method to apply() as a lambda expression
df['colF'] = df.apply(lambda row: categorise(row), axis=1)
df

Unnamed: 0,colA,colB,col C,colD,colE,colF
0,1,Hello,158,True,12.8,D
1,2,Hey,567,False,74.2,B
2,3,Hi,123,False,1.1,D
3,4,Howdy,578,True,45.8,B
4,5,Hello,418,True,21.1,D
5,6,Hi,98,False,98.1,A


In [5]:
# for simpler operations, you can specify the lambda expressions directly to the apply() method: 
df['colG'] = df.apply(lambda row: row['col C'] + row.colE, axis=1)
df

Unnamed: 0,colA,colB,col C,colD,colE,colF,colG
0,1,Hello,158,True,12.8,D,170.8
1,2,Hey,567,False,74.2,B,641.2
2,3,Hi,123,False,1.1,D,124.1
3,4,Howdy,578,True,45.8,B,623.8
4,5,Hello,418,True,21.1,D,439.1
5,6,Hi,98,False,98.1,A,196.1


## Using NumPy’s select() method
A more vectorised approach (and potentially better in terms of performance) is to use NumPy’s select() method

This time, instead of defining a function we will instead create a list containing the desired conditions.

In [6]:
import numpy as np
conditions = [
  np.logical_and(df['col C'].gt(0)  , np.less_equal(df['col C'], 99)), # AND(col C > 0  , col C <= 99 )
  np.logical_and(df['col C'].gt(100), np.less_equal(df['col C'],199)), # AND(col C > 100, col C <= 199)
  np.logical_and(df['col C'].gt(200), np.less_equal(df['col C'],299)), # AND(col C > 200, col C <= 299)
]

Then define an additional list containing the corresponding values that the new column will contain. Note that in the list below we do not include the default value D.

In [7]:
outputs = ['A', 'B', 'C']

And finally, we use the select() method to apply the conditions and also specify the default value that will be used whenever none of the specified conditions is met.

In [8]:
df['colF'] = pd.Series(np.select(conditions, outputs, 'D'))

print(df)

   colA   colB  col C   colD  colE colF   colG
0     1  Hello    158   True  12.8    B  170.8
1     2    Hey    567  False  74.2    D  641.2
2     3     Hi    123  False   1.1    B  124.1
3     4  Howdy    578   True  45.8    D  623.8
4     5  Hello    418   True  21.1    D  439.1
5     6     Hi     98  False  98.1    A  196.1


## Using loc property
Finally, another option is the loc property that in some occasions might be more efficient compared to apply() method. Note that this approach may also be a little bit more verbose compared to the solutions we discussed previously.

In [9]:
df.loc[
  np.logical_and(df['col C'].gt(0), np.less_equal(df['col C'], 99)), 
  'colF'
] = 'A'
df.loc[
  np.logical_and(df['col C'].gt(100), np.less_equal(df['col C'], 199)),'colF'
] = 'B'
df.loc[
  np.logical_and(df['col C'].gt(200), np.less_equal(df['col C'], 299)),'colF'
] = 'C'
df['colF'].fillna('D', inplace=True)

print(df)

   colA   colB  col C   colD  colE colF   colG
0     1  Hello    158   True  12.8    B  170.8
1     2    Hey    567  False  74.2    D  641.2
2     3     Hi    123  False   1.1    B  124.1
3     4  Howdy    578   True  45.8    D  623.8
4     5  Hello    418   True  21.1    D  439.1
5     6     Hi     98  False  98.1    A  196.1
