# How to compute columns as a function of multiple other columns

In [1]:
import pandas as pd
import numpy as np

#### Create a dataframe
I will eventually want to populate "a_or_b" with the following computation:
    
    if "a" OR "b", return True, else False

In [12]:
df = pd.DataFrame(columns = ["a","b","a_or_b"])
df["a"] = [True, False, False, False, True]
df["b"] = [True, True, False, False, False]
df["a_or_b"] = None

df.head()

Unnamed: 0,a,b,a_or_b
0,True,True,
1,False,True,
2,False,False,
3,False,False,
4,True,False,


### 1. Looping. This is the bad way. 
You can do this when you have a small amount of data, but it is slow, so don't do it in production

In [13]:
new_column = []
for ix, row in df.iterrows():
    value = row["a"] or row["b"]
    print(value)
    new_column.append(value)
df["a_or_b"] = new_column

df.head()

True
True
False
False
True


Unnamed: 0,a,b,a_or_b
0,True,True,True
1,False,True,True
2,False,False,False
3,False,False,False
4,True,False,True


### 2. Applying functions. This is the good way

Dataframes have something called "apply" that lets you quickly apply a function to single multiple columns. It is a little clunky to write but very fast. It requires that you write a function for your calculation.

In [11]:
def boolean_or(x,y):
    """Assumes x and y are True or False"""
    return x or y

df["a_or_b"] = None # resetting this column
df.head()

Unnamed: 0,a,b,a_or_b
0,True,True,
1,False,True,
2,False,False,
3,False,False,
4,True,False,


In [14]:
# I will now apply my new function to the dataframe

df["a_or_b"] = df.apply(lambda x: boolean_or(x["a"],x["b"]), axis = 1)

df.head()

Unnamed: 0,a,b,a_or_b
0,True,True,True
1,False,True,True
2,False,False,False
3,False,False,False
4,True,False,True


Below I am just dropping column a. You need to either set "inplace" to True or assign the method call to the dataframe again.

In [24]:
df = df.drop("a", axis=1)

In [25]:
df.head()

Unnamed: 0,b,a_or_b
0,True,True
1,True,True
2,False,False
3,False,False
4,False,True
