# Set Operations on DataFrames

To start with, let's explain what "set" operations are.  When we say "set" here, we aren't talking about the `set` data type in Python.  We're talking about **set algebra**.  https://en.wikipedia.org/wiki/Algebra_of_sets

**Operations**
* UNION
* INTERSECTION
* MINUS / EXCEPT
* COMPLIMENT

**Relations**
* EQUALITY
* INCLUSION

## OPERATIONS

In [None]:
import pandas as pd
family = pd.DataFrame([['Paul','M'],['Anny','F'],['Sarahlynn','F'],['Jim','M']], columns=['Name','Gender'])
kirkwood = pd.DataFrame([['Paul','M'],['Anny','F'],['Sarahlynn','F'],['Rob','M']], columns=['Name','Gender'])

family.set_index('Name', inplace=True)
kirkwood.set_index('Name', inplace=True)

In [None]:
family

In [None]:
kirkwood

In [None]:
# UNION
pd.concat([family, kirkwood], join='outer', axis=1, sort=False)

In [None]:
# INTERSECTION
pd.concat([family, kirkwood], join='inner', axis=1, sort=False)

In [None]:
# MINUS
family.loc[family.index.difference(kirkwood.index)]

**COMPLIMENT** isn't really a valuable concept with data frames because there isn't the idea of an "entire universe of possible values"


## RELATIONS

In [None]:
import pandas as pd
family = pd.DataFrame([['Paul','M'],['Anny','F'],['Sarahlynn','F'],['Jim','M']], columns=['Name','Gender'])
kirkwood_family = pd.DataFrame([['Paul','M'],['Anny','F'],['Sarahlynn','F']], columns=['Name','Gender'])

family.set_index('Name', inplace=True)
kirkwood_family.set_index('Name', inplace=True)

In [None]:
family

In [None]:
kirkwood_family

In [None]:
# Test for equality
family.index.equals(kirkwood_family.index)

In [None]:
# Test for inclusion
len(kirkwood_family.index.difference(family.index)) == 0