# JOIN types

Follows [Tableau tutorial](http://www.tableau.com/learn/tutorials/on-demand/join-types-8.2). *Color images of tables are taken from that page.*

In [1]:
import pandas as pd

## Base tables

- **Left**: Siblings table
- **Right**: Eye color table
- **Key field**: Name

<table>
  <tr>
    <th><img src="images/left_table.png" width=200></th>
    <th><img src="images/right_table.png" width=200></th>
  </tr>
</table>

We want to join together these two tables using the Pandas `merge()` method – equivalent to the SQL JOIN command – but there are four options for how we can do this, depending on what data (and lack of data) we want to retain: 

**LEFT, RIGHT, INNER, and OUTER**

In [2]:
siblings = pd.DataFrame({'Name':['Taylor','Alex','Shannon','Tracy'],
                        '# of Siblings':[2,3,0,1]})
siblings

Unnamed: 0,Name,# of Siblings
0,Taylor,2
1,Alex,3
2,Shannon,0
3,Tracy,1


In [3]:
eye_color = pd.DataFrame({'Name':['Taylor','Alex','Morgan'],
                         'Eye Color':['Blue','Brown','Brown']})
eye_color

Unnamed: 0,Name,Eye Color
0,Taylor,Blue
1,Alex,Brown
2,Morgan,Brown


## LEFT JOIN

**A LEFT JOIN keeps all information from the left table**, but drops rows from the right that don't have a key entry in the left table's key column.

**Notice the Nulls (NaN in Pandas) in the resulting table where keys were missing on the Right!**

<img src="images/left_join.png" width=500>

### `pd.merge()`

We can call the Pandas `merge()` method with both DataFrames as arguments.

*Note: If you don't specify `on='Name'` it will still work – Pandas will default to the intersection of the columns in both DataFrames.* ***I think it makes your code more readable to explicitly specify the key column(s).***

In [4]:
pd.merge(siblings, eye_color, how='left', on='Name')

Unnamed: 0,Name,# of Siblings,Eye Color
0,Taylor,2,Blue
1,Alex,3,Brown
2,Shannon,0,
3,Tracy,1,


### `df.merge()`

Or, we can call the `merge()` method on the left DataFrame itself and get the same result. 

*This is what I more commonly do.*

In [5]:
siblings.merge(eye_color, how='left', on='Name')

Unnamed: 0,Name,# of Siblings,Eye Color
0,Taylor,2,Blue
1,Alex,3,Brown
2,Shannon,0,
3,Tracy,1,


## RIGHT JOIN

**A RIGHT JOIN** does the opposite – it **keeps all information from the right table**, but drops rows from the left that don't have a key entry in the rigth table's key column.

**Notice the Nulls (NaN in Pandas) in the resulting table where keys were missing on the Left!**

<img src="images/right_join.png" width=500>

In [6]:
siblings.merge(eye_color, how='right', on='Name')

Unnamed: 0,Name,# of Siblings,Eye Color
0,Taylor,2.0,Blue
1,Alex,3.0,Brown
2,Morgan,,Brown


## INNER JOIN

**An INNER JOIN keeps only the information where key columns are common to both tables.**

**Notice there are no Nulls now, but rows are missing from both tables!**

<img src="images/inner_join.png" width=500>

In [7]:
siblings.merge(eye_color, how='inner', on='Name')

Unnamed: 0,Name,# of Siblings,Eye Color
0,Taylor,2,Blue
1,Alex,3,Brown


## OUTER JOIN

**An OUTER JOIN keeps all information from both tables.**

**Notice there are now Nulls from mismatched keys in both tables!**

<img src="images/outer_join.png" width=500>

In [8]:
siblings.merge(eye_color, how='outer', on='Name')

Unnamed: 0,Name,# of Siblings,Eye Color
0,Taylor,2.0,Blue
1,Alex,3.0,Brown
2,Shannon,0.0,
3,Tracy,1.0,
4,Morgan,,Brown
