# Merges and matching

## Merges

<img align="right" style="padding-left:10px; height: 64%; width: 48%" src="Addenda/figures/sql-joins.png" >

* A number of pandas operations are patterned on SQL JOIN
* The pandas equivalent looks like `pd.merge(left=df1, right=df2, how='...', on='common_attr')`
    * `how` is `inner` by default, or can be `left`, `right`, or `outer`.
    * If the column names in both tables are different, one can use `left_on=...` and `right_on=...`.
    * SQL JOIN statements typically include a `WHERE` clause. These are unavailable in pandas but the pandas operation can be followed by pandas selectors. See "Matching" below.
* See the SQL JOIN page on [Wikipedia](https://en.wikipedia.org/wiki/Join_(SQL)) for reference.

## Matching

<img align="right" style="padding-left:10px; height: 50%; width: 50%" src="Addenda/figures/pandas-loc-iloc.png" >

Pandas provides two idioms for matching rows and columns, shown alongside:

* Selecting data by label or by a conditional statment (`.loc`). We studied `.loc()` in the 03-04-dataframes-in-pandas notebook.
* Selecting data by row numbers (`.iloc`). `.iloc` is similar except it uses row numbers.

The data manipulation workbook is based upon [the merges getting started](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html#merge) in the Pandas documentation. Following up on the comment on the WHERE clause above, the idiom for incorporating uses `.loc` (or `.iloc`) and looks like:

```
result = pd.merge(left=df1, right=df2, how='...', on='common_attr')
result = result.loc[ <row_sel>, <column_sel> ]


```


## Self Join

<img align="right" style="padding-left:10px; height: 50%; width: 50%" src="Addenda/figures/self-join.gif" >

The notion of self-joins is sometimes confusing even though it is exactly equivalent to joins. 

```
fname
------
Alex
Barb
Cory
```
Then `SELECT a.*, b.* FROM name a JOIN name b ; ` yields

```
a.fname | b.fname
------------+-----------
Alex    | Alex
Alex    | Barb
Alex    | Cory
Barb    | Alex
Barb    | Barb
Barb    | Cory
Cory    | Alex
Cory    | Barb
Cory    | Cory
```

To remove duplicates, the usual idiom is: `SELECT a.*, b.* FROM name a JOIN name b WHERE a.fname < b.fname; `

The figure alongside depicts a more general case of self-joins.