## Inner Join

Inner joins only return rows with matching values on both tables.

To merge, find similar column to merge on.

```<MERGED TABLE> = <TABLE 1>.merge(<TABLE2>, on='<SIMILAR COL>', suffixes=('_<SUFFIX 1>', '_<SUFFIX 2>'))```

*Note: the ```on``` argument also accepts a list of columns.*

In [5]:
import pandas as pd


## One-to-one Relationship
Every row in the left table is related to only one row in the right table.

e.g. Let's say we have a ward table and a census table.  It makes sense that there is only one population that corresponds to each ward.

## One-to-many Relationship
Every row in the left table is related to one or more rows in the right table.

e.g. Let's say we have a ward table and a businesses table.  Each ward may have multiple businesses.

## Left Join

Left join returns all rows in the left table and only those on the right table where the key columns match.

By default, merge does inner join.  To do a left join, we specify with the ```how='left'``` argument.

```<MERGED TABLE> = <TABLE 1>.merge(<TABLE2>, on='<SIMILAR COL>', suffixes=('_<SUFFIX 1>', '_<SUFFIX 2>', how='left')```

## Right Join

Mirror opposite of left join.


If columns on merging tables are differently named, but the same thing use the ```left_on``` and ```right_on``` arguments.

```<MERGED TABLE> = <TABLE 1>.merge(<TABLE2>, on='<SIMILAR COL>', left_on='<LEFT COL NAME>', right_on='<RIGHT COL NAME>')```

## Outer Join

Will return all rows from both tables, whether or not there is a match.

Use argument ```how='outer'```.

## Self Join
Any left/right/inner join on the same table.

e.g. The movie sequels table example.  Doing inner join shows only movies with sequels.  Left join will show all movies.

## Merging on Indexes
Virtually the same, except when merging with `left_on` and `right_on` arguments we must also include ```left_index=True``` and `right_index=True`

## Filtering Joins
Filter observations from table based on whether or not they match an observation in another table.

## Semi join
Returns the intersection similar to an inner join, but returns columns from the left table only.  No duplicates are returned.

If Table A is the left table and Table B is the right table, and we want to do a semi-join:
1. Merge Tables A and B: `<TAB_A-B> = <TAB_A>.merge(<TAB_B>, on='<COL>)`
2. Filter: `<TAB_A>[<TAB_A>['<COL>'].isin(<TAB_A-B>['<COL>'])]`

## Anti join
Return the left table, excluding the intersection. Returns columns only from left table.

If Table A is the left table and Table B is the right table, and we want to do an anti-join:
1. Left Join Tables A and B with `indicator=True`: `<TAB_A-B> = <TAB_A>.merge(<TAB_B>, on='<COL>, how='left', indicator=True)`
2. Filtered List: `<NOT_IN_B> = <TAB_A-B>.loc[<TAB_A-B>['_merge'] == 'left_only', '<COL>']`
3. Selecting from Filtered List: `<TAB A>[<TAB A>['<COL>'.isin(NOT_IN_B>)]`

## Concatenating Tables Vertically

* If all tables have same column names, use `pd.concat([<LIST OF TABLES TO CONCATENATE>])`
* Can use `ignore_index=True` to set default index from 0 to n-1.
* When using `ignore_index=False`, we can specify argument `keys=[<AN ADDITIONAL LABEL FOR INDEX>]`
* By default, concatenation is an outer join.  If we wish to specify, we can use argument `join='inner'`

## Verifying Integrity of Data
When calling `.merge()` we can specify a `validate` argument as follows: `.merge(validate=None)`

Possible options for validation include:
* `'one_to_one'`
* `'one_to_many'`
* `'many_to_one'`
* `'many_to_many'`

When calling `.concat()` we can specify a `verify_integrity` argument as follows: `.concat(verify_integrity=True)`

The default value of `verify_integrity` is `False`.  When set to true the method checks whether the concatenated index contains duplicates.

## merge_ordered()
The `merge_ordered()` function defaults with an outer join.  Can use arguments `on`, `how`, and `suffixes`.

e.g.: `pd.merge_ordered(<TAB1>,<TAB2>)`

For almost all missing data, we can use forward filling (fill missing col data with previous value) by specifying the `fill_method` argument with `fill_method='ffill'`

## merge_asof()
Similar to a left ordered merge, except will match on the nearest key column and not always exact matches.

*Note: Merged 'on' columns must be sorted.*

By default, the `direction` argument is takes the  closest that is less than or equal to the value in question.  We can change this to greater than or equal to by setting `direction='forward'`.  The `nearest` is another option.

## Selecting Data with the .query() method
Very much like SQL `where` clause.  Can query on multiple conditions with `and` and `or`.

Some examples:
* `stocks.query('nike >= 90')`
* `stocks.query('nike > 90 and disney < 140')`
* `stocks.query('date == "2019-04-01"')`

##  Melt Method
Unpivots a table from wide to long format.

The **wide** data format indicates that every row relates to one subject.  The columns correspond to different attributes of each row-wise subject.

The **long** or **tall** data format occurs when data about a subject can be found over multiple rows and each row corresponds to an attribute about the subject.

Example:
* `social_fin.melt(id_vars=['financial','company'])` where 'financial' and 'company' are the columns we want to keep as columns and all other columns are "melted"
* `social_fin.melt(id_vars=['financial','company'], value_vars=['2018','2017'])` in this case we specify which columns we want to "melt"

The `var_name` and `value_name` arguments allow us to specify names for the the respective column names in the melted table.
