In [None]:
import pandas as pd
import numpy as np

# Create dataframes for examples
left_dataframe = pd.DataFrame({"ID": [1,2,3,4], "left_side": "LEFT"})
right_dataframe = pd.DataFrame({"ID": [3,4,5,6], "right_side": "RIGHT"})


# Merge
---
*Merging with Pandas works pretty much the same as SQL. So if you have SQL experience and you only need to know the Python syntax, you can scroll through this.*
<br><br>

When you're working on a data science project, you'll often start from multiple data tables that have various bits of information that you would like to see all in one place. Unfortunately, simply copying and pasting rows of data onto each other just isn't the way it works. Merge is an important tool to have because it allows you to efficiently combine data tables together in a nice and orderly fashion. In this tutorial we're starting from multiple data tables in the form of CSV files. So take a moment to look at the data we'll be using. 
<br><br>


# Understanding Merge

Let's begin with a basic line of code that merges two dataframes into one. With this command we're collecting all the rows of data from both dataframes and lining them up based on values in "some_column" that are common to both dataframes.

```python
pd.merge(left_dataframe, right_dataframe, on="some_column", how="left|right|inner|outer)`
```
- A **LEFT** dataframe is whichever one you type first.
- A **RIGHT** dataframe is whichever one you type second.
- **"on"** is the column or list of columns that determine which rows from one table match to which rows in the second table. Sometimes the columns you want to merge on have different names in the dataframes. For example, maybe one dataframe calls it RecordID while the other dataframe calls it RowID even though they are really the same ID. In those cases you can specify the column names separately for each dataframe using the "left_on" and "right_on" arguments. 
- **"how"** is the method to use, by default Pandas uses the "inner" method.

Read the [Merge documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html) for more details on arguments.
<br><br>



Let's talk about the four merge methods: `Left, Right, Inner, Outer`

We have four methods because when you merge dataframes, you're not always going to have a one-to-one match between the rows. These four methods affect how Pandas deals with un-matched data and that's what this kernel is going to cover.

To illustrate an example of what I mean by "merging on" and "matching", here are a couple of dataframes that I made. The circled values are the ones that would match if we merged the dataframes on the "ID" column. 


Left Side | Right Side
:-------------------------:|:--------------------:
<img src="https://imgur.com/bKOhL4h.jpg" alt="LEFT"> |  <img src="https://imgur.com/zp9JmWL.jpg"  alt='RIGHT'>



## 1. Left Merge
---
In a Left Merge we are mostly concerned with data on the LEFT side but we would like to add data from the RIGHT side if it has some of the same ID's. 

To do that, we chop up the rows in the RIGHT dataframe and glue pieces of it onto the LEFT dataframe. Remember, we mostly care about the LEFT side and only want  data from the RIGHT side if it has any of the same ID's. So if something in the RIGHT dataframe doesn't match or doesn't exist then we have to do things to keep the columns the same length. We do that by either adding NaN to fill the void or discarding some rows entirely. 

In this example, the LEFT side has ID's 1, 2, 3, and 4:
- The RIGHT side does not have ID's 1 or 2 so we add NaN's because we need columns to be the same length
- The RIGHT side does have data for ID's 3 and 4 so we add it as a new column
- The LEFT side does not have ID's 5 or 6 so we don't need that info from the RIGHT and it's discarded


Left Side |  | Right Side
:-------------------------:|:-------------------------:|:--------------------:
<img src="https://imgur.com/bKOhL4h.jpg" alt="LEFT"> |   |  <img src="https://imgur.com/zp9JmWL.jpg"  alt='RIGHT'>
| Left Merged | 
| <img src="https://imgur.com/hGg40Po.jpg" alt="left join"> |

In [None]:
# Left merge on "ID" column 
pd.merge(left=left_dataframe, right=right_dataframe, on="ID", how="left")

## 2. Right Merge
---
Right Merges work just like Left Merges, the difference is that we mostly care about the RIGHT side and would like to add data from the LEFT if it has matching ID's.



Left Side |  | Right Side
:-------------------------:|:-------------------------:|:--------------------:
<img src="https://imgur.com/bKOhL4h.jpg" alt="LEFT"> |   |  <img src="https://imgur.com/zp9JmWL.jpg"  alt='RIGHT'>
| Right Merged | 
| <img src="https://imgur.com/GDfbxmT.jpg" alt="Right Merge"> |


In [None]:
# Code for a right merge
pd.merge(left=left_dataframe, right=right_dataframe, on="ID", how="right")

## 3. Inner Merge
With an Inner Merge, we chop up both dataframes and only glue the stuff that matches. If an ID isn't in both dataframes, we don't keep it and we don't add NaN's.

Left Side |  | Right Side
:-------------------------:|:-------------------------:|:--------------------:
<img src="https://imgur.com/bKOhL4h.jpg" alt="LEFT"> |   |  <img src="https://imgur.com/zp9JmWL.jpg"  alt='RIGHT'>
| Inner Merged | 
| <img src="https://imgur.com/juL2H0R.jpg" alt="Right Merge"> |


In [None]:
# Inner merge on ID column
pd.merge(left=left_dataframe, right=right_dataframe, on="ID", how="inner")

## 4. Outer Merge
With an Outer Merge, we chop up both dataframes and keep everything from both sides. Then we toss in NaN's to fill any blanks. 

Left Side |  | Right Side
:-------------------------:|:-------------------------:|:--------------------:
<img src="https://imgur.com/bKOhL4h.jpg" alt="LEFT"> |   |  <img src="https://imgur.com/zp9JmWL.jpg"  alt='RIGHT'>
|   Outer Merged |   
| <img src="https://imgur.com/NZqgFf1.jpg" alt="Right Merge"> |


In [None]:
# Outer merge on ID column
pd.merge(left=left_dataframe, right=right_dataframe, on="ID", how="outer")

# Example

In these examples, we're merging actual restaurant ratings with parking lot info. I've highlighted the placeID columns because that's what we are merging on. You might want to fork this kernel and explore the final dataframes to see the differences when using the different merge methods. 
<br><br>

Ratings |  Parking
:-------------------------:|:--------------------:
<img src="https://imgur.com/EX0MAr6.jpg" width=400 alt="LEFT">  | <img src="https://imgur.com/ZWwEZNR.jpg" width=180  alt='RIGHT'>

In [None]:
# Load restaurant ratings and parking data into dataframes
ratings = pd.read_csv("../input/rating_final.csv")
parking = pd.read_csv("../input/chefmozparking.csv")

# Left merge: Keep everything from the left, drop things from the right (if they don't match)
left_merge = pd.merge(left=ratings, right=parking, on="placeID", how="left")
left_merge.head()

# Conclusion

In this tutorial you've seen how to merge dataframes together using the four methods, `Left, Right, Inner, and Outer`. These are the basic merge methods you can use to merge data based on common values in some column. There are also different ways to merge "on." For example, you can merge "on" the indexes. Or as mentioned at the beginning, sometimes the columns you want to merge on have different names so you can specify the column names separately for each dataframe using the "left_on" and "right_on" arguments. This tutorial hasn't covered that so take a look at the [Merge documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html) to see the other ways you can merge data with Pandas.
<br><br>

If you're feeling confident with Merge, then <a href="https://www.kaggle.com/crawford/python-groupby-tutorial" alt="Click here to continue to the Groupby Tutorial">click here</a> to continue to the Python Groupby Tutorial.