In [1]:
%pip install numpy pandas sqlalchemy



In [2]:
import numpy as np
import pandas as pd

# Reorganizing Data in DataFrames: Merging (a.k.a. "Joining") Columns

## Merge / Joins
The `pd.merge()` function and `DataFrame.join()` method take two DataFrames and make them **wider** by matching rows with the same-values on a specified column.  

For example, it can turn this `df1` DataFrame:

| Day | Weather |
| :-: | :---:   |
| Monday | Sunny   |
| Tuesday | Rainy |

and this `df2` DataFrame:

| Day | Temperature |
| :-: | :---:   |
| Tuesday | 12   |
| Monday | 18 |

into this:

| Day | Weather | Temperature |
| :-: | :---:   | :---: |
| Monday | Sunny   | 18 |
| Tuesday | Rainy | 12 |

with one line of code:

```python
df_merged = pd.merge(left=df1, right=df2, left_on="Day", right_on="Day")
```

Just specify which columns should be matched up with each other, and it will search for the matching values automatically!  If you want it to use the index, you can alternatively supply the option `left_index=True` and/or `right_index=True`. 

### Exercises

Let's practice merging dataframes with the `pd.merge()` function.

Dataframe 1:

In [None]:
df1 = pd.DataFrame({'Name': ['Paul', 'Arash', 'Jenny'], 'Age': [16, 19, 17]})
df1

Unnamed: 0,Name,Age
0,Paul,16
1,Arash,19
2,Jenny,17


Dataframe 2:

In [None]:
df2 = pd.DataFrame({'Name': ['Arash', 'Paul', 'Sara'], 'Weight': [32, 15, 37]})
df2

Unnamed: 0,Name,Weight
0,Arash,32
1,Paul,15
2,Sara,37


Dataframe 3:

In [None]:
df3 = pd.DataFrame({'Name': ['Amy', 'Paul', 'Sara'], 'Height': [170, 190, 143]})
df3

Unnamed: 0,Name,Height
0,Amy,170
1,Paul,190
2,Sara,143


Merge the first two dataframes together.  Who do we know both the age and weight of?

Who do we know both the weight and height of?

Try merging all 3 by merging twice.  Who do we know everything about?

Note that the Names that weren't present in both dataframes dropped out of the final result.  If you'd like to keep them and have NaNs appear, you can change the `how` parameter in the `pd.merge()` function.  Let's try out a few options by merging dataframes 1 and 2:

`how="outer"`

`how="left"`

`how="right"`

`how="inner"`

Recognizing that multiple inner joins can result in high data attrition, what policies would you put in your future data analyses to both prevent data loss and keep data easy to analyze?