# Combining Datasets

With Pandas, we can merge, join, and concatenate datasets.

Performing these operations can help up better understand and analyze data by unifying the data.

We'll focus on the `.merge()` method and touch on `.join()` and `pd.concat()`.

## The Dataset


In [20]:
## Begin Example
import pandas as pd

In [21]:
## Begin Example
df1 = pd.read_csv("../data/spongebob1.csv")
df2 = pd.read_csv("../data/spongebob2.csv")

In [22]:
## Begin Example
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   id       4 non-null      int64 
 1   name     4 non-null      object
 2   job      4 non-null      object
 3   species  4 non-null      object
dtypes: int64(1), object(3)
memory usage: 260.0+ bytes


In [23]:
df1.head()

Unnamed: 0,id,name,job,species
0,1,Spongebob Squarepants,Fry Cook,Sea Sponge
1,2,Patrick Star,Professional Best Friend,Starfish
2,3,Squidward Tentacles,Cashier,Octopus
3,4,Sandy Cheeks,Scientist,Squirrel


In [24]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 4 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   id                5 non-null      int64 
 1   name              5 non-null      object
 2   age               5 non-null      int64 
 3   personality_type  5 non-null      object
dtypes: int64(2), object(2)
memory usage: 292.0+ bytes


In [25]:
df2.head()

Unnamed: 0,id,name,age,personality_type
0,3,Squidward Tentacles,49,Grumpy
1,4,Sandy Cheeks,27,Energetic
2,5,Mr. Krabs,78,Greedy
3,6,Plankton,52,Obsessive
4,7,Mrs. Puff,60,Anxious


## `.merge()`

`.merge()` is most useful when you want to combine rows that share data.

To use `.merge()`, we provide two arguments:

* The left dataframe
* The right dataframe

Afterwards, we can provide several optional arguments to specify how the datasets are merged:

* `how` - what kind of merge to make:
    * `inner`
    * `outer`
    * `left`
    * `right`
* `on`: specify which columns or indices to join on
    * By default, columns from the two dataframes that share names will be used as join keys
    * When using on, the column/index you choose must be present in both dataframes
* `left_on`
* `right_on`
* `left_index`
* `right_index`




## `.merge()` - Inner

```python

df1.merge(df2)
```

In [26]:
## Begin Example
df1.merge(df2)
## End Example

Unnamed: 0,id,name,job,species,age,personality_type
0,3,Squidward Tentacles,Cashier,Octopus,49,Grumpy
1,4,Sandy Cheeks,Scientist,Squirrel,27,Energetic


By default, `.merge()` will perform an inner join.

However the same operation can be performed by explicitly stating the type of join using the `how` argument and where the join will be performed using the `on` argument.

```python
df1.merge(df2,
         how = "inner",
         on = "column name")
```

In [27]:
## Begin Example

df1.merge(df2, how="inner", on=["id", "name"])
## End Example

Unnamed: 0,id,name,job,species,age,personality_type
0,3,Squidward Tentacles,Cashier,Octopus,49,Grumpy
1,4,Sandy Cheeks,Scientist,Squirrel,27,Energetic


## `.merge()` - Outer

```python

df1.merge(df2, how = "outer")
```

Outer joins return everything from both `df1` and `df2`.

In [28]:
## Begin Example
df1.merge(df2, how="outer")
## End Example

Unnamed: 0,id,name,job,species,age,personality_type
0,1,Spongebob Squarepants,Fry Cook,Sea Sponge,,
1,2,Patrick Star,Professional Best Friend,Starfish,,
2,3,Squidward Tentacles,Cashier,Octopus,49.0,Grumpy
3,4,Sandy Cheeks,Scientist,Squirrel,27.0,Energetic
4,5,Mr. Krabs,,,78.0,Greedy
5,6,Plankton,,,52.0,Obsessive
6,7,Mrs. Puff,,,60.0,Anxious


## `.merge` - Left


```python

df1.merge(df2, how = "left")
```

Returns everything from `df1` and the overlap from `df2`.

In [29]:
## Begin Example
df1.merge(df2, how = "left")
## End Example

Unnamed: 0,id,name,job,species,age,personality_type
0,1,Spongebob Squarepants,Fry Cook,Sea Sponge,,
1,2,Patrick Star,Professional Best Friend,Starfish,,
2,3,Squidward Tentacles,Cashier,Octopus,49.0,Grumpy
3,4,Sandy Cheeks,Scientist,Squirrel,27.0,Energetic


## `.merge()` - Right

```python
df1.merge(df2, how = "right")
```

Returns everything from `df2` and the overlap from `df1`.

In [30]:
## Begin Example
df1.merge(df2, how = "right")

## End Example

Unnamed: 0,id,name,job,species,age,personality_type
0,3,Squidward Tentacles,Cashier,Octopus,49,Grumpy
1,4,Sandy Cheeks,Scientist,Squirrel,27,Energetic
2,5,Mr. Krabs,,,78,Greedy
3,6,Plankton,,,52,Obsessive
4,7,Mrs. Puff,,,60,Anxious


## `.merge()` - Cross

```python

df1.merge(df2, how = "cross")
```

Cross returns all possible combinations of rows from `df1` with `df2`.

In [31]:
## Begin Example
df1.merge(df2, how = "cross")

## End Example

Unnamed: 0,id_x,name_x,job,species,id_y,name_y,age,personality_type
0,1,Spongebob Squarepants,Fry Cook,Sea Sponge,3,Squidward Tentacles,49,Grumpy
1,1,Spongebob Squarepants,Fry Cook,Sea Sponge,4,Sandy Cheeks,27,Energetic
2,1,Spongebob Squarepants,Fry Cook,Sea Sponge,5,Mr. Krabs,78,Greedy
3,1,Spongebob Squarepants,Fry Cook,Sea Sponge,6,Plankton,52,Obsessive
4,1,Spongebob Squarepants,Fry Cook,Sea Sponge,7,Mrs. Puff,60,Anxious
5,2,Patrick Star,Professional Best Friend,Starfish,3,Squidward Tentacles,49,Grumpy
6,2,Patrick Star,Professional Best Friend,Starfish,4,Sandy Cheeks,27,Energetic
7,2,Patrick Star,Professional Best Friend,Starfish,5,Mr. Krabs,78,Greedy
8,2,Patrick Star,Professional Best Friend,Starfish,6,Plankton,52,Obsessive
9,2,Patrick Star,Professional Best Friend,Starfish,7,Mrs. Puff,60,Anxious
