# Real Python article "Combining Data in pandas With merge(), .join(), and concat()"

Import typical data science packages

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Open our data sets

In [None]:
climate_temp = pd.read_csv('./climate_temp.csv')
climate_precip = pd.read_csv('./climate_precip.csv')

Sample the temperature data...

In [None]:
climate_temp.sample(7, random_state=1618)

... and the precipitation data.

In [None]:
climate_precip.sample(7, random_state=1618)

What is the shape of the data?

In [None]:
climate_temp.shape

In [None]:
climate_precip.shape

## Inner join

Begin by selecting a small slicep of precipitation data.

In [None]:
precip_one_station = climate_precip.query('STATION == "GHCND:USC00045721"')
precip_one_station.sample(7, random_state=1618)

In [None]:
precip_one_station.shape

Now, with only 385 rows in one of the parameters, we get a small dataset after the merge.

In [None]:
inner_merged = pd.merge(precip_one_station, climate_temp)
inner_merged.sample(7, random_state=1618)

The shape after the merge has 365 rows.

In [None]:
inner_merged.shape

By default, `merge` joins on all the same columns. 

Since a combination of `Station` and `Date` is unique,
one can actually limit the `merge` join to those columns.

In [None]:
inner_merged_total = pd.merge(
    climate_temp, climate_precip, on=['STATION', 'DATE']
)
inner_merged_total.sample(7, random_state=1618)

In [None]:
inner_merged_total.shape

One can also specify a single common column by supplying a 
single value for the `on` parameter (instead of a list).

## Outer join