In [1]:
import pandas as pd

# Merges

In this subchapter, we will go over how to combine different DataFrames in `pandas`.

To start, let's load in the same dataset as the first subchapter in the `Pandas` chapter. As a reminder, this dataset has beer sales across 50 continental states in the US. It is sourced from [_Salience and Taxation: Theory and Evidence_](https://www.aeaweb.org/articles?id=10.1257/aer.99.4.1145) by Chetty, Looney, and Kroft (AER 2010), and it includes 7 columns:
- `st_name`: the state abbreviation
- `year`: the year the data was recorded
- `c_beer`: the quantity of beer consumed, in thousands of gallons
- `beer_tax`: the ad valorem tax, as a percentage
- `btax_dollars`: the excise tax, represented in dollars per case (24 cans) of beer 
- `population`: the population of the state, in thousands
- `salestax`: the sales tax percentage



In [2]:
df = pd.read_csv('data/beer_tax.csv')
df

Unnamed: 0,st_name,year,c_beer,beer_tax,btax_dollars,population,salestax
0,AL,1970,33098,72.341130,2.370,3450,4.0
1,AL,1971,37598,69.304600,2.370,3497,4.0
2,AL,1972,42719,67.149190,2.370,3539,4.0
3,AL,1973,46203,63.217026,2.370,3580,4.0
4,AL,1974,49769,56.933796,2.370,3627,4.0
...,...,...,...,...,...,...,...
1703,WY,1999,12423,0.319894,0.045,492,4.0
1704,WY,2000,12595,0.309491,0.045,494,4.0
1705,WY,2001,12808,0.300928,0.045,494,4.0
1706,WY,2002,13191,0.296244,0.045,499,4.0


However, as our purpose is to discuss combining different DataFrames, let us now also load in two additional datasets. The first dataset has countrywide inflation data for the US, [sourced](https://www.bls.gov/cpi/data.htm) from the Bureau of Labor Statistics. The second dataset has statewide inflation for some states, sourced from *The Slope of the Phillips Curve: Evidence from U.S. States* by Jonathon Hazell, Juan Herreño, Emi Nakamura, Jón Steinsson (QJE 2022, [link](https://academic.oup.com/qje/article/137/3/1299/6529257)).

In [21]:
us_inf = pd.read_csv('data/us_inf.csv')
us_inf.head()

Unnamed: 0,year,US Inflation
0,1967,35.980357
1,1968,38.0616
2,1969,39.96251
3,1970,42.184851
4,1971,44.389292


In [45]:
state_inf = pd.read_csv('data/state_inf.csv')
state_inf.head()

Unnamed: 0,state,year,State Inflation
0,AL,1989,4.002484
1,AL,1990,3.54908
2,AL,1991,3.598064
3,AL,1992,1.077842
4,AL,1993,2.445113


First, let's try joining the US Inflation data with the beer tax table. We can use `pd.merge()` to accomplish this. A simple join is done below.

In [46]:
pd.merge(left = df, right = us_inf, left_on = 'year', right_on = 'year')

Unnamed: 0,st_name,year,c_beer,beer_tax,btax_dollars,population,salestax,US Inflation
0,AL,1970,33098,72.341130,2.370000,3450,4.0,42.184851
1,AK,1970,5372,13.735660,0.562500,304,0.0,42.184851
2,AZ,1970,38604,5.494264,0.180000,1795,3.0,42.184851
3,AR,1970,22378,16.632357,0.544900,1930,3.0,42.184851
4,CA,1970,363645,2.747132,0.090000,20023,5.0,42.184851
...,...,...,...,...,...,...,...,...
1703,VA,2003,151706,4.093625,0.636000,7386,4.5,254.390503
1704,WA,2003,116550,3.769560,0.585652,6131,6.5,254.390503
1705,WV,2003,41400,2.569457,0.399200,1810,6.0,254.390503
1706,WI,2003,151000,0.934582,0.145200,5472,5.0,254.390503
