# Inflation

## BLS: Consumer Price Index (CPI)

Information about the survey [here](https://download.bls.gov/pub/time.series/cu/cu.txt)

In [1]:
import os
import pandas as pd

# path for the folder "project"
path = "C:\\Users\\pedro\\OneDrive\\NYU\\CSS\\II. Data Skills\\project"
os.chdir(path)

Importing all `CPI` files directly from [BLS](https://download.bls.gov/pub/time.series/cu/), and saving as a .parquet file:

In [2]:
all_itens = pd.read_csv("https://download.bls.gov/pub/time.series/cu/cu.data.1.AllItems", delimiter="\t")
food = pd.read_csv("https://download.bls.gov/pub/time.series/cu/cu.data.11.USFoodBeverage", delimiter="\t")
housing = pd.read_csv("https://download.bls.gov/pub/time.series/cu/cu.data.12.USHousing", delimiter="\t")
apparel = pd.read_csv("https://download.bls.gov/pub/time.series/cu/cu.data.13.USApparel", delimiter="\t")
transportation = pd.read_csv("https://download.bls.gov/pub/time.series/cu/cu.data.14.USTransportation", delimiter="\t")
medical = pd.read_csv("https://download.bls.gov/pub/time.series/cu/cu.data.15.USMedical", delimiter="\t")
recreation = pd.read_csv("https://download.bls.gov/pub/time.series/cu/cu.data.16.USRecreation", delimiter="\t")
education = pd.read_csv("https://download.bls.gov/pub/time.series/cu/cu.data.17.USEducationAndCommunication", delimiter="\t")
others = pd.read_csv("https://download.bls.gov/pub/time.series/cu/cu.data.18.USOtherGoodsAndServices", delimiter="\t")
groups = pd.read_csv("https://download.bls.gov/pub/time.series/cu/cu.data.20.USCommoditiesServicesSpecial", delimiter="\t")

In [3]:
CPI = pd.concat([all_itens,food,housing,apparel,transportation,medical,recreation,education,
                 others,groups])


In [4]:
CPI.columns = CPI.columns.str.strip()
CPI["series_id"] = CPI["series_id"].str.strip()
months = ["M01", "M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12"]
CPI = CPI[CPI["period"].isin(months)].drop(columns = "footnote_codes")
CPI["period"] = CPI["period"].str.replace("M","")
CPI["date"] = CPI["year"].astype(str)+"-"+CPI["period"]+"-1"
CPI["date"] = pd.to_datetime(CPI["date"])
CPI = CPI.drop(columns = ["year","period"])[["series_id","date","value"]].drop_duplicates()

In [5]:
CPI[CPI["series_id"]=="CUUR0000SA0"]

Unnamed: 0,series_id,date,value
2337,CUUR0000SA0,1913-01-01,9.800
2338,CUUR0000SA0,1913-02-01,9.800
2339,CUUR0000SA0,1913-03-01,9.800
2340,CUUR0000SA0,1913-04-01,9.800
2341,CUUR0000SA0,1913-05-01,9.700
...,...,...,...
3759,CUUR0000SA0,2022-06-01,296.311
3760,CUUR0000SA0,2022-07-01,296.276
3761,CUUR0000SA0,2022-08-01,296.171
3762,CUUR0000SA0,2022-09-01,296.808


In [6]:
CPI.to_parquet("data\\inflation\\data_bls_cpi.parquet")

&nbsp;<br>

Building the Dictionary:

In [7]:
# importing different information for the data
series = pd.read_csv("https://download.bls.gov/pub/time.series/cu/cu.series", delimiter="\t")

area = pd.read_csv("https://download.bls.gov/pub/time.series/cu/cu.area", delimiter="\t")
area = area[["area_code","area_name"]]

item = pd.read_csv("https://download.bls.gov/pub/time.series/cu/cu.item", delimiter="\t")
base = pd.read_csv("https://download.bls.gov/pub/time.series/cu/cu.base", delimiter="\t")

In [8]:
# removing blank space of column names
series.columns = series.columns.str.strip()
area.columns = area.columns.str.strip()
item.columns = item.columns.str.strip()
base.columns = base.columns.str.strip()

In [25]:
# Merge dfs
cpi_dict = pd.merge(series, item, how = "left", on = "item_code")
cpi_dict = pd.merge(cpi_dict, area, how = "left", on = "area_code")
cpi_dict = pd.merge(cpi_dict, base, how = "left", on = "base_code")
cpi_dict = cpi_dict[cpi_dict["periodicity_code"]=="R"].drop(columns="periodicity_code") 
cpi_dict["series_id"] = cpi_dict["series_id"].str.strip()
# only data for US:
cpi_dict = cpi_dict[cpi_dict["area_code"]=="0000"].drop(columns=["area_code","area_name"])
# only current data
cpi_dict = cpi_dict[cpi_dict["base_code"]=="S"].drop(columns=["base_code","base_name"])

cpi_dict = cpi_dict[
    ["series_id","item_code","item_name","display_level",
     "seasonal","series_title","base_period",
     "begin_period","begin_year","end_period","end_year"]]

cpi_dict.head()

Unnamed: 0,series_id,item_code,item_name,display_level,seasonal,series_title,base_period,begin_period,begin_year,end_period,end_year
0,CUSR0000SA0,SA0,All items,0,S,"All items in U.S. city average, all urban cons...",1982-84=100,M01,1947,M10,2022
1,CUSR0000SA0E,SA0E,Energy,1,S,"Energy in U.S. city average, all urban consume...",1982-84=100,M01,1957,M10,2022
2,CUSR0000SA0L1,SA0L1,All items less food,1,S,"All items less food in U.S. city average, all ...",1982-84=100,M01,1947,M10,2022
3,CUSR0000SA0L12,SA0L12,All items less food and shelter,1,S,All items less food and shelter in U.S. city a...,1982-84=100,M01,1967,M10,2022
4,CUSR0000SA0L12E,SA0L12E,"All items less food, shelter, and energy",1,S,"All items less food, shelter, and energy in U....",1982-84=100,M01,1967,M10,2022


In [26]:
cpi_dict.to_parquet("data\\inflation\\dict_bls_cpi.parquet")