# Inflation

## BLS CPI Weights

Web Scraping to import CPI monthly weights

In [6]:
import os
import pandas as pd

# path for the folder "project"
path = "C:\\Users\\pedro\\OneDrive\\NYU\\CSS\\II. Data Skills\\project"
os.chdir(path)

**CPI Weights**

We need to access the data for the CPI weights. BLS doesn't provide the data for us, so we need to get it by using web scrapping techniques.

The CPI weights are released in the press release. We can access the historical press releases [here](https://www.bls.gov/bls/news-release/cpi.htm) 

Lets try to import `table 1` from [oct-22](https://www.bls.gov/news.release/archives/cpi_11102022.htm)

In [3]:
# oct 22
oct22 = pd.read_html("https://www.bls.gov/news.release/archives/cpi_11102022.htm", match = "Table 1.")

names = oct22[0].iloc[0:-1,0]
weights = oct22[0].iloc[0:-1,1]

oct22 = pd.concat([names,weights],axis=1)
oct22.columns = oct22.columns.droplevel()
oct22 = oct22.rename(columns={oct22.columns[0]:"Expenditure Category"})
oct22["Expenditure Category"] = oct22["Expenditure Category"].str.strip()
oct22 = oct22.dropna()
oct22

Unnamed: 0,Expenditure Category,RelativeimportanceSep.2022
0,All items,100.0
1,Food,13.705
2,Food at home,8.507
3,Cereals and bakery products,1.105
4,"Meats, poultry, fish, and eggs",1.904
5,Dairy and related products,0.806
6,Fruits and vegetables,1.431
7,Nonalcoholic beverages and beverage materials,0.978
8,Other food at home,2.284
9,Food away from home(1),5.197


Let's try now to get the `sep-22` table1

In [4]:
# sep 22
sep22 = pd.read_html("https://www.bls.gov/news.release/archives/cpi_10132022.htm", match = "Table 1.")

names = sep22[0].iloc[0:-1,0]
weights = sep22[0].iloc[0:-1,1]

sep22 = pd.concat([names,weights],axis=1)
sep22.columns = sep22.columns.droplevel()
sep22 = sep22.rename(columns={sep22.columns[0]:"Expenditure Category"})
sep22["Expenditure Category"] = sep22["Expenditure Category"].str.strip()
sep22 = sep22.dropna()
sep22

Unnamed: 0,Expenditure Category,RelativeimportanceAug.2022
0,All items,100.0
1,Food,13.635
2,Food at home,8.475
3,Cereals and bakery products,1.098
4,"Meats, poultry, fish, and eggs",1.905
5,Dairy and related products,0.804
6,Fruits and vegetables,1.413
7,Nonalcoholic beverages and beverage materials,0.973
8,Other food at home,2.283
9,Food away from home(1),5.16


It seams they follow the same format. We can now, join the two dfs

In [5]:
cpi_weights = pd.merge(sep22,oct22, how="left", on="Expenditure Category")
cpi_weights

Unnamed: 0,Expenditure Category,RelativeimportanceAug.2022,RelativeimportanceSep.2022
0,All items,100.0,100.0
1,Food,13.635,13.705
2,Food at home,8.475,8.507
3,Cereals and bakery products,1.098,1.105
4,"Meats, poultry, fish, and eggs",1.905,1.904
5,Dairy and related products,0.804,0.806
6,Fruits and vegetables,1.413,1.431
7,Nonalcoholic beverages and beverage materials,0.973,0.978
8,Other food at home,2.283,2.284
9,Food away from home(1),5.16,5.197


**Creating Function:**

In [7]:
def weights_update(url, n, month):
    """
    Webscrapping table with CPI weights for a given month from BLS website.
    ----
    Params:
        url: string. BLS html website
        n: int. position of the table in the HTML
        month: string. Reference month. yyyy-mm-dd format.
    
    ----
    Results:
        new: pandas data frame with cpi weights for the requested month
    
    """
    
    new = pd.read_html(url, match="Table 1.")
    
    names = new[n].iloc[0:-1,0]
    weights = new[n].iloc[0:-1,1]
    
    new = pd.concat([names,weights],axis=1)
    new.columns = new.columns.droplevel()
    new = new.rename(columns={new.columns[0]:"Expenditure Category",
                              new.columns[1]:month})
    new["Expenditure Category"] = new["Expenditure Category"].str.strip()
    new = new.dropna()

    # replacing (1), (2)... values in names:
    for i in range(20):
        new["Expenditure Category"] = new["Expenditure Category"].str.replace(f"({i})", "", regex=False)
    
    new["Expenditure Category"] = new["Expenditure Category"].replace("Airline fare","Airline fares")
    #new["Expenditure Category"] = new["Expenditure Category"].str.replace("(2)", "", regex=False)
    #new["Expenditure Category"] = new["Expenditure Category"].str.replace("(3)", "", regex=False)
    #new["Expenditure Category"] = new["Expenditure Category"].str.replace("(4)", "", regex=False)
    
    return new


**2012**

In [10]:
# dict 2012
dict_2012 = pd.DataFrame({"url":["https://www.bls.gov/news.release/archives/cpi_05152012.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_06142012.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_07172012.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_08152012.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_09142012.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_10162012.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_11152012.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_12142012.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_01162013.htm"],
                          "date":["2012-04-01","2012-05-01","2012-06-01",
                                  "2012-07-01","2012-08-01","2012-09-01","2012-10-01","2012-11-01",
                                  "2012-12-01"]})

print(f"iteration: 1/{len(dict_2012)+1}")

# jan2012
cpi_weights12 = weights_update(url = "https://www.bls.gov/news.release/archives/cpi_04132012.htm",
                               n = 0,
                               month = "2012-03-01")


for i in range(len(dict_2012)):
    
    print(f"iteration: {i+2}/{len(dict_2012)+1}")
    new = weights_update(url = dict_2012.iloc[i,0],
                         n = 0,
                         month = dict_2012.iloc[i,1])
    
    cpi_weights12 = pd.merge(cpi_weights12,new, how="left", on="Expenditure Category")

iteration: 1/10
iteration: 2/10
iteration: 3/10
iteration: 4/10
iteration: 5/10
iteration: 6/10
iteration: 7/10
iteration: 8/10
iteration: 9/10
iteration: 10/10


**2013**

In [11]:
# dict 2013
dict_2013 = pd.DataFrame({"url":["https://www.bls.gov/news.release/archives/cpi_03152013.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_04162013.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_05162013.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_06182013.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_07162013.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_08152013.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_09172013.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_10302013.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_11202013.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_12172013.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_01162014.htm"],
                          "date":["2013-02-01","2013-03-01","2013-04-01","2013-05-01","2013-06-01",
                                  "2013-07-01","2013-08-01","2013-09-01","2013-10-01","2013-11-01",
                                  "2013-12-01"]})

print(f"iteration: 1/{len(dict_2013)+1}")

# jan2013
cpi_weights13 = weights_update(url = "https://www.bls.gov/news.release/archives/cpi_02212013.htm",
                               n = 0,
                               month = "2013-01-01")


for i in range(len(dict_2013)):
    
    print(f"iteration: {i+2}/{len(dict_2013)+1}")
    new = weights_update(url = dict_2013.iloc[i,0],
                         n = 0,
                         month = dict_2013.iloc[i,1])
    
    cpi_weights13 = pd.merge(cpi_weights13,new, how="left", on="Expenditure Category")

iteration: 1/12
iteration: 2/12
iteration: 3/12
iteration: 4/12
iteration: 5/12
iteration: 6/12
iteration: 7/12
iteration: 8/12
iteration: 9/12
iteration: 10/12
iteration: 11/12
iteration: 12/12


**2014**

In [12]:
# dict 2014
dict_2014 = pd.DataFrame({"url":["https://www.bls.gov/news.release/archives/cpi_03182014.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_04152014.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_05152014.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_06172014.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_07222014.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_08192014.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_09172014.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_10222014.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_11202014.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_12172014.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_01162015.htm"],
                          "date":["2014-02-01","2014-03-01","2014-04-01","2014-05-01","2014-06-01",
                                  "2014-07-01","2014-08-01","2014-09-01","2014-10-01","2014-11-01",
                                  "2014-12-01"]})

print(f"iteration: 1/{len(dict_2014)+1}")

# jan2014
cpi_weights14 = weights_update(url = "https://www.bls.gov/news.release/archives/cpi_02202014.htm",
                               n = 0,
                               month = "2014-01-01")


for i in range(len(dict_2014)):
    
    print(f"iteration: {i+2}/{len(dict_2014)+1}")
    new = weights_update(url = dict_2014.iloc[i,0],
                         n = 0,
                         month = dict_2014.iloc[i,1])
    
    cpi_weights14 = pd.merge(cpi_weights14,new, how="left", on="Expenditure Category")


iteration: 1/12
iteration: 2/12
iteration: 3/12
iteration: 4/12
iteration: 5/12
iteration: 6/12
iteration: 7/12
iteration: 8/12
iteration: 9/12
iteration: 10/12
iteration: 11/12
iteration: 12/12


**2015**

In [13]:
# dict 2015

dict_2015 = pd.DataFrame({"url":["https://www.bls.gov/news.release/archives/cpi_03242015.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_04172015.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_05222015.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_06182015.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_07172015.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_08192015.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_09162015.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_10152015.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_11172015.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_12152015.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_01202016.htm"],
                          "date":["2015-02-01","2015-03-01","2015-04-01","2015-05-01","2015-06-01",
                                  "2015-07-01","2015-08-01","2015-09-01","2015-10-01","2015-11-01",
                                  "2015-12-01"]})

print(f"iteration: 1/{len(dict_2015)+1}")

# jan2015
cpi_weights15 = weights_update(url = "https://www.bls.gov/news.release/archives/cpi_02262015.htm",
                               n = 0,
                               month = "2015-01-01")


for i in range(len(dict_2015)):
    
    print(f"iteration: {i+2}/{len(dict_2015)+1}")
    new = weights_update(url = dict_2015.iloc[i,0],
                         n = 0,
                         month = dict_2015.iloc[i,1])
    
    cpi_weights15 = pd.merge(cpi_weights15,new, how="left", on="Expenditure Category")


iteration: 1/12
iteration: 2/12
iteration: 3/12
iteration: 4/12
iteration: 5/12
iteration: 6/12
iteration: 7/12
iteration: 8/12
iteration: 9/12
iteration: 10/12
iteration: 11/12
iteration: 12/12


**2016**

In [14]:
# dict 2016 
dict_2016 = pd.DataFrame({"url":["https://www.bls.gov/news.release/archives/cpi_03162016.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_04142016.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_05172016.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_06162016.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_07152016.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_08162016.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_09162016.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_10182016.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_11172016.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_12152016.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_01182017.htm"],
                          "date":["2016-02-01","2016-03-01","2016-04-01","2016-05-01","2016-06-01",
                                  "2016-07-01","2016-08-01","2016-09-01","2016-10-01","2016-11-01",
                                  "2016-12-01"]})

print(f"iteration: 1/{len(dict_2016)+1}")

# jan2016
cpi_weights16 = weights_update(url = "https://www.bls.gov/news.release/archives/cpi_02192016.htm",
                               n = 0,
                               month = "2016-01-01")


for i in range(len(dict_2016)):
    
    print(f"iteration: {i+2}/{len(dict_2016)+1}")
    new = weights_update(url = dict_2016.iloc[i,0],
                         n = 0,
                         month = dict_2016.iloc[i,1])
    
    cpi_weights16 = pd.merge(cpi_weights16,new, how="left", on="Expenditure Category")


iteration: 1/12
iteration: 2/12
iteration: 3/12
iteration: 4/12
iteration: 5/12
iteration: 6/12
iteration: 7/12
iteration: 8/12
iteration: 9/12
iteration: 10/12
iteration: 11/12
iteration: 12/12


**2017**

In [15]:
# dict 2017

dict_2017 = pd.DataFrame({"url":["https://www.bls.gov/news.release/archives/cpi_03152017.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_04142017.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_05122017.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_06142017.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_07142017.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_08112017.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_09142017.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_10132017.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_11152017.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_12132017.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_01122018.htm"],
                          "date":["2017-02-01","2017-03-01","2017-04-01","2017-05-01","2017-06-01",
                                  "2017-07-01","2017-08-01","2017-09-01","2017-10-01","2017-11-01",
                                  "2017-12-01"]})

print(f"iteration: 1/{len(dict_2017)+1}")

# jan2017
cpi_weights17 = weights_update(url = "https://www.bls.gov/news.release/archives/cpi_02152017.htm",
                               n = 0,
                               month = "2017-01-01")

for i in range(len(dict_2017)):
    
    print(f"iteration: {i+2}/{len(dict_2017)+1}")
    new = weights_update(url = dict_2017.iloc[i,0],
                         n = 0,
                         month = dict_2017.iloc[i,1])
    
    cpi_weights17 = pd.merge(cpi_weights17,new, how="left", on="Expenditure Category")

iteration: 1/12
iteration: 2/12
iteration: 3/12
iteration: 4/12
iteration: 5/12
iteration: 6/12
iteration: 7/12
iteration: 8/12
iteration: 9/12
iteration: 10/12
iteration: 11/12
iteration: 12/12


**2018**

In [16]:
# dict 2018

dict_2018 = pd.DataFrame({"url":["https://www.bls.gov/news.release/archives/cpi_03132018.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_04112018.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_05102018.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_06122018.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_07122018.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_08102018.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_09132018.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_10112018.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_11142018.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_12122018.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_01112019.htm"],
                          "date":["2018-02-01","2018-03-01","2018-04-01","2018-05-01","2018-06-01",
                                  "2018-07-01","2018-08-01","2018-09-01","2018-10-01","2018-11-01",
                                  "2018-12-01"]})

print(f"iteration: 1/{len(dict_2018)+1}")

# jan2018
cpi_weights18 = weights_update(url = "https://www.bls.gov/news.release/archives/cpi_02142018.htm",
                               n = 0,
                               month = "2018-01-01")

for i in range(len(dict_2018)):
    
    print(f"iteration: {i+2}/{len(dict_2018)+1}")
    new = weights_update(url = dict_2018.iloc[i,0],
                         n = 0,
                         month = dict_2018.iloc[i,1])
    
    cpi_weights18 = pd.merge(cpi_weights18,new, how="left", on="Expenditure Category")

iteration: 1/12
iteration: 2/12
iteration: 3/12
iteration: 4/12
iteration: 5/12
iteration: 6/12
iteration: 7/12
iteration: 8/12
iteration: 9/12
iteration: 10/12
iteration: 11/12
iteration: 12/12


**2019**

In [17]:
# dict 2019

dict_2019 = pd.DataFrame({"url":["https://www.bls.gov/news.release/archives/cpi_03122019.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_04102019.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_05102019.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_06122019.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_07112019.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_08132019.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_09122019.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_10102019.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_11132019.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_12112019.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_01142020.htm"],
                          "date":["2019-02-01","2019-03-01","2019-04-01","2019-05-01","2019-06-01",
                                  "2019-07-01","2019-08-01","2019-09-01","2019-10-01","2019-11-01",
                                  "2019-12-01"]})

print(f"iteration: 1/{len(dict_2019)+1}")

# jan2019
cpi_weights19 = weights_update(url = "https://www.bls.gov/news.release/archives/cpi_02132019.htm",
                               n = 0,
                               month = "2019-01-01")

for i in range(len(dict_2019)):
    
    print(f"iteration: {i+2}/{len(dict_2019)+1}")
    new = weights_update(url = dict_2019.iloc[i,0],
                         n = 0,
                         month = dict_2019.iloc[i,1])
    
    cpi_weights19 = pd.merge(cpi_weights19,new, how="left", on="Expenditure Category")

iteration: 1/12
iteration: 2/12
iteration: 3/12
iteration: 4/12
iteration: 5/12
iteration: 6/12
iteration: 7/12
iteration: 8/12
iteration: 9/12
iteration: 10/12
iteration: 11/12
iteration: 12/12


**2020**

In [20]:
# dict 2020
dict_2020 = pd.DataFrame({"url":["https://www.bls.gov/news.release/archives/cpi_03112020.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_04102020.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_05122020.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_06102020.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_07142020.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_08122020.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_09112020.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_10132020.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_11122020.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_12102020.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_01132021.htm"],
                          "date":["2020-02-01","2020-03-01","2020-04-01","2020-05-01","2020-06-01",
                                  "2020-07-01","2020-08-01","2020-09-01","2020-10-01","2020-11-01",
                                  "2020-12-01"]})

print(f"iteration: 1/{len(dict_2020)+1}")

# jan2020
cpi_weights20 = weights_update(url = "https://www.bls.gov/news.release/archives/cpi_02132020.htm",
                               n = 0,
                               month = "2020-01-01")


for i in range(len(dict_2020)):
    
    print(f"iteration: {i+2}/{len(dict_2020)+1}")
    new = weights_update(url = dict_2020.iloc[i,0],
                         n = 0,
                         month = dict_2020.iloc[i,1])
    
    cpi_weights20 = pd.merge(cpi_weights20,new, how="left", on="Expenditure Category")


iteration: 1/12
iteration: 2/12
iteration: 3/12
iteration: 4/12
iteration: 5/12
iteration: 6/12
iteration: 7/12
iteration: 8/12
iteration: 9/12
iteration: 10/12
iteration: 11/12
iteration: 12/12


**2021**

In [21]:
# 2021 -------
dict_2021 = pd.DataFrame({"url":["https://www.bls.gov/news.release/archives/cpi_03102021.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_04132021.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_05122021.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_06102021.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_07132021.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_08112021.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_09142021.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_10132021.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_11102021.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_12102021.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_01122022.htm"],
                          "date":["2021-02-01","2021-03-01","2021-04-01","2021-05-01","2021-06-01",
                                  "2021-07-01","2021-08-01","2021-09-01","2021-10-01","2021-11-01",
                                  "2021-12-01"]})

# jan2021

print(f"iteration: 1/{len(dict_2021)+1}")

cpi_weights21 = weights_update(url = "https://www.bls.gov/news.release/archives/cpi_02102021.htm",
                               n = 0,
                               month = "2021-01-01")


for i in range(len(dict_2021)):
    
    print(f"iteration: {i+2}/{len(dict_2021)+1}")
    new = weights_update(url = dict_2021.iloc[i,0],
                         n = 0,
                         month = dict_2021.iloc[i,1])
    
    cpi_weights21 = pd.merge(cpi_weights21,new, how="left", on="Expenditure Category")


iteration: 1/12
iteration: 2/12
iteration: 3/12
iteration: 4/12
iteration: 5/12
iteration: 6/12
iteration: 7/12
iteration: 8/12
iteration: 9/12
iteration: 10/12
iteration: 11/12
iteration: 12/12


**2022**

In [22]:
# 2022 -------
dict_2022 = pd.DataFrame({"url":["https://www.bls.gov/news.release/archives/cpi_03102022.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_04122022.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_05112022.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_06102022.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_07132022.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_08102022.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_09132022.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_10132022.htm",
                                 "https://www.bls.gov/news.release/archives/cpi_11102022.htm"],
                          "date":["2022-02-01","2022-03-01","2022-04-01","2022-05-01","2022-06-01",
                                  "2022-07-01","2022-08-01","2022-09-01","2022-10-01"]})

print(f"iteration: 1/{len(dict_2022)+1}")

# jan2022
cpi_weights22 = weights_update(url = "https://www.bls.gov/news.release/archives/cpi_02102022.htm",
                               n = 0,
                               month = "2022-01-01")


for i in range(len(dict_2022)):
    
    print(f"iteration: {i+2}/{len(dict_2022)+1}")
    
    new = weights_update(url = dict_2022.iloc[i,0],
                         n = 0,
                         month = dict_2022.iloc[i,1])
    
    cpi_weights22 = pd.merge(cpi_weights22,new, how="left", on="Expenditure Category")


iteration: 1/10
iteration: 2/10
iteration: 3/10
iteration: 4/10
iteration: 5/10
iteration: 6/10
iteration: 7/10
iteration: 8/10
iteration: 9/10
iteration: 10/10


**Merge**

In [75]:
cpi_weights14["Expenditure Category"] = cpi_weights14["Expenditure Category"].replace("Airline fare","Airline fares")
cpi_weights15["Expenditure Category"] = cpi_weights15["Expenditure Category"].replace("Airline fare","Airline fares")
cpi_weights16["Expenditure Category"] = cpi_weights16["Expenditure Category"].replace("Airline fare","Airline fares")

In [23]:
cpi_weights12

Unnamed: 0,Expenditure Category,2012-03-01,2012-04-01,2012-05-01,2012-06-01,2012-07-01,2012-08-01,2012-09-01,2012-10-01,2012-11-01,2012-12-01
0,All items,100.000,100.000,100.000,100.000,100.000,100.000,100.000,100.000,100.0,100.000
1,Food,14.255,14.167,14.151,14.174,14.208,14.235,14.189,14.134,14.175,14.243
2,Food at home,8.608,8.550,8.537,8.539,8.552,8.558,8.526,8.484,8.518,8.553
3,Cereals and bakery products,1.240,1.227,1.228,1.232,1.228,1.235,1.225,1.215,1.22,1.226
4,"Meats, poultry, fish, and eggs",1.941,1.942,1.940,1.929,1.941,1.951,1.950,1.934,1.946,1.950
5,Dairy and related products,0.912,0.904,0.893,0.890,0.889,0.886,0.882,0.881,0.888,0.900
6,Fruits and vegetables,1.264,1.246,1.253,1.262,1.266,1.252,1.247,1.248,1.257,1.265
7,Nonalcoholic beverages and beverage materials,0.960,0.951,0.947,0.940,0.941,0.941,0.937,0.939,0.938,0.941
8,Other food at home,2.291,2.280,2.276,2.286,2.287,2.293,2.284,2.268,2.268,2.271
9,Food away from home,5.648,5.616,5.614,5.634,5.656,5.677,5.663,5.650,5.656,5.690


In [24]:
cpi_weights = pd.merge(cpi_weights12.iloc[0:-1,:], cpi_weights13, how="left",on="Expenditure Category")

In [25]:
cpi_weights = pd.merge(cpi_weights, cpi_weights14, how="left",on="Expenditure Category")
cpi_weights = pd.merge(cpi_weights, cpi_weights15, how="left",on="Expenditure Category")
cpi_weights = pd.merge(cpi_weights, cpi_weights16, how="left",on="Expenditure Category")
cpi_weights = pd.merge(cpi_weights, cpi_weights17, how="left",on="Expenditure Category")
cpi_weights = pd.merge(cpi_weights, cpi_weights18, how="left",on="Expenditure Category")
cpi_weights = pd.merge(cpi_weights, cpi_weights19, how="left",on="Expenditure Category")
cpi_weights = pd.merge(cpi_weights, cpi_weights20, how="left",on="Expenditure Category")
cpi_weights = pd.merge(cpi_weights, cpi_weights21, how="left",on="Expenditure Category")
cpi_weights = pd.merge(cpi_weights, cpi_weights22, how="left",on="Expenditure Category")

In [26]:
cpi_weights.head()

Unnamed: 0,Expenditure Category,2012-03-01,2012-04-01,2012-05-01,2012-06-01,2012-07-01,2012-08-01,2012-09-01,2012-10-01,2012-11-01,...,2022-01-01,2022-02-01,2022-03-01,2022-04-01,2022-05-01,2022-06-01,2022-07-01,2022-08-01,2022-09-01,2022-10-01
0,All items,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,...,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0
1,Food,14.255,14.167,14.151,14.174,14.208,14.235,14.189,14.134,14.175,...,13.37,13.388,13.405,13.361,13.421,13.423,13.372,13.527,13.635,13.705
2,Food at home,8.608,8.55,8.537,8.539,8.552,8.558,8.526,8.484,8.518,...,8.165,8.193,8.234,8.245,8.304,8.324,8.295,8.414,8.475,8.507
3,Cereals and bakery products,1.24,1.227,1.228,1.232,1.228,1.235,1.225,1.215,1.22,...,1.03,1.039,1.043,1.047,1.052,1.058,1.064,1.086,1.098,1.105
4,"Meats, poultry, fish, and eggs",1.941,1.942,1.94,1.929,1.941,1.951,1.95,1.934,1.946,...,1.888,1.878,1.878,1.878,1.899,1.906,1.887,1.9,1.905,1.904


In [27]:
cpi_weights = cpi_weights.melt(id_vars="Expenditure Category", var_name="date")

In [28]:
cpi_weights["date"] = pd.to_datetime(cpi_weights["date"])
cpi_weights["value"] = pd.to_numeric(cpi_weights["value"])
cpi_weights = cpi_weights.rename(columns = {"value":"weight", "Expenditure Category":"item_name"})

In [29]:
cpi_weights[cpi_weights["weight"].isna()]

Unnamed: 0,item_name,date,weight
4731,Hospital services,2022-10-01,


In [30]:
cpi_weights.head()

Unnamed: 0,item_name,date,weight
0,All items,2012-03-01,100.0
1,Food,2012-03-01,14.255
2,Food at home,2012-03-01,8.608
3,Cereals and bakery products,2012-03-01,1.24
4,"Meats, poultry, fish, and eggs",2012-03-01,1.941


In [31]:
cpi_weights.to_parquet("data\\inflation\\data_bls_cpi_weights.parquet")