## U.S. Bureau of Labor Statistics - CPI Analysis
#### Eric Bottinelli

### 1. Retrieve data via BLS API v2

**Documentation**

- https://www.bls.gov/developers/api_python.htm
- https://data.bls.gov/cgi-bin/surveymost?cu

**Packages to install**

- Prettytable ('pip install prettytable')

**API Series ID**

Consumer Price Index for All Urban Consumers (CPI-U)
- *All items in U.S. city average, all urban consumers, not seasonally adjusted*: CUUR0000SA0
- *All items less food and energy in U.S. city average, all urban consumers, not seasonally adjusted*: CUUR0000SA0L1E
- *Food and beverages in U.S. city average, all urban consumers, not seasonally adjusted*: CUUR0000SAF
- *Food at home in U.S. city average, all urban consumers, not seasonally adjusted*: CUUR0000SAF11
- *Food away from home in U.S. city average, all urban consumers, not seasonally adjusted*: CUUR0000SEFV
- *Energy in U.S. city average, all urban consumers, not seasonally adjusted*: CUUR0000SA0E
- *Housing in U.S. city average, all urban consumers, not seasonally adjusted*: CUUR0000SAH
- *Shelter in U.S. city average, all urban consumers, not seasonally adjusted*: CUUR0000SAH1
((https://www.bls.gov/cpi/factsheets/owners-equivalent-rent-and-rent.htm))

**Calculate special CPI**

Occasionally, a user wishes to estimate a price change that is not published by BLS. For instance, suppose a user would like a CPI series for ‘services less energy services and shelter’. This can be done by estimating a special index, in this case, ‘services less energy services and shelter’.
[BLS Doc](https://www.bls.gov/cpi/factsheets/constructing-special-cpis.htm)

If SEEB01 -> CUUR0000SEEB01

Cost weight is just a sum of all the items

If I add all the values to calculate the services less energy services and shelter, it becomes a lot of data. Explore different solution (e.g. remove goods from core CPI)

**Supercore CPI**

"Fed Chair Jerome Powell cited a specific category of inflation—inflation in core services other than housing—as being perhaps “the most important category for understanding the future evolution of core inflation.” The financial press has termed this category “supercore” inflation" ([FED of St. Louis](https://www.stlouisfed.org/on-the-economy/2024/may/measuring-inflation-headline-core-supercore-services))

In [3]:
import os
import requests
import json
import prettytable
import pandas as pd
from datetime import datetime

folder_name = 'CPI_Data'

In [7]:
if not os.path.exists(folder_name):
    os.makedirs(folder_name)

current_date = datetime.now()
current_year = current_date.year
last_year = current_year - 1

headers = {'Content-type': 'application/json'}
series_ids = ['CUUR0000SA0', 'CUUR0000SA0L1E', 'CUUR0000SAF', 'CUUR0000SAF11', 'CUUR0000SEFV', 'CUUR0000SA0E', 'CUUR0000SAH1']
data = json.dumps({"seriesid": series_ids, "startyear": str(last_year), "endyear": str(current_year)})
response = requests.post('https://api.bls.gov/publicAPI/v2/timeseries/data/', data=data, headers=headers)
json_data = json.loads(response.text)

series_names = {
    'CUUR0000SA0': 'All_Items',
    'CUUR0000SA0L1E': 'All_Items_Less_Food_Energy',
    'CUUR0000SAF': 'Food_Beverages',
    'CUUR0000SAF11': 'Food_At_Home',
    'CUUR0000SEFV': 'Food_Away_From_Home',
    'CUUR0000SA0E': 'Energy',
    'CUUR0000SAH1': 'Shelter'
}

all_data = []
for series in json_data['Results']['series']:
    rows = []
    for item in series['data']:
        footnotes = "".join([footnote['text'] + ',' for footnote in item['footnotes'] if footnote]).rstrip(',')
        if 'M01' <= item['period'] <= 'M12':
            rows.append([series_names[series['seriesID']], item['year'], item['period'], item['value'], footnotes])

    # Create dataframe for current series
    df = pd.DataFrame(rows, columns=["series id", "year", "period", "value", "footnotes"])
    all_data.append(df)

complete_data = pd.concat(all_data)

csv_path = os.path.join(folder_name, 'CPI_data.csv')
complete_data.to_csv(csv_path, index=False)

In [4]:
complete_data = pd.read_csv("CPI_Data/CPI_data.csv")

In [46]:
df = complete_data.copy()
df['date'] = pd.to_datetime(df['year'].astype(str) + df['period'].str.replace('M', ''), format='%Y%m')
df['series id'] = df['series id'].astype(str)  # Convert series id to string
df['value'] = pd.to_numeric(df['value'], errors='coerce')  # Ensure value is numeric
df['footnotes'] = df['footnotes'].astype(str)  # Convert footnotes to string
df.drop(['year', 'period', 'footnotes'], axis=1, inplace=True)
df.rename(columns={'series id': 'id'}, inplace=True)
df = df[['id', 'date', 'value']]

In [47]:
df['MoM_change'] = df.groupby('id')['value'].pct_change()
df['YoY_change'] = df.groupby('id')['value'].pct_change(periods=12)

In [39]:
df.head()

Unnamed: 0,id,date,value,MoM_change,YoY_change
0,All_Items,2024-07-01,314.54,,
1,All_Items,2024-06-01,314.175,-0.00116,
2,All_Items,2024-05-01,314.069,-0.000337,
3,All_Items,2024-04-01,313.548,-0.001659,
4,All_Items,2024-03-01,312.332,-0.003878,


In [62]:
df2 = df.copy()
df2['Month-Year'] = df2['date'].dt.strftime('%b-%y')

# Define mappings for IDs to Categories and Weights
category_map = {
    'All_Items': (0, 'Headline', '', ''),
    'Food_Beverages': (1, '', 'Food', ''),
    'Food_At_Home': (2, '', '', 'At home'),
    'Food_Away_From_Home': (3, '', '', 'Away Home'),
    'Energy': (4, '', 'Energy', ''),
    'All_Items_Less_Food_Energy': (5, 'Core', '', ''),
    'Shelter': (6, '', '', 'Shelter'),
}
weight_map = {
    'All_Items': '100%',
    'Food_Beverages': '10%',
    'Food_At_Home': '5%',
    'Food_Away_From_Home': '5%',
    'Energy': '8%',
    'All_Items_Less_Food_Energy': '~80%',
    'Shelter': '20%'
}

ordered_categories = ['Headline', 'Food + Energy', 'Core']
ordered_sub_categories_1 = ['Food', 'Energy', 'Services']
ordered_sub_categories_2 = ['At home', 'Away Home', 'Shelter', 'Services - shelter = core-core']

df2['Order'], df2['Category'], df2['Sub Category 1'], df2['Sub Category 2'] = zip(*df2['id'].map(category_map))
df2['Weight'] = df2['id'].map(weight_map)

# Pivot the DataFrame to create a structured table
pivot_df = df2.pivot_table(
    index=['Order', 'Category', 'Sub Category 1', 'Sub Category 2', 'Weight'],
    columns='Month-Year',
    values='MoM_change',
    aggfunc='first'
)

pivot_df = pivot_df[sorted(pivot_df.columns, key=lambda x: pd.to_datetime(x, format='%b-%y'), reverse=True)]

# Flatten the headers by removing the MultiIndex after pivot
pivot_df.columns.name = None  # Remove the aggregation name
pivot_df.reset_index(inplace=True)  # Make 'Category', 'sub category 1', 'sub category 2', 'Weight' as regular columns

pivot_df.sort_values(by='Order', inplace=True)

pivot_df.drop(columns=['Order'], inplace=True)

In [60]:
pivot_df.head()

Unnamed: 0,Category,Sub Category 1,Sub Category 2,Weight,Jun-24,May-24,Apr-24,Mar-24,Feb-24,Jan-24,...,Oct-23,Sep-23,Aug-23,Jul-23,Jun-23,May-23,Apr-23,Mar-23,Feb-23,Jan-23
0,Headline,,,100%,-0.00116,-0.000337,-0.001659,-0.003878,-0.006423,-0.006152,...,0.002019,0.000384,-0.002479,-0.004348,-0.001904,-0.003219,-0.002512,-0.005034,-0.0033,-0.005551
1,,Food,,10%,-0.002476,-0.001831,-0.001341,-0.001889,-0.000925,-0.001255,...,0.001732,-0.003134,-0.002082,-0.001735,-0.002804,-0.000965,-0.00227,-0.002341,-0.000905,-0.004229
2,,,At home,5%,-0.002906,-0.000239,9.2e-05,-0.000919,0.000141,-0.001414,...,0.005158,-0.002831,-0.000688,-0.000859,-0.003691,0.000662,-0.000684,-0.001356,0.001871,-0.003465
3,,,Away Home,5%,-0.002076,-0.004115,-0.003503,-0.003464,-0.002606,-0.000954,...,-0.004326,-0.003723,-0.00393,-0.003429,-0.001739,-0.003819,-0.00469,-0.003681,-0.00593,-0.006301
4,,Energy,,8%,-0.004144,0.012083,0.00214,-0.019803,-0.030424,-0.021391,...,0.035105,0.032258,-0.005662,-0.032277,-0.00342,-0.014226,0.012637,-0.015063,0.009277,0.005883


In [63]:
csv_path = os.path.join(folder_name, 'cleaned_CPI_data.csv')
pivot_df.to_csv(csv_path, index=False)