## US Personal Consumption Expenditures by State

- This notebook explores the BEA (https://bea.gov) Personal Consumption Expenditures dataset.
- It contains 25 categories of spending data by US region, State and County for the years 1997-2019

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
# Load the data
df = pd.read_csv('../input/us-personal-expenditures-by-state-19972019/SAEXP1__ALL_AREAS_1997_2019.csv', dtype='string');
df.shape

- This set has 1,444 rows of data in 31 columns.
- The first 8 columns describe the row (as with most BEA csv files).
- The GeoName column specifies whether the row references the entire US, or a specific state.
- The Description column specifies the type of speding (Food & Drink, Clothing & Footware, Motor Vehicles etc).
- The rest of the columns are the values for each year.

In [None]:
df.head()

### Unique categories

In [None]:
# Count the unique categories
category_count = str(len(df['Description'].unique()))
print("There are " + category_count + " categories, therefore " + category_count + " rows for each 'area' (not counting N/A).")

In [None]:
# List all the categories
df['Description'].unique()

- We can choose any of these categories for a state. Let's get data for California ..

In [None]:
df1 = df[df['GeoName'] == 'California']
df1.shape

- This gives us 24 rows, one for each category. Let's look at a few of the rows.

In [None]:
df1.head(10)

- We'll grab the "Clothing and footwear" row by using it's index in the original dataframe.

In [None]:
# Get a single row from the original DF
row = df.iloc[[129]]
row.head()

- Now let's pivot the columns, or un-encode them, as I like to think of it.

In [None]:
# Make DF to pivot the columns into rows
new_df = pd.DataFrame(columns = ['year','Clothing and footwear'])

# Loop through the years and get the values from each column
for year in range(1997,2019):
    year = str(year)
    df2 = pd.DataFrame(row[[year]].astype('float'))

    df2.columns = ['Clothing and footwear']
    df2.insert(0, 'year', year)
    new_df = pd.concat([new_df,df2], ignore_index=True, axis=0)
    
new_df.head(10)

- Now we can see how much Californians spent on Clothing and Footwear in the last 20 years.
- There's 25 categories of data in this set that can be extracted and recombined to make interesting maps and tables.

In [None]:
# Plot the spending on Clothing and footwear in California
plt.figure(figsize = (12,5))
sns.lineplot(data=new_df, x='year', y='Clothing and footwear');