### Working with BEA DATA

The Bureau of Economic Analysis (https://bea.gov) serves a ton of nice economic data for the US.

Their data usually comes with several tables crunched into one file.

This file has 1204 rows and 300 columns.

The rows represent three area categories .. United States, Region, and State.

There are 20 rows for each of the three area categories.

Each row represents one of the following categories:
- Personal income (millions of dollars, seasonally adjusted)
- Nonfarm personal income
- Farm income
- Population (midperiod, persons)
- Per capita personal income (dollars)
- Earnings by place of work
- Less: Contributions for government social insurance
- Employee and self-employed contributions for government social insurance
- Employer contributions for government social insurance
- Plus: Adjustment for residence
- Equals: Net earnings by place of residence
- Plus: Dividends, interest, and rent
- Plus: Personal current transfer receipts
- Wages and salaries
- Supplements to wages and salaries
- Employer contributions for employee pension and insurance funds
- Employer contributions for government social insurance
- Proprietors' income
- Farm proprietors' income
- Nonfarm proprietors' income

The columns are:
- GeoFIPS
- GeoName - Geographical location (United States, Region or State)
- Region - (1-8)
- TableName - Original BEA table name
- LineCode - (10-72) BEA code
- IndustryClassification 
- Description - Describes each row
- Unit 
- 1948:Q1 through 2020:Q4 - These are the individual quarter values

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
# Load the data
df = pd.read_csv('../input/us-quarterly-personal-income-1948-2020/us_quarterly_personal_income_1948_2020.csv')
df.shape

In [None]:
# Take a look at the data
df.head()

Looking at the first few rows, we can see that the data needs to be extracted into individual dataframes to be more usable.

The GeoName column specifies whether the row represents the United States, or a single state. Let's get some rows for California.

In [None]:
# Get all the rows from the original df that match the state we want
new_df = df[df['GeoName'] == "California"]
new_df.head()

We'll extract a single row, then convert the columns into a better DF. In this case, we'll get the Personal Income row. This represents the number of dollars in millions earned by individuals.

In [None]:
# Pick the row (category), we're interested in. In this case, we do Personal Income by using the index of 100
new_row = df.iloc[[100]]
new_row.head()

In [None]:
# Make DF to pivot the columns into rows
new_df = pd.DataFrame(columns = ['year','Q1','Q2','Q3','Q4'])

# Loop through the years and get the quartly values from each column
for year in range(1948,2020):
    year = str(year)
    df2 = pd.DataFrame(new_row[[year + ':Q1',year + ':Q2',year + ':Q3',year + ':Q4']].astype('float'))

    df2.columns = ['Q1','Q2','Q3','Q4']
    df2.insert(0, 'year', year)
    new_df = pd.concat([new_df,df2], ignore_index=True, axis=0)
    
new_df.head()

Now we'll sum up the columns and add a total for the year.

In [None]:
# Sum the quarterly amounts
new_df['total'] = new_df[['Q1','Q2','Q3','Q4']].sum(axis=1)
new_df.head()

Now we have 72 rows and 6 columns of Quarterly Personal Income by year in California, including the yearly total.

In [None]:
# Plot the Annual Personal Income by year for California
plt.figure(figsize = (12,5))
plot = sns.lineplot(data=new_df, x='year', y='total');
plt.title("California - Annual Personal Income (in millions)")
plt.xticks(rotation=45)
plt.show()

- Let's grab another state and compare. We'll use West Virginia.

In [None]:
# Get all the rows from the original df that match the state we want
new_df = df[df['GeoName'] == "West Virginia"]
new_df.head()

In [None]:
# Pick the row (category), we're interested in. In this case, we do Personal Income again by using the index of 980
new_row = df.iloc[[980]]
new_row.head()

In [None]:
# Make DF to pivot the columns into rows .. This is the same as the previous step .. it could be a function
new_df = pd.DataFrame(columns = ['year','Q1','Q2','Q3','Q4'])

# Loop through the years and get the quartly values from each column
for year in range(1948,2020):
    year = str(year)
    df2 = pd.DataFrame(new_row[[year + ':Q1',year + ':Q2',year + ':Q3',year + ':Q4']].astype('float'))

    df2.columns = ['Q1','Q2','Q3','Q4']
    df2.insert(0, 'year', year)
    new_df = pd.concat([new_df,df2], ignore_index=True, axis=0)
    
new_df.head()

# Sum the quarterly amounts
new_df['total'] = new_df[['Q1','Q2','Q3','Q4']].sum(axis=1)
new_df.head()

In [None]:
# Plot the Annual Personal Income by year for the state of West Virginia
plt.figure(figsize = (12,5))
plot = sns.lineplot(data=new_df, x='year', y='total')
plt.title("West Virginia - Annual Personal Income (in millions)")
plt.xticks(rotation=45)
plt.show()

- Various tables from different categories or areas can be created to do comparative analysis between areas or industry types etc.

- This is a simple example but the technique can be used on most of the datasets from BEA