## US Gross Domestic Product by State

- This notebook explores the BEA (https://bea.gov) GDP by State dataset.
- It contains 25 categories of spending data by US region, State and County for the years 1997-2019

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
# Load the data
df = pd.read_csv('../input/us-gdp-by-state-19972020/SAGDP2N__ALL_AREAS_1997_2020.csv');
df.shape

In [None]:
df.head()

- This set has 5,528 rows of data in 32 columns.
- The first 8 columns describe the row.
- The GeoName column specifies whether the row references the entire US, or a specific state.
- The Description column specifies the type of industry (Agriculture, Farms, Textile Mills etc).
- The rest of the columns are the values for each year.

In [None]:
# Count the unique categories
category_count = str(len(df['Description'].unique()))
print("There are " + category_count + " categories, therefore " + category_count + " rows for each 'area'.")

In [None]:
# List all the categories
df['Description'].unique()

- These represent each of the industry categories that are available.
- Let's get a specific state and look at some data for it.

In [None]:
df1 = df[df['GeoName'] == 'New York']
df1.shape

- We get 92 rows for each state, one for each industry category. Let's look at the first few.

In [None]:
df1.head(10)

- Now we can pick a specific industry and grab the row.

In [None]:
# Get a single row from the original DF
row = df.iloc[[3046]]
row.head()

- Now we'll iterate over the year columns and pivot them into a better table.

In [None]:
# Make DF to pivot the columns into rows
new_df = pd.DataFrame(columns = ['year','Oil and gas extraction'])

# Loop through the years and get the values from each column
for year in range(1997,2020):
    year = str(year)
    df2 = pd.DataFrame(row[[year]].astype('float'))

    df2.columns = ['Oil and gas extraction']
    df2.insert(0, 'year', year)
    new_df = pd.concat([new_df,df2], ignore_index=True, axis=0)
    
new_df.head(10)

In [None]:
# Plot the GDP of Oin and gas extraction in New York over the last 20+ years
plt.figure(figsize = (12,5))
plt.title('Oil and Gas Extraction in New York 1997-2020 (millions of dollars)')
sns.lineplot(data=new_df, x='year', y='Oil and gas extraction');

- This is just a simple example to explain how to drill down into BEA data. I'm sure there are better methods.
- There's lots of interesting data to explore in this set!