This is my "sandbox" to play around with the BEA api. My plan is to use this for my regional cycle project. Below I explain how to use the BEA api (stuff has changed since I last messed around with it)

In [1]:
import pandas as pd
import requests
import numpy as np

In [2]:
BEA_ID = "6BF79D8C-8042-4196-88DC-0E0C55B0C3B6"

After getting your key. [Here is some basic documentation](https://www.bea.gov/API/bea_web_service_api_user_guide.htm). Like a lot APIs, basic idea is to specify the URL in the proper way and it will return some output in a specified format. We can then convert the format (its .json) into a dataframe:



In [147]:
API_URL = "https://bea.gov/api/data?&UserID=6BF79D8C-8042-4196-88DC-0E0C55B0C3B6&method=GETDATASETLIST&ResultFormat=JSON&"
    
r = requests.get(API_URL)

In [152]:
type(r)


requests.models.Response

In [153]:
type(r.json())

dict

In [160]:
print(r.json().keys())

print(r.json()['BEAAPI'].keys())

print(r.json()['BEAAPI']["Results"].keys())

dict_keys(['BEAAPI'])
dict_keys(['Results', 'Request'])
dict_keys(['Dataset'])


This then leads to the insight that the `.json()` is a bunch of dictionaries within the dictionaries and that by working through it we can find a "root dictionary" that can be converted into a usable DataFrame

In [161]:
df = pd.DataFrame(r.json()["BEAAPI"]["Results"]['Dataset'])

In [162]:
df

Unnamed: 0,DatasetDescription,DatasetName
0,The RegionalData dataset is obsolete. Please u...,RegionalData
1,Standard NIPA tables,NIPA
2,Standard NI underlying detail tables,NIUnderlyingDetail
3,Multinational Enterprises,MNE
4,Standard Fixed Assets tables,FixedAssets
5,International Transactions Accounts,ITA
6,International Investment Position,IIP
7,GDP by Industry,GDPbyIndustry
8,Regional Income data sets,RegionalIncome
9,Regional Product data sets,RegionalProduct


Then this gives us the different datasets that are available through the BEA api. Now below, I'm going to grab personal income at the county level. The documentation for grabing this is [here](https://www.bea.gov/API/bea_web_service_api_user_guide.htm)

In [42]:
years = range(1969,2016)
years = list(years)

years = "".join(str(years))

years = years[1:-1]

In [43]:
years

'1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015'

In [44]:
my_key = "https://bea.gov/api/data?&UserID=6BF79D8C-8042-4196-88DC-0E0C55B0C3B6&method=GetData&"

data_set = "datasetname=RegionalIncome&"

table_and_line_income = "TableName=CA1&LineCode=3&"

table_and_line_population = "TableName=CA1&LineCode=2&"

year = "Year=" + years + "&"

location = "GeoFips=COUNTY&"

form = "ResultFormat=json"

In [45]:
API_URL = my_key + data_set + table_and_line_income + year + location + form

r = requests.get(API_URL)

df_income = pd.DataFrame(r.json()["BEAAPI"]["Results"]["Data"])

In [46]:
df_income.drop(['CL_UNIT', 'Code',"NoteRef", "UNIT_MULT"], axis=1, inplace = True)

#df["DataValue"].column = "IncomePC"

df_income.rename(columns={"DataValue":"IncomePC"}, inplace=True)

df_income.head()

Unnamed: 0,IncomePC,GeoFips,GeoName,TimePeriod
0,28627,0,United States,1999
1,31540,0,United States,2001
2,38144,0,United States,2006
3,19985,0,United States,1991
4,21698,0,United States,1993


In [47]:
API_URL = my_key + data_set + table_and_line_population + year + location + form

r = requests.get(API_URL)

population = pd.DataFrame(r.json()["BEAAPI"]["Results"]["Data"])

population.drop(['CL_UNIT', 'Code',"NoteRef", "UNIT_MULT", "GeoName"], axis=1, inplace = True)

#df["DataValue"].column = "IncomePC"

population.rename(columns={"DataValue":"Population"}, inplace=True)

population.head()

Unnamed: 0,Population,GeoFips,TimePeriod
0,306771529,0,2009
1,219760875,0,1977
2,215456585,0,1975
3,266278393,0,1995
4,233792014,0,1983


In [48]:
combo = pd.merge(population, df_income,   # left df, right df
                 how='inner',      # Try the different options, inner, outer, left, right...what happens.
                 on=['GeoFips',"TimePeriod"],       # link with cntry
                 indicator=True)  # Tells us what happend

In [49]:
combo["TimePeriod"] = pd.to_datetime(combo["TimePeriod"], infer_datetime_format = True)

#combo.sort_values(by = "TimePeriod", inplace = True)

In [50]:
combo.set_index(["GeoFips","TimePeriod"],inplace = True)

In [51]:
combo.sort_index(level="GeoFips", inplace = True)

In [53]:
combo.head(50)

Unnamed: 0_level_0,Unnamed: 1_level_0,Population,IncomePC,GeoName,_merge
GeoFips,TimePeriod,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,1969-01-01,201298000,3930,United States,both
0,1970-01-01,203798722,4196,United States,both
0,1971-01-01,206817509,4468,United States,both
0,1972-01-01,209274882,4853,United States,both
0,1973-01-01,211349205,5352,United States,both
0,1974-01-01,213333635,5824,United States,both
0,1975-01-01,215456585,6312,United States,both
0,1976-01-01,217553859,6856,United States,both
0,1977-01-01,219760875,7494,United States,both
0,1978-01-01,222098244,8338,United States,both


In [57]:
combo.loc["02020"]

Unnamed: 0_level_0,Population,IncomePC,GeoName,_merge
TimePeriod,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1969-01-01,123265,6232,"Anchorage Municipality, AK",both
1970-01-01,127569,6823,"Anchorage Municipality, AK",both
1971-01-01,134619,7102,"Anchorage Municipality, AK",both
1972-01-01,143203,7348,"Anchorage Municipality, AK",both
1973-01-01,147314,7853,"Anchorage Municipality, AK",both
1974-01-01,152386,9320,"Anchorage Municipality, AK",both
1975-01-01,165035,11439,"Anchorage Municipality, AK",both
1976-01-01,174496,12853,"Anchorage Municipality, AK",both
1977-01-01,177003,14071,"Anchorage Municipality, AK",both
1978-01-01,179642,13998,"Anchorage Municipality, AK",both


In [288]:
df.DataValue.replace("(NA)", np.nan, inplace = True)

In [289]:
df.DataValue = df.DataValue.astype(float)

In [290]:
df.dtypes

CL_UNIT        object
Code           object
DataValue     float64
GeoFips        object
GeoName        object
NoteRef        object
TimePeriod     object
UNIT_MULT      object
dtype: object

In [291]:
df["TimePeriod"] = pd.to_datetime(df["TimePeriod"], infer_datetime_format = True)

In [292]:
df.sort_values(by = "TimePeriod", inplace = True)


In [294]:
df.head()

Unnamed: 0,CL_UNIT,Code,DataValue,GeoFips,GeoName,NoteRef,TimePeriod,UNIT_MULT
0,dollars,CA1-3,3930.0,0,United States,,1969-01-01,0
3535,dollars,CA1-3,2800.0,31123,"Morrill, NE",,1969-01-01,0
3536,dollars,CA1-3,3093.0,31125,"Nance, NE",,1969-01-01,0
3539,dollars,CA1-3,3727.0,31127,"Nemaha, NE",,1969-01-01,0
3540,dollars,CA1-3,3106.0,31129,"Nuckolls, NE",,1969-01-01,0


In [346]:
grp = df.groupby("GeoFips")

In [310]:
def log_diff_income(df):
    df["growth"] = np.log(df.DataValue).diff() / 46
    
    return df

In [388]:
transform_dict = {"DataValue": ["first","last", lambda x: np.diff(np.log(x))/46]}

new_df = grp.agg(transform_dict)

In [394]:
new_df["DataValue"]["last"].corr(new_df["DataValue"]["<lambda>"])

0.414791642511866

In [367]:
test =  new_df.diff(axis = 1).toseries

AttributeError: 'DataFrame' object has no attribute 'toseries'

In [385]:
np.diff?

In [309]:
grp = df.groupby("GeoFips")
grp.get_group("44007").TimePeriod.diff()

Object `TimePeriod.diff` not found.


In [None]:
grp.get_group("44007").TimePeriod.diff

In [None]:
grp.get_group("44007").sort_value

In [None]:
grp.get_group("44007").sort_values

In [277]:
new_df.sort_values?

In [332]:
new_df.growth.corr(new_df.loc["1969-01-01"].DataValue)

nan

In [325]:
new_df.set_index("TimePeriod", inplace = True)

In [376]:
df.diff?