### Messing around with the Fred API

In [1]:
# Import stuff
import pandas as pd
import numpy as np
import time
import requests
import json

K: number of features
T: number of timesteps
N: number of examples

At the end of the day, we want our dataset in the following format (NxTxK):
- each example will have an associated county and timeframe
- each example will have inputs (KxT)
- each example will have targets (1xT)

#### How the data is organized from their API:
- series: what we might think of as a dataset. Has (time, value) pairs
- category: This is a general bucket containing many series or containing child categories

#### Individual categories:
- county: each county has a unique id
- msa: each metro statistical area has a unique id and will be associated with many counties
- state: each state has a unique id and will be associated with many state
- country: USA has a unique id and will be associated with all counties

Series: each series has a unique id and will be related to a geography type (county, metro statistical area(MSA), state, country) via id

#### Getting the actual data
We can the feature matrix for one example as follows:
- Choose a county
- Request all series titles for the category_id corresponding to the county
- Request the full series for each title from before (we can fill some of our KxT)
- Get the state that the county is in
- Include the features for the state (fill in some more of our KxT)
- Include the features for the country (fill in some more of our KxT)
- Get the msa associated with the county
- Include the features for the msa (fill in the rest of our KxT)

To get the features for each state:
- Request the series_ids for the category_id corresponding to the state

To get the features for the country:
- Request the series_ids for the category_id corresponding to the country

To get the features for each msa:
- Request the series_ids for the category_id corresponding to the msa

In [2]:
#Get category id for all states

params = {
            'category_id':27281,
            'api_key': 'e76fcf746d3ca3cc025c0803dd212fc8',
            'file_type': 'json'
         }

r = requests.get(url = 'https://api.stlouisfed.org/fred/category/children', params=params) 
res = r.json()

states = res['categories']

for state in states:
    print(state)

{'id': 27282, 'name': 'Alabama', 'parent_id': 27281}
{'id': 27283, 'name': 'Alaska', 'parent_id': 27281}
{'id': 27284, 'name': 'Arizona', 'parent_id': 27281}
{'id': 149, 'name': 'Arkansas', 'parent_id': 27281}
{'id': 27286, 'name': 'California', 'parent_id': 27281}
{'id': 27287, 'name': 'Colorado', 'parent_id': 27281}
{'id': 27288, 'name': 'Connecticut', 'parent_id': 27281}
{'id': 27289, 'name': 'Delaware', 'parent_id': 27281}
{'id': 27290, 'name': 'District of Columbia', 'parent_id': 27281}
{'id': 27291, 'name': 'Florida', 'parent_id': 27281}
{'id': 27292, 'name': 'Georgia', 'parent_id': 27281}
{'id': 27293, 'name': 'Hawaii', 'parent_id': 27281}
{'id': 27294, 'name': 'Idaho', 'parent_id': 27281}
{'id': 150, 'name': 'Illinois', 'parent_id': 27281}
{'id': 151, 'name': 'Indiana', 'parent_id': 27281}
{'id': 27297, 'name': 'Iowa', 'parent_id': 27281}
{'id': 27298, 'name': 'Kansas', 'parent_id': 27281}
{'id': 152, 'name': 'Kentucky', 'parent_id': 27281}
{'id': 27300, 'name': 'Louisiana', 'p

In [3]:
# This first request is just an intermediate link that we have to
# make for each state in order to access its counties
state_id = states[0]['id']
state_name = states[0]['name']
params = {
    'category_id': state_id,
    'api_key': 'e76fcf746d3ca3cc025c0803dd212fc8',
    'file_type': 'json'
}
r = requests.get(url = 'https://api.stlouisfed.org/fred/category/children', params=params)
res = r.json()

state_counties_id = None
for child in res['categories']:
    if child['name'] == 'Counties':
        state_counties_id = child['id']
        
print(state_counties_id)

27335


In [4]:
# Here we actually get the counties
params = {
    'category_id': state_counties_id,
    'api_key': 'e76fcf746d3ca3cc025c0803dd212fc8',
    'file_type': 'json'
}
r = requests.get(url = 'https://api.stlouisfed.org/fred/category/children', params=params)
res = r.json()
print(json.dumps(res['categories'], indent=4))

counties_in_state = res['categories']

[
    {
        "id": 27336,
        "name": "Autauga County, AL",
        "parent_id": 27335
    },
    {
        "id": 27337,
        "name": "Baldwin County, AL",
        "parent_id": 27335
    },
    {
        "id": 27338,
        "name": "Barbour County, AL",
        "parent_id": 27335
    },
    {
        "id": 27339,
        "name": "Bibb County, AL",
        "parent_id": 27335
    },
    {
        "id": 27340,
        "name": "Blount County, AL",
        "parent_id": 27335
    },
    {
        "id": 27341,
        "name": "Bullock County, AL",
        "parent_id": 27335
    },
    {
        "id": 27342,
        "name": "Butler County, AL",
        "parent_id": 27335
    },
    {
        "id": 27343,
        "name": "Calhoun County, AL",
        "parent_id": 27335
    },
    {
        "id": 27344,
        "name": "Chambers County, AL",
        "parent_id": 27335
    },
    {
        "id": 27345,
        "name": "Cherokee County, AL",
        "parent_id": 27335
    },
    {
     

In [5]:
# Get the series titles for a county
county_id = counties_in_state[0]['id']
county_name = counties_in_state[0]['name']
print(county_name)
params = {
    'category_id': county_id,
    'api_key': 'e76fcf746d3ca3cc025c0803dd212fc8',
    'file_type': 'json'
}
r = requests.get(url = 'https://api.stlouisfed.org/fred/category/series', params=params)
res = r.json()
series_for_county = res['seriess']
print(json.dumps(series_for_county[0:5], indent=4))

Autauga County, AL
[
    {
        "id": "2020RATIO001001",
        "realtime_start": "2020-04-10",
        "realtime_end": "2020-04-10",
        "title": "Income Inequality in Autauga County, AL",
        "observation_start": "2010-01-01",
        "observation_end": "2018-01-01",
        "frequency": "Annual",
        "frequency_short": "A",
        "units": "Ratio",
        "units_short": "Ratio",
        "seasonal_adjustment": "Not Seasonally Adjusted",
        "seasonal_adjustment_short": "NSA",
        "last_updated": "2019-12-19 11:06:37-06",
        "popularity": 1,
        "group_popularity": 1,
        "notes": "This data represents the ratio of the mean income for the highest quintile (top 20 percent) of earners divided by the mean income of the lowest quintile (bottom 20 percent) of earners in a particular county. Multiyear estimates from the American Community Survey (ACS) are \"period\" estimates derived from a data sample collected over a period of time, as opposed to \"p

In [7]:
# Get the observations for a series
current_series = series_for_county[0]
series_id = current_series['id']

params = {
    'series_id': series_id,
    'api_key': 'e76fcf746d3ca3cc025c0803dd212fc8',
    'file_type': 'json'
}
r = requests.get(url = 'https://api.stlouisfed.org/fred/series/observations', params=params)
res1 = r.json()

In [8]:
print(series_id)
print(json.dumps(res1, indent=4))

2020RATIO001001
{
    "realtime_start": "2020-04-10",
    "realtime_end": "2020-04-10",
    "observation_start": "1600-01-01",
    "observation_end": "9999-12-31",
    "units": "lin",
    "output_type": 1,
    "file_type": "json",
    "order_by": "observation_date",
    "sort_order": "asc",
    "count": 9,
    "offset": 0,
    "limit": 100000,
    "observations": [
        {
            "realtime_start": "2020-04-10",
            "realtime_end": "2020-04-10",
            "date": "2010-01-01",
            "value": "10.938249904470768"
        },
        {
            "realtime_start": "2020-04-10",
            "realtime_end": "2020-04-10",
            "date": "2011-01-01",
            "value": "10.690538383398636"
        },
        {
            "realtime_start": "2020-04-10",
            "realtime_end": "2020-04-10",
            "date": "2012-01-01",
            "value": "11.112458162885831"
        },
        {
            "realtime_start": "2020-04-10",
            "realtime_end": "

In [9]:
# Get the observations for another series
next_series = series_for_county[1]
series_id = next_series['id']

params = {
    'series_id': series_id,
    'api_key': 'e76fcf746d3ca3cc025c0803dd212fc8',
    'file_type': 'json'
}
r = requests.get(url = 'https://api.stlouisfed.org/fred/series/observations', params=params)
res2 = r.json()
print(json.dumps(res2, indent=4))

{
    "realtime_start": "2020-04-10",
    "realtime_end": "2020-04-10",
    "observation_start": "1600-01-01",
    "observation_end": "9999-12-31",
    "units": "lin",
    "output_type": 1,
    "file_type": "json",
    "order_by": "observation_date",
    "sort_order": "asc",
    "count": 45,
    "offset": 0,
    "limit": 100000,
    "observations": [
        {
            "realtime_start": "2020-04-10",
            "realtime_end": "2020-04-10",
            "date": "2016-07-01",
            "value": "269.0"
        },
        {
            "realtime_start": "2020-04-10",
            "realtime_end": "2020-04-10",
            "date": "2016-08-01",
            "value": "258.0"
        },
        {
            "realtime_start": "2020-04-10",
            "realtime_end": "2020-04-10",
            "date": "2016-09-01",
            "value": "252.0"
        },
        {
            "realtime_start": "2020-04-10",
            "realtime_end": "2020-04-10",
            "date": "2016-10-01",
       

In [11]:
# Combine two sets of observations into one DataFrame
county_data = pd.DataFrame(columns=['date'])

series_title = series_for_county[0]['title']

df_series_1 = pd.DataFrame(res1['observations'])
df_series_1.rename(columns={'value': get_series_name(county_name, series_title)}, inplace=True)
df_series_1.drop(['realtime_start', 'realtime_end'], axis=1, inplace=True)
# print(df_series_1.head())
county_data = pd.merge(county_data, df_series_1, how='outer', on=['date'])
county_data.set_index('date', inplace=True)
series_title = series_for_county[1]['title']

df_series_2 = pd.DataFrame(res2['observations'])
df_series_2.rename(columns={'value': get_series_name(county_name, series_title)}, inplace=True)
df_series_2.drop(['realtime_start', 'realtime_end'], axis=1, inplace=True)
county_data = pd.merge(county_data, df_series_2, how='outer', on=['date'])
county_data.set_index('date', inplace=True)
print(county_data)

             Income Inequality Housing Inventory: Active Listing Count
date                                                                  
2010-01-01  10.938249904470768                                     NaN
2011-01-01  10.690538383398636                                     NaN
2012-01-01  11.112458162885831                                     NaN
2013-01-01  10.792501714546978                                     NaN
2014-01-01  11.025451749327182                                     NaN
2015-01-01  11.869328638873329                                     NaN
2016-01-01  12.374940823733628                                     NaN
2017-01-01  13.917740895907187                                   177.0
2018-01-01  15.725662194702442                                   192.0
2016-07-01                 NaN                                   269.0
2016-08-01                 NaN                                   258.0
2016-09-01                 NaN                                   252.0
2016-1