# Downloading Data with APIs

* Contact: Lachlan Deer, @ldeer [econgit], @lachlandeer [github]

In [1]:
import requests
import json
import pandas as pd
import csv

## Import BLS Area Codes

In [2]:
bls_area_codes = pd.read_csv('data/bls_area_codes.csv', sep=",")

how does this data look?

In [3]:
bls_area_codes.head()

Unnamed: 0,area_type_code,area_code,area_text
0,A,ST0100000000000,Alabama
1,A,ST0200000000000,Alaska
2,A,ST0400000000000,Arizona
3,A,ST0500000000000,Arkansas
4,A,ST0600000000000,California


This suggests `area_type_code==A` should subset all the states. This is indeed true, if we look at the help files of the BLS website https://www.bls.gov/help/def/la.htm

Let's extract only the state codes that we want:

In [4]:
states = bls_area_codes.query('area_type_code == "A"')

and check that we have a data frame of the expected shape

In [5]:
states.shape

(52, 3)

## Generating Series IDs

To extract data from the BLS API we have to pass the an ID for the series that we want to get. To start - take a look here:https://www.bls.gov/help/hlpforma.htm#LA to get a sense of how we need to assemble a series ID.

We learn that:
* `LA` denotes the local area unemployment data
* `U` denotes unadjusted data, while `S` denotes seasonally unadjusted
* We then have to pass an area code, which comprises
    * the area codes from the BLS file above actually completely specifies the area code we need
* Different employment data have IDs we need to specify
    * 03 is unemployment rate	
    * 04 is unemployment	
    * 05 is employment	
    * 06 is labor force	

So if we want the Unemployment rate for California, the code is

In [6]:
CA_unemp = 'LAU' + 'ST0600000000000' + '03'
print(CA_unemp)

LAUST060000000000003


## Setting up the API Call - Downloading a Single Series

now we need to set up an call to the BLS API to return the data to us. If we loosely follow their instructions: 

We need an API key to make the amount of calls we will make today. Register here: https://data.bls.gov/registrationEngine/.

Set up a python file called `api_key.py` that contains one variable BLS_KEY that contains our private key. Make sure this file is ignored in any version control software so you don't share it with the world 

In [7]:
# Load BEA API key
from api_key import BLS_KEY

In [8]:
# set up api call
headers = {'Content-type': 'application/json'}
data_request = json.dumps({"seriesid":[ CA_unemp ],
                           "startyear": "2000", 
                           "endyear": "2016",
                           "registrationkey": BLS_KEY})

# make the call
payload = requests.post('https://api.bls.gov/publicAPI/v2/timeseries/data/', 
                        data=data_request, headers=headers)
# get the data
json_data = json.loads(payload.text)

Let's see what the data looks like:

In [16]:
#json_data

Wow, thats a little ugly but has all the data that we need. Let's find a way to get that into a nice table format..

In [10]:
fields = ["state","year","period","value","footnotes"]
data   = json_data['Results']['series']

now getting stuff out of data is a little tricky because of how its returned, its a list of size 1

In [11]:
seriesID = data[0]['seriesID']
print(seriesID)

LAUST060000000000003


we will put the data in a csv file with the following destination:

In [12]:
outfile = 'data/bls-employment-state/' + seriesID + '.csv'

For each row of the data, we want the State Name, plus the fields specified above as columns.
To make the code readily extendable to extracting multiple series, we will get the state code from the `seriesID` and look up the corresponding State Name from the BLS area code data set we have in the session 

In [13]:
iState = seriesID[3:-2]
state = states.query('area_code == @iState')['area_text'] \
        .reset_index(drop=True).get_value(0, 'area_text')

print(state)

California


Now we are ready to write the data to a csv file:

In [14]:
for series in json_data['Results']['series']:
    with open(outfile, 'w') as iFile:
                writer = csv.writer(iFile)
                writer.writerow(fields)

                for item in series['data']:
                    year = item['year']
                    period = item['period']
                    value = item['value']
                    footnotes=""
                    for footnote in item['footnotes']:
                        if footnote:
                            footnotes = footnotes + footnote['text'] + ','

                    if 'M01' <= period <= 'M12':
                        writer.writerow([state,year,period,value,footnotes[0:-1]])

    iFile.close()  # good idea to close if you're done with it

Did that work?

In [15]:
import glob
print(glob.glob("data/bls-employment-state/*.csv"))

['data/bls-employment-state/LAUST060000000000003.csv']


Cool! Now lets try and get a bunch of data using what we just learned.

## Setting up the API Call - Downloading Multiple Series

Now we want to get the unemployment rate, qty of unemployed, qty employed and the size of the labor force for each state.

One important constraint we will face is that we can only request 50 series in each API call. We will get around this by writing some code that will chunk up our list of series into groups of 50 and send them set by set to the BLS API.


### Assembling All Series

We can use a lambda function to construct lists of all the series of data we need:

In [17]:
unempRate_codes   = list(states.area_code.apply(lambda x: 'LAU' + x + '03'))
unemp_codes       = list(states.area_code.apply(lambda x: 'LAU' + x + '04'))
employment_codes  = list(states.area_code.apply(lambda x: 'LAU' + x + '05'))
labourForce_codes = list(states.area_code.apply(lambda x: 'LAU' + x + '06'))

allSeries = unempRate_codes + unemp_codes + employment_codes + labourForce_codes

### Chunking `allSeries` into groups of 50


In [20]:
chunkedList = [allSeries[i:i+50] for i in range(0,len(allSeries),50)]

so now we have a set of lists:

In [26]:
len(chunkedList)

5

and each list has size

In [28]:
for iList in range(0,len(chunkedList)):
    print('List', iList, 'has', len(chunkedList[iList]), 'series')

List 0 has 50 series
List 1 has 50 series
List 2 has 50 series
List 3 has 50 series
List 4 has 8 series


### Making multiple API Calls efficiently

To send all these requests to the API, we will:

* Loop over each chunk, `iChunk` to
    * Build the API call for each chunk
    * For each series returned, we will clean the data to create a csv
    
The way we have set up our code above means we can nest it all inside a loop over the chunks as follows:

In [29]:
# Load BEA API key
from api_key import BLS_KEY

for iChunk in range(0,len(chunkedList)):
    data = chunkedList[iChunk]
    print('Completing chunk:', iChunk)
    
    # set up api call
    headers = {'Content-type': 'application/json'}
    data_request = json.dumps({"seriesid":  data,
                               "startyear": "2000", 
                               "endyear":   "2016",
                               "registrationkey": BLS_KEY})
    payload = requests.post('https://api.bls.gov/publicAPI/v2/timeseries/data/', 
                              data=data_request, headers=headers)
    json_data = json.loads(payload.text)
    
    # save the output to a csv file
    fields=["state","year","period","value","footnotes"]

    for series in json_data['Results']['series']:

        seriesID = series['seriesID']
        outfile = 'data/bls-employment-state/' + seriesID + '.csv'

        iState = seriesID[3:-2]
        state = states.query('area_code == @iState')['area_text'] \
                    .reset_index(drop=True).get_value(0, 'area_text')

        with open(outfile, 'w') as iFile:
            writer = csv.writer(iFile)
            writer.writerow(fields)

            for item in series['data']:
                year = item['year']
                period = item['period']
                value = item['value']
                footnotes=""
                for footnote in item['footnotes']:
                    if footnote:
                        footnotes = footnotes + footnote['text'] + ','

                if 'M01' <= period <= 'M12':
                    writer.writerow([state,year,period,value,footnotes[0:-1]])

        iFile.close()  # good idea to close if you're done with it

Completing chunk: 0
Completing chunk: 1
Completing chunk: 2
Completing chunk: 3
Completing chunk: 4


Let's checkout our results quickly

In [30]:
print(glob.glob("data/bls-employment-state/*.csv"))

['data/bls-employment-state/LAUST010000000000005.csv', 'data/bls-employment-state/LAUST240000000000003.csv', 'data/bls-employment-state/LAUST200000000000006.csv', 'data/bls-employment-state/LAUST470000000000005.csv', 'data/bls-employment-state/LAUST260000000000003.csv', 'data/bls-employment-state/LAUST360000000000004.csv', 'data/bls-employment-state/LAUST480000000000006.csv', 'data/bls-employment-state/LAUST110000000000005.csv', 'data/bls-employment-state/LAUST490000000000004.csv', 'data/bls-employment-state/LAUST130000000000006.csv', 'data/bls-employment-state/LAUST720000000000005.csv', 'data/bls-employment-state/LAUST480000000000005.csv', 'data/bls-employment-state/LAUST170000000000003.csv', 'data/bls-employment-state/LAUST120000000000004.csv', 'data/bls-employment-state/LAUST180000000000004.csv', 'data/bls-employment-state/LAUST060000000000005.csv', 'data/bls-employment-state/LAUST060000000000003.csv', 'data/bls-employment-state/LAUST720000000000003.csv', 'data/bls-employment-state/

## Final Thoughts

Getting data from APIs takes a little getting used to, and the first time you do it for a new API will take time, they don't all look alike and work exactly the same way.

*BUT*, using them means the data collection (where possible) becomes reproducible and the output can be made to be neat quite quickly. This is a **big** positive in my book, and reason enough to spend an hour or so learning how to use an API to get data from the data sources that have it available.

The remaining notebooks in this directory will use this data, along with some supplementary data to demonstrate how to use the `pandas` library to work efficiently with data sets.