# BLS_API  - Tool 3 of 3 for accessing ATUS files

## This file takes in the series list you generated in the BLS Series Finder notebook or from the BLS website and downloads the files.

## It also cleans the files and adds in the demographic data as well as the standard error calculations from the aspects.txt flat file.

### Once downloaded, you can save as a new csv or append it to an existing csv that has the same number of columns.

### This code assumes you have a registered API key from BLS. Unregistered users are limited in what they can download. A key can be aquired here [BLS API Key](https://www.bls.gov/developers/)

In [319]:
#import modules
import requests
import json
import pandas as pd
import pyperclip
from flatten_dict import flatten
import re



## If you have obtained your series list from the BLS website and it is copied to your clipboard. Use the code below to format it for use in the API. 

## DO NOT RUN IF YOU HAVE YOUR LIST IN A CSV FILE


In [320]:
#Paste the data from the clipboard
list_clip = pyperclip.paste()
#remove messy formating
series = list_clip.split(("\r\n"))
#print the list
series

['TUU10101AA01008829',
 'TUU10101AA01013237',
 'TUU10101AA01009381',
 'TUU10101AA01012652',
 'TUU10101AA01010374',
 'TUU10101AA01011174',
 'TUU10101AA01011987',
 'TUU10101AA01012335',
 'TUU10101AA01015010',
 'TUU10101AA01013585',
 'TUU10101AA01015586',
 'TUU10101AA01015925']

## This section is for users who have their list in a csv file or used the BLS Series Selector notebook

## DO NOT RUN IF YOU COPIED YOUR LIST FROM THE CLIPBOARD

The code assumes your list is in the /data directory.

This function will take your csv and make it into a list that you can copy into the json file. You need to copy and paste it from the output cell. json.dumps cannot use a variable as a place holder

In [234]:
#Enter the filename of your list
filename = 'data/List_for_API.csv'

def list_for_json(filename=filename):
    """Takes the list of series and outputs a list for the API request"""
    l = pd.read_csv(filename, index_col=0).reset_index(drop=True)
    l = l.iloc[:,0].tolist()
    #list(map(str, l))
    l = [x.strip(' ') for x in l]
    return l

In [235]:
#Stores the list in a variable for the json request
series = list_for_json()

In [236]:
series

['TUU10101AA01005684',
 'TUU10101AA01006341',
 'TUU10101AA01008860',
 'TUU10101AA01009412',
 'TUU10101AA01010405',
 'TUU10101AA01011205',
 'TUU10101AA01012018',
 'TUU10101AA01012366',
 'TUU10101AA01012683',
 'TUU10101AA01013268',
 'TUU10101AA01013622',
 'TUU10101AA01013988',
 'TUU10101AA01014273',
 'TUU10101AA01014547',
 'TUU10101AA01015041',
 'TUU10101AA01015617',
 'TUU10101AA01015956',
 'TUU10101AA01016313',
 'TUU10101AA01016391']

## The API code starts here! Once you have either pasted or uploaded your series start from here to get your data.

> Put in your start and end year

>Put your registration key in the `"registrationkey":` pair

>Leave `"catalog":"true"` to include demographic data

In [321]:
start_yr = "2008"
end_yr = "2018"
regkey = #Your registration code. If you do not have one, remove this section from the json.dumps
headers = {'Content-type': 'application/json'}
data = json.dumps({"seriesid":series,"startyear":start_yr, "endyear":end_yr, "registrationkey":regkey,"catalog":"true"})
p = requests.post('https://api.bls.gov/publicAPI/v2/timeseries/data/', data=data, headers=headers)


In [322]:
#You can now look at the returned data. This will also inform you if you got an error
#json_data = json.loads(p.text)
json_data = json.loads(p.text)
json_data

{'status': 'REQUEST_SUCCEEDED',
 'responseTime': 1588,
 'message': [],
 'Results': {'series': [{'seriesID': 'TUU10101AA01008829',
    'catalog': {'series_title': 'Avg hrs per day - Personal care activities (includes travel)',
     'series_id': 'TUU10101AA01008829',
     'seasonality': 'Not Seasonally Adjusted',
     'survey_name': 'American Time Use',
     'survey_abbreviation': 'TU',
     'measure_data_type': 'Average hours per day',
     'cps_labor_force_status': 'All persons',
     'demographic_age': '15 years and over',
     'demographic_race': 'All races',
     'demographic_gender': 'Both sexes',
     'demographic_children': 'All persons',
     'demographic_education': 'All education levels',
     'cps_family_children': 'All persons',
     'cps_activity': 'Personal care activities (includes travel)',
     'day_of_week': 'All days',
     'earnings': 'All persons'},
    'data': [{'year': '2018',
      'period': 'A01',
      'periodName': 'Annual',
      'latest': 'true',
      'valu

## This next section will take the nested dictionary and produce a DataFrame.

**You can edit this with your own column names

In [323]:
#Flatten the nested dictionary into flat dictionary
flat_j = flatten(json_data, enumerate_types=(list,))

## This section takes the flat dictionary and orders it into a dataframe with the desired column names.

>column_names = the names of the columns you want to extract from your dictonary. These do not need to match the data and should make sense for analysis.

>dict1 = the dictionary the flattened data will be put into. Use the same column names and order as the dictionary.

>num_ser = calculated from the number of series you input into the API

>num_data = the number of years that collected +1.

**These values will be used to iterate through the flattened dictionary and place the data in the correct columns.

This section needs some optimization for allowing users to customize it more easily. Currently, it requires the user to enter the names manually.


In [324]:
column_names = ['series_id', 'year', 'education', 'gender', 'ethnicity', 'age', 'activity', 'avg_time_per_week']
df = pd.DataFrame(columns=column_names)
dict1={'series_id':[],'year':[],'education':[],'gender':[],'ethnicity':[],'age':[],'activity':[],'avg_time_per_week':[]}
num_ser=[i for i in range(0,len(series))]
num_data=[i for i in range(0,11)]

for i in num_ser:
    for j in num_data:
        series_id = flat_j['Results', 'series', num_ser[i], 'seriesID']
        dict1['series_id'].append(series_id)
        year=flat_j[('Results', 'series', num_ser[i], 'data', num_data[j], 'year')]
        dict1['year'].append(year)
        education = flat_j['Results', 'series', num_ser[i], 'catalog', 'demographic_education']
        dict1['education'].append(education)
        gender = flat_j['Results', 'series', num_ser[i], 'catalog', 'demographic_gender']
        dict1['gender'].append(gender)
        ethnicity = flat_j['Results', 'series', num_ser[i], 'catalog', 'demographic_race']
        dict1['ethnicity'].append(ethnicity)
        age = flat_j['Results', 'series', num_ser[i], 'catalog', 'demographic_age']
        dict1['age'].append(age)
        activity = flat_j['Results','series',num_ser[i],'catalog','cps_activity']
        dict1['activity'].append(activity)
        value = flat_j['Results', 'series', num_ser[i], 'data', num_data[j], 'value']
        dict1['avg_time_per_week'].append(value)

        
        

In [325]:
#loads the dictionary into a DataFrame
data = pd.DataFrame.from_dict(dict1)

In [326]:
data

Unnamed: 0,series_id,year,education,gender,ethnicity,age,activity,avg_time_per_week
0,TUU10101AA01008829,2018,All education levels,Both sexes,All races,15 years and over,Personal care activities (includes travel),9.58
1,TUU10101AA01008829,2017,All education levels,Both sexes,All races,15 years and over,Personal care activities (includes travel),9.59
2,TUU10101AA01008829,2016,All education levels,Both sexes,All races,15 years and over,Personal care activities (includes travel),9.58
3,TUU10101AA01008829,2015,All education levels,Both sexes,All races,15 years and over,Personal care activities (includes travel),9.64
4,TUU10101AA01008829,2014,All education levels,Both sexes,All races,15 years and over,Personal care activities (includes travel),9.58
...,...,...,...,...,...,...,...,...
127,TUU10101AA01015925,2012,All education levels,Both sexes,All races,15 years and over,"Other activities, not elsewhere classified (in...",0.24
128,TUU10101AA01015925,2011,All education levels,Both sexes,All races,15 years and over,"Other activities, not elsewhere classified (in...",0.29
129,TUU10101AA01015925,2010,All education levels,Both sexes,All races,15 years and over,"Other activities, not elsewhere classified (in...",0.35
130,TUU10101AA01015925,2009,All education levels,Both sexes,All races,15 years and over,"Other activities, not elsewhere classified (in...",0.24


## This section adds in the Standard Error information from the aspects flat file
### The formula used to generate this statistic is found on page 40 of the ATUS users guide, which can be downloaded here: [ATUS Users Guide](https://www.bls.gov/tus/atususersguide.pdf)
### This is optional, but useful when evaluating the data and is not included in the API.


In [328]:
#Function for returning the standard error
#read in your std_err file
std_err = pd.read_csv('data/aspect.txt', sep="\t")
#The original files from ATUS have extra spaces in the series id column that can cause issues
std_err['series_id'] = [x.strip(' ') for x in std_err['series_id']]

def add_std_err(df = data,error = std_err):
    '''This function takes in the dataframe from the API and adds in the
    standard errors from the ATUS aspect file. It returns a new dataframe with 
    the data+ the error and the std_error dataframe for use in the error checker.
    The defaults are set for the used in this notebook'''
    error = error[['series_id', 'year', 'value']].copy(deep=True).reset_index(drop=True)
    error = error.applymap(str)
    error = error.rename(columns = {'value': 'std_err'})
    merge = pd.merge(df,error, left_on=['series_id', 'year'], right_on=['series_id', 'year'], how='left').reset_index(drop=True).copy(deep=True)
    return merge, error

In [329]:
#Apply the funtion. Here I applied it to the data I have already generated
#The first return is the datafile with the estimates and the new error column
#The second return is the std_err file that can be used in the error checker
data,std_err = add_std_err()

In [330]:
#You should now see the std_err column at the far right
data

Unnamed: 0,series_id,year,education,gender,ethnicity,age,activity,avg_time_per_week,std_err
0,TUU10101AA01008829,2018,All education levels,Both sexes,All races,15 years and over,Personal care activities (includes travel),9.58,0.034
1,TUU10101AA01008829,2017,All education levels,Both sexes,All races,15 years and over,Personal care activities (includes travel),9.59,0.025
2,TUU10101AA01008829,2016,All education levels,Both sexes,All races,15 years and over,Personal care activities (includes travel),9.58,0.028
3,TUU10101AA01008829,2015,All education levels,Both sexes,All races,15 years and over,Personal care activities (includes travel),9.64,0.028
4,TUU10101AA01008829,2014,All education levels,Both sexes,All races,15 years and over,Personal care activities (includes travel),9.58,0.027
...,...,...,...,...,...,...,...,...,...
127,TUU10101AA01015925,2012,All education levels,Both sexes,All races,15 years and over,"Other activities, not elsewhere classified (in...",0.24,0.011
128,TUU10101AA01015925,2011,All education levels,Both sexes,All races,15 years and over,"Other activities, not elsewhere classified (in...",0.29,0.014
129,TUU10101AA01015925,2010,All education levels,Both sexes,All races,15 years and over,"Other activities, not elsewhere classified (in...",0.35,0.012
130,TUU10101AA01015925,2009,All education levels,Both sexes,All races,15 years and over,"Other activities, not elsewhere classified (in...",0.24,0.009


## This function runs a random check to make sure that the mapping was accurate.

### The defaults are set to run using the names of the files generated from the add_std_err() function.



In [331]:
def mapping_error_checker(df = data, error = std_err):
    """This function takes a random row from the df file and pulls the series id
    and year. It then checks to make sure that the std_error value matches in both 
    the df and the error file and returns a message"""
    rand_samp =  df.sample().reset_index(drop=True)
    test_ID = rand_samp['series_id']
    test_year = rand_samp['year']
    first = error.loc[(error['series_id']== test_ID[0]) & (error['year'] == test_year[0])].reset_index(drop=True)
    second = df.loc[(data['series_id']==test_ID[0]) & (df['year'] == test_year[0])].reset_index(drop=True)
    if first['std_err'].equals(second['std_err']) == True:
        print('Everything looks good!')
    else:
        print('Mapping Error!')

In [332]:
#Run the check
mapping_error_checker()

Everything looks good!


## This function will save the data to a new file or append it to an existing file 

In [333]:
#Write or append
filename = 'data/all_by_year.csv' #define the name of your file here

def new_or_append(choice,df=data,filename = filename):
    """Function creates either a new csv file or appends the list to 
    an existing file. choice options are 'new' or 'append'. 'df' is the 
    name of the DataFrame you want to save or append.
    The filename is defined in a variable or can be entered manually"""
    if choice == 'new':
        df.to_csv(filename)
    elif choice == 'append':
        df.to_csv(filename, mode = 'a', header = False)
        test_append = pd.read_csv(filename, index_col=0).reset_index(drop=True)
        test_append.to_csv(filename, mode = 'w', header = False)

In [334]:
#Run the function
new_or_append('new')