## U.S Bureau of Labor Statistics

This website uses an API to extract data in json format :  https://www.bls.gov/developers/

Sample code provided to extract data using python: https://www.bls.gov/developers/api_python.htm#python2

BLS Public Data API Signatures: https://www.bls.gov/developers/api_signature_v2.htm

### Series ID Formats

There are several datasets that can be extracted using the API. 
* Series ID Formats: https://www.bls.gov/help/hlpforma.htm#OEUS
  * Local Area Unemployment Rate: https://www.bls.gov/help/hlpforma.htm#LA
  * Inflation & Prices: Consumer Price Index - Avg price data: https://www.bls.gov/help/hlpforma.htm#AP
* For every series mentioned - use the links to understand what data you want the series to extract (Survey Overview section)

### Method 1 : Using pyPI package

The new Python library for the Bureau of Labor Statistics API is still being developed, but it looks promising. In order to retrieve data, use the get_series() function, which has three arguments: a series ID (or multiple IDs), a starting year and an end year.

#### Series Overview: 

Example
  
  **Series ID    LAUCN281070000000003**
  
  **Positions:    Value:            Field Name**
  
  `1-2          LA               Prefix (LA-Local Area Unemployment)`
  
  `3            S                Seasonal Adjustment Code (S-Seasonally adjusted)`
  
  `4-18         ST1200000000000  Area Code (Florida state)`
  
  `19-20        03               Measure Code (03-Unemployment Rate)`

In [0]:
import bls
#use get_series and specify start and end year
FL_unemployment_rate = bls.get_series('LASST120000000000003', 2000, 2020)
FL_unemployment_rate.head()

The result is a pandas series with the year-month and unemployment rate as two elements

### Method 2: Using Requests

We can locate the series we want and import it into Pandas. 
* In order to find the series name and fields, refer to the Series ID Formats page. 
* Once you know the series ID for the data we want to request, simply append the ID to the version one API URL, https://api.bls.gov/publicAPI/v1/timeseries/data/<SERIES_NAME> 
* Use the `requests` Python package to return the data in dictionary or list format

In [0]:
import pandas as pd
import json
import requests

url = 'https://api.bls.gov/publicAPI/v1/timeseries/data/LASST120000000000003' # CUUR0000SA0 - 2nd example using the Inflation & Prices average price series
data = requests.get(url).json()
print('Status: ' + data['status'])

print(data.keys())

In [0]:
data

In [0]:
data['Results']

In [0]:
#Look into the keys under Results
data['Results'].keys()

In [0]:
#Look into series ID under series
for i in data['Results']['series']:
  print(i['seriesID'])

In [0]:
data['Results']['series']  

We are interested in extracting the `data` element of the json so we convert it into pandas table

In [0]:
df=pd.json_normalize(data['Results']['series'],'data')
df.head(10)

Unnamed: 0,year,period,periodName,latest,value,footnotes
0,2020,M07,July,True,11.3,"[{'code': 'P', 'text': 'Preliminary.'}]"
1,2020,M06,June,,10.3,[{}]
2,2020,M05,May,,13.7,[{}]
3,2020,M04,April,,13.8,[{}]
4,2020,M03,March,,4.4,[{}]
5,2020,M02,February,,2.8,[{}]
6,2020,M01,January,,2.8,[{}]
7,2019,M12,December,,2.9,[{}]
8,2019,M11,November,,2.8,[{}]
9,2019,M10,October,,2.9,[{}]


In [0]:
#Drop the unwanted columns
df = df.drop(['latest', 'footnotes'],axis=1)
#Rename column
df = df.rename(columns={"value":"unemploymentRate"})
df.head()

Unnamed: 0,year,period,periodName,unemploymentRate
0,2020,M07,July,11.3
1,2020,M06,June,10.3
2,2020,M05,May,13.7
3,2020,M04,April,13.8
4,2020,M03,March,4.4


#### Method 3

This code writes the data to text files, named according to the series they're derived from. You can simply replace the values for series ID and years in the sample code if you like, but you'll eventually find that limits your options a bit.

In [0]:
import requests
import json
import prettytable
import pandas as pd
import numpy as np

headers = {'Content-type': 'application/json'}
data = json.dumps({"seriesid": ['CUUR0000SA0','SUUR0000SA0'],"startyear":"2000", "endyear":"2020"})
p = requests.post('https://api.bls.gov/publicAPI/v2/timeseries/data/', data=data, headers=headers)
json_data = json.loads(p.text)
for series in json_data['Results']['series']:
    x=prettytable.PrettyTable(["series id","year","period","value","footnotes"])
    seriesId = series['seriesID']
    for item in series['data']:
        year = item['year']
        period = item['period']
        value = item['value']
        footnotes=""
        for footnote in item['footnotes']:
            if footnote:
                footnotes = footnotes + footnote['text'] + ','
        if 'M01' <= period <= 'M12':
            x.add_row([seriesId,year,period,value,footnotes[0:-1]])
    output = open(seriesId + '.txt','w')
    output.write (x.get_string())
    output.close()

In [0]:
json_data['Results']['series']

In [0]:
multi_df=pd.json_normalize(json_data['Results']['series'])
multi_df

Unnamed: 0,seriesID,data
0,CUUR0000SA0,"[{'year': '2009', 'period': 'M12', 'periodName..."
1,SUUR0000SA0,"[{'year': '2009', 'period': 'M12', 'periodName..."


In [0]:
inflation_and_prices = multi_df[multi_df['seriesID']=="CUUR0000SA0"]
inflation_and_prices

Unnamed: 0,seriesID,data
0,CUUR0000SA0,"[{'year': '2009', 'period': 'M12', 'periodName..."


In [0]:
inflation_series=pd.json_normalize(inflation_and_prices.explode('data')['data'])
inflation_series

Unnamed: 0,year,period,periodName,value,footnotes
0,2009,M12,December,215.949,[{}]
1,2009,M11,November,216.330,[{}]
2,2009,M10,October,216.177,[{}]
3,2009,M09,September,215.969,[{}]
4,2009,M08,August,215.834,[{}]
...,...,...,...,...,...
115,2000,M05,May,171.5,[{}]
116,2000,M04,April,171.3,[{}]
117,2000,M03,March,171.2,[{}]
118,2000,M02,February,169.8,[{}]


**References**

* Series Information : https://www.bls.gov/help/hlpforma.htm#LA
* Ref article: http://danstrong.tech/blog/BLS-API/