![Image of Yaktocat](https://raw.githubusercontent.com/johnmrudolph/jupyter_blog/master/headers/eia_api_header.png)

<h2 style="color:#003f5b">My setup & workflow</h2>

This is my first post so before I dive into the content I thought it would be a good idea to give a short description of my setup and workflow. I do most of my work in linux (Ubuntu 14.04) and run Python 3.5. Most of my development is done via text editor (Sublime), IPython and command line. If I need to share or demo code then I use Jupyter.

<h2 style="color:#003f5b">What's an API?</h2>

If you already know the answer to this question then you may not find much use in reading the rest of this post BUT if you don't know what I'm talking abou then I suggest you keep reading since APIs can make life much easier for data science types.

APIs are important because they allow for ease of communication between applications. If your application needs to interact with another application than an API helps to define the rules of this interaction. Each API will have its own nuances but generally you request information from an API by passing in parameters via a URL. Interacting with an API is a much easier way to grab dating than trying to scrub HTML.

<h2 style="color:#003f5b">A class to interact with an API</h2>

Most public APIs limit the size and number of request that you can make. There are tools that can help ensure that you are a curtious API user. The request-cache package is great for limiting unnecsarry API calls: https://pypi.python.org/pypi/requests-cache. You can also mock up API requests and responses which is helpful during development and unit testing: https://realpython.com/blog/python/testing-third-party-apis-with-mocks/.

Before making a request you usually need an API key. Here is a link to the EIA Open Data page where you can obtain an EIA API key: http://www.eia.gov/opendata/register.cfm. We are going to have to import some packages to help us make the request and parse out the information that is returned.

In [1]:
import requests # provides an easy way to send an HTTP/1.1 request and manage the response

Let's start by defining a class for our API call. If you're like me and you come from more of a functional programming background then classes may seem a bit foreign. We could use functions as a means to the same end but classes offer a more practical way to keep track of information that we might need later on or we want might to inherit from this class at some point in the future to create a new class which interacts with the API differently.

In [2]:
class GetSeries(object):
    '''
    A class to call the EIA API and capture the response
    '''
    
    # setting as class variable because same url will be used for all calls
    eia_url = 'http://api.eia.gov/series/'
    
    # (1)
    def __init__(self, api_key, series_id):
        '''
        :param api_key: an valid API key provided by EIA
        :param series: a valid EIA series ID
        '''
        self.api_key = api_key
        self.series_id = series_id
        self.response = self.get_response()
    
    # (2)
    def get_response(self):
        '''
         Calls the EIA API and returns response object
        '''
        api_parms = (
            ('api_key', self.api_key),
            ('series_id', self.series_id),
        )
        return requests.get(self.eia_url, params=api_parms)

I numbered the 3 main elements of the GetSeries class which are worth explaining:

(1) When the class is instantiated 4 attributes are created:
    1. api_key: the api_key used to make the request
    2. series_id: the data that we want
    3. response: a request object returned by the EIA api call
    4. json: a converted json object from the response object
    
(2) The api_key and series_id are packed into a tuple of tuples and then unloaded into a URL and sent to the API using the requests package. The URL is interpreted by the EIA API and the relevent information is returned back as a response object.

<h2 style="color:#003f5b">What does the API response look like?</h2>

The api_key and series_id are required inputs so these need to be defined before the GetSeries class can be instantiated.

In [3]:
# in production you would want to keep the API key somewhere secret
api_key = '5F4109570C68FDE20F42C25F5152D879'

# a list of valid EIA series can be found here: http://www.eia.gov/opendata/qb.cfm
# I'll use "Natural gas lower 48 weekly"
series_id = 'NG.NW2_EPG0_SWO_R48_BCF.W'

# create a new GetSeries object based on the paramters defined above
ng_stor = GetSeries(api_key, series_id)

Above we created a new instance of the GetSeries class called ng_stor. Lets start by taking a look at some of the information that we can get from the response attribute of the GerSeries class.

In [4]:
# if we sent a valid request to the API then we should get a Reponse [200] status code
print('The API status code is {}'.format(ng_stor.response))
# this is the URL that we sent to the api
print('The URL sent to the API is {}'.format(ng_stor.response.url))
# this is some information about the response that the API sent to us
print('The API header is {}'.format(ng_stor.response.headers))

The API status code is <Response [200]>
The URL sent to the API is http://api.eia.gov/series/?api_key=5F4109570C68FDE20F42C25F5152D879&series_id=NG.NW2_EPG0_SWO_R48_BCF.W
The API header is {'Content-Language': 'en', 'Content-Encoding': 'gzip', 'Connection': 'keep-alive', 'Server': 'Apache', 'Content-Length': '2248', 'Date': 'Fri, 09 Sep 2016 03:56:16 GMT', 'Vary': 'Accept-Encoding', 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*', 'Expires': 'Fri, 09 Sep 2016 03:56:16 GMT', 'Pragma': 'no-cache', 'Cache-Control': 'max-age=0, no-cache, no-store'}


The headers attribute of the response object gives us some helpful information about the API server and the content that was sent. The 'Content-Type" is json so the Requests library can convert the response to json object directly.

In [5]:
# create a new variable to make it easier to work with the json
ng_stor_json = ng_stor.response.json()

<h2 style="color:#003f5b">What can we do with the JSON?</h2>

Printing the entire json object is a bit messy so lets examine the json by using keys. The API documentation: http://www.eia.gov/opendata/commands.cfm provides detail about the json structure.

In [6]:
# create variables for the request and the series
request = ng_stor_json['request']
series = ng_stor_json['series']

print('Here is a list of all the items avaialble to us from json:')

# print each key in the requ
print('request:')
for key in request.keys():
    print('\t{}'.format(key))
    
# print each key in series - note that the level below series is a dictinary nested in a list - series[0]
print('series:')
for key in series[0].keys():
    print('\t{}'.format(key))

Here is a list of all the items avaialble to us from json:
request:
	series_id
	command
series:
	updated
	source
	units
	f
	end
	start
	series_id
	data
	name
	copyright
	unitsshort
	description


The data attribute that is nested under series is what we are after so lets extract it and convert to a pandas dataframe and create a datetime index to make it a proper timeseries.

<h2 style="color:#003f5b">Parse json to pandas datafrane</h2>

In [7]:
# we need to import pandas - ideally this import would happen at the start of the script
# and datetime to convert a string to a timestamp
import pandas as pd
from datetime import datetime

I think a function or class would work to parse the json into a dataframe. I'm going to use a class because I used a class to interact with the API so might as well keep the object oriented thing going....

In [8]:
class CreateDataFrame(object):
    '''Creates the dataframe for Energy API call'''
    
    # (1)
    def __init__(self, json):
        """:param json: eia json"""
        self.json = json
        self.series = self.json['series']
        self.data = self.series[0]['data']
        self.df = self.create_dataframe()
    
    # (2)
    def create_dataframe(self):
        """Function to create dataframe from json['series'] """
        values = [x[1] for x in self.data]
        dates = self.get_dates()
        return pd.DataFrame(values, index=dates, columns=['values'])
    
    # (3)
    def get_dates(self):
        """Parses text dates to datetime index"""
        freq = {'A': '%Y', 'M': '%Y%m', 'W': '%Y%m%d',
                'D': '%Y%m%d', 'H': '%Y%m%d %H'}
        date_list = []
        for x in self.data:
            # need to add this ugly bit to remove hourly time format from EIA
            time = x[0].replace('T', ' ')
            time = time.replace('Z', '')
            time = datetime.strptime(time, freq[self.series[0]['f']])
            date_list.append(time.strftime('%Y-%m-%d %H:%M:%S'))
        return date_list

(1) When the class is instantiated 4 attributes are created:
    1. json: the json that was passed in from the EIA API
    2. series: the series key of the json object
    3. data: the data key that is embedded in series
    4. df: a pandas dataframe
    
(2) A class method used to take the data key values from json and convert it to a pandas dataframe. The data key holds a list of lists that contains a date and value. This code extracts the value from the list and calls another method to extract the dates and convert to datetime. The dataframe is created by using the converted datetime to assign an index.

(3) A class method used to convert a string representation of dates to a datetime object. I built this to handle the different frequencies that could potentially be passes in since EIA data can be annual, monthly, weekly, daily or hourly.

Lets create the dataframe to check if the the CreateDataFrame class is working as expected

In [9]:
# here we create
ng_stor_df = CreateDataFrame(ng_stor_json).df
# print out the first 20 obs
print(ng_stor_df.head(n=20))

                     values
2016-09-02 00:00:00    3437
2016-08-26 00:00:00    3401
2016-08-19 00:00:00    3350
2016-08-12 00:00:00    3339
2016-08-05 00:00:00    3317
2016-07-29 00:00:00    3288
2016-07-22 00:00:00    3294
2016-07-15 00:00:00    3277
2016-07-08 00:00:00    3243
2016-07-01 00:00:00    3179
2016-06-24 00:00:00    3140
2016-06-17 00:00:00    3103
2016-06-10 00:00:00    3041
2016-06-03 00:00:00    2972
2016-05-27 00:00:00    2907
2016-05-20 00:00:00    2825
2016-05-13 00:00:00    2754
2016-05-06 00:00:00    2681
2016-04-29 00:00:00    2625
2016-04-22 00:00:00    2557


There it is. We have our dataframe and can model away.