![Image of Yaktocat](https://raw.githubusercontent.com/johnmrudolph/jupyter_blog/master/headers/eia_api_header.png)

<h2 style="color:#003f5b">My setup & workflow</h2>

This is my first post so before I dive into the content I thought it would be a good idea to give a short description of my setup and workflow. I do most of my work in linux (Ubuntu 16.04) and run Python 3.5. Most of my development is done via text editor (Sublime), IPython and command line. If I need to share or demo code then I use Jupyter but otherwise find it a bit clunky for development.

<h2 style="color:#003f5b">What's an API?</h2>

APIs are important because they allow for ease of communication between applications. A good API can make life easier because the API has built in methods to do "stuff" which otherwise might require some tricky programming. For data science types, interacting with an API is a much easier way to dynamcally grab data from an online source than trying to scrub HTML or download some kind of static file. Each API will have its own nuances but generally you request information from an API by passing in parameters via a URL - basically a query string.

<h2 style="color:#003f5b">Creating a class to interact with an API</h2>

Most public APIs limit the size and number of request that you can make. There are tools that can help ensure that you are a respectful API user. The request-cache package is great for limiting unnecsarry API calls: https://pypi.python.org/pypi/requests-cache. You can also mock up API requests and responses which is helpful during development and unit testing: https://realpython.com/blog/python/testing-third-party-apis-with-mocks/.

Before making a request you usually need an API key. Here is a link to the EIA Open Data page where you can obtain an EIA API key: http://www.eia.gov/opendata/register.cfm. We are going to use the requests package to call the EIA API and handle the response.

In [3]:
import requests

We should probably take a look at the EIA API documentation: http://www.eia.gov/opendata/commands.cfm so that we know what a valid call should look like. Here is our example: http://api.eia.gov/series/?series_id=sssssss&api_key=YOUR_API_KEY_HERE[&num=][&out=xml|json]. So the parameters needed by the API follow the "?" in the URL string. Looks like we need a series_id and an API_KEY. The documentation also tells us that there are some optional parameters that we can pass in to filter the data that is returned to us in the response - we'll keep this in mind as we develop.

Let's start by creating a class to handle the API call.

In [8]:
class GetSeries(object):
    '''
    A class to call the EIA API and capture the response
    '''
    
    # setting as class variable because the same url will be used for all calls
    eia_url = 'http://api.eia.gov/series/'
    
    # (1)
    # **kwargs will allow us to accept an unspecified number of keyword arguments
    # this gives us flexibility to modify the class later on to handle optional parameters
    def __init__(self, **kwargs):
        '''
        valid kwargs:
        :param api_key: an valid API key provided by EIA
        :param series: a valid EIA series ID
        '''
        self.kwargs = kwargs
        self.parms = self.create_parms()
        self.response = self.get_response()
        
    # (2)
    def create_parms(self):
        '''
        Convert kwargs into a list to pass into api call
        '''
        try:
            kwargs_list = [['api_key', self.kwargs['api_key']]]
            kwargs_list.append(['series_id', self.kwargs['series_id']])
        except KeyError:
            pass
        return kwargs_list
    
    # (3)
    def get_response(self):
        '''
         Calls the EIA API and returns response object
        '''
        api_parms = [tuple(parm) for parm in self.parms]
        return requests.get(self.eia_url, params=api_parms)

I numbered the 3 main elements of the GetSeries class which are worth explaining:

(1) I set up the init to accept keyword arguments to allow for flexibilty. We will only be making an API call using 2 parameters: 1. An api_key and 2. a series_id. When the class is instantiated 3 attributes are created:
    1. kwargs: this is a dictionary of key word arguments that can be passed into the class
    2. parms: converts the kwargs into a list to pass into the api call
    3. response: a response object returned from the EIA API
    
(2) When we pass in kwargs we get a dicitionary of key value pairs based on whatever the user has supplied. This means that we have to check for and handle the keyword arguments. In this case we need an api_key and series_id which we can check for using a try and accept. If both arguements have been passed in then we will create a list of key, value pairs. If neither of these keyword arguments have been passed in then we will raise a key error.

(3) The api_key and series_id list is packed into a tuple of tuples andf then loaded into a URL using the request package. Note the the request package requires a tuple format of the paramters. The URL is interpreted by the EIA API and the relevent information is returned back as a response object.

<h2 style="color:#003f5b">What does the API response look like?</h2>

The api_key and series_id are required inputs so these need to be defined before the GetSeries class can be instantiated.

In [18]:
# in production you would want to keep the API key somewhere secret
api_key = 'YOUR_API_KEY'

# a list of valid EIA series can be found here: http://www.eia.gov/opendata/qb.cfm
# I'll use "Natural gas lower 48 weekly"
series_id = 'NG.NW2_EPG0_SWO_R48_BCF.W'

# create a new GetSeries object based on the paramters defined above
ng_stor = GetSeries(api_key=api_key, series_id=series_id)

Above we created a new instance of the GetSeries class called ng_stor. Lets start by taking a look at some of the information that we can get from the response attribute of the GerSeries class.

In [17]:
# if we sent a valid request to the API then we should get a Reponse [200] status code
print('The API status code is {}'.format(ng_stor.response))
# this is the URL that we sent to the api - note I didn't show in the output below bc it has my API key embedded!
print('The URL sent to the API is {}'.format(ng_stor.response.url))
# this is some additional information about the response that the API sent to us
print('The API header is {}'.format(ng_stor.response.headers))

The API status code is <Response [200]>
The API header is {'Access-Control-Allow-Origin': '*', 'Pragma': 'no-cache', 'Content-Encoding': 'gzip', 'Content-Type': 'application/json', 'Content-Length': '2264', 'Date': 'Fri, 30 Sep 2016 03:44:11 GMT', 'Content-Language': 'en', 'Server': 'Apache', 'Vary': 'Accept-Encoding', 'Cache-Control': 'max-age=0, no-cache, no-store', 'Expires': 'Fri, 30 Sep 2016 03:44:11 GMT', 'Connection': 'keep-alive'}


The headers attribute of the response object gives us some helpful information about the API server and the content that was sent. The 'Content-Type" is json so the Requests library can convert the response to a json object directly.

In [10]:
# create a new variable to make it easier to work with the json
ng_stor_json = ng_stor.response.json()

<h2 style="color:#003f5b">What can we do with the JSON?</h2>

Printing the entire json object is a bit messy so lets examine the json by using its keys since at this point it is really a dictionary as far as python is concerned. The API documentation: http://www.eia.gov/opendata/commands.cfm provides some detail about the json structure (it looks like a nested python dicitonary). The first level of the json has 2 keys: [request] and [series] and the data that we want is nested somewhere under [series].

In [11]:
# create variables for the request and the series
request = ng_stor_json['request']
series = ng_stor_json['series']

print('Here is a list of all the items avaialble to us from json:')

# print each key in the request
print('request:')
for key in request.keys():
    print('\t{}'.format(key))
    
# print each key in series - note that the level below series is a dictinary nested in a list - series[0]
print('series:')
for key in series[0].keys():
    print('\t{}'.format(key))

Here is a list of all the items avaialble to us from json:
request:
	command
	series_id
series:
	start
	units
	f
	updated
	name
	description
	copyright
	end
	source
	series_id
	unitsshort
	data


The data attribute that is nested under series is what we are after so lets extract it and convert to a pandas dataframe and create a datetime index to make it a proper timeseries.

<h2 style="color:#003f5b">Parse the JSON to a Pandas dataframe</h2>

In [12]:
# we need to import pandas - ideally this import would happen at the start of the script
# and datetime to convert a string to a timestamp
import pandas as pd
from datetime import datetime

I think either a function or a class could be used here to parse the json into a dataframe. I'm going to use a class because I used a class to interact with the API so I might as well keep the object oriented thing going. I'm also going to build in some flexibility to handle different datetime frequencies so that this class can be used to handle all EIA time series.

In [13]:
class CreateDataFrame(object):
    '''Creates the dataframe for Energy API call'''
    
    # (1)
    def __init__(self, json):
        """:param json: eia json"""
        self.json = json
        self.series = self.json['series']
        self.data = self.series[0]['data']
        self.df = self.create_dataframe()
    
    # (2)
    def create_dataframe(self):
        """Function to create dataframe from json['series'] """
        values = [x[1] for x in self.data]
        dates = self.get_dates()
        return pd.DataFrame(values, index=dates, columns=['values'])
    
    # (3)
    def get_dates(self):
        """Parses text dates to datetime index"""
        freq = {'A': '%Y', 'M': '%Y%m', 'W': '%Y%m%d',
                'D': '%Y%m%d', 'H': '%Y%m%d %H'}
        date_list = []
        for x in self.data:
            # need to add this ugly bit to remove hourly time format from EIA
            time = x[0].replace('T', ' ')
            time = time.replace('Z', '')
            time = datetime.strptime(time, freq[self.series[0]['f']])
            date_list.append(time.strftime('%Y-%m-%d %H:%M:%S'))
        return date_list

(1) When the class is instantiated 4 attributes are created:
    1. json: the json that was passed in from the EIA API
    2. series: the series key of the json object
    3. data: the data key that is embedded in series
    4. df: a pandas dataframe
    
(2) A class method used to take the data key values from json and convert it to a pandas dataframe. The data key holds a list of lists that contains a date and value. This code extracts the value from the list and calls another method to extract the dates and convert to datetime. The dataframe is created by using the converted datetime to assign an index.

(3) A class method used to convert a string representation of dates to a datetime object. I built this to handle the different frequencies that could potentially be passes in since EIA data can be annual, monthly, weekly, daily or hourly.

Lets create the dataframe to check if the the CreateDataFrame class is working as expected

In [14]:
# here we create instantiate the CreateDataFrame class and assign the df attribute
ng_stor_df = CreateDataFrame(ng_stor_json).df
# print out the first 20 obs
print(ng_stor_df.head(n=20))

                     values
2016-09-23 00:00:00    3600
2016-09-16 00:00:00    3551
2016-09-09 00:00:00    3499
2016-09-02 00:00:00    3437
2016-08-26 00:00:00    3401
2016-08-19 00:00:00    3350
2016-08-12 00:00:00    3339
2016-08-05 00:00:00    3317
2016-07-29 00:00:00    3288
2016-07-22 00:00:00    3294
2016-07-15 00:00:00    3277
2016-07-08 00:00:00    3243
2016-07-01 00:00:00    3179
2016-06-24 00:00:00    3140
2016-06-17 00:00:00    3103
2016-06-10 00:00:00    3041
2016-06-03 00:00:00    2972
2016-05-27 00:00:00    2907
2016-05-20 00:00:00    2825
2016-05-13 00:00:00    2754


The dataframe looks good - it gives us a nice clean time series that is ready to be modelled. I'll be using the EIA API, GetSeries and CreateDataframe classes in my next post where I play around with outlier detection.