# Working with APIs



Lesson Goals

    Understand what is API and what it does.
    Learn how to make simple calls to an API and retrieve JSON data.
    Learn how to handle nested JSON API results.

Introduction

Thus far in the program, we have learned how to obtain data from files and from relational databases. However, sometimes the data we need is not readily available via one of these two data sources. In some cases, the data we need may be contained within an application. Application owners will often create APIs (or Application Programming Interface) so that their applications can talk to other applications. An API is a set of programmatic instructions for accessing software applications, and the data that comes from APIs typically contains some sort of structure (such as JSON). This structure makes working with API data preferable to crawling websites and scraping content off of web pages.

In this lesson, we are going to learn how to make API calls to an application, retrieve data in JSON format, learn about API authentication, and use Python libraries to obtain data from APIs.
Simple API Example with Requests

There are a few libraries that can be used for working with APIs in Python, but the Requests library is one of the most intuitive. It has a get method that allows you to send an HTTP request to an application and receive a response. Let's take a look at a basic API call using the requests library. 

In [1]:
import json
import requests

response = requests.get('https://jsonplaceholder.typicode.com/todos')
results = response.json()
results[0]

{'userId': 1, 'id': 1, 'title': 'delectus aut autem', 'completed': False}

In this example, we used the get method to send a request to the JSONPlaceholder API, and we received back a response in the form of JSON structured data. If we wanted to analyze this data, we could easily use Pandas to convert the results into a data frame to which we can then apply various analytical methods. 

In [2]:
import pandas as pd

data = pd.DataFrame(results)
data.head()

Unnamed: 0,completed,id,title,userId
0,False,1,delectus aut autem,1
1,False,2,quis ut nam facilis et officia qui,1
2,False,3,fugiat veniam minus,1
3,True,4,et porro tempora,1
4,False,5,laboriosam mollitia et enim quasi adipisci qui...,1


# More Complex Requests API Example

In the previous section, the data we received from the API was not very complex. It was all at a single level and fit neatly into a data frame. However, sometimes API responses contain data that is nested, and we must find a way to flatten the JSON data so that it fits nicely into a data frame. Let's make an API call to the Github public API, create a Pandas data frame from the results, and examine the structure of the data.



In [3]:
response = requests.get('https://api.github.com/events')

data = pd.DataFrame(response.json())
data.head()

Unnamed: 0,actor,created_at,id,org,payload,public,repo,type
0,"{'id': 50721655, 'login': 'Jrose3797', 'displa...",2019-07-08T14:22:26Z,9967820122,,"{'push_id': 3793486088, 'size': 1, 'distinct_s...",True,"{'id': 195824522, 'name': 'Jrose3797/dsc-intro...",PushEvent
1,"{'id': 3761375, 'login': 'cdcabrera', 'display...",2019-07-08T14:22:26Z,9967820118,,"{'push_id': 3793486084, 'size': 2, 'distinct_s...",True,"{'id': 190663766, 'name': 'cdcabrera/curiosity...",PushEvent
2,"{'id': 26219511, 'login': 'heaptracetechnology...",2019-07-08T14:22:26Z,9967820120,,"{'ref': 'Standard-OMG-mongodb', 'ref_type': 'b...",True,"{'id': 195819196, 'name': 'heaptracetechnology...",CreateEvent
3,"{'id': 9443847, 'login': 'hendrikebbers', 'dis...",2019-07-08T14:22:26Z,9967820117,"{'id': 1673867, 'login': 'AdoptOpenJDK', 'grav...","{'action': 'created', 'issue': {'url': 'https:...",True,"{'id': 176502087, 'name': 'AdoptOpenJDK/IcedTe...",IssueCommentEvent
4,"{'id': 6710696, 'login': 'nbuonin', 'display_l...",2019-07-08T14:22:26Z,9967820111,"{'id': 52456, 'login': 'ccnmtl', 'gravatar_id'...","{'push_id': 3793486074, 'size': 1, 'distinct_s...",True,"{'id': 183269109, 'name': 'ccnmtl/ohcoe-hugo',...",PushEvent


When we look at the data frame, we can see that there are dictionaries nested in several fields. We need to extract the information that is in these fields and add them to the data frame as columns. To do this, we are going to create our own flatten function that accepts a data frame and a list of columns that contain nested dictionaries in them. Our function is going to iterate through the columns and, for each column, it is going to:

    Turn the nested dictionaries into a data frame with a column for each key
    Assign column names to each column in this new data frame
    Add these new columns to the original data frame
    Drop the column with the nested dictionaries


In [4]:
def flatten(data, col_list):
    for column in col_list:
        flattened = pd.DataFrame(dict(data[column])).transpose()
        columns = [str(col) for col in flattened.columns]
        flattened.columns = [column + '_' + colname for colname in columns]
        data = pd.concat([data, flattened], axis=1)
        data = data.drop(column, axis=1)
    return data

Now that we have our function, let's apply it to the columns that have nested dictionaries and get back a revised data frame.

In [5]:
nested_columns = ['actor', 'org', 'payload', 'repo']

flat = flatten(data, nested_columns)
flat.head()

Unnamed: 0,created_at,id,public,type,actor_avatar_url,actor_display_login,actor_gravatar_id,actor_id,actor_login,actor_url,...,payload_number,payload_pull_request,payload_push_id,payload_pusher_type,payload_ref,payload_ref_type,payload_size,repo_id,repo_name,repo_url
0,2019-07-08T14:22:26Z,9967820122,True,PushEvent,https://avatars.githubusercontent.com/u/50721655?,Jrose3797,,50721655,Jrose3797,https://api.github.com/users/Jrose3797,...,,,3793486088.0,,refs/heads/wip,,1.0,195824522,Jrose3797/dsc-intro-to-sets-lab-houston-ds-060319,https://api.github.com/repos/Jrose3797/dsc-int...
1,2019-07-08T14:22:26Z,9967820118,True,PushEvent,https://avatars.githubusercontent.com/u/3761375?,cdcabrera,,3761375,cdcabrera,https://api.github.com/users/cdcabrera,...,,,3793486084.0,,refs/heads/master,,2.0,190663766,cdcabrera/curiosity-frontend,https://api.github.com/repos/cdcabrera/curiosi...
2,2019-07-08T14:22:26Z,9967820120,True,CreateEvent,https://avatars.githubusercontent.com/u/26219511?,heaptracetechnology,,26219511,heaptracetechnology,https://api.github.com/users/heaptracetechnology,...,,,,user,Standard-OMG-mongodb,branch,,195819196,heaptracetechnology/mongodb,https://api.github.com/repos/heaptracetechnolo...
3,2019-07-08T14:22:26Z,9967820117,True,IssueCommentEvent,https://avatars.githubusercontent.com/u/9443847?,hendrikebbers,,9443847,hendrikebbers,https://api.github.com/users/hendrikebbers,...,,,,,,,,176502087,AdoptOpenJDK/IcedTea-Web,https://api.github.com/repos/AdoptOpenJDK/Iced...
4,2019-07-08T14:22:26Z,9967820111,True,PushEvent,https://avatars.githubusercontent.com/u/6710696?,nbuonin,,6710696,nbuonin,https://api.github.com/users/nbuonin,...,,,3793486074.0,,refs/heads/domain-rev-progress-bars,,1.0,183269109,ccnmtl/ohcoe-hugo,https://api.github.com/repos/ccnmtl/ohcoe-hugo


Alternatively, we can flatten nested data using the function json_normalize. This function is part of the Pandas library. The function will flatten and rename each flattened column to the name of the original column and the name of the nested column separated by a period. For example actor.avatar_url.

Here is an example of how to use this function. Note that you have to import it separately in order to avoid using the full path when calling the function.

In [6]:
from pandas.io.json import json_normalize

results = response.json()
flattened_data = json_normalize(results)

flattened_data.head()

Unnamed: 0,actor.avatar_url,actor.display_login,actor.gravatar_id,actor.id,actor.login,actor.url,created_at,id,org.avatar_url,org.gravatar_id,...,payload.push_id,payload.pusher_type,payload.ref,payload.ref_type,payload.size,public,repo.id,repo.name,repo.url,type
0,https://avatars.githubusercontent.com/u/50721655?,Jrose3797,,50721655,Jrose3797,https://api.github.com/users/Jrose3797,2019-07-08T14:22:26Z,9967820122,,,...,3793486000.0,,refs/heads/wip,,1.0,True,195824522,Jrose3797/dsc-intro-to-sets-lab-houston-ds-060319,https://api.github.com/repos/Jrose3797/dsc-int...,PushEvent
1,https://avatars.githubusercontent.com/u/3761375?,cdcabrera,,3761375,cdcabrera,https://api.github.com/users/cdcabrera,2019-07-08T14:22:26Z,9967820118,,,...,3793486000.0,,refs/heads/master,,2.0,True,190663766,cdcabrera/curiosity-frontend,https://api.github.com/repos/cdcabrera/curiosi...,PushEvent
2,https://avatars.githubusercontent.com/u/26219511?,heaptracetechnology,,26219511,heaptracetechnology,https://api.github.com/users/heaptracetechnology,2019-07-08T14:22:26Z,9967820120,,,...,,user,Standard-OMG-mongodb,branch,,True,195819196,heaptracetechnology/mongodb,https://api.github.com/repos/heaptracetechnolo...,CreateEvent
3,https://avatars.githubusercontent.com/u/9443847?,hendrikebbers,,9443847,hendrikebbers,https://api.github.com/users/hendrikebbers,2019-07-08T14:22:26Z,9967820117,https://avatars.githubusercontent.com/u/1673867?,,...,,,,,,True,176502087,AdoptOpenJDK/IcedTea-Web,https://api.github.com/repos/AdoptOpenJDK/Iced...,IssueCommentEvent
4,https://avatars.githubusercontent.com/u/6710696?,nbuonin,,6710696,nbuonin,https://api.github.com/users/nbuonin,2019-07-08T14:22:26Z,9967820111,https://avatars.githubusercontent.com/u/52456?,,...,3793486000.0,,refs/heads/domain-rev-progress-bars,,1.0,True,183269109,ccnmtl/ohcoe-hugo,https://api.github.com/repos/ccnmtl/ohcoe-hugo,PushEvent


Looks much cleaner, and now we have access to the information that was enclosed within those dictionaries. Sometimes multiple rounds of flattening will be required if the JSON data returned from the API you are working with has hierarchically nested data.
